12 KiB
12 KiB
Processed Emails Feature Specification
Overview
This document outlines the specification for implementing a feature to persistently track which emails have been processed by the Email Organizer system. The goal is to maintain a record of email processing status to avoid reprocessing the same emails during synchronization and provide accurate pending email counts.
Current Implementation Status
The Processed Emails feature is fully implemented and operational:
Core Implementation
- ProcessedEmail Model: Implemented in
app/models.py - ProcessedEmails Service: Implemented in
app/processed_emails_service.py - Emails Blueprint: Implemented in
app/routes/emails.py - UI Integration: Pending emails dialog and processing functionality
Key Features
- Email UID tracking for processing status
- Pending email counts and management
- Bulk email processing operations
- Email metadata display and management
- Integration with IMAP synchronization process
Requirements
1. Email Tracking Requirements
- Unique Email Identification: Track emails using a unique identifier (UID) provided by the IMAP server, along with the folder name and user ID
- Processing Status: Mark emails as either "pending" (unprocessed) or "processed"
- Minimal Data Storage: Store only essential information - email UID, folder, user, and processing status - not email content, subjects, or bodies
- Persistence: Maintain processing status across application restarts and synchronization cycles
- Efficient Lookup: Quickly determine which emails in a folder are pending processing
2. Synchronization Requirements
- Initial Sync: During first synchronization of a folder, all emails should be marked as "pending"
- Incremental Sync: On subsequent syncs, only emails that haven't been processed should be identified as pending
- Status Update: When an email is processed, update its status from "pending" to "processed"
- Cleanup: Remove records for emails that no longer exist on the IMAP server
3. Performance Requirements
- Efficient Storage: Use appropriate database indexing for fast lookups
- Minimal Memory Usage: Store only essential data to keep memory footprint low
- Batch Processing: Support batch operations for processing multiple emails efficiently
Data Model Design
ProcessedEmails Table
erDiagram
USER {
int id PK "Primary Key"
string email "Unique, Not Null"
string first_name "Not Null"
string last_name "Not Null"
string password_hash "Not Null"
json imap_config "JSON Configuration"
datetime created_at "Default: UTC Now"
datetime updated_at "Default: UTC Now, On Update"
}
FOLDER {
int id PK "Primary Key"
int user_id FK "Foreign Key to User"
string name "Not Null"
text rule_text "Natural Language Rule"
int priority "Processing Order"
boolean organize_enabled "Default: True"
int total_count "Default: 0"
int pending_count "Default: 0"
json recent_emails "JSON Array"
datetime created_at "Default: UTC Now"
datetime updated_at "Default: UTC Now, On Update"
}
PROCESSED_EMAIL {
int id PK "Primary Key"
int user_id FK "Foreign Key to User"
int folder_id FK "Foreign Key to Folder"
string email_uid "Not Null" "IMAP Email UID"
string folder_name "Not Null" "IMAP Folder Name"
boolean is_processed "Default: False" "Processing Status"
datetime first_seen_at "Default: UTC Now" "First seen during sync"
datetime processed_at "Nullable" "When email was processed"
datetime created_at "Default: UTC Now"
datetime updated_at "Default: UTC Now, On Update"
}
USER ||--o{ FOLDER : "has"
USER ||--o{ PROCESSED_EMAIL : "has"
FOLDER ||--o{ PROCESSED_EMAIL : "has"
Column Specifications
| Table | Column | Data Type | Constraints | Description |
|---|---|---|---|---|
| PROCESSED_EMAIL | id | Integer | Primary Key, Autoincrement | Unique identifier for each processed email record |
| PROCESSED_EMAIL | user_id | Integer | Foreign Key to User, Not Null | Reference to the user who owns this email |
| PROCESSED_EMAIL | folder_id | Integer | Foreign Key to Folder, Not Null | Reference to the folder this email belongs to |
| PROCESSED_EMAIL | email_uid | String(255) | Not Null | Unique ID of the email from IMAP server |
| PROCESSED_EMAIL | folder_name | String(255) | Not Null | Name of the IMAP folder (for redundancy) |
| PROCESSED_EMAIL | is_processed | Boolean | Default: False | Processing status (false=pending, true=processed) |
| PROCESSED_EMAIL | first_seen_at | DateTime | Default: datetime.utcnow | First time this email was detected during sync |
| PROCESSED_EMAIL | processed_at | DateTime | Nullable | When the email was marked as processed |
| PROCESSED_EMAIL | created_at | DateTime | Default: datetime.utcnow | Record creation timestamp |
| PROCESSED_EMAIL | updated_at | DateTime | Default: datetime.utcnow, On Update | Record update timestamp |
Relationships
- User to ProcessedEmail: One-to-many relationship - each user can have multiple processed email records
- Folder to ProcessedEmail: One-to-many relationship - each folder can have multiple processed email records
- Composite Key: The combination of (user_id, folder_name, email_uid) should be unique to prevent duplicate records
Database Indexes
- Primary key index on
id - Foreign key indexes on
user_idandfolder_id - Composite unique index on
(user_id, folder_name, email_uid) - Index on
folder_namefor faster folder-based queries - Index on
is_processedfor filtering pending emails - Index on
first_seen_atfor tracking recently added emails
Service Design
ProcessedEmailsService
The ProcessedEmailsService (app/processed_emails_service.py) provides:
class ProcessedEmailsService:
def __init__(self, user: User):
self.user = user
def get_pending_emails(self, folder_name: str) -> List[str]:
"""Get list of email UIDs that are pending processing in a folder."""
def mark_email_processed(self, folder_name: str, email_uid: str) -> bool:
"""Mark an email as processed."""
def mark_emails_processed(self, folder_name: str, email_uids: List[str]) -> int:
"""Mark multiple emails as processed in bulk."""
def sync_folder_emails(self, folder_name: str, email_uids: List[str]) -> int:
"""Sync email UIDs for a folder, adding new ones as pending."""
def get_pending_count(self, folder_name: str) -> int:
"""Get count of pending emails for a folder."""
def cleanup_old_records(self, folder_name: str, current_uids: List[str]) -> int:
"""Remove records for emails that no longer exist in the folder."""
IMAPService Integration
The IMAP service (app/imap_service.py) integrates with the ProcessedEmailsService:
class IMAPService:
def __init__(self, user: User):
self.user = user
self.config = user.imap_config or {}
self.connection = None
self.processed_emails_service = ProcessedEmailsService(user)
def get_folder_email_count(self, folder_name: str) -> int:
"""Get the count of emails in a specific folder, considering processed status."""
def get_pending_emails(self, folder_name: str) -> List[str]:
"""Get email UIDs that are pending processing."""
def sync_folders(self) -> Tuple[bool, str]:
"""Sync IMAP folders with local database, tracking email processing status."""
API Endpoints
HTMX Endpoints for Processed Email Management
-
Get Pending Emails for a Folder
- Method: GET
- Path:
/api/folders/<folder_id>/pending-emails - Response: Dialog with list of email metadata for pending emails (subject, date, UID)
- Features: Email preview, individual processing buttons
-
Mark Email as Processed
- Method: POST
- Path:
/api/folders/<folder_id>/emails/<email_uid>/process - Action: Mark a specific email as processed
- Response: Updated dialog body with new counts
-
Sync Emails for a Folder
- Method: POST
- Path:
/api/folders/<folder_id>/sync-emails - Action: Sync emails for a specific folder with processed email tracking
- Response: Updated counts and sync status
-
Process Multiple Emails
- Method: POST
- Path:
/api/folders/<folder_id>/process-emails - Action: Process multiple emails in a folder (mark as processed)
- Response: Success message with updated counts
Workflow Integration
Email Processing Flow
sequenceDiagram
participant U as User
participant B as Browser
participant M as Main Blueprint
participant I as IMAP Service
participant P as ProcessedEmails Service
participant DB as Database
U->>B: Click "Sync Folders"
B->>M: POST /api/imap/sync
M->>I: Sync folders with processed email tracking
I->>I: Connect to IMAP server
I->>I: Get list of email UIDs for folder
I->>P: sync_folder_emails(folder_name, email_uids)
P->>DB: Create pending email records
P->>I: Return list of pending email UIDs
I->>M: Return sync results
M->>B: Update UI with pending counts
Email Processing Status Update
sequenceDiagram
participant U as User
participant B as Browser
participant M as Main Blueprint
participant P as ProcessedEmails Service
participant DB as Database
U->>B: Trigger email processing
B->>M: POST /api/folders/<folder_id>/process-emails
M->>P: mark_emails_processed(folder_name, email_uids)
P->>DB: Update email processing status
P->>M: Return success count
M->>B: Update UI with new counts
Migration Strategy
Current Implementation Status
Phase 1: Data Model Implementation ✅
- Create the
processed_emailstable with appropriate indexes ✅ - Implement the
ProcessedEmailsServiceclass ✅ - Add basic CRUD operations for email processing records ✅
Phase 2: IMAP Service Integration ✅
- Update
IMAPServiceto useProcessedEmailsService✅ - Modify folder synchronization to track email UIDs ✅
- Update email count methods to consider processing status ✅
Phase 3: API and UI Integration ✅
- Add API endpoints for processed email management ✅
- Update UI to display accurate pending counts ✅
- Add bulk processing capabilities ✅
Phase 4: Optimization and Cleanup ✅
- Implement batch processing for performance ✅
- Add periodic cleanup of orphaned records ✅
- Optimize database queries for large datasets ✅
Security Considerations
- Access Control: Ensure users can only access their own email processing records
- Data Validation: Validate all email UIDs and folder names to prevent injection attacks
- Rate Limiting: Implement rate limiting for email processing endpoints to prevent abuse
- Data Privacy: Ensure no sensitive email content is stored in the database
Performance Considerations
- Database Indexing: Proper indexing on frequently queried fields
- Batch Operations: Use batch operations for processing multiple emails
- Memory Management: Process emails in batches to avoid memory issues with large mailboxes
- Caching: Consider caching frequently accessed email processing status
Future Enhancements
- Email Movement Tracking: Track when emails are moved between folders
- Processing History: Maintain a history of email processing actions
- Email Deduplication: Handle duplicate emails across folders
- Automated Cleanup: Periodic cleanup of old or orphaned processing records
- Analytics: Provide insights into email processing patterns and efficiency