# Processed Emails Feature Specification ## Overview This document outlines the specification for implementing a feature to persistently track which emails have been processed by the Email Organizer system. The goal is to maintain a record of email processing status to avoid reprocessing the same emails during synchronization and provide accurate pending email counts. ## Requirements ### 1. Email Tracking Requirements - **Unique Email Identification**: Track emails using a unique identifier (UID) provided by the IMAP server, along with the folder name and user ID - **Processing Status**: Mark emails as either "pending" (unprocessed) or "processed" - **Minimal Data Storage**: Store only essential information - email UID, folder, user, and processing status - not email content, subjects, or bodies - **Persistence**: Maintain processing status across application restarts and synchronization cycles - **Efficient Lookup**: Quickly determine which emails in a folder are pending processing ### 2. Synchronization Requirements - **Initial Sync**: During first synchronization of a folder, all emails should be marked as "pending" - **Incremental Sync**: On subsequent syncs, only emails that haven't been processed should be identified as pending - **Status Update**: When an email is processed, update its status from "pending" to "processed" - **Cleanup**: Remove records for emails that no longer exist on the IMAP server (optional for future enhancement) ### 3. Performance Requirements - **Efficient Storage**: Use appropriate database indexing for fast lookups - **Minimal Memory Usage**: Store only essential data to keep memory footprint low - **Batch Processing**: Support batch operations for processing multiple emails efficiently ## Data Model Design ### ProcessedEmails Table ```mermaid erDiagram USER { int id PK "Primary Key" string email "Unique, Not Null" string first_name "Not Null" string last_name "Not Null" string password_hash "Not Null" json imap_config "JSON Configuration" datetime created_at "Default: UTC Now" datetime updated_at "Default: UTC Now, On Update" } FOLDER { int id PK "Primary Key" int user_id FK "Foreign Key to User" string name "Not Null" text rule_text "Natural Language Rule" int priority "Processing Order" boolean organize_enabled "Default: True" int total_count "Default: 0" int pending_count "Default: 0" json recent_emails "JSON Array" datetime created_at "Default: UTC Now" datetime updated_at "Default: UTC Now, On Update" } PROCESSED_EMAIL { int id PK "Primary Key" int user_id FK "Foreign Key to User" int folder_id FK "Foreign Key to Folder" string email_uid "Not Null" "IMAP Email UID" string folder_name "Not Null" "IMAP Folder Name" boolean is_processed "Default: False" "Processing Status" datetime first_seen_at "Default: UTC Now" "First seen during sync" datetime processed_at "Nullable" "When email was processed" datetime created_at "Default: UTC Now" datetime updated_at "Default: UTC Now, On Update" } USER ||--o{ FOLDER : "has" USER ||--o{ PROCESSED_EMAIL : "has" FOLDER ||--o{ PROCESSED_EMAIL : "has" ``` ### Column Specifications | Table | Column | Data Type | Constraints | Description | |-------|--------|-----------|--------------|-------------| | PROCESSED_EMAIL | id | Integer | Primary Key, Autoincrement | Unique identifier for each processed email record | | PROCESSED_EMAIL | user_id | Integer | Foreign Key to User, Not Null | Reference to the user who owns this email | | PROCESSED_EMAIL | folder_id | Integer | Foreign Key to Folder, Not Null | Reference to the folder this email belongs to | | PROCESSED_EMAIL | email_uid | String(255) | Not Null | Unique ID of the email from IMAP server | | PROCESSED_EMAIL | folder_name | String(255) | Not Null | Name of the IMAP folder (for redundancy) | | PROCESSED_EMAIL | is_processed | Boolean | Default: False | Processing status (false=pending, true=processed) | | PROCESSED_EMAIL | first_seen_at | DateTime | Default: datetime.utcnow | First time this email was detected during sync | | PROCESSED_EMAIL | processed_at | DateTime | Nullable | When the email was marked as processed | | PROCESSED_EMAIL | created_at | DateTime | Default: datetime.utcnow | Record creation timestamp | | PROCESSED_EMAIL | updated_at | DateTime | Default: datetime.utcnow, On Update | Record update timestamp | ### Relationships - **User to ProcessedEmail**: One-to-many relationship - each user can have multiple processed email records - **Folder to ProcessedEmail**: One-to-many relationship - each folder can have multiple processed email records - **Composite Key**: The combination of (user_id, folder_name, email_uid) should be unique to prevent duplicate records ### Database Indexes - Primary key index on `id` - Foreign key indexes on `user_id` and `folder_id` - Composite unique index on `(user_id, folder_name, email_uid)` - Index on `folder_name` for faster folder-based queries - Index on `is_processed` for filtering pending emails - Index on `first_seen_at` for tracking recently added emails ## Service Design ### ProcessedEmailsService A new service class will be responsible for managing processed email records: ```python class ProcessedEmailsService: def __init__(self, user: User): self.user = user def get_pending_emails(self, folder_name: str) -> List[str]: """Get list of email UIDs that are pending processing in a folder.""" def mark_email_processed(self, folder_name: str, email_uid: str) -> bool: """Mark an email as processed.""" def mark_emails_processed(self, folder_name: str, email_uids: List[str]) -> int: """Mark multiple emails as processed in bulk.""" def sync_folder_emails(self, folder_name: str, email_uids: List[str]) -> int: """Sync email UIDs for a folder, adding new ones as pending.""" def get_pending_count(self, folder_name: str) -> int: """Get count of pending emails for a folder.""" def cleanup_old_records(self, folder_name: str, current_uids: List[str]) -> int: """Remove records for emails that no longer exist in the folder.""" ``` ### IMAPService Integration The existing IMAP service will be enhanced to use the ProcessedEmailsService: ```python class IMAPService: def __init__(self, user: User): self.user = user self.config = user.imap_config or {} self.connection = None self.processed_emails_service = ProcessedEmailsService(user) def get_folder_email_count(self, folder_name: str) -> int: """Get the count of emails in a specific folder, considering processed status.""" def get_pending_emails(self, folder_name: str) -> List[str]: """Get email UIDs that are pending processing.""" def sync_folders(self) -> Tuple[bool, str]: """Sync IMAP folders with local database, tracking email processing status.""" ``` ## API Endpoints ### New HTMX Endpoints for Processed Email Management 1. **Get Pending Emails for a Folder** - Method: GET - Path: `/api/folders//pending-emails` - Response: An Dialog List of email metadata for pending emails (subject, date, UID), a button to preview the email (fetch it from the imap server) 2. **Mark Email as Processed** - Method: POST - Path: `/api/folders//emails//process` - Action: Mark a specific email as processed - Response: Updated dialog body. ## Workflow Integration ### Email Processing Flow ```mermaid sequenceDiagram participant U as User participant B as Browser participant M as Main Blueprint participant I as IMAP Service participant P as ProcessedEmails Service participant DB as Database U->>B: Click "Sync Folders" B->>M: POST /api/imap/sync M->>I: Sync folders with processed email tracking I->>I: Connect to IMAP server I->>I: Get list of email UIDs for folder I->>P: sync_folder_emails(folder_name, email_uids) P->>DB: Create pending email records P->>I: Return list of pending email UIDs I->>M: Return sync results M->>B: Update UI with pending counts ``` ### Email Processing Status Update ```mermaid sequenceDiagram participant U as User participant B as Browser participant M as Main Blueprint participant P as ProcessedEmails Service participant DB as Database U->>B: Trigger email processing B->>M: POST /api/folders//process-emails M->>P: mark_emails_processed(folder_name, email_uids) P->>DB: Update email processing status P->>M: Return success count M->>B: Update UI with new counts ``` ## Migration Strategy ### Phase 1: Data Model Implementation 1. Create the `processed_emails` table with appropriate indexes 2. Implement the `ProcessedEmailsService` class 3. Add basic CRUD operations for email processing records ### Phase 2: IMAP Service Integration 1. Update `IMAPService` to use `ProcessedEmailsService` 2. Modify folder synchronization to track email UIDs 3. Update email count methods to consider processing status ### Phase 3: API and UI Integration 1. Add API endpoints for processed email management 2. Update UI to display accurate pending counts 3. Add bulk processing capabilities ### Phase 4: Optimization and Cleanup 1. Implement batch processing for performance 2. Add periodic cleanup of orphaned records 3. Optimize database queries for large datasets ## Security Considerations 1. **Access Control**: Ensure users can only access their own email processing records 2. **Data Validation**: Validate all email UIDs and folder names to prevent injection attacks 3. **Rate Limiting**: Implement rate limiting for email processing endpoints to prevent abuse 4. **Data Privacy**: Ensure no sensitive email content is stored in the database ## Performance Considerations 1. **Database Indexing**: Proper indexing on frequently queried fields 2. **Batch Operations**: Use batch operations for processing multiple emails 3. **Memory Management**: Process emails in batches to avoid memory issues with large mailboxes 4. **Caching**: Consider caching frequently accessed email processing status ## Future Enhancements 1. **Email Movement Tracking**: Track when emails are moved between folders 2. **Processing History**: Maintain a history of email processing actions 3. **Email Deduplication**: Handle duplicate emails across folders 4. **Automated Cleanup**: Periodic cleanup of old or orphaned processing records 5. **Analytics**: Provide insights into email processing patterns and efficiency