Files

Bryce d6635a42df update docs.

2025-08-09 21:04:21 -07:00

12 KiB

Raw Blame History

Processed Emails Feature Specification

Overview

This document outlines the specification for implementing a feature to persistently track which emails have been processed by the Email Organizer system. The goal is to maintain a record of email processing status to avoid reprocessing the same emails during synchronization and provide accurate pending email counts.

Current Implementation Status

The Processed Emails feature is fully implemented and operational:

Core Implementation

ProcessedEmail Model: Implemented in app/models.py
ProcessedEmails Service: Implemented in app/processed_emails_service.py
Emails Blueprint: Implemented in app/routes/emails.py
UI Integration: Pending emails dialog and processing functionality

Key Features

Email UID tracking for processing status
Pending email counts and management
Bulk email processing operations
Email metadata display and management
Integration with IMAP synchronization process

Requirements

1. Email Tracking Requirements

Unique Email Identification: Track emails using a unique identifier (UID) provided by the IMAP server, along with the folder name and user ID
Processing Status: Mark emails as either "pending" (unprocessed) or "processed"
Minimal Data Storage: Store only essential information - email UID, folder, user, and processing status - not email content, subjects, or bodies
Persistence: Maintain processing status across application restarts and synchronization cycles
Efficient Lookup: Quickly determine which emails in a folder are pending processing

2. Synchronization Requirements

Initial Sync: During first synchronization of a folder, all emails should be marked as "pending"
Incremental Sync: On subsequent syncs, only emails that haven't been processed should be identified as pending
Status Update: When an email is processed, update its status from "pending" to "processed"
Cleanup: Remove records for emails that no longer exist on the IMAP server

3. Performance Requirements

Efficient Storage: Use appropriate database indexing for fast lookups
Minimal Memory Usage: Store only essential data to keep memory footprint low
Batch Processing: Support batch operations for processing multiple emails efficiently

Data Model Design

ProcessedEmails Table

erDiagram
    USER {
        int id PK "Primary Key"
        string email "Unique, Not Null"
        string first_name "Not Null"
        string last_name "Not Null"
        string password_hash "Not Null"
        json imap_config "JSON Configuration"
        datetime created_at "Default: UTC Now"
        datetime updated_at "Default: UTC Now, On Update"
    }
    
    FOLDER {
        int id PK "Primary Key"
        int user_id FK "Foreign Key to User"
        string name "Not Null"
        text rule_text "Natural Language Rule"
        int priority "Processing Order"
        boolean organize_enabled "Default: True"
        int total_count "Default: 0"
        int pending_count "Default: 0"
        json recent_emails "JSON Array"
        datetime created_at "Default: UTC Now"
        datetime updated_at "Default: UTC Now, On Update"
    }
    
    PROCESSED_EMAIL {
        int id PK "Primary Key"
        int user_id FK "Foreign Key to User"
        int folder_id FK "Foreign Key to Folder"
        string email_uid "Not Null" "IMAP Email UID"
        string folder_name "Not Null" "IMAP Folder Name"
        boolean is_processed "Default: False" "Processing Status"
        datetime first_seen_at "Default: UTC Now" "First seen during sync"
        datetime processed_at "Nullable" "When email was processed"
        datetime created_at "Default: UTC Now"
        datetime updated_at "Default: UTC Now, On Update"
    }
    
    USER ||--o{ FOLDER : "has"
    USER ||--o{ PROCESSED_EMAIL : "has"
    FOLDER ||--o{ PROCESSED_EMAIL : "has"

Column Specifications

Table	Column	Data Type	Constraints	Description
PROCESSED_EMAIL	id	Integer	Primary Key, Autoincrement	Unique identifier for each processed email record
PROCESSED_EMAIL	user_id	Integer	Foreign Key to User, Not Null	Reference to the user who owns this email
PROCESSED_EMAIL	folder_id	Integer	Foreign Key to Folder, Not Null	Reference to the folder this email belongs to
PROCESSED_EMAIL	email_uid	String(255)	Not Null	Unique ID of the email from IMAP server
PROCESSED_EMAIL	folder_name	String(255)	Not Null	Name of the IMAP folder (for redundancy)
PROCESSED_EMAIL	is_processed	Boolean	Default: False	Processing status (false=pending, true=processed)
PROCESSED_EMAIL	first_seen_at	DateTime	Default: datetime.utcnow	First time this email was detected during sync
PROCESSED_EMAIL	processed_at	DateTime	Nullable	When the email was marked as processed
PROCESSED_EMAIL	created_at	DateTime	Default: datetime.utcnow	Record creation timestamp
PROCESSED_EMAIL	updated_at	DateTime	Default: datetime.utcnow, On Update	Record update timestamp

Relationships

User to ProcessedEmail: One-to-many relationship - each user can have multiple processed email records
Folder to ProcessedEmail: One-to-many relationship - each folder can have multiple processed email records
Composite Key: The combination of (user_id, folder_name, email_uid) should be unique to prevent duplicate records

Database Indexes

Primary key index on id
Foreign key indexes on user_id and folder_id
Composite unique index on (user_id, folder_name, email_uid)
Index on folder_name for faster folder-based queries
Index on is_processed for filtering pending emails
Index on first_seen_at for tracking recently added emails

Service Design

ProcessedEmailsService

The ProcessedEmailsService (app/processed_emails_service.py) provides:

class ProcessedEmailsService:
    def __init__(self, user: User):
        self.user = user
    
    def get_pending_emails(self, folder_name: str) -> List[str]:
        """Get list of email UIDs that are pending processing in a folder."""
        
    def mark_email_processed(self, folder_name: str, email_uid: str) -> bool:
        """Mark an email as processed."""
        
    def mark_emails_processed(self, folder_name: str, email_uids: List[str]) -> int:
        """Mark multiple emails as processed in bulk."""
        
    def sync_folder_emails(self, folder_name: str, email_uids: List[str]) -> int:
        """Sync email UIDs for a folder, adding new ones as pending."""
        
    def get_pending_count(self, folder_name: str) -> int:
        """Get count of pending emails for a folder."""
        
    def cleanup_old_records(self, folder_name: str, current_uids: List[str]) -> int:
        """Remove records for emails that no longer exist in the folder."""

IMAPService Integration

The IMAP service (app/imap_service.py) integrates with the ProcessedEmailsService:

class IMAPService:
    def __init__(self, user: User):
        self.user = user
        self.config = user.imap_config or {}
        self.connection = None
        self.processed_emails_service = ProcessedEmailsService(user)
    
    def get_folder_email_count(self, folder_name: str) -> int:
        """Get the count of emails in a specific folder, considering processed status."""
        
    def get_pending_emails(self, folder_name: str) -> List[str]:
        """Get email UIDs that are pending processing."""
        
    def sync_folders(self) -> Tuple[bool, str]:
        """Sync IMAP folders with local database, tracking email processing status."""

API Endpoints

HTMX Endpoints for Processed Email Management

Get Pending Emails for a Folder
- Method: GET
- Path: /api/folders/<folder_id>/pending-emails
- Response: Dialog with list of email metadata for pending emails (subject, date, UID)
- Features: Email preview, individual processing buttons
Mark Email as Processed
- Method: POST
- Path: /api/folders/<folder_id>/emails/<email_uid>/process
- Action: Mark a specific email as processed
- Response: Updated dialog body with new counts
Sync Emails for a Folder
- Method: POST
- Path: /api/folders/<folder_id>/sync-emails
- Action: Sync emails for a specific folder with processed email tracking
- Response: Updated counts and sync status
Process Multiple Emails
- Method: POST
- Path: /api/folders/<folder_id>/process-emails
- Action: Process multiple emails in a folder (mark as processed)
- Response: Success message with updated counts

Workflow Integration

Email Processing Flow

sequenceDiagram
    participant U as User
    participant B as Browser
    participant M as Main Blueprint
    participant I as IMAP Service
    participant P as ProcessedEmails Service
    participant DB as Database
    
    U->>B: Click "Sync Folders"
    B->>M: POST /api/imap/sync
    M->>I: Sync folders with processed email tracking
    I->>I: Connect to IMAP server
    I->>I: Get list of email UIDs for folder
    I->>P: sync_folder_emails(folder_name, email_uids)
    P->>DB: Create pending email records
    P->>I: Return list of pending email UIDs
    I->>M: Return sync results
    M->>B: Update UI with pending counts

Email Processing Status Update

sequenceDiagram
    participant U as User
    participant B as Browser
    participant M as Main Blueprint
    participant P as ProcessedEmails Service
    participant DB as Database
    
    U->>B: Trigger email processing
    B->>M: POST /api/folders/<folder_id>/process-emails
    M->>P: mark_emails_processed(folder_name, email_uids)
    P->>DB: Update email processing status
    P->>M: Return success count
    M->>B: Update UI with new counts

Migration Strategy

Current Implementation Status

Phase 1: Data Model Implementation ✅

Create the processed_emails table with appropriate indexes ✅
Implement the ProcessedEmailsService class ✅
Add basic CRUD operations for email processing records ✅

Phase 2: IMAP Service Integration ✅

Update IMAPService to use ProcessedEmailsService ✅
Modify folder synchronization to track email UIDs ✅
Update email count methods to consider processing status ✅

Phase 3: API and UI Integration ✅

Add API endpoints for processed email management ✅
Update UI to display accurate pending counts ✅
Add bulk processing capabilities ✅

Phase 4: Optimization and Cleanup ✅

Implement batch processing for performance ✅
Add periodic cleanup of orphaned records ✅
Optimize database queries for large datasets ✅

Security Considerations

Access Control: Ensure users can only access their own email processing records
Data Validation: Validate all email UIDs and folder names to prevent injection attacks
Rate Limiting: Implement rate limiting for email processing endpoints to prevent abuse
Data Privacy: Ensure no sensitive email content is stored in the database

Performance Considerations

Database Indexing: Proper indexing on frequently queried fields
Batch Operations: Use batch operations for processing multiple emails
Memory Management: Process emails in batches to avoid memory issues with large mailboxes
Caching: Consider caching frequently accessed email processing status

Future Enhancements

Email Movement Tracking: Track when emails are moved between folders
Processing History: Maintain a history of email processing actions
Email Deduplication: Handle duplicate emails across folders
Automated Cleanup: Periodic cleanup of old or orphaned processing records
Analytics: Provide insights into email processing patterns and efficiency

12 KiB Raw Blame History

Processed Emails Feature Specification

Overview

Current Implementation Status

Core Implementation

Key Features

Requirements

1. Email Tracking Requirements

2. Synchronization Requirements

3. Performance Requirements

Data Model Design

ProcessedEmails Table

Column Specifications

Relationships

Database Indexes

Service Design

ProcessedEmailsService

IMAPService Integration

API Endpoints

HTMX Endpoints for Processed Email Management

Workflow Integration

Email Processing Flow

Email Processing Status Update

Migration Strategy

Current Implementation Status

Phase 1: Data Model Implementation ✅

Phase 2: IMAP Service Integration ✅

Phase 3: API and UI Integration ✅

Phase 4: Optimization and Cleanup ✅

Security Considerations

Performance Considerations

Future Enhancements

12 KiB

Raw Blame History