11 KiB
11 KiB
Processed Emails Feature Specification
Overview
This document outlines the specification for implementing a feature to persistently track which emails have been processed by the Email Organizer system. The goal is to maintain a record of email processing status to avoid reprocessing the same emails during synchronization and provide accurate pending email counts.
Requirements
1. Email Tracking Requirements
- Unique Email Identification: Track emails using a unique identifier (UID) provided by the IMAP server, along with the folder name and user ID
- Processing Status: Mark emails as either "pending" (unprocessed) or "processed"
- Minimal Data Storage: Store only essential information - email UID, folder, user, and processing status - not email content, subjects, or bodies
- Persistence: Maintain processing status across application restarts and synchronization cycles
- Efficient Lookup: Quickly determine which emails in a folder are pending processing
2. Synchronization Requirements
- Initial Sync: During first synchronization of a folder, all emails should be marked as "pending"
- Incremental Sync: On subsequent syncs, only emails that haven't been processed should be identified as pending
- Status Update: When an email is processed, update its status from "pending" to "processed"
- Cleanup: Remove records for emails that no longer exist on the IMAP server (optional for future enhancement)
3. Performance Requirements
- Efficient Storage: Use appropriate database indexing for fast lookups
- Minimal Memory Usage: Store only essential data to keep memory footprint low
- Batch Processing: Support batch operations for processing multiple emails efficiently
Data Model Design
ProcessedEmails Table
erDiagram
USER {
int id PK "Primary Key"
string email "Unique, Not Null"
string first_name "Not Null"
string last_name "Not Null"
string password_hash "Not Null"
json imap_config "JSON Configuration"
datetime created_at "Default: UTC Now"
datetime updated_at "Default: UTC Now, On Update"
}
FOLDER {
int id PK "Primary Key"
int user_id FK "Foreign Key to User"
string name "Not Null"
text rule_text "Natural Language Rule"
int priority "Processing Order"
boolean organize_enabled "Default: True"
int total_count "Default: 0"
int pending_count "Default: 0"
json recent_emails "JSON Array"
datetime created_at "Default: UTC Now"
datetime updated_at "Default: UTC Now, On Update"
}
PROCESSED_EMAIL {
int id PK "Primary Key"
int user_id FK "Foreign Key to User"
int folder_id FK "Foreign Key to Folder"
string email_uid "Not Null" "IMAP Email UID"
string folder_name "Not Null" "IMAP Folder Name"
boolean is_processed "Default: False" "Processing Status"
datetime first_seen_at "Default: UTC Now" "First seen during sync"
datetime processed_at "Nullable" "When email was processed"
datetime created_at "Default: UTC Now"
datetime updated_at "Default: UTC Now, On Update"
}
USER ||--o{ FOLDER : "has"
USER ||--o{ PROCESSED_EMAIL : "has"
FOLDER ||--o{ PROCESSED_EMAIL : "has"
Column Specifications
| Table | Column | Data Type | Constraints | Description |
|---|---|---|---|---|
| PROCESSED_EMAIL | id | Integer | Primary Key, Autoincrement | Unique identifier for each processed email record |
| PROCESSED_EMAIL | user_id | Integer | Foreign Key to User, Not Null | Reference to the user who owns this email |
| PROCESSED_EMAIL | folder_id | Integer | Foreign Key to Folder, Not Null | Reference to the folder this email belongs to |
| PROCESSED_EMAIL | email_uid | String(255) | Not Null | Unique ID of the email from IMAP server |
| PROCESSED_EMAIL | folder_name | String(255) | Not Null | Name of the IMAP folder (for redundancy) |
| PROCESSED_EMAIL | is_processed | Boolean | Default: False | Processing status (false=pending, true=processed) |
| PROCESSED_EMAIL | first_seen_at | DateTime | Default: datetime.utcnow | First time this email was detected during sync |
| PROCESSED_EMAIL | processed_at | DateTime | Nullable | When the email was marked as processed |
| PROCESSED_EMAIL | created_at | DateTime | Default: datetime.utcnow | Record creation timestamp |
| PROCESSED_EMAIL | updated_at | DateTime | Default: datetime.utcnow, On Update | Record update timestamp |
Relationships
- User to ProcessedEmail: One-to-many relationship - each user can have multiple processed email records
- Folder to ProcessedEmail: One-to-many relationship - each folder can have multiple processed email records
- Composite Key: The combination of (user_id, folder_name, email_uid) should be unique to prevent duplicate records
Database Indexes
- Primary key index on
id - Foreign key indexes on
user_idandfolder_id - Composite unique index on
(user_id, folder_name, email_uid) - Index on
folder_namefor faster folder-based queries - Index on
is_processedfor filtering pending emails - Index on
first_seen_atfor tracking recently added emails
Service Design
ProcessedEmailsService
A new service class will be responsible for managing processed email records:
class ProcessedEmailsService:
def __init__(self, user: User):
self.user = user
def get_pending_emails(self, folder_name: str) -> List[str]:
"""Get list of email UIDs that are pending processing in a folder."""
def mark_email_processed(self, folder_name: str, email_uid: str) -> bool:
"""Mark an email as processed."""
def mark_emails_processed(self, folder_name: str, email_uids: List[str]) -> int:
"""Mark multiple emails as processed in bulk."""
def sync_folder_emails(self, folder_name: str, email_uids: List[str]) -> int:
"""Sync email UIDs for a folder, adding new ones as pending."""
def get_pending_count(self, folder_name: str) -> int:
"""Get count of pending emails for a folder."""
def cleanup_old_records(self, folder_name: str, current_uids: List[str]) -> int:
"""Remove records for emails that no longer exist in the folder."""
IMAPService Integration
The existing IMAP service will be enhanced to use the ProcessedEmailsService:
class IMAPService:
def __init__(self, user: User):
self.user = user
self.config = user.imap_config or {}
self.connection = None
self.processed_emails_service = ProcessedEmailsService(user)
def get_folder_email_count(self, folder_name: str) -> int:
"""Get the count of emails in a specific folder, considering processed status."""
def get_pending_emails(self, folder_name: str) -> List[str]:
"""Get email UIDs that are pending processing."""
def sync_folders(self) -> Tuple[bool, str]:
"""Sync IMAP folders with local database, tracking email processing status."""
API Endpoints
New HTMX Endpoints for Processed Email Management
-
Get Pending Emails for a Folder
- Method: GET
- Path:
/api/folders/<folder_id>/pending-emails - Response: An Dialog List of email metadata for pending emails (subject, date, UID), a button to preview the email (fetch it from the imap server)
-
Mark Email as Processed
- Method: POST
- Path:
/api/folders/<folder_id>/emails/<email_uid>/process - Action: Mark a specific email as processed
- Response: Updated dialog body.
Workflow Integration
Email Processing Flow
sequenceDiagram
participant U as User
participant B as Browser
participant M as Main Blueprint
participant I as IMAP Service
participant P as ProcessedEmails Service
participant DB as Database
U->>B: Click "Sync Folders"
B->>M: POST /api/imap/sync
M->>I: Sync folders with processed email tracking
I->>I: Connect to IMAP server
I->>I: Get list of email UIDs for folder
I->>P: sync_folder_emails(folder_name, email_uids)
P->>DB: Create pending email records
P->>I: Return list of pending email UIDs
I->>M: Return sync results
M->>B: Update UI with pending counts
Email Processing Status Update
sequenceDiagram
participant U as User
participant B as Browser
participant M as Main Blueprint
participant P as ProcessedEmails Service
participant DB as Database
U->>B: Trigger email processing
B->>M: POST /api/folders/<folder_id>/process-emails
M->>P: mark_emails_processed(folder_name, email_uids)
P->>DB: Update email processing status
P->>M: Return success count
M->>B: Update UI with new counts
Migration Strategy
Phase 1: Data Model Implementation
- Create the
processed_emailstable with appropriate indexes - Implement the
ProcessedEmailsServiceclass - Add basic CRUD operations for email processing records
Phase 2: IMAP Service Integration
- Update
IMAPServiceto useProcessedEmailsService - Modify folder synchronization to track email UIDs
- Update email count methods to consider processing status
Phase 3: API and UI Integration
- Add API endpoints for processed email management
- Update UI to display accurate pending counts
- Add bulk processing capabilities
Phase 4: Optimization and Cleanup
- Implement batch processing for performance
- Add periodic cleanup of orphaned records
- Optimize database queries for large datasets
Security Considerations
- Access Control: Ensure users can only access their own email processing records
- Data Validation: Validate all email UIDs and folder names to prevent injection attacks
- Rate Limiting: Implement rate limiting for email processing endpoints to prevent abuse
- Data Privacy: Ensure no sensitive email content is stored in the database
Performance Considerations
- Database Indexing: Proper indexing on frequently queried fields
- Batch Operations: Use batch operations for processing multiple emails
- Memory Management: Process emails in batches to avoid memory issues with large mailboxes
- Caching: Consider caching frequently accessed email processing status
Future Enhancements
- Email Movement Tracking: Track when emails are moved between folders
- Processing History: Maintain a history of email processing actions
- Email Deduplication: Handle duplicate emails across folders
- Automated Cleanup: Periodic cleanup of old or orphaned processing records
- Analytics: Provide insights into email processing patterns and efficiency