261 lines
11 KiB
Markdown
261 lines
11 KiB
Markdown
# Processed Emails Feature Specification
|
|
|
|
## Overview
|
|
|
|
This document outlines the specification for implementing a feature to persistently track which emails have been processed by the Email Organizer system. The goal is to maintain a record of email processing status to avoid reprocessing the same emails during synchronization and provide accurate pending email counts.
|
|
|
|
## Requirements
|
|
|
|
### 1. Email Tracking Requirements
|
|
|
|
- **Unique Email Identification**: Track emails using a unique identifier (UID) provided by the IMAP server, along with the folder name and user ID
|
|
- **Processing Status**: Mark emails as either "pending" (unprocessed) or "processed"
|
|
- **Minimal Data Storage**: Store only essential information - email UID, folder, user, and processing status - not email content, subjects, or bodies
|
|
- **Persistence**: Maintain processing status across application restarts and synchronization cycles
|
|
- **Efficient Lookup**: Quickly determine which emails in a folder are pending processing
|
|
|
|
### 2. Synchronization Requirements
|
|
|
|
- **Initial Sync**: During first synchronization of a folder, all emails should be marked as "pending"
|
|
- **Incremental Sync**: On subsequent syncs, only emails that haven't been processed should be identified as pending
|
|
- **Status Update**: When an email is processed, update its status from "pending" to "processed"
|
|
- **Cleanup**: Remove records for emails that no longer exist on the IMAP server (optional for future enhancement)
|
|
|
|
### 3. Performance Requirements
|
|
|
|
- **Efficient Storage**: Use appropriate database indexing for fast lookups
|
|
- **Minimal Memory Usage**: Store only essential data to keep memory footprint low
|
|
- **Batch Processing**: Support batch operations for processing multiple emails efficiently
|
|
|
|
## Data Model Design
|
|
|
|
### ProcessedEmails Table
|
|
|
|
```mermaid
|
|
erDiagram
|
|
USER {
|
|
int id PK "Primary Key"
|
|
string email "Unique, Not Null"
|
|
string first_name "Not Null"
|
|
string last_name "Not Null"
|
|
string password_hash "Not Null"
|
|
json imap_config "JSON Configuration"
|
|
datetime created_at "Default: UTC Now"
|
|
datetime updated_at "Default: UTC Now, On Update"
|
|
}
|
|
|
|
FOLDER {
|
|
int id PK "Primary Key"
|
|
int user_id FK "Foreign Key to User"
|
|
string name "Not Null"
|
|
text rule_text "Natural Language Rule"
|
|
int priority "Processing Order"
|
|
boolean organize_enabled "Default: True"
|
|
int total_count "Default: 0"
|
|
int pending_count "Default: 0"
|
|
json recent_emails "JSON Array"
|
|
datetime created_at "Default: UTC Now"
|
|
datetime updated_at "Default: UTC Now, On Update"
|
|
}
|
|
|
|
PROCESSED_EMAIL {
|
|
int id PK "Primary Key"
|
|
int user_id FK "Foreign Key to User"
|
|
int folder_id FK "Foreign Key to Folder"
|
|
string email_uid "Not Null" "IMAP Email UID"
|
|
string folder_name "Not Null" "IMAP Folder Name"
|
|
boolean is_processed "Default: False" "Processing Status"
|
|
datetime first_seen_at "Default: UTC Now" "First seen during sync"
|
|
datetime processed_at "Nullable" "When email was processed"
|
|
datetime created_at "Default: UTC Now"
|
|
datetime updated_at "Default: UTC Now, On Update"
|
|
}
|
|
|
|
USER ||--o{ FOLDER : "has"
|
|
USER ||--o{ PROCESSED_EMAIL : "has"
|
|
FOLDER ||--o{ PROCESSED_EMAIL : "has"
|
|
```
|
|
|
|
### Column Specifications
|
|
|
|
| Table | Column | Data Type | Constraints | Description |
|
|
|-------|--------|-----------|--------------|-------------|
|
|
| PROCESSED_EMAIL | id | Integer | Primary Key, Autoincrement | Unique identifier for each processed email record |
|
|
| PROCESSED_EMAIL | user_id | Integer | Foreign Key to User, Not Null | Reference to the user who owns this email |
|
|
| PROCESSED_EMAIL | folder_id | Integer | Foreign Key to Folder, Not Null | Reference to the folder this email belongs to |
|
|
| PROCESSED_EMAIL | email_uid | String(255) | Not Null | Unique ID of the email from IMAP server |
|
|
| PROCESSED_EMAIL | folder_name | String(255) | Not Null | Name of the IMAP folder (for redundancy) |
|
|
| PROCESSED_EMAIL | is_processed | Boolean | Default: False | Processing status (false=pending, true=processed) |
|
|
| PROCESSED_EMAIL | first_seen_at | DateTime | Default: datetime.utcnow | First time this email was detected during sync |
|
|
| PROCESSED_EMAIL | processed_at | DateTime | Nullable | When the email was marked as processed |
|
|
| PROCESSED_EMAIL | created_at | DateTime | Default: datetime.utcnow | Record creation timestamp |
|
|
| PROCESSED_EMAIL | updated_at | DateTime | Default: datetime.utcnow, On Update | Record update timestamp |
|
|
|
|
### Relationships
|
|
|
|
- **User to ProcessedEmail**: One-to-many relationship - each user can have multiple processed email records
|
|
- **Folder to ProcessedEmail**: One-to-many relationship - each folder can have multiple processed email records
|
|
- **Composite Key**: The combination of (user_id, folder_name, email_uid) should be unique to prevent duplicate records
|
|
|
|
### Database Indexes
|
|
|
|
- Primary key index on `id`
|
|
- Foreign key indexes on `user_id` and `folder_id`
|
|
- Composite unique index on `(user_id, folder_name, email_uid)`
|
|
- Index on `folder_name` for faster folder-based queries
|
|
- Index on `is_processed` for filtering pending emails
|
|
- Index on `first_seen_at` for tracking recently added emails
|
|
|
|
## Service Design
|
|
|
|
### ProcessedEmailsService
|
|
|
|
A new service class will be responsible for managing processed email records:
|
|
|
|
```python
|
|
class ProcessedEmailsService:
|
|
def __init__(self, user: User):
|
|
self.user = user
|
|
|
|
def get_pending_emails(self, folder_name: str) -> List[str]:
|
|
"""Get list of email UIDs that are pending processing in a folder."""
|
|
|
|
def mark_email_processed(self, folder_name: str, email_uid: str) -> bool:
|
|
"""Mark an email as processed."""
|
|
|
|
def mark_emails_processed(self, folder_name: str, email_uids: List[str]) -> int:
|
|
"""Mark multiple emails as processed in bulk."""
|
|
|
|
def sync_folder_emails(self, folder_name: str, email_uids: List[str]) -> int:
|
|
"""Sync email UIDs for a folder, adding new ones as pending."""
|
|
|
|
def get_pending_count(self, folder_name: str) -> int:
|
|
"""Get count of pending emails for a folder."""
|
|
|
|
def cleanup_old_records(self, folder_name: str, current_uids: List[str]) -> int:
|
|
"""Remove records for emails that no longer exist in the folder."""
|
|
```
|
|
|
|
### IMAPService Integration
|
|
|
|
The existing IMAP service will be enhanced to use the ProcessedEmailsService:
|
|
|
|
```python
|
|
class IMAPService:
|
|
def __init__(self, user: User):
|
|
self.user = user
|
|
self.config = user.imap_config or {}
|
|
self.connection = None
|
|
self.processed_emails_service = ProcessedEmailsService(user)
|
|
|
|
def get_folder_email_count(self, folder_name: str) -> int:
|
|
"""Get the count of emails in a specific folder, considering processed status."""
|
|
|
|
def get_pending_emails(self, folder_name: str) -> List[str]:
|
|
"""Get email UIDs that are pending processing."""
|
|
|
|
def sync_folders(self) -> Tuple[bool, str]:
|
|
"""Sync IMAP folders with local database, tracking email processing status."""
|
|
```
|
|
|
|
## API Endpoints
|
|
|
|
### New HTMX Endpoints for Processed Email Management
|
|
|
|
1. **Get Pending Emails for a Folder**
|
|
- Method: GET
|
|
- Path: `/api/folders/<folder_id>/pending-emails`
|
|
- Response: An Dialog List of email metadata for pending emails (subject, date, UID), a button to preview the email (fetch it from the imap server)
|
|
|
|
2. **Mark Email as Processed**
|
|
- Method: POST
|
|
- Path: `/api/folders/<folder_id>/emails/<email_uid>/process`
|
|
- Action: Mark a specific email as processed
|
|
- Response: Updated dialog body.
|
|
|
|
## Workflow Integration
|
|
|
|
### Email Processing Flow
|
|
|
|
```mermaid
|
|
sequenceDiagram
|
|
participant U as User
|
|
participant B as Browser
|
|
participant M as Main Blueprint
|
|
participant I as IMAP Service
|
|
participant P as ProcessedEmails Service
|
|
participant DB as Database
|
|
|
|
U->>B: Click "Sync Folders"
|
|
B->>M: POST /api/imap/sync
|
|
M->>I: Sync folders with processed email tracking
|
|
I->>I: Connect to IMAP server
|
|
I->>I: Get list of email UIDs for folder
|
|
I->>P: sync_folder_emails(folder_name, email_uids)
|
|
P->>DB: Create pending email records
|
|
P->>I: Return list of pending email UIDs
|
|
I->>M: Return sync results
|
|
M->>B: Update UI with pending counts
|
|
```
|
|
|
|
### Email Processing Status Update
|
|
|
|
```mermaid
|
|
sequenceDiagram
|
|
participant U as User
|
|
participant B as Browser
|
|
participant M as Main Blueprint
|
|
participant P as ProcessedEmails Service
|
|
participant DB as Database
|
|
|
|
U->>B: Trigger email processing
|
|
B->>M: POST /api/folders/<folder_id>/process-emails
|
|
M->>P: mark_emails_processed(folder_name, email_uids)
|
|
P->>DB: Update email processing status
|
|
P->>M: Return success count
|
|
M->>B: Update UI with new counts
|
|
```
|
|
|
|
## Migration Strategy
|
|
|
|
### Phase 1: Data Model Implementation
|
|
1. Create the `processed_emails` table with appropriate indexes
|
|
2. Implement the `ProcessedEmailsService` class
|
|
3. Add basic CRUD operations for email processing records
|
|
|
|
### Phase 2: IMAP Service Integration
|
|
1. Update `IMAPService` to use `ProcessedEmailsService`
|
|
2. Modify folder synchronization to track email UIDs
|
|
3. Update email count methods to consider processing status
|
|
|
|
### Phase 3: API and UI Integration
|
|
1. Add API endpoints for processed email management
|
|
2. Update UI to display accurate pending counts
|
|
3. Add bulk processing capabilities
|
|
|
|
### Phase 4: Optimization and Cleanup
|
|
1. Implement batch processing for performance
|
|
2. Add periodic cleanup of orphaned records
|
|
3. Optimize database queries for large datasets
|
|
|
|
## Security Considerations
|
|
|
|
1. **Access Control**: Ensure users can only access their own email processing records
|
|
2. **Data Validation**: Validate all email UIDs and folder names to prevent injection attacks
|
|
3. **Rate Limiting**: Implement rate limiting for email processing endpoints to prevent abuse
|
|
4. **Data Privacy**: Ensure no sensitive email content is stored in the database
|
|
|
|
## Performance Considerations
|
|
|
|
1. **Database Indexing**: Proper indexing on frequently queried fields
|
|
2. **Batch Operations**: Use batch operations for processing multiple emails
|
|
3. **Memory Management**: Process emails in batches to avoid memory issues with large mailboxes
|
|
4. **Caching**: Consider caching frequently accessed email processing status
|
|
|
|
## Future Enhancements
|
|
|
|
1. **Email Movement Tracking**: Track when emails are moved between folders
|
|
2. **Processing History**: Maintain a history of email processing actions
|
|
3. **Email Deduplication**: Handle duplicate emails across folders
|
|
4. **Automated Cleanup**: Periodic cleanup of old or orphaned processing records
|
|
5. **Analytics**: Provide insights into email processing patterns and efficiency |