Files
email-organizer/docs/design/data-model.md
2025-08-06 21:30:33 -07:00

10 KiB

Email Organizer Data Model

Overview

This document describes the data model for the Email Organizer application, including entities, attributes, relationships, and constraints. The system uses PostgreSQL with SQLAlchemy ORM for data persistence.

Entity Relationship Diagram

erDiagram
    USER {
        int id PK "Primary Key"
        string first_name "Not Null"
        string last_name "Not Null"
        string email "Unique, Not Null"
        string password_hash "Not Null"
        json imap_config "JSON Configuration"
        datetime created_at "Default: UTC Now"
        datetime updated_at "Default: UTC Now, On Update"
    }
    
    FOLDER {
        int id PK "Primary Key"
        int user_id FK "Foreign Key to User"
        string name "Not Null"
        text rule_text "Natural Language Rule"
        int priority "Processing Order"
        boolean organize_enabled "Default: True"
        string folder_type "Default: 'destination'"
        int total_count "Default: 0"
        int pending_count "Default: 0"
        int emails_count "Default: 0"
        json recent_emails "JSON Array"
        datetime created_at "Default: UTC Now"
        datetime updated_at "Default: UTC Now, On Update"
    }
    
    USER ||--o{ FOLDER : "has"
    
    note "User-Folder Relationship"
    note "One-to-Many: Each user can have multiple folders"

Entities

User Entity

The User entity stores account information and authentication data for each user.

Attributes

Column Name Data Type Constraints Description
id Integer Primary Key, Autoincrement Unique identifier for each user
first_name String(255) Not Null User's first name
last_name String(255) Not Null User's last name
email String(255) Unique, Not Null User's email address (login identifier)
password_hash String(2048) Not Null Hashed password for authentication
imap_config JSON Nullable IMAP server configuration settings
created_at DateTime Default: datetime.utcnow Timestamp of account creation
updated_at DateTime Default: datetime.utcnow, On Update Timestamp of last update

Relationships

  • One-to-Many: Each User can have multiple Folder instances
  • Self-referencing: No direct relationships to other User instances

Business Rules

  • Email must be unique across all users
  • Password is stored as a hash, never in plain text
  • IMAP configuration is stored as JSON for flexibility

Folder Entity

The Folder entity stores email organization rules and metadata for each user's email folders.

Attributes

Column Name Data Type Constraints Description
id Integer Primary Key, Autoincrement Unique identifier for each folder
user_id Integer Foreign Key to User, Not Null Reference to the owning user
name String(255) Not Null Display name of the folder
rule_text Text Nullable Natural language description of the folder rule
priority Integer Nullable Processing order (0=normal, 1=high)
organize_enabled Boolean Default: True Whether the organization rule is active
folder_type String(20) Default: 'destination' Folder type: 'tidy' or 'destination'
total_count Integer Default: 0 Total number of emails in the folder
pending_count Integer Default: 0 Number of emails waiting to be processed
emails_count Integer Default: 0 Number of emails moved to this destination folder
recent_emails JSON Default: [] Array of recent email metadata
created_at DateTime Default: datetime.utcnow Timestamp of folder creation
updated_at DateTime Default: datetime.utcnow, On Update Timestamp of last update

Relationships

  • Many-to-One: Each Folder belongs to one User
  • Self-referencing: No direct relationships to other Folder instances

Business Rules

  • Each folder must belong to a user
  • Folder name must be unique per user
  • Rule text can be null (for manually created folders)
  • Priority values: 0 (normal), 1 (high priority)
  • Folder types:
    • 'tidy': Folders containing emails to be processed (e.g., Inbox)
    • 'destination': Folders that are targets for email organization (default)
  • Recent emails array stores JSON objects with subject and date information

Data Constraints

Primary Keys

  • User.id: Integer, auto-incrementing
  • Folder.id: Integer, auto-incrementing

Foreign Keys

  • Folder.user_id: References User.id with ON DELETE CASCADE

Unique Constraints

  • User.email: Ensures no duplicate email addresses
  • Composite unique constraint on (User.id, Folder.name) to prevent duplicate folder names per user

Not Null Constraints

  • User.first_name, User.last_name, User.email, User.password_hash
  • Folder.user_id, Folder.name

Default Values

  • User.created_at, User.updated_at: Current UTC timestamp
  • Folder.created_at, Folder.updated_at: Current UTC timestamp
  • Folder.organize_enabled: True
  • Folder.folder_type: 'destination'
  • Folder.total_count, Folder.pending_count, Folder.emails_count: 0
  • Folder.recent_emails: Empty array

JSON Data Structures

IMAP Configuration

The imap_config field stores JSON with the following structure:

{
  "server": "imap.gmail.com",
  "port": 993,
  "username": "user@example.com",
  "password": "app-specific-password",
  "use_ssl": true,
  "use_tls": false,
  "connection_timeout": 30
}

Recent Emails

The recent_emails field stores an array of JSON objects:

[
  {
    "subject": "Order Confirmation",
    "date": "2023-11-15T10:30:00Z"
  },
  {
    "subject": "Meeting Reminder",
    "date": "2023-11-14T14:45:00Z"
  }
]

Database Indexes

Current Indexes

  • Primary key indexes on User.id and Folder.id
  • Foreign key index on Folder.user_id
  • Index on User.email for faster login lookups
  • Composite index on (user_id, name) for folder uniqueness checks
  • Index on Folder.priority for filtering by priority
  • Index on Folder.organize_enabled for active/inactive filtering

Data Migration History

Migration Files

  1. Initial Migration (migrations/versions/02a7c13515a4_initial.py)

    • Created basic User and Folder tables
    • Established primary keys and foreign keys
  2. Add Name Fields (migrations/versions/28e8e0be0355_add_first_name_last_name_and_timestamp_.py)

    • Added first_name and last_name columns to User table
    • Added created_at and updated_at timestamps
  3. Add Email Count Fields (migrations/versions/a3ad1b9a0e5f_add_email_count_fields_to_folders.py)

    • Added total_count and pending_count columns to Folder table
    • Added organize_enabled boolean flag
  4. Add Recent Emails Field (migrations/versions/9a88c7e94083_add_recent_emails_field_to_folders_table.py)

    • Added recent_emails JSON column to Folder table
    • Default value set to empty array
  5. Add Toggle Feature (migrations/versions/f8ba65458ba2_adding_toggle.py)

    • Added organize_enabled toggle functionality
    • Enhanced folder management features

Performance Considerations

  1. User Authentication

    • Index on email column for fast login lookups
    • Password hash comparison is done in application code
  2. Folder Operations

    • Foreign key index on user_id for efficient filtering
    • Consider pagination for users with many folders
  3. IMAP Sync Operations

    • Batch updates for email counts
    • JSON operations for recent emails metadata

Folder Types

The system supports three distinct types of folders, each with different purposes and behaviors:

Tidy Folders

Folders with folder_type = 'tidy' are source folders that contain emails waiting to be processed and organized.

Characteristics:

  • Display pending and processed email counts
  • Can have organization rules enabled/disabled
  • Support viewing pending emails
  • Example: Inbox folder

UI Representation:

  • Shows "pending count" and "processed count" badges
  • Includes "View Pending" button if there are pending emails
  • May include priority indicators

Destination Folders

Folders with folder_type = 'destination' are target folders where emails are moved from other folders during organization.

Characteristics:

  • Display count of emails moved to this folder
  • Typically don't have organization rules (or they're ignored)
  • Focus on showing how many emails have been organized into them
  • Example: "Projects", "Finance", "Personal" folders

UI Representation:

  • Shows "emails count" badge
  • Simpler interface without pending/processed indicators
  • Focus on folder management and viewing contents

Ignore Folders

Folders with folder_type = 'ignore' are folders that are stored in the database but are neither scanned to be tidied nor used as destination folders.

Characteristics:

  • Hidden by default in the user interface
  • Not processed by AI for organization
  • No organization rules specified
  • Known emails count is reset to 0 when changed to this type
  • Example: Archive, Spam, Drafts folders

UI Representation:

  • Hidden by default unless "Show Hidden" checkbox is checked
  • When visible, shows minimal information
  • No action buttons for organization or processing

Folder Type Determination

Folder types are determined as follows:

  • During IMAP synchronization:
    • First step: Connection testing
    • Second step: Folder type selection modal with table
    • Default folder types:
      • Inbox: Tidy
      • Archive/Spam/Drafts: Ignore
      • All others: Destination
  • Manually created folders default to 'destination'
  • Folder type can be changed through the user interface
  • When changing to 'ignore', emails_count is reset to 0

Future Data Model Considerations

Potential Enhancements

  1. Email Entity

    • Store email metadata for better analytics
    • Track email movement between folders
  2. Rule Engine

    • Store parsed rule structures for better processing
    • Version control for rule changes
  3. User Preferences

    • Additional customization options
    • UI preference storage
  4. Audit Log

    • Track changes to user data
    • Monitor folder operations