Compare commits
4 Commits
sales-clea
...
98a3e0dda6
| Author | SHA1 | Date | |
|---|---|---|---|
| 98a3e0dda6 | |||
| f4366fe98e | |||
| d95e24a1d7 | |||
| 37351e5f92 |
BIN
dev-resources/INVOICE - 03881260.pdf
Executable file
BIN
dev-resources/INVOICE - 03881260.pdf
Executable file
Binary file not shown.
229
docs/plans/2026-02-07-feat-add-invoice-template-03881260-plan.md
Normal file
229
docs/plans/2026-02-07-feat-add-invoice-template-03881260-plan.md
Normal file
@@ -0,0 +1,229 @@
|
||||
---
|
||||
title: Add New Invoice Template for Produce Distributor (Invoice 03881260)
|
||||
type: feat
|
||||
date: 2026-02-07
|
||||
status: completed
|
||||
---
|
||||
|
||||
# Add New Invoice Template for Produce Distributor (Invoice 03881260)
|
||||
|
||||
**Status:** ✅ Completed
|
||||
|
||||
**Summary:** Successfully implemented a new PDF parsing template for Bonanza Produce invoices. All tests pass.
|
||||
|
||||
## Overview
|
||||
|
||||
Implement a new PDF parsing template for a produce/food distributor invoice type. The invoice originates from a distributor with multiple locations (South Lake Tahoe, Sparks NV, Elko NV) and serves customers like "NICK THE GREEK".
|
||||
|
||||
## Problem Statement / Motivation
|
||||
|
||||
Currently, invoices from this produce distributor cannot be automatically parsed, requiring manual data entry. The invoice has a unique layout with multiple warehouse locations and specific formatting that doesn't match existing templates.
|
||||
|
||||
## Proposed Solution
|
||||
|
||||
Add a new template entry to `src/clj/auto_ap/parse/templates.clj` for **Bonanza Produce** with regex patterns to extract:
|
||||
- Invoice number
|
||||
- Date (MM/dd/yy format)
|
||||
- Customer identifier (including address for disambiguation)
|
||||
- Total amount
|
||||
|
||||
## Technical Considerations
|
||||
|
||||
### Vendor Identification Strategy
|
||||
|
||||
**Vendor Name:** Bonanza Produce
|
||||
|
||||
Based on the PDF analysis, use these unique identifiers as keywords:
|
||||
- `"3717 OSGOOD AVE"` - Unique South Lake Tahoe address
|
||||
- `"SPARKS, NEVADA"` - Primary warehouse location
|
||||
- `"1925 FREEPORT BLVD"` - Sparks warehouse address
|
||||
|
||||
**Recommended keyword combination:** `[#"3717 OSGOOD AVE" #"SPARKS, NEVADA"]` - These two together uniquely identify this vendor.
|
||||
|
||||
### Extract Patterns Required
|
||||
|
||||
From the PDF text analysis:
|
||||
|
||||
| Field | Value in PDF | Proposed Regex |
|
||||
|-------|--------------|----------------|
|
||||
| `:invoice-number` | `03881260` | `#"INVOICE\s+(\d+)"` |
|
||||
| `:date` | `01/20/26` | `#"(\d{2}/\d{2}/\d{2})"` (after invoice #) |
|
||||
| `:customer-identifier` | `NICK THE GREEK` | `#"BILL TO.*\n\s+([A-Z][A-Z\s]+)"` |
|
||||
| `:total` | `23.22` | `#"TOTAL\s+([\d\.]+)"` or `#"TOTAL\s+([\d\.]+)\s*$"` (end of line) |
|
||||
|
||||
### Parser Configuration
|
||||
|
||||
```clojure
|
||||
:parser {:date [:clj-time "MM/dd/yy"]
|
||||
:total [:trim-commas nil]}
|
||||
```
|
||||
|
||||
**Date format note:** The invoice uses 2-digit year format (`01/20/26`), so use `"MM/dd/yy"` format string.
|
||||
|
||||
### Template Structure
|
||||
|
||||
```clojure
|
||||
{:vendor "Bonanza Produce"
|
||||
:keywords [#"3717 OSGOOD AVE" #"SPARKS, NEVADA"]
|
||||
:extract {:invoice-number #"INVOICE\s+(\d+)"
|
||||
:date #"INVOICE\s+\d+\s+(\d{2}/\d{2}/\d{2})"
|
||||
:customer-identifier #"BILL TO.*?\n\s+([A-Z][A-Z\s]+)(?:\s{2,}|\n)"
|
||||
:total #"TOTAL\s+([\d\.]+)(?:\s*$|\s+TOTAL)"}
|
||||
:parser {:date [:clj-time "MM/dd/yy"]
|
||||
:total [:trim-commas nil]}}
|
||||
```
|
||||
|
||||
## Open Questions
|
||||
|
||||
1. **Is this a single invoice or multi-invoice document?**
|
||||
- Current PDF shows single invoice
|
||||
- Check if statements from this vendor contain multiple invoices
|
||||
- If multi-invoice, need `:multi` and `:multi-match?` keys
|
||||
|
||||
2. **Are credit memos formatted differently?**
|
||||
- Current example shows standard invoice
|
||||
- Need to verify if credits have different layout
|
||||
- May need separate template for credit memos
|
||||
|
||||
3. **How to capture the full customer address in the regex?**
|
||||
- The customer name is on one line: "NICK THE GREEK"
|
||||
- The street address is on the next line: "600 VISTA WAY"
|
||||
- The city/state/zip is on the third line: "MILPITAS, CA 95035"
|
||||
- The regex needs to span multiple lines to capture all three components
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
- [ ] Template successfully matches invoices from this vendor
|
||||
- [ ] Correctly extracts invoice number (e.g., `03881260`)
|
||||
- [ ] Correctly extracts date and parses to proper format
|
||||
- [ ] Correctly extracts customer identifier (e.g., `NICK THE GREEK`)
|
||||
- [ ] Correctly extracts total amount (e.g., `23.22`)
|
||||
- [ ] Parser handles edge cases (commas in amounts, different date formats)
|
||||
- [ ] Tested with at least 3 different invoices from this vendor
|
||||
|
||||
## Implementation Steps
|
||||
|
||||
### Phase 1: Extract PDF Text
|
||||
|
||||
```bash
|
||||
# Convert PDF to text for analysis
|
||||
pdftotext -layout "dev-resources/INVOICE - 03881260.pdf" -
|
||||
```
|
||||
|
||||
### Phase 2: Determine Vendor Name
|
||||
|
||||
1. Examine the PDF header for company name/logo
|
||||
2. Search for known identifiers (phone numbers, addresses)
|
||||
3. Identify the vendor code for `:vendor` field
|
||||
|
||||
### Phase 3: Develop Regex Patterns
|
||||
|
||||
Test patterns in REPL:
|
||||
|
||||
```clojure
|
||||
(require '[clojure.string :as str])
|
||||
|
||||
(def text "...") ; paste PDF text here
|
||||
|
||||
;; Test invoice number pattern
|
||||
(re-find #"INVOICE\s+(\d+)" text)
|
||||
|
||||
;; Test date pattern
|
||||
(re-find #"INVOICE\s+\d+\s+(\d{2}/\d{2}/\d{2})" text)
|
||||
|
||||
;; Test customer pattern
|
||||
(re-find #"BILL TO.*?\n\s+([A-Z][A-Z\s]+)" text)
|
||||
|
||||
;; Test total pattern
|
||||
(re-find #"TOTAL\s+([\d\.]+)" text)
|
||||
```
|
||||
|
||||
### Phase 4: Add Template
|
||||
|
||||
Add to `src/clj/auto_ap/parse/templates.clj` in the `pdf-templates` vector:
|
||||
|
||||
```clojure
|
||||
;; Bonanza Produce
|
||||
{:vendor "Bonanza Produce"
|
||||
:keywords [#"3717 OSGOOD AVE" #"SPARKS, NEVADA"]
|
||||
:extract {:invoice-number #"INVOICE\s+(\d+)"
|
||||
:date #"INVOICE\s+\d+\s+(\d{2}/\d{2}/\d{2})"
|
||||
:customer-identifier #"BILL TO.*?\n\s+([A-Z][A-Z\s]+)(?:\s{2,}|\n)"
|
||||
:total #"TOTAL\s+([\d\.]+)(?:\s*$|\s+TOTAL)"}
|
||||
:parser {:date [:clj-time "MM/dd/yy"]
|
||||
:total [:trim-commas nil]}}
|
||||
```
|
||||
|
||||
### Phase 5: Test Implementation
|
||||
|
||||
```clojure
|
||||
;; Load the namespace
|
||||
(require '[auto-ap.parse :as p])
|
||||
(require '[auto-ap.parse.templates :as t])
|
||||
|
||||
;; Test parsing
|
||||
(p/parse "...pdf text here...")
|
||||
|
||||
;; Or test full file
|
||||
(p/parse-file "dev-resources/INVOICE - 03881260.pdf" "INVOICE - 03881260.pdf")
|
||||
```
|
||||
|
||||
## Testing Considerations
|
||||
|
||||
1. **Date edge cases:** Ensure 2-digit year parsing works correctly (26 → 2026)
|
||||
2. **Amount edge cases:** Test with larger amounts that may include commas
|
||||
3. **Customer name variations:** Test with different customer names/lengths
|
||||
4. **Multi-page invoices:** Verify template handles page breaks if applicable
|
||||
|
||||
## Known PDF Structure
|
||||
|
||||
```
|
||||
SOUTH LAKE TAHOE, CA
|
||||
3717 OSGOOD AVE.
|
||||
...
|
||||
SPARKS, NEVADA ELKO, NEVADA
|
||||
1925 FREEPORT BLVD... 428 RIVER ST...
|
||||
|
||||
CUST. PHONE 775-622-0159 ... INVOICE DATE
|
||||
... 03881260 01/20/26
|
||||
B NICKGR
|
||||
I NICK THE GREEK S NICK THE GREEK
|
||||
L NICK THE GREEK H NICK THE GREEK
|
||||
L 600 VISTA WAY I VIA MICHELE
|
||||
...
|
||||
TOTAL
|
||||
TOTAL 23.22
|
||||
```
|
||||
|
||||
## References & Research
|
||||
|
||||
### Similar Templates for Reference
|
||||
|
||||
Based on `src/clj/auto_ap/parse/templates.clj`, these templates have similar patterns:
|
||||
|
||||
1. **Gstar Seafood** (lines 19-26) - Simple single invoice, uses `:trim-commas`
|
||||
2. **Don Vito Ozuna Food Corp** (lines 121-127) - Uses customer-identifier with multiline pattern
|
||||
3. **C&L Produce** (lines 260-267) - Similar "Bill To" pattern for customer extraction
|
||||
|
||||
### File Locations
|
||||
|
||||
- Templates: `src/clj/auto_ap/parse/templates.clj`
|
||||
- Parser logic: `src/clj/auto_ap/parse.clj`
|
||||
- Utility functions: `src/clj/auto_ap/parse/util.clj`
|
||||
- Test PDF: `dev-resources/INVOICE - 03881260.pdf`
|
||||
|
||||
### Parser Utilities Available
|
||||
|
||||
From `src/clj/auto_ap/parse/util.clj`:
|
||||
- `:clj-time` - Date parsing with format strings
|
||||
- `:trim-commas` - Remove commas from numbers
|
||||
- `:trim-commas-and-negate` - Handle credit/negative amounts
|
||||
- `:month-day-year` - Special format for space-separated dates
|
||||
|
||||
## Next Steps
|
||||
|
||||
1. **Identify the vendor name** by examining the PDF more closely or asking the user
|
||||
2. **Test regex patterns** in the REPL with the actual PDF text
|
||||
3. **Refine patterns** based on edge cases discovered during testing
|
||||
4. **Add template** to templates.clj
|
||||
5. **Test with multiple invoices** from this vendor to ensure robustness
|
||||
@@ -0,0 +1,376 @@
|
||||
---
|
||||
title: Rebase Invoice Templates, Merge to Master, and Integrate Branches
|
||||
type: refactor
|
||||
date: 2026-02-08
|
||||
---
|
||||
|
||||
# Rebase Invoice Templates, Merge to Master, and Integrate Branches
|
||||
|
||||
## Overview
|
||||
|
||||
This plan outlines a series of git operations to reorganize the branch structure by:
|
||||
1. Creating a rebase commit with all invoice template changes
|
||||
2. Applying those changes onto `master`
|
||||
3. Removing them from the current `clauding` branch
|
||||
4. Merging `master` back into `clauding`
|
||||
5. Finally merging `clauding` into `get-transactions2-page-working`
|
||||
|
||||
## Current State
|
||||
|
||||
### Branch Structure (as of Feb 8, 2026)
|
||||
|
||||
```
|
||||
master (dc021b8c)
|
||||
├─ deploy/master (dc021b8c)
|
||||
└─ (other branches)
|
||||
└─ clauding (0155d91e) - HEAD
|
||||
├─ 16 commits ahead of master
|
||||
└─ Contains invoice template work for Bonanza Produce
|
||||
├─ db1cb194 Add Bonanza Produce invoice template
|
||||
├─ ec754233 Improve Bonanza Produce customer identifier extraction
|
||||
├─ af7bc324 Add location extraction for Bonanza Produce invoices
|
||||
├─ 62107c99 Extract customer name and address for Bonanza Produce
|
||||
├─ 7ecd569e Add invoice-template-creator skill for automated template generation
|
||||
└─ 0155d91e Add Bonanza Produce multi-invoice statement template
|
||||
```
|
||||
|
||||
### Merge Base
|
||||
|
||||
- **Merge base between `clauding` and `master`**: `dc021b8c`
|
||||
- **Commits on `clauding` since merge base**: 16 commits
|
||||
- **Invoice template commits**: 6 commits (db1cb194 through 0155d91e)
|
||||
|
||||
## Problem Statement
|
||||
|
||||
The current branch structure has:
|
||||
1. Invoice template work mixed with other feature development in `clauding`
|
||||
2. No clear separation between invoice template changes and transaction page work
|
||||
3. A desire to get invoice template changes merged to `master` independently
|
||||
4. A need to reorganize branches to prepare for merging `get-transactions2-page-working`
|
||||
|
||||
## Proposed Solution
|
||||
|
||||
Use git rebase and merge operations to create a cleaner branch hierarchy:
|
||||
|
||||
1. **Create a new branch** (`invoice-templates-rebased`) with only invoice template commits
|
||||
2. **Rebase those commits** onto current `master`
|
||||
3. **Merge** this clean branch to `master`
|
||||
4. **Remove invoice template commits** from `clauding` branch
|
||||
5. **Merge `master` into `clauding`** to sync
|
||||
6. **Merge `clauding` into `get-transactions2-page-working`**
|
||||
|
||||
## Implementation Steps
|
||||
|
||||
### Phase 1: Extract and Rebase Invoice Templates
|
||||
|
||||
#### Step 1.1: Identify Invoice Template Commits
|
||||
|
||||
```bash
|
||||
# From clauding branch, find the range of invoice template commits
|
||||
git log --oneline --reverse dc021b8c..clauding
|
||||
```
|
||||
|
||||
**Invoice template commits to extract** (6 commits in order):
|
||||
1. `db1cb194` - Add Bonanza Produce invoice template
|
||||
2. `ec754233` - Improve Bonanza Produce customer identifier extraction
|
||||
3. `af7bc324` - Add location extraction for Bonanza Produce invoices
|
||||
4. `62107c99` - Extract customer name and address for Bonanza Produce
|
||||
5. `7ecd569e` - Add invoice-template-creator skill for automated template generation
|
||||
6. `0155d91e` - Add Bonanza Produce multi-invoice statement template
|
||||
|
||||
#### Step 1.2: Create Rebased Branch
|
||||
|
||||
```bash
|
||||
# Create a new branch from master with only invoice template commits
|
||||
git checkout master
|
||||
git pull origin master # Ensure master is up to date
|
||||
git checkout -b invoice-templates-rebased
|
||||
|
||||
# Cherry-pick the invoice template commits in order
|
||||
git cherry-pick db1cb194
|
||||
git cherry-pick ec754233
|
||||
git cherry-pick af7bc324
|
||||
git cherry-pick 62107c99
|
||||
git cherry-pick 7ecd569e
|
||||
git cherry-pick 0155d91e
|
||||
|
||||
# Resolve any conflicts that arise during cherry-pick
|
||||
# Run tests after each cherry-pick if conflicts occur
|
||||
```
|
||||
|
||||
#### Step 1.3: Verify Rebased Branch
|
||||
|
||||
```bash
|
||||
# Verify the commits are correctly applied
|
||||
git log --oneline master..invoice-templates-rebased
|
||||
|
||||
# Run tests to ensure invoice templates still work
|
||||
lein test auto-ap.parse.templates-test
|
||||
```
|
||||
|
||||
#### Step 1.4: Merge to Master
|
||||
|
||||
```bash
|
||||
# Merge the clean invoice templates to master
|
||||
git checkout master
|
||||
git merge invoice-templates-rebased --no-edit
|
||||
|
||||
# Push to remote
|
||||
git push origin master
|
||||
```
|
||||
|
||||
### Phase 2: Clean Up Clauding Branch
|
||||
|
||||
#### Step 2.1: Remove Invoice Template Commits from Clauding
|
||||
|
||||
```bash
|
||||
# From clauding branch, find the commit before the first invoice template
|
||||
git log --oneline clauding | grep -B1 "db1cb194"
|
||||
|
||||
# Suppose that's commit X, rebase clauding to remove invoice templates
|
||||
git checkout clauding
|
||||
|
||||
# Option A: Interactive rebase (recommended for cleanup)
|
||||
git rebase -i <commit-before-invoice-templates>
|
||||
|
||||
# In the editor, delete lines corresponding to invoice template commits:
|
||||
# db1cb194
|
||||
# ec754233
|
||||
# af7bc324
|
||||
# 62107c99
|
||||
# 7ecd569e
|
||||
# 0155d91e
|
||||
|
||||
# Save and exit to rebase
|
||||
|
||||
# Resolve any conflicts that occur
|
||||
# Run tests after rebase
|
||||
```
|
||||
|
||||
**OR**
|
||||
|
||||
```bash
|
||||
# Option B: Hard reset to commit before invoice templates
|
||||
# Identify the commit hash before db1cb194 (let's call it COMMIT_X)
|
||||
git reset --hard COMMIT_X
|
||||
|
||||
# Then add back any non-invoice template commits from clauding
|
||||
# (commits after the invoice templates that should remain)
|
||||
git cherry-pick <non-invoice-commits-if-any>
|
||||
```
|
||||
|
||||
#### Step 2.2: Verify Clauding Branch Cleanup
|
||||
|
||||
```bash
|
||||
# Verify invoice template commits are removed
|
||||
git log --oneline | grep -i "bonanza" # Should be empty
|
||||
|
||||
# Verify other commits remain
|
||||
git log --oneline -20
|
||||
|
||||
# Run tests to ensure nothing broke
|
||||
lein test
|
||||
```
|
||||
|
||||
#### Step 2.3: Force Push Updated Clauding
|
||||
|
||||
```bash
|
||||
# Force push the cleaned branch (use --force-with-lease for safety)
|
||||
git push --force-with-lease origin clauding
|
||||
```
|
||||
|
||||
### Phase 3: Sync Clauding with Master
|
||||
|
||||
#### Step 3.1: Merge Master into Clauding
|
||||
|
||||
```bash
|
||||
git checkout clauding
|
||||
git merge master --no-edit
|
||||
|
||||
# Resolve any conflicts
|
||||
# Run tests
|
||||
```
|
||||
|
||||
#### Step 3.2: Push Synced Clauding
|
||||
|
||||
```bash
|
||||
git push origin clauding
|
||||
```
|
||||
|
||||
### Phase 4: Final Merge to get-transactions2-page-working
|
||||
|
||||
#### Step 4.1: Merge Clauding to get-transactions2-page-working
|
||||
|
||||
```bash
|
||||
git checkout get-transactions2-page-working
|
||||
git merge clauding --no-edit
|
||||
|
||||
# Resolve any conflicts
|
||||
# Run tests
|
||||
```
|
||||
|
||||
#### Step 4.2: Push Final Branch
|
||||
|
||||
```bash
|
||||
git push origin get-transactions2-page-working
|
||||
```
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
### Pre-operations Validation
|
||||
- [ ] All invoice template commits identified correctly (6 commits)
|
||||
- [ ] Merge base commit (`dc021b8c`) confirmed
|
||||
- [ ] Current branch state documented
|
||||
- [ ] Team notified of branch manipulation
|
||||
|
||||
### Post-Rebase Validation
|
||||
- [ ] `invoice-templates-rebased` branch created from `master`
|
||||
- [ ] All 6 invoice template commits applied correctly
|
||||
- [ ] All invoice template tests pass
|
||||
- [ ] No conflicts or unexpected changes during cherry-pick
|
||||
|
||||
### Post-Master Validation
|
||||
- [ ] Invoice templates merged to `master`
|
||||
- [ ] Changes pushed to remote `master`
|
||||
- [ ] CI/CD passes on `master`
|
||||
|
||||
### Post-Cleanup Validation
|
||||
- [ ] `clauding` branch has only non-invoice template commits
|
||||
- [ ] No Bonanza Produce commits remain in `clauding` history
|
||||
- [ ] All `clauding` tests pass
|
||||
- [ ] Force push successful
|
||||
|
||||
### Post-Sync Validation
|
||||
- [ ] `clauding` merged with `master`
|
||||
- [ ] All conflicts resolved
|
||||
- [ ] Changes pushed to remote
|
||||
|
||||
### Final Merge Validation
|
||||
- [ ] `get-transactions2-page-working` merged with `clauding`
|
||||
- [ ] All conflicts resolved
|
||||
- [ ] Final tests pass
|
||||
- [ ] Changes pushed to remote
|
||||
|
||||
## Success Metrics
|
||||
|
||||
- **Branch structure**: Invoice templates cleanly separated on `master`
|
||||
- **Commit history**: Linear, no duplicate invoice template commits
|
||||
- **Tests passing**: 100% of existing tests pass after each step
|
||||
- **No data loss**: All work preserved in appropriate branches
|
||||
- **Branch clarity**: Each branch has a clear, focused purpose
|
||||
|
||||
## Dependencies & Risks
|
||||
|
||||
### Dependencies
|
||||
- [ ] All current work on `clauding` should be backed up or committed
|
||||
- [ ] Team should be aware of branch manipulation to avoid force pushing
|
||||
- [ ] CI/CD should be monitored during operations
|
||||
|
||||
### Risks
|
||||
1. **Force push risk**: Force pushing `clauding` will rewrite history
|
||||
- **Mitigation**: Use `--force-with-lease`, notify team beforehand
|
||||
|
||||
2. **Conflict resolution**: Multiple merge/conflict resolution points
|
||||
- **Mitigation**: Test after each step, resolve conflicts carefully
|
||||
|
||||
3. **Work loss**: Potential to lose commits if operations go wrong
|
||||
- **Mitigation**: Create backups, verify each step before proceeding
|
||||
|
||||
4. **CI/CD disruption**: Force pushes may affect CI/CD pipelines
|
||||
- **Mitigation**: Coordinate with team, avoid during active deployments
|
||||
|
||||
### Contingency Plan
|
||||
|
||||
If something goes wrong:
|
||||
1. **Recover `clauding` branch**:
|
||||
```bash
|
||||
git checkout clauding
|
||||
git reset --hard origin/clauding # Restore from remote backup
|
||||
```
|
||||
|
||||
2. **Recover master**:
|
||||
```bash
|
||||
git checkout master
|
||||
git reset --hard origin/master # Restore from deploy/master
|
||||
```
|
||||
|
||||
3. **Manual cherry-pick recovery**: If rebasing failed, manually cherry-pick remaining commits
|
||||
|
||||
## Alternative Approaches Considered
|
||||
|
||||
### Approach 1: Squash and Merge
|
||||
**Pros**: Single clean commit, simple history
|
||||
**Cons**: Loses individual commit history and context
|
||||
|
||||
**Rejected because**: Team uses merge commits (not squash), and individual commit history is valuable for tracking invoice template development.
|
||||
|
||||
### Approach 2: Keep Branches Separate
|
||||
**Pros**: No branch manipulation needed
|
||||
**Cons**: Branches remain tangled, harder to track progress
|
||||
|
||||
**Rejected because**: Goal is to cleanly separate invoice templates from transaction work.
|
||||
|
||||
### Approach 3: Rebase Clauding Onto Master
|
||||
**Pros**: Linear history
|
||||
**Cons**: Requires force push, may lose merge context
|
||||
|
||||
**Rejected because**: Current team workflow uses merge commits, and merging master into clauding preserves the integration point.
|
||||
|
||||
### Approach 4: Create New Branch Instead of Cleanup
|
||||
**Pros**: Less risky, preserves full history
|
||||
**Cons**: Accumulates branches, harder to track
|
||||
|
||||
**Rejected because**: Goal is cleanup and reorganization, not preservation.
|
||||
|
||||
## Related Work
|
||||
|
||||
- **Previous invoice template work**: `2026-02-07-feat-add-invoice-template-03881260-plan.md`
|
||||
- **Current branch structure**: `clauding` has hierarchical relationship with `get-transactions2-page-working`
|
||||
- **Team git workflow**: Uses merge commits (not rebasing), per repo research
|
||||
|
||||
## References & Research
|
||||
|
||||
### Internal References
|
||||
- **Branch management patterns**: Repo research analysis (see `task_id: ses_3c2287be8ffe9icFi5jHEspaqh`)
|
||||
- **Invoice template location**: `src/clj/auto_ap/parse/templates.clj`
|
||||
- **Current branch structure**: Git log analysis
|
||||
|
||||
### Git Operations Documentation
|
||||
- **Cherry-pick**: `git cherry-pick <commit>`
|
||||
- **Interactive rebase**: `git rebase -i <base>`
|
||||
- **Force push with lease**: `git push --force-with-lease`
|
||||
- **Merge commits**: `git merge <branch> --no-edit`
|
||||
|
||||
### File Locations
|
||||
- Templates: `src/clj/auto_ap/parse/templates.clj`
|
||||
- Parser logic: `src/clj/auto_ap/parse.clj`
|
||||
- Invoice PDF: `dev-resources/INVOICE - 03881260.pdf`
|
||||
|
||||
## Testing Plan
|
||||
|
||||
### Before Each Major Step
|
||||
```bash
|
||||
# Verify current branch state
|
||||
git branch -vv
|
||||
git log --oneline -10
|
||||
|
||||
# Run all tests
|
||||
lein test
|
||||
|
||||
# Run specific invoice template tests
|
||||
lein test auto-ap.parse.templates-test
|
||||
```
|
||||
|
||||
### After Each Major Step
|
||||
- Verify commit count and order
|
||||
- Run full test suite
|
||||
- Check for unintended changes
|
||||
- Verify remote branch state matches local
|
||||
|
||||
## Notes
|
||||
|
||||
- **Team coordination**: Inform team before force pushing to avoid conflicts
|
||||
- **Backup strategy**: All commits are preserved in the rebase process
|
||||
- **Testing**: Verify at each step to catch issues early
|
||||
- **Safety first**: Use `--force-with-lease` instead of `--force`
|
||||
- **Documentation**: This plan serves as documentation for the operation
|
||||
@@ -0,0 +1,133 @@
|
||||
---
|
||||
module: SSR Admin
|
||||
component: testing_framework
|
||||
date: '2026-02-07'
|
||||
problem_type: best_practice
|
||||
resolution_type: test_fix
|
||||
severity: medium
|
||||
root_cause: inadequate_documentation
|
||||
symptoms:
|
||||
- Route tests only verified HTTP status codes (200), not actual HTML content
|
||||
- No verification that route responses contain expected page elements
|
||||
- Could have false positives where routes return empty or wrong content
|
||||
rails_version: 7.1.0
|
||||
tags:
|
||||
- testing
|
||||
- routes
|
||||
- hiccup
|
||||
- html-verification
|
||||
- clojure
|
||||
- str-includes
|
||||
---
|
||||
|
||||
# Enhancing Route Tests with HTML Content Verification
|
||||
|
||||
## Problem
|
||||
|
||||
Route tests for the SSR admin modules (vendors and transaction-rules) were only verifying HTTP status codes, making them vulnerable to false positives. A route could return a 200 status but with empty or incorrect HTML content, and the tests would still pass.
|
||||
|
||||
## Symptoms
|
||||
|
||||
- Tests like `(is (= 200 (:status response)))` only checked HTTP status
|
||||
- No assertions about the actual HTML content returned
|
||||
- Route handlers could return malformed or empty hiccup vectors without test failures
|
||||
- Dialog routes could return generic HTML without the expected content
|
||||
|
||||
## Root Cause
|
||||
|
||||
Missing best practice for route testing in Clojure SSR applications. Unlike Rails controller tests that can use `assert_select` or Capybara matchers, there was no established pattern for verifying hiccup-rendered HTML content.
|
||||
|
||||
## Solution
|
||||
|
||||
Enhanced route tests to verify HTML content using `clojure.string/includes?` checks on the rendered HTML string.
|
||||
|
||||
### Implementation Pattern
|
||||
|
||||
```clojure
|
||||
;; BEFORE: Only status check
|
||||
(deftest page-route-returns-html-response
|
||||
(testing "Page route returns HTML response"
|
||||
(let [request {:identity (admin-token)}
|
||||
response ((get sut/key->handler :auto-ap.routes.admin.transaction-rules/page) request)]
|
||||
(is (= 200 (:status response))))))
|
||||
|
||||
;; AFTER: Status + content verification
|
||||
(deftest page-route-returns-html-response
|
||||
(testing "Page route returns HTML response"
|
||||
(let [request {:identity (admin-token)}
|
||||
response ((get sut/key->handler :auto-ap.routes.admin.transaction-rules/page) request)
|
||||
html-str (apply str (:body response))]
|
||||
(is (= 200 (:status response)))
|
||||
(is (str/includes? html-str "Transaction Rules")))))
|
||||
```
|
||||
|
||||
### Key Changes
|
||||
|
||||
1. **Convert body to string**: Use `(apply str (:body response))` to convert hiccup vectors to HTML string
|
||||
2. **Add content assertions**: Use `clojure.string/includes?` to verify expected content exists
|
||||
3. **Test-specific content**: Match content unique to that route (page titles, button text, entity names)
|
||||
|
||||
### Files Modified
|
||||
|
||||
- `test/clj/auto_ap/ssr/admin/vendors_test.clj`
|
||||
- Added `vendor-page-route-contains-vendor-content` test
|
||||
|
||||
- `test/clj/auto_ap/ssr/admin/transaction_rules_test.clj`
|
||||
- Enhanced 7 route tests with content verification:
|
||||
- `page-route-returns-html-response` → checks for "Transaction Rules"
|
||||
- `table-route-returns-table-data` → checks for "New Transaction Rule"
|
||||
- `edit-dialog-route-returns-dialog` → checks for entity-specific content
|
||||
- `account-typeahead-route-works` → checks for "account"
|
||||
- `location-select-route-works` → checks for "location"
|
||||
- `execute-dialog-route-works` → checks for "Code transactions"
|
||||
- `new-dialog-route-returns-empty-form` → checks for "Transaction rule"
|
||||
|
||||
### Testing Strategy
|
||||
|
||||
For each route, identify the minimal but specific content that indicates the route is working:
|
||||
|
||||
- **Page routes**: Check for page title or heading
|
||||
- **Dialog routes**: Check for dialog-specific button text or the entity name being edited
|
||||
- **Typeahead routes**: Check for the resource type (e.g., "account")
|
||||
- **Table routes**: Check for action buttons or empty state messages
|
||||
|
||||
## Prevention
|
||||
|
||||
When writing route tests, always:
|
||||
|
||||
1. ✅ Verify HTTP status code (200, 302, etc.)
|
||||
2. ✅ Verify response contains expected HTML content
|
||||
3. ✅ Use specific content unique to that route
|
||||
4. ✅ Avoid overly generic strings that might appear on any page
|
||||
|
||||
### Template for Route Tests
|
||||
|
||||
```clojure
|
||||
(deftest [route-name]-returns-expected-content
|
||||
(testing "[Route description]"
|
||||
(let [request {:identity (admin-token)
|
||||
;; Add route-params, query-params as needed
|
||||
}
|
||||
response ((get sut/key->handler :auto-ap.routes.[module]/[route]) request)
|
||||
html-str (apply str (:body response))]
|
||||
(is (= 200 (:status response)))
|
||||
(is (str/includes? html-str "[Expected content]")))))
|
||||
```
|
||||
|
||||
## Tools Used
|
||||
|
||||
- `clojure.string/includes?` - Simple string containment check
|
||||
- `apply str` - Converts hiccup vector to HTML string
|
||||
- No additional dependencies needed
|
||||
|
||||
## Benefits
|
||||
|
||||
- **Catches regressions**: Tests fail if route returns wrong content
|
||||
- **Self-documenting**: Test assertions describe expected behavior
|
||||
- **Lightweight**: No complex HTML parsing libraries required
|
||||
- **Fast**: String operations are performant
|
||||
|
||||
## Related
|
||||
|
||||
- Similar pattern could apply to any Clojure SSR application using hiccup
|
||||
- For more complex DOM assertions, consider adding hickory or enlive for structured HTML parsing
|
||||
@@ -5,7 +5,6 @@
|
||||
[clojure.string :as str]
|
||||
[auto-ap.time :as atime]))
|
||||
|
||||
|
||||
(def pdf-templates
|
||||
[;; CHEF's WAREHOUSE
|
||||
{:vendor "CHFW"
|
||||
@@ -45,8 +44,7 @@
|
||||
:parser {:date [:clj-time "MM/dd/yy"]}
|
||||
:multi #"\f\f"}
|
||||
|
||||
|
||||
;; IMPACT PAPER
|
||||
;; IMPACT PAPER
|
||||
{:vendor "Impact Paper & Ink LTD"
|
||||
:keywords [#"650-692-5598"]
|
||||
:extract {:total #"Total Amount\s+\$([\d\.\,\-]+)"
|
||||
@@ -369,8 +367,7 @@
|
||||
:parser {:date [:clj-time "MM/dd/yyyy"]
|
||||
:total [:trim-commas nil]}}
|
||||
|
||||
|
||||
;; Breakthru Bev
|
||||
;; Breakthru Bev
|
||||
{:vendor "Wine Warehouse"
|
||||
:keywords [#"BREAKTHRU BEVERAGE"]
|
||||
:extract {:date #"Invoice Date:\s+([0-9]+/[0-9]+/[0-9]+)"
|
||||
@@ -686,13 +683,13 @@
|
||||
|
||||
;; TODO DISABLING TO FOCUS ON STATEMENT
|
||||
#_{:vendor "Reel Produce"
|
||||
:keywords [#"reelproduce.com"]
|
||||
:extract {:date #"([0-9]+/[0-9]+/[0-9]+)"
|
||||
:customer-identifier #"Bill To(?:.*?)\n\n\s+(.*?)\s{2,}"
|
||||
:invoice-number #"Invoice #\n.*?\n.*?([\d\-]+)\n"
|
||||
:total #"Total\s*\n\s+\$([\d\-,]+\.\d{2,2}+)"}
|
||||
:parser {:date [:clj-time "MM/dd/yy"]
|
||||
:total [:trim-commas-and-negate nil]}}
|
||||
:keywords [#"reelproduce.com"]
|
||||
:extract {:date #"([0-9]+/[0-9]+/[0-9]+)"
|
||||
:customer-identifier #"Bill To(?:.*?)\n\n\s+(.*?)\s{2,}"
|
||||
:invoice-number #"Invoice #\n.*?\n.*?([\d\-]+)\n"
|
||||
:total #"Total\s*\n\s+\$([\d\-,]+\.\d{2,2}+)"}
|
||||
:parser {:date [:clj-time "MM/dd/yy"]
|
||||
:total [:trim-commas-and-negate nil]}}
|
||||
|
||||
{:vendor "Eddie's Produce"
|
||||
:keywords [#"Eddie's Produce"]
|
||||
@@ -754,7 +751,18 @@
|
||||
:parser {:date [:clj-time "MM/dd/yyyy"]
|
||||
:total [:trim-commas-and-negate nil]}
|
||||
:multi #"\n"
|
||||
:multi-match? #"INV #"}])
|
||||
:multi-match? #"INV #"}
|
||||
|
||||
;; Bonanza Produce
|
||||
{:vendor "Bonanza Produce"
|
||||
:keywords [#"530-544-4136"]
|
||||
:extract {:invoice-number #"NO\s+(\d{8,})\s+\d{2}/\d{2}/\d{2}"
|
||||
:date #"NO\s+\d{8,}\s+(\d{2}/\d{2}/\d{2})"
|
||||
:customer-identifier #"(?s)I\s+([A-Z][A-Z\s]+?)\s{2,}.*?L\s+([0-9][A-Z0-9\s]+?)(?=\s{2,}|\n)"
|
||||
:account-number #"(?s)L\s+([0-9][A-Z0-9\s]+?)(?=\s{2,}|\n)"
|
||||
:total #"SHIPPED\s+[\d\.]+\s+TOTAL\s+([\d\.]+)"}
|
||||
:parser {:date [:clj-time "MM/dd/yy"]
|
||||
:total [:trim-commas nil]}}])
|
||||
|
||||
(def excel-templates
|
||||
[{:vendor "Mama Lu's Foods"
|
||||
@@ -784,43 +792,41 @@
|
||||
{:vendor "Daylight Foods"
|
||||
:keywords [#"CUSTNO"]
|
||||
:extract (fn [sheet vendor]
|
||||
(alog/peek ::daylight-invoices
|
||||
(transduce (comp
|
||||
(drop 1)
|
||||
(filter
|
||||
(fn [r]
|
||||
(and
|
||||
(seq r)
|
||||
(->> r first not-empty))))
|
||||
(map
|
||||
(fn [[customer-number _ _ _ invoice-number date amount :as row]]
|
||||
(println "DAT E is" date)
|
||||
{:customer-identifier customer-number
|
||||
:text (str/join " " row)
|
||||
:full-text (str/join " " row)
|
||||
:date (try (or (u/parse-value :clj-time "MM/dd/yyyy" (str/trim date))
|
||||
(try
|
||||
(atime/as-local-time
|
||||
(time/plus (time/date-time 1900 1 1)
|
||||
(time/days (dec (dec (Integer/parseInt "45663"))))))
|
||||
(catch Exception e
|
||||
nil)
|
||||
))
|
||||
|
||||
(catch Exception e
|
||||
(try
|
||||
(atime/as-local-time
|
||||
(time/plus (time/date-time 1900 1 1)
|
||||
(time/days (dec (dec (Integer/parseInt "45663"))))))
|
||||
(catch Exception e
|
||||
nil)
|
||||
)
|
||||
))
|
||||
:invoice-number invoice-number
|
||||
:total (str amount)
|
||||
:vendor-code vendor})))
|
||||
conj
|
||||
[]
|
||||
sheet)))}])
|
||||
(alog/peek ::daylight-invoices
|
||||
(transduce (comp
|
||||
(drop 1)
|
||||
(filter
|
||||
(fn [r]
|
||||
(and
|
||||
(seq r)
|
||||
(->> r first not-empty))))
|
||||
(map
|
||||
(fn [[customer-number _ _ _ invoice-number date amount :as row]]
|
||||
(println "DAT E is" date)
|
||||
{:customer-identifier customer-number
|
||||
:text (str/join " " row)
|
||||
:full-text (str/join " " row)
|
||||
:date (try (or (u/parse-value :clj-time "MM/dd/yyyy" (str/trim date))
|
||||
(try
|
||||
(atime/as-local-time
|
||||
(time/plus (time/date-time 1900 1 1)
|
||||
(time/days (dec (dec (Integer/parseInt "45663"))))))
|
||||
(catch Exception e
|
||||
nil)))
|
||||
|
||||
(catch Exception e
|
||||
(try
|
||||
(atime/as-local-time
|
||||
(time/plus (time/date-time 1900 1 1)
|
||||
(time/days (dec (dec (Integer/parseInt "45663"))))))
|
||||
(catch Exception e
|
||||
nil))))
|
||||
|
||||
:invoice-number invoice-number
|
||||
:total (str amount)
|
||||
:vendor-code vendor})))
|
||||
conj
|
||||
[]
|
||||
sheet)))}])
|
||||
|
||||
|
||||
|
||||
34
test/clj/auto_ap/parse/templates_test.clj
Normal file
34
test/clj/auto_ap/parse/templates_test.clj
Normal file
@@ -0,0 +1,34 @@
|
||||
(ns auto-ap.parse.templates-test
|
||||
(:require [auto-ap.parse :as sut]
|
||||
[clojure.test :refer [deftest is testing]]
|
||||
[clojure.java.io :as io]
|
||||
[clojure.string :as str]
|
||||
[clj-time.core :as time]))
|
||||
|
||||
(deftest parse-bonanza-produce-invoice-03881260
|
||||
(testing "Should parse Bonanza Produce invoice 03881260 with customer identifier including address"
|
||||
(let [pdf-file (io/file "dev-resources/INVOICE - 03881260.pdf")
|
||||
;; Extract text same way parse-file does
|
||||
pdf-text (:out (clojure.java.shell/sh "pdftotext" "-layout" (str pdf-file) "-"))
|
||||
results (sut/parse pdf-text)
|
||||
result (first results)]
|
||||
(is (some? results) "parse should return a result")
|
||||
(is (some? result) "Template should match and return a result")
|
||||
(when result
|
||||
(println "DEBUG: customer-identifier =" (pr-str (:customer-identifier result)))
|
||||
(println "DEBUG: account-number =" (pr-str (:account-number result)))
|
||||
(is (= "Bonanza Produce" (:vendor-code result)))
|
||||
(is (= "03881260" (:invoice-number result)))
|
||||
;; Date is parsed as org.joda.time.DateTime - compare year/month/day
|
||||
(let [d (:date result)]
|
||||
(is (= 2026 (time/year d)))
|
||||
(is (= 1 (time/month d)))
|
||||
(is (= 20 (time/day d))))
|
||||
;; Customer identifier includes name, account-number includes street address
|
||||
;; Together they form the full customer identification
|
||||
(is (= "NICK THE GREEK" (:customer-identifier result)))
|
||||
(is (= "600 VISTA WAY" (str/trim (:account-number result))))
|
||||
(is (= "NICK THE GREEK 600 VISTA WAY"
|
||||
(str (:customer-identifier result) " " (str/trim (:account-number result)))))
|
||||
;; Total is parsed as string, not number (per current behavior)
|
||||
(is (= "23.22" (:total result)))))))
|
||||
Reference in New Issue
Block a user