- Added multi-invoice template for Bonanza Produce with :multi and :multi-match? flags - Template uses keywords for statement header to identify multi-invoice format - Extracts invoice-number, date, customer-identifier (from RETURN line), and total - Parses 4 invoices from statement PDF 13595522.pdf - All tests pass (29 assertions, 0 failures, 0 errors) - Added test: parse-bonanza-produce-statement-13595522 - Updated invoice-template-creator skill: emphasized test-first approach
6.0 KiB
module, date, problem_type, component, symptoms, root_cause, resolution_type, severity, tags
| module | date | problem_type | component | symptoms | root_cause | resolution_type | severity | tags | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Invoice Parsing | 2026-02-07 | integration_failure | pdf_template_parser |
|
template_inadequate | template_fix | high |
|
Bonanza Produce Multi-Invoice Statement Template Fix
Problem
Bonanza Produce sends two different invoice formats:
- Single invoices (e.g., 03881260.pdf) with I/L markers and specific layout
- Multi-invoice statements (e.g., 13595522.pdf) containing 4 invoices per page
The single invoice template failed to parse multi-invoice statements because:
- Multi-invoice statements lack the I/L (Invoice/Location) markers used in single invoice templates
- The layout structure is completely different, with invoices listed as table rows instead of distinct sections
- Customer identifier extraction requires a different regex pattern
Environment
- Component: PDF Template Parser (Clojure)
- Date: 2026-02-07
- Test File:
test/clj/auto_ap/parse/templates_test.clj - Template File:
src/clj/auto_ap/parse/templates.clj - Test Document:
dev-resources/13595522.pdf(4 invoices on single page)
Symptoms
- Single invoice template only parses first invoice from multi-invoice statement
- Parse returns single result instead of 4 separate invoice records
:customer-identifierextraction returns empty or incorrect values for statements- Test
parse-bonanza-produce-statement-13595522expects 4 results but receives 1
What Didn't Work
Attempted Solution 1: Reuse single invoice template with :multi flag
- Added
:multi #"\n"and:multi-match?pattern to existing single invoice template - Why it failed: The single invoice template's regex patterns (e.g.,
I\s+([A-Z][A-Z\s]+?)\s{2,}.*?L\s+) expect I/L markers that don't exist in multi-invoice statements. The layout structure is fundamentally different.
Attempted Solution 2: Using simpler customer identifier pattern
- Tried pattern
#"(.*?)\s+RETURN"extracted from multi-invoice statement text - Why it failed: This pattern alone doesn't account for the statement's column-based layout. Need to combine with
:multiand:multi-match?flags to parse multiple invoices.
Solution
Added a dedicated multi-invoice template that:
- Uses different keywords to identify multi-invoice statements
- Employs
:multiand:multi-match?flags for multiple invoice extraction - Uses simpler regex patterns suitable for the statement layout
Implementation:
;; Bonanza Produce Statement (multi-invoice)
{:vendor "Bonanza Produce"
:keywords [#"The perishable agricultural commodities" #"SPARKS, NEVADA"]
:extract {:invoice-number #"^\s+[0-9]{2}/[0-9]{2}/[0-9]{2}\s+([0-9]+)\s+INVOICE"
:customer-identifier #"(.*?)\s+RETURN"
:date #"^\s+([0-9]{2}/[0-9]{2}/[0-9]{2})"
:total #"^\s+[0-9]{2}/[0-9]{2}/[0-9]{2}\s+[0-9]+\s+INVOICE\s+([\d.]+)"}
:parser {:date [:clj-time "MM/dd/yy"]
:total [:trim-commas nil]}
:multi #"\n"
:multi-match? #"\s+[0-9]{2}/[0-9]{2}/[0-9]{2}\s+[0-9]+\s+INVOICE"}
Key differences from single invoice template:
:keywords: Look for statement header text instead of phone number:customer-identifier: Pattern#"(.*?)\s+RETURN"works for statement format:multi #"\n": Split results on newline boundaries:multi-match?: Match invoice header pattern to identify individual invoices- No I/L markers: Patterns scan from left margin without location markers
Why This Works
-
Statement-specific keywords: "The perishable agricultural commodities" and "SPARKS, NEVADA" uniquely identify multi-invoice statements vs. single invoices (which have phone number 530-544-4136)
-
Multi-flag parsing: The
:multiand:multi-match?flags tell the parser to split the document on newlines and identify individual invoices using the date/invoice-number pattern, rather than treating the whole page as one invoice -
Simplified patterns: Without I/L markers, patterns scan from line start (
^\s+) and extract columns based on whitespace positions. The:customer-identifierpattern(.*?)\s+RETURNcaptures everything before "RETURN" on each line -
Separate templates: Having distinct templates for single invoices vs. statements prevents conflict and allows optimization for each format
Prevention
When adding templates for vendors with multiple document formats:
-
Create separate templates: Don't try to make one template handle both formats. Use distinct keywords to identify each format
-
Test both single and multi-invoice documents: Ensure templates parse expected number of invoices:
(is (= 4 (count results)) "Should parse 4 invoices from statement") -
Verify
:multiusage: Multi-invoice templates should have both:multiand:multi-match?flags::multi #"\n" :multi-match? #"\s+[0-9]{2}/[0-9]{2}/[0-9]{2}\s+[0-9]+\s+INVOICE" -
Check pattern scope: Multi-invoice statements often lack structural markers (I/L), so patterns should:
- Use
^\s+to anchor at line start - Extract from whitespace-separated columns
- Avoid patterns requiring specific markers
- Use
-
Run all template tests: Before committing, run:
lein test auto-ap.parse.templates-test
Related Issues
- Single invoice template:
src/clj/auto_ap/parse/templates.cljlines 756-765 - Similar multi-invoice patterns: Search for
:multiand:multi-match?insrc/clj/auto_ap/parse/templates.clj
Key Files
- Tests:
test/clj/auto_ap/parse/templates_test.clj(lines 36-53) - Template:
src/clj/auto_ap/parse/templates.clj(lines 767-777) - Test document:
dev-resources/13595522.pdf - Template parser:
src/clj/auto_ap/parse.clj