Files
integreat/docs/solutions/integration-issues/multi-invoice-template-bonanza-produce-20260207.md
Bryce 8a0395dc4a Add Bonanza Produce multi-invoice statement template
- Added multi-invoice template for Bonanza Produce with :multi and :multi-match? flags
- Template uses keywords for statement header to identify multi-invoice format
- Extracts invoice-number, date, customer-identifier (from RETURN line), and total
- Parses 4 invoices from statement PDF 13595522.pdf
- All tests pass (29 assertions, 0 failures, 0 errors)

- Added test: parse-bonanza-produce-statement-13595522
- Updated invoice-template-creator skill: emphasized test-first approach
2026-02-08 07:56:14 -08:00

6.0 KiB

module, date, problem_type, component, symptoms, root_cause, resolution_type, severity, tags
module date problem_type component symptoms root_cause resolution_type severity tags
Invoice Parsing 2026-02-07 integration_failure pdf_template_parser
Bonanza Produce multi-invoice statement (13595522.pdf) fails to parse correctly
Single invoice template extracts only one invoice instead of four
Multi-invoice statement lacks I/L markers present in single invoices
Customer identifier extraction pattern requires different regex for statements
template_inadequate template_fix high
pdf
parsing
invoice
bonanza-produce
multi-invoice
integration

Bonanza Produce Multi-Invoice Statement Template Fix

Problem

Bonanza Produce sends two different invoice formats:

  1. Single invoices (e.g., 03881260.pdf) with I/L markers and specific layout
  2. Multi-invoice statements (e.g., 13595522.pdf) containing 4 invoices per page

The single invoice template failed to parse multi-invoice statements because:

  • Multi-invoice statements lack the I/L (Invoice/Location) markers used in single invoice templates
  • The layout structure is completely different, with invoices listed as table rows instead of distinct sections
  • Customer identifier extraction requires a different regex pattern

Environment

  • Component: PDF Template Parser (Clojure)
  • Date: 2026-02-07
  • Test File: test/clj/auto_ap/parse/templates_test.clj
  • Template File: src/clj/auto_ap/parse/templates.clj
  • Test Document: dev-resources/13595522.pdf (4 invoices on single page)

Symptoms

  • Single invoice template only parses first invoice from multi-invoice statement
  • Parse returns single result instead of 4 separate invoice records
  • :customer-identifier extraction returns empty or incorrect values for statements
  • Test parse-bonanza-produce-statement-13595522 expects 4 results but receives 1

What Didn't Work

Attempted Solution 1: Reuse single invoice template with :multi flag

  • Added :multi #"\n" and :multi-match? pattern to existing single invoice template
  • Why it failed: The single invoice template's regex patterns (e.g., I\s+([A-Z][A-Z\s]+?)\s{2,}.*?L\s+) expect I/L markers that don't exist in multi-invoice statements. The layout structure is fundamentally different.

Attempted Solution 2: Using simpler customer identifier pattern

  • Tried pattern #"(.*?)\s+RETURN" extracted from multi-invoice statement text
  • Why it failed: This pattern alone doesn't account for the statement's column-based layout. Need to combine with :multi and :multi-match? flags to parse multiple invoices.

Solution

Added a dedicated multi-invoice template that:

  1. Uses different keywords to identify multi-invoice statements
  2. Employs :multi and :multi-match? flags for multiple invoice extraction
  3. Uses simpler regex patterns suitable for the statement layout

Implementation:

;; Bonanza Produce Statement (multi-invoice)
{:vendor "Bonanza Produce"
 :keywords [#"The perishable agricultural commodities" #"SPARKS, NEVADA"]
 :extract {:invoice-number #"^\s+[0-9]{2}/[0-9]{2}/[0-9]{2}\s+([0-9]+)\s+INVOICE"
           :customer-identifier #"(.*?)\s+RETURN"
           :date #"^\s+([0-9]{2}/[0-9]{2}/[0-9]{2})"
           :total #"^\s+[0-9]{2}/[0-9]{2}/[0-9]{2}\s+[0-9]+\s+INVOICE\s+([\d.]+)"}
 :parser {:date [:clj-time "MM/dd/yy"]
          :total [:trim-commas nil]}
 :multi #"\n"
 :multi-match? #"\s+[0-9]{2}/[0-9]{2}/[0-9]{2}\s+[0-9]+\s+INVOICE"}

Key differences from single invoice template:

  • :keywords: Look for statement header text instead of phone number
  • :customer-identifier: Pattern #"(.*?)\s+RETURN" works for statement format
  • :multi #"\n": Split results on newline boundaries
  • :multi-match?: Match invoice header pattern to identify individual invoices
  • No I/L markers: Patterns scan from left margin without location markers

Why This Works

  1. Statement-specific keywords: "The perishable agricultural commodities" and "SPARKS, NEVADA" uniquely identify multi-invoice statements vs. single invoices (which have phone number 530-544-4136)

  2. Multi-flag parsing: The :multi and :multi-match? flags tell the parser to split the document on newlines and identify individual invoices using the date/invoice-number pattern, rather than treating the whole page as one invoice

  3. Simplified patterns: Without I/L markers, patterns scan from line start (^\s+) and extract columns based on whitespace positions. The :customer-identifier pattern (.*?)\s+RETURN captures everything before "RETURN" on each line

  4. Separate templates: Having distinct templates for single invoices vs. statements prevents conflict and allows optimization for each format

Prevention

When adding templates for vendors with multiple document formats:

  1. Create separate templates: Don't try to make one template handle both formats. Use distinct keywords to identify each format

  2. Test both single and multi-invoice documents: Ensure templates parse expected number of invoices:

    (is (= 4 (count results)) "Should parse 4 invoices from statement")
    
  3. Verify :multi usage: Multi-invoice templates should have both :multi and :multi-match? flags:

    :multi #"\n"
    :multi-match? #"\s+[0-9]{2}/[0-9]{2}/[0-9]{2}\s+[0-9]+\s+INVOICE"
    
  4. Check pattern scope: Multi-invoice statements often lack structural markers (I/L), so patterns should:

    • Use ^\s+ to anchor at line start
    • Extract from whitespace-separated columns
    • Avoid patterns requiring specific markers
  5. Run all template tests: Before committing, run:

    lein test auto-ap.parse.templates-test
    
  • Single invoice template: src/clj/auto_ap/parse/templates.clj lines 756-765
  • Similar multi-invoice patterns: Search for :multi and :multi-match? in src/clj/auto_ap/parse/templates.clj

Key Files

  • Tests: test/clj/auto_ap/parse/templates_test.clj (lines 36-53)
  • Template: src/clj/auto_ap/parse/templates.clj (lines 767-777)
  • Test document: dev-resources/13595522.pdf
  • Template parser: src/clj/auto_ap/parse.clj