Files
Bryce 8a0395dc4a Add Bonanza Produce multi-invoice statement template
- Added multi-invoice template for Bonanza Produce with :multi and :multi-match? flags
- Template uses keywords for statement header to identify multi-invoice format
- Extracts invoice-number, date, customer-identifier (from RETURN line), and total
- Parses 4 invoices from statement PDF 13595522.pdf
- All tests pass (29 assertions, 0 failures, 0 errors)

- Added test: parse-bonanza-produce-statement-13595522
- Updated invoice-template-creator skill: emphasized test-first approach
2026-02-08 07:56:14 -08:00

4.9 KiB

Invoice Template Examples

Simple Single Invoice

{:vendor "Gstar Seafood"
 :keywords [#"G Star Seafood"]
 :extract {:total #"Total\s{2,}([\d\-,]+\.\d{2,2}+)"
           :customer-identifier #"(.*?)(?:\s+)Invoice #"
           :date #"Invoice Date\s{2,}([0-9]+/[0-9]+/[0-9]+)"
           :invoice-number #"Invoice #\s+(\d+)"}
 :parser {:date [:clj-time "MM/dd/yyyy"]
          :total [:trim-commas nil]}}

Multi-Invoice Statement

{:vendor "Southbay Fresh Produce"
 :keywords [#"(SOUTH BAY FRESH PRODUCE|SOUTH BAY PRODUCE)"]
 :extract {:date #"^([0-9]+/[0-9]+/[0-9]+)"
           :customer-identifier #"To:[^\n]*\n\s+([A-Za-z' ]+)\s{2}"
           :invoice-number #"INV #\/(\d+)"
           :total #"\$([0-9.]+)\."}
 :parser {:date [:clj-time "MM/dd/yyyy"]}
 :multi #"\n"
 :multi-match? #"^[0-9]+/[0-9]+/[0-9]+\s+INV "}

Customer with Address (Multi-line)

{:vendor "Bonanza Produce"
 :keywords [#"530-544-4136"]
 :extract {:invoice-number #"NO\s+(\d{8,})\s+\d{2}/\d{2}/\d{2}"
           :date #"NO\s+\d{8,}\s+(\d{2}/\d{2}/\d{2})"
           :customer-identifier #"(?s)I\s+([A-Z][A-Z\s]+?)\s{2,}.*?L\s+([0-9][A-Z0-9\s]+?)(?=\s{2,}|\n)"
           :account-number #"(?s)L\s+([0-9][A-Z0-9\s]+?)(?=\s{2,}|\n)"
           :total #"SHIPPED\s+[\d\.]+\s+TOTAL\s+([\d\.]+)"}
 :parser {:date [:clj-time "MM/dd/yy"]
          :total [:trim-commas nil]}}

Credit Memo (Negative Amounts)

{:vendor "General Produce Company"
 :keywords [#"916-552-6495"]
 :extract {:date #"DATE.*\n.*\n.*?([0-9]+/[0-9]+/[0-9]+)"
           :invoice-number #"CREDIT NO.*\n.*\n.*?(\d{5,}?)\s+"
           :account-number #"CUST NO.*\n.*\n\s+(\d+)"
           :total #"TOTAL:\s+\|\s*(.*)"}
 :parser {:date [:clj-time "MM/dd/yy"]
          :total [:trim-commas-and-negate nil]}}

Complex Date Parsing

{:vendor "Ben E. Keith"
 :keywords [#"BEN E. KEITH"]
 :extract {:date #"Customer No Mo Day Yr.*?\n.*?\d{5,}\s{2,}(\d+\s+\d+\s+\d+)"
           :customer-identifier #"Customer No Mo Day Yr.*?\n.*?(\d{5,})"
           :invoice-number #"Invoice No.*?\n.*?(\d{8,})"
           :total #"Total Invoice.*?\n.*?([\-]?[0-9]+\.[0-9]{2,})"}
 :parser {:date [:month-day-year nil]
          :total [:trim-commas-and-negate nil]}}

Multiple Date Formats

{:vendor "RNDC"
 :keywords [#"P.O.Box 743564"]
 :extract {:date #"(?:INVOICE|CREDIT) DATE\n(?:.*?)(\S+)\n"
           :account-number #"Store Number:\s+(\d+)"
           :invoice-number #"(?:INVOICE|CREDIT) DATE\n(?:.*?)\s{2,}(\d+?)\s+\S+\n"
           :total #"Net Amount(?:.*\n){4}(?:.*?)([\-]?[0-9\.]+)\n"}
 :parser {:date [:clj-time ["MM/dd/yy" "dd-MMM-yy"]]
          :total [:trim-commas-and-negate nil]}}

Common Regex Patterns

Phone Numbers

#"\(?[0-9]{3}\)?[-.\s]?[0-9]{3}[-.\s]?[0-9]{4}"

Dollar Amounts

#"\$?([0-9,]+\.[0-9]{2})"

Dates (MM/dd/yy)

#"([0-9]{2}/[0-9]{2}/[0-9]{2})"

Dates (MM/dd/yyyy)

#"([0-9]{2}/[0-9]{2}/[0-9]{4})"

Multi-line Text (dotall mode)

#"(?s)start.*?end"

Non-greedy Match

#"(pattern.+?)"

Lookahead Boundary

#"value(?=\s{2,}|\n)"

Field Extraction Strategies

1. Simple Line-based

Use [^\n]* to match until end of line:

#"Invoice:\s+([^\n]+)"

2. Whitespace Boundary

Use (?=\s{2,}|\n) to stop at multiple spaces or newline:

#"Customer:\s+(.+?)(?=\s{2,}|\n)"

3. Specific Marker

Match until a specific pattern is found:

#"(?s)Start(.*?)End"

4. Multi-part Extraction

Use multiple capture groups for related fields:

#"Date:\s+(\d{2})/(\d{2})/(\d{2})"

Parser Options

Date Parsers

  • [:clj-time "MM/dd/yyyy"] - Standard US date
  • [:clj-time "MM/dd/yy"] - 2-digit year
  • [:clj-time "MMM dd, yyyy"] - Named month
  • [:clj-time ["MM/dd/yy" "yyyy-MM-dd"]] - Multiple formats
  • [:month-day-year nil] - Space-separated (1 15 26)

Number Parsers

  • [:trim-commas nil] - Remove commas from numbers
  • [:trim-commas-and-negate nil] - Handle negative/credit amounts
  • [:trim-commas-and-remove-dollars nil] - Remove $ and commas
  • nil - No parsing, return raw string

Testing Patterns

Basic Test Structure

(deftest parse-vendor-invoice
  (testing "Should parse vendor invoice"
    (let [results (sut/parse-file (io/file "dev-resources/INVOICE.pdf")
                                  "INVOICE.pdf")
          result (first results)]
      (is (some? result))
      (is (= "Vendor" (:vendor-code result)))
      (is (= "12345" (:invoice-number result))))))

Date Testing

(let [d (:date result)]
  (is (= 2026 (time/year d)))
  (is (= 1 (time/month d)))
  (is (= 15 (time/day d))))

Multi-field Verification

(is (= "Expected Name" (:customer-identifier result)))
(is (= "Expected Street" (str/trim (:account-number result))))
(is (= "Expected City, ST 12345" (str/trim (:location result))))