- Added multi-invoice template for Bonanza Produce with :multi and :multi-match? flags - Template uses keywords for statement header to identify multi-invoice format - Extracts invoice-number, date, customer-identifier (from RETURN line), and total - Parses 4 invoices from statement PDF 13595522.pdf - All tests pass (29 assertions, 0 failures, 0 errors) - Added test: parse-bonanza-produce-statement-13595522 - Updated invoice-template-creator skill: emphasized test-first approach
4.9 KiB
4.9 KiB
Invoice Template Examples
Simple Single Invoice
{:vendor "Gstar Seafood"
:keywords [#"G Star Seafood"]
:extract {:total #"Total\s{2,}([\d\-,]+\.\d{2,2}+)"
:customer-identifier #"(.*?)(?:\s+)Invoice #"
:date #"Invoice Date\s{2,}([0-9]+/[0-9]+/[0-9]+)"
:invoice-number #"Invoice #\s+(\d+)"}
:parser {:date [:clj-time "MM/dd/yyyy"]
:total [:trim-commas nil]}}
Multi-Invoice Statement
{:vendor "Southbay Fresh Produce"
:keywords [#"(SOUTH BAY FRESH PRODUCE|SOUTH BAY PRODUCE)"]
:extract {:date #"^([0-9]+/[0-9]+/[0-9]+)"
:customer-identifier #"To:[^\n]*\n\s+([A-Za-z' ]+)\s{2}"
:invoice-number #"INV #\/(\d+)"
:total #"\$([0-9.]+)\."}
:parser {:date [:clj-time "MM/dd/yyyy"]}
:multi #"\n"
:multi-match? #"^[0-9]+/[0-9]+/[0-9]+\s+INV "}
Customer with Address (Multi-line)
{:vendor "Bonanza Produce"
:keywords [#"530-544-4136"]
:extract {:invoice-number #"NO\s+(\d{8,})\s+\d{2}/\d{2}/\d{2}"
:date #"NO\s+\d{8,}\s+(\d{2}/\d{2}/\d{2})"
:customer-identifier #"(?s)I\s+([A-Z][A-Z\s]+?)\s{2,}.*?L\s+([0-9][A-Z0-9\s]+?)(?=\s{2,}|\n)"
:account-number #"(?s)L\s+([0-9][A-Z0-9\s]+?)(?=\s{2,}|\n)"
:total #"SHIPPED\s+[\d\.]+\s+TOTAL\s+([\d\.]+)"}
:parser {:date [:clj-time "MM/dd/yy"]
:total [:trim-commas nil]}}
Credit Memo (Negative Amounts)
{:vendor "General Produce Company"
:keywords [#"916-552-6495"]
:extract {:date #"DATE.*\n.*\n.*?([0-9]+/[0-9]+/[0-9]+)"
:invoice-number #"CREDIT NO.*\n.*\n.*?(\d{5,}?)\s+"
:account-number #"CUST NO.*\n.*\n\s+(\d+)"
:total #"TOTAL:\s+\|\s*(.*)"}
:parser {:date [:clj-time "MM/dd/yy"]
:total [:trim-commas-and-negate nil]}}
Complex Date Parsing
{:vendor "Ben E. Keith"
:keywords [#"BEN E. KEITH"]
:extract {:date #"Customer No Mo Day Yr.*?\n.*?\d{5,}\s{2,}(\d+\s+\d+\s+\d+)"
:customer-identifier #"Customer No Mo Day Yr.*?\n.*?(\d{5,})"
:invoice-number #"Invoice No.*?\n.*?(\d{8,})"
:total #"Total Invoice.*?\n.*?([\-]?[0-9]+\.[0-9]{2,})"}
:parser {:date [:month-day-year nil]
:total [:trim-commas-and-negate nil]}}
Multiple Date Formats
{:vendor "RNDC"
:keywords [#"P.O.Box 743564"]
:extract {:date #"(?:INVOICE|CREDIT) DATE\n(?:.*?)(\S+)\n"
:account-number #"Store Number:\s+(\d+)"
:invoice-number #"(?:INVOICE|CREDIT) DATE\n(?:.*?)\s{2,}(\d+?)\s+\S+\n"
:total #"Net Amount(?:.*\n){4}(?:.*?)([\-]?[0-9\.]+)\n"}
:parser {:date [:clj-time ["MM/dd/yy" "dd-MMM-yy"]]
:total [:trim-commas-and-negate nil]}}
Common Regex Patterns
Phone Numbers
#"\(?[0-9]{3}\)?[-.\s]?[0-9]{3}[-.\s]?[0-9]{4}"
Dollar Amounts
#"\$?([0-9,]+\.[0-9]{2})"
Dates (MM/dd/yy)
#"([0-9]{2}/[0-9]{2}/[0-9]{2})"
Dates (MM/dd/yyyy)
#"([0-9]{2}/[0-9]{2}/[0-9]{4})"
Multi-line Text (dotall mode)
#"(?s)start.*?end"
Non-greedy Match
#"(pattern.+?)"
Lookahead Boundary
#"value(?=\s{2,}|\n)"
Field Extraction Strategies
1. Simple Line-based
Use [^\n]* to match until end of line:
#"Invoice:\s+([^\n]+)"
2. Whitespace Boundary
Use (?=\s{2,}|\n) to stop at multiple spaces or newline:
#"Customer:\s+(.+?)(?=\s{2,}|\n)"
3. Specific Marker
Match until a specific pattern is found:
#"(?s)Start(.*?)End"
4. Multi-part Extraction
Use multiple capture groups for related fields:
#"Date:\s+(\d{2})/(\d{2})/(\d{2})"
Parser Options
Date Parsers
[:clj-time "MM/dd/yyyy"]- Standard US date[:clj-time "MM/dd/yy"]- 2-digit year[:clj-time "MMM dd, yyyy"]- Named month[:clj-time ["MM/dd/yy" "yyyy-MM-dd"]]- Multiple formats[:month-day-year nil]- Space-separated (1 15 26)
Number Parsers
[:trim-commas nil]- Remove commas from numbers[:trim-commas-and-negate nil]- Handle negative/credit amounts[:trim-commas-and-remove-dollars nil]- Remove $ and commasnil- No parsing, return raw string
Testing Patterns
Basic Test Structure
(deftest parse-vendor-invoice
(testing "Should parse vendor invoice"
(let [results (sut/parse-file (io/file "dev-resources/INVOICE.pdf")
"INVOICE.pdf")
result (first results)]
(is (some? result))
(is (= "Vendor" (:vendor-code result)))
(is (= "12345" (:invoice-number result))))))
Date Testing
(let [d (:date result)]
(is (= 2026 (time/year d)))
(is (= 1 (time/month d)))
(is (= 15 (time/day d))))
Multi-field Verification
(is (= "Expected Name" (:customer-identifier result)))
(is (= "Expected Street" (str/trim (:account-number result))))
(is (= "Expected City, ST 12345" (str/trim (:location result))))