Files
integreat/.opencode/skills/invoice-template-creator/references/examples.md
Bryce 8a0395dc4a Add Bonanza Produce multi-invoice statement template
- Added multi-invoice template for Bonanza Produce with :multi and :multi-match? flags
- Template uses keywords for statement header to identify multi-invoice format
- Extracts invoice-number, date, customer-identifier (from RETURN line), and total
- Parses 4 invoices from statement PDF 13595522.pdf
- All tests pass (29 assertions, 0 failures, 0 errors)

- Added test: parse-bonanza-produce-statement-13595522
- Updated invoice-template-creator skill: emphasized test-first approach
2026-02-08 07:56:14 -08:00

189 lines
4.9 KiB
Markdown

# Invoice Template Examples
## Simple Single Invoice
```clojure
{:vendor "Gstar Seafood"
:keywords [#"G Star Seafood"]
:extract {:total #"Total\s{2,}([\d\-,]+\.\d{2,2}+)"
:customer-identifier #"(.*?)(?:\s+)Invoice #"
:date #"Invoice Date\s{2,}([0-9]+/[0-9]+/[0-9]+)"
:invoice-number #"Invoice #\s+(\d+)"}
:parser {:date [:clj-time "MM/dd/yyyy"]
:total [:trim-commas nil]}}
```
## Multi-Invoice Statement
```clojure
{:vendor "Southbay Fresh Produce"
:keywords [#"(SOUTH BAY FRESH PRODUCE|SOUTH BAY PRODUCE)"]
:extract {:date #"^([0-9]+/[0-9]+/[0-9]+)"
:customer-identifier #"To:[^\n]*\n\s+([A-Za-z' ]+)\s{2}"
:invoice-number #"INV #\/(\d+)"
:total #"\$([0-9.]+)\."}
:parser {:date [:clj-time "MM/dd/yyyy"]}
:multi #"\n"
:multi-match? #"^[0-9]+/[0-9]+/[0-9]+\s+INV "}
```
## Customer with Address (Multi-line)
```clojure
{:vendor "Bonanza Produce"
:keywords [#"530-544-4136"]
:extract {:invoice-number #"NO\s+(\d{8,})\s+\d{2}/\d{2}/\d{2}"
:date #"NO\s+\d{8,}\s+(\d{2}/\d{2}/\d{2})"
:customer-identifier #"(?s)I\s+([A-Z][A-Z\s]+?)\s{2,}.*?L\s+([0-9][A-Z0-9\s]+?)(?=\s{2,}|\n)"
:account-number #"(?s)L\s+([0-9][A-Z0-9\s]+?)(?=\s{2,}|\n)"
:total #"SHIPPED\s+[\d\.]+\s+TOTAL\s+([\d\.]+)"}
:parser {:date [:clj-time "MM/dd/yy"]
:total [:trim-commas nil]}}
```
## Credit Memo (Negative Amounts)
```clojure
{:vendor "General Produce Company"
:keywords [#"916-552-6495"]
:extract {:date #"DATE.*\n.*\n.*?([0-9]+/[0-9]+/[0-9]+)"
:invoice-number #"CREDIT NO.*\n.*\n.*?(\d{5,}?)\s+"
:account-number #"CUST NO.*\n.*\n\s+(\d+)"
:total #"TOTAL:\s+\|\s*(.*)"}
:parser {:date [:clj-time "MM/dd/yy"]
:total [:trim-commas-and-negate nil]}}
```
## Complex Date Parsing
```clojure
{:vendor "Ben E. Keith"
:keywords [#"BEN E. KEITH"]
:extract {:date #"Customer No Mo Day Yr.*?\n.*?\d{5,}\s{2,}(\d+\s+\d+\s+\d+)"
:customer-identifier #"Customer No Mo Day Yr.*?\n.*?(\d{5,})"
:invoice-number #"Invoice No.*?\n.*?(\d{8,})"
:total #"Total Invoice.*?\n.*?([\-]?[0-9]+\.[0-9]{2,})"}
:parser {:date [:month-day-year nil]
:total [:trim-commas-and-negate nil]}}
```
## Multiple Date Formats
```clojure
{:vendor "RNDC"
:keywords [#"P.O.Box 743564"]
:extract {:date #"(?:INVOICE|CREDIT) DATE\n(?:.*?)(\S+)\n"
:account-number #"Store Number:\s+(\d+)"
:invoice-number #"(?:INVOICE|CREDIT) DATE\n(?:.*?)\s{2,}(\d+?)\s+\S+\n"
:total #"Net Amount(?:.*\n){4}(?:.*?)([\-]?[0-9\.]+)\n"}
:parser {:date [:clj-time ["MM/dd/yy" "dd-MMM-yy"]]
:total [:trim-commas-and-negate nil]}}
```
## Common Regex Patterns
### Phone Numbers
```clojure
#"\(?[0-9]{3}\)?[-.\s]?[0-9]{3}[-.\s]?[0-9]{4}"
```
### Dollar Amounts
```clojure
#"\$?([0-9,]+\.[0-9]{2})"
```
### Dates (MM/dd/yy)
```clojure
#"([0-9]{2}/[0-9]{2}/[0-9]{2})"
```
### Dates (MM/dd/yyyy)
```clojure
#"([0-9]{2}/[0-9]{2}/[0-9]{4})"
```
### Multi-line Text (dotall mode)
```clojure
#"(?s)start.*?end"
```
### Non-greedy Match
```clojure
#"(pattern.+?)"
```
### Lookahead Boundary
```clojure
#"value(?=\s{2,}|\n)"
```
## Field Extraction Strategies
### 1. Simple Line-based
Use `[^\n]*` to match until end of line:
```clojure
#"Invoice:\s+([^\n]+)"
```
### 2. Whitespace Boundary
Use `(?=\s{2,}|\n)` to stop at multiple spaces or newline:
```clojure
#"Customer:\s+(.+?)(?=\s{2,}|\n)"
```
### 3. Specific Marker
Match until a specific pattern is found:
```clojure
#"(?s)Start(.*?)End"
```
### 4. Multi-part Extraction
Use multiple capture groups for related fields:
```clojure
#"Date:\s+(\d{2})/(\d{2})/(\d{2})"
```
## Parser Options
### Date Parsers
- `[:clj-time "MM/dd/yyyy"]` - Standard US date
- `[:clj-time "MM/dd/yy"]` - 2-digit year
- `[:clj-time "MMM dd, yyyy"]` - Named month
- `[:clj-time ["MM/dd/yy" "yyyy-MM-dd"]]` - Multiple formats
- `[:month-day-year nil]` - Space-separated (1 15 26)
### Number Parsers
- `[:trim-commas nil]` - Remove commas from numbers
- `[:trim-commas-and-negate nil]` - Handle negative/credit amounts
- `[:trim-commas-and-remove-dollars nil]` - Remove $ and commas
- `nil` - No parsing, return raw string
## Testing Patterns
### Basic Test Structure
```clojure
(deftest parse-vendor-invoice
(testing "Should parse vendor invoice"
(let [results (sut/parse-file (io/file "dev-resources/INVOICE.pdf")
"INVOICE.pdf")
result (first results)]
(is (some? result))
(is (= "Vendor" (:vendor-code result)))
(is (= "12345" (:invoice-number result))))))
```
### Date Testing
```clojure
(let [d (:date result)]
(is (= 2026 (time/year d)))
(is (= 1 (time/month d)))
(is (= 15 (time/day d))))
```
### Multi-field Verification
```clojure
(is (= "Expected Name" (:customer-identifier result)))
(is (= "Expected Street" (str/trim (:account-number result))))
(is (= "Expected City, ST 12345" (str/trim (:location result))))
```