- Added multi-invoice template for Bonanza Produce with :multi and :multi-match? flags - Template uses keywords for statement header to identify multi-invoice format - Extracts invoice-number, date, customer-identifier (from RETURN line), and total - Parses 4 invoices from statement PDF 13595522.pdf - All tests pass (29 assertions, 0 failures, 0 errors) - Added test: parse-bonanza-produce-statement-13595522 - Updated invoice-template-creator skill: emphasized test-first approach
189 lines
4.9 KiB
Markdown
189 lines
4.9 KiB
Markdown
# Invoice Template Examples
|
|
|
|
## Simple Single Invoice
|
|
|
|
```clojure
|
|
{:vendor "Gstar Seafood"
|
|
:keywords [#"G Star Seafood"]
|
|
:extract {:total #"Total\s{2,}([\d\-,]+\.\d{2,2}+)"
|
|
:customer-identifier #"(.*?)(?:\s+)Invoice #"
|
|
:date #"Invoice Date\s{2,}([0-9]+/[0-9]+/[0-9]+)"
|
|
:invoice-number #"Invoice #\s+(\d+)"}
|
|
:parser {:date [:clj-time "MM/dd/yyyy"]
|
|
:total [:trim-commas nil]}}
|
|
```
|
|
|
|
## Multi-Invoice Statement
|
|
|
|
```clojure
|
|
{:vendor "Southbay Fresh Produce"
|
|
:keywords [#"(SOUTH BAY FRESH PRODUCE|SOUTH BAY PRODUCE)"]
|
|
:extract {:date #"^([0-9]+/[0-9]+/[0-9]+)"
|
|
:customer-identifier #"To:[^\n]*\n\s+([A-Za-z' ]+)\s{2}"
|
|
:invoice-number #"INV #\/(\d+)"
|
|
:total #"\$([0-9.]+)\."}
|
|
:parser {:date [:clj-time "MM/dd/yyyy"]}
|
|
:multi #"\n"
|
|
:multi-match? #"^[0-9]+/[0-9]+/[0-9]+\s+INV "}
|
|
```
|
|
|
|
## Customer with Address (Multi-line)
|
|
|
|
```clojure
|
|
{:vendor "Bonanza Produce"
|
|
:keywords [#"530-544-4136"]
|
|
:extract {:invoice-number #"NO\s+(\d{8,})\s+\d{2}/\d{2}/\d{2}"
|
|
:date #"NO\s+\d{8,}\s+(\d{2}/\d{2}/\d{2})"
|
|
:customer-identifier #"(?s)I\s+([A-Z][A-Z\s]+?)\s{2,}.*?L\s+([0-9][A-Z0-9\s]+?)(?=\s{2,}|\n)"
|
|
:account-number #"(?s)L\s+([0-9][A-Z0-9\s]+?)(?=\s{2,}|\n)"
|
|
:total #"SHIPPED\s+[\d\.]+\s+TOTAL\s+([\d\.]+)"}
|
|
:parser {:date [:clj-time "MM/dd/yy"]
|
|
:total [:trim-commas nil]}}
|
|
```
|
|
|
|
## Credit Memo (Negative Amounts)
|
|
|
|
```clojure
|
|
{:vendor "General Produce Company"
|
|
:keywords [#"916-552-6495"]
|
|
:extract {:date #"DATE.*\n.*\n.*?([0-9]+/[0-9]+/[0-9]+)"
|
|
:invoice-number #"CREDIT NO.*\n.*\n.*?(\d{5,}?)\s+"
|
|
:account-number #"CUST NO.*\n.*\n\s+(\d+)"
|
|
:total #"TOTAL:\s+\|\s*(.*)"}
|
|
:parser {:date [:clj-time "MM/dd/yy"]
|
|
:total [:trim-commas-and-negate nil]}}
|
|
```
|
|
|
|
## Complex Date Parsing
|
|
|
|
```clojure
|
|
{:vendor "Ben E. Keith"
|
|
:keywords [#"BEN E. KEITH"]
|
|
:extract {:date #"Customer No Mo Day Yr.*?\n.*?\d{5,}\s{2,}(\d+\s+\d+\s+\d+)"
|
|
:customer-identifier #"Customer No Mo Day Yr.*?\n.*?(\d{5,})"
|
|
:invoice-number #"Invoice No.*?\n.*?(\d{8,})"
|
|
:total #"Total Invoice.*?\n.*?([\-]?[0-9]+\.[0-9]{2,})"}
|
|
:parser {:date [:month-day-year nil]
|
|
:total [:trim-commas-and-negate nil]}}
|
|
```
|
|
|
|
## Multiple Date Formats
|
|
|
|
```clojure
|
|
{:vendor "RNDC"
|
|
:keywords [#"P.O.Box 743564"]
|
|
:extract {:date #"(?:INVOICE|CREDIT) DATE\n(?:.*?)(\S+)\n"
|
|
:account-number #"Store Number:\s+(\d+)"
|
|
:invoice-number #"(?:INVOICE|CREDIT) DATE\n(?:.*?)\s{2,}(\d+?)\s+\S+\n"
|
|
:total #"Net Amount(?:.*\n){4}(?:.*?)([\-]?[0-9\.]+)\n"}
|
|
:parser {:date [:clj-time ["MM/dd/yy" "dd-MMM-yy"]]
|
|
:total [:trim-commas-and-negate nil]}}
|
|
```
|
|
|
|
## Common Regex Patterns
|
|
|
|
### Phone Numbers
|
|
```clojure
|
|
#"\(?[0-9]{3}\)?[-.\s]?[0-9]{3}[-.\s]?[0-9]{4}"
|
|
```
|
|
|
|
### Dollar Amounts
|
|
```clojure
|
|
#"\$?([0-9,]+\.[0-9]{2})"
|
|
```
|
|
|
|
### Dates (MM/dd/yy)
|
|
```clojure
|
|
#"([0-9]{2}/[0-9]{2}/[0-9]{2})"
|
|
```
|
|
|
|
### Dates (MM/dd/yyyy)
|
|
```clojure
|
|
#"([0-9]{2}/[0-9]{2}/[0-9]{4})"
|
|
```
|
|
|
|
### Multi-line Text (dotall mode)
|
|
```clojure
|
|
#"(?s)start.*?end"
|
|
```
|
|
|
|
### Non-greedy Match
|
|
```clojure
|
|
#"(pattern.+?)"
|
|
```
|
|
|
|
### Lookahead Boundary
|
|
```clojure
|
|
#"value(?=\s{2,}|\n)"
|
|
```
|
|
|
|
## Field Extraction Strategies
|
|
|
|
### 1. Simple Line-based
|
|
Use `[^\n]*` to match until end of line:
|
|
```clojure
|
|
#"Invoice:\s+([^\n]+)"
|
|
```
|
|
|
|
### 2. Whitespace Boundary
|
|
Use `(?=\s{2,}|\n)` to stop at multiple spaces or newline:
|
|
```clojure
|
|
#"Customer:\s+(.+?)(?=\s{2,}|\n)"
|
|
```
|
|
|
|
### 3. Specific Marker
|
|
Match until a specific pattern is found:
|
|
```clojure
|
|
#"(?s)Start(.*?)End"
|
|
```
|
|
|
|
### 4. Multi-part Extraction
|
|
Use multiple capture groups for related fields:
|
|
```clojure
|
|
#"Date:\s+(\d{2})/(\d{2})/(\d{2})"
|
|
```
|
|
|
|
## Parser Options
|
|
|
|
### Date Parsers
|
|
- `[:clj-time "MM/dd/yyyy"]` - Standard US date
|
|
- `[:clj-time "MM/dd/yy"]` - 2-digit year
|
|
- `[:clj-time "MMM dd, yyyy"]` - Named month
|
|
- `[:clj-time ["MM/dd/yy" "yyyy-MM-dd"]]` - Multiple formats
|
|
- `[:month-day-year nil]` - Space-separated (1 15 26)
|
|
|
|
### Number Parsers
|
|
- `[:trim-commas nil]` - Remove commas from numbers
|
|
- `[:trim-commas-and-negate nil]` - Handle negative/credit amounts
|
|
- `[:trim-commas-and-remove-dollars nil]` - Remove $ and commas
|
|
- `nil` - No parsing, return raw string
|
|
|
|
## Testing Patterns
|
|
|
|
### Basic Test Structure
|
|
```clojure
|
|
(deftest parse-vendor-invoice
|
|
(testing "Should parse vendor invoice"
|
|
(let [results (sut/parse-file (io/file "dev-resources/INVOICE.pdf")
|
|
"INVOICE.pdf")
|
|
result (first results)]
|
|
(is (some? result))
|
|
(is (= "Vendor" (:vendor-code result)))
|
|
(is (= "12345" (:invoice-number result))))))
|
|
```
|
|
|
|
### Date Testing
|
|
```clojure
|
|
(let [d (:date result)]
|
|
(is (= 2026 (time/year d)))
|
|
(is (= 1 (time/month d)))
|
|
(is (= 15 (time/day d))))
|
|
```
|
|
|
|
### Multi-field Verification
|
|
```clojure
|
|
(is (= "Expected Name" (:customer-identifier result)))
|
|
(is (= "Expected Street" (str/trim (:account-number result))))
|
|
(is (= "Expected City, ST 12345" (str/trim (:location result))))
|
|
```
|