Add invoice-template-creator skill for automated template generation
New repository-based skill at .claude/skills/invoice-template-creator/: - SKILL.md: Complete guide for creating invoice parsing templates - references/examples.md: Common patterns and template examples - Covers vendor identification, regex patterns, field extraction - Includes testing strategies and common pitfalls Updated AGENTS.md with reference to the new skill. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
201
.claude/skills/invoice-template-creator/SKILL.md
Normal file
201
.claude/skills/invoice-template-creator/SKILL.md
Normal file
@@ -0,0 +1,201 @@
|
||||
---
|
||||
name: invoice-template-creator
|
||||
description: This skill creates PDF invoice parsing templates for the Integreat system. It should be used when adding support for a new vendor invoice format that needs to be automatically parsed.
|
||||
license: Complete terms in LICENSE.txt
|
||||
---
|
||||
|
||||
# Invoice Template Creator
|
||||
|
||||
This skill automates the creation of invoice parsing templates for the Integreat system. It generates both the template definition and a corresponding test file based on a sample PDF invoice.
|
||||
|
||||
## When to Use This Skill
|
||||
|
||||
Use this skill when you need to add support for a new vendor invoice format that cannot be parsed by existing templates. This typically happens when:
|
||||
|
||||
- A new vendor sends invoices in a unique format
|
||||
- An existing vendor changes their invoice layout
|
||||
- You encounter an invoice that fails to parse with current templates
|
||||
|
||||
## Prerequisites
|
||||
|
||||
Before using this skill, ensure you have:
|
||||
|
||||
1. A sample PDF invoice file placed in `dev-resources/` directory
|
||||
2. Identified the vendor name
|
||||
3. Identified unique text patterns in the invoice (phone numbers, addresses, etc.) that can distinguish this vendor
|
||||
4. Know the expected values for key fields (invoice number, date, customer name, total)
|
||||
|
||||
## Usage Workflow
|
||||
|
||||
### Step 1: Analyze the PDF
|
||||
|
||||
First, extract and analyze the PDF text to understand its structure:
|
||||
|
||||
```bash
|
||||
pdftotext -layout "dev-resources/FILENAME.pdf" -
|
||||
```
|
||||
|
||||
Look for:
|
||||
- **Vendor identifiers**: Phone numbers, addresses, or unique text that identifies this vendor
|
||||
- **Field patterns**: How invoice number, date, customer name, and total appear in the text
|
||||
- **Layout quirks**: Multi-line fields, special formatting, or unusual spacing
|
||||
|
||||
### Step 2: Define Expected Values
|
||||
|
||||
Document the expected values for each field:
|
||||
|
||||
| Field | Expected Value | Notes |
|
||||
|-------|---------------|-------|
|
||||
| Vendor Name | "Vendor Name" | Company name as it should appear |
|
||||
| Invoice Number | "12345" | The invoice identifier |
|
||||
| Date | "01/15/26" | Format found in PDF |
|
||||
| Customer Name | "Customer Name" | As it appears on invoice |
|
||||
| Customer Address | "123 Main St" | Street address if available |
|
||||
| Total | "100.00" | Amount |
|
||||
|
||||
### Step 3: Create the Template and Test
|
||||
|
||||
The skill will:
|
||||
|
||||
1. **Create a test file** at `test/clj/auto_ap/parse/templates_test.clj` (or add to existing)
|
||||
- Test parses the PDF file
|
||||
- Verifies all expected values are extracted correctly
|
||||
- Follows existing test patterns
|
||||
|
||||
2. **Add template to** `src/clj/auto_ap/parse/templates.clj`
|
||||
- Adds entry to `pdf-templates` vector
|
||||
- Includes:
|
||||
- `:vendor` - Vendor name
|
||||
- `:keywords` - Regex patterns to identify this vendor (must match all)
|
||||
- `:extract` - Regex patterns for each field
|
||||
- `:parser` - Optional date/number parsers
|
||||
|
||||
### Step 4: Iterative Refinement
|
||||
|
||||
Run the test to see if it passes:
|
||||
|
||||
```bash
|
||||
lein test auto-ap.parse.templates-test
|
||||
```
|
||||
|
||||
If it fails, examine the debug output and refine the regex patterns. Common issues:
|
||||
|
||||
- **Template doesn't match**: Keywords don't actually appear in the PDF text
|
||||
- **Field is nil**: Regex capture group doesn't match the actual text format
|
||||
- **Wrong value captured**: Regex is too greedy or matches wrong text
|
||||
|
||||
## Template Structure Reference
|
||||
|
||||
### Basic Template Format
|
||||
|
||||
```clojure
|
||||
{:vendor "Vendor Name"
|
||||
:keywords [#"unique-pattern-1" #"unique-pattern-2"]
|
||||
:extract {:invoice-number #"Invoice\s+#\s+(\d+)"
|
||||
:date #"Date:\s+(\d{2}/\d{2}/\d{2})"
|
||||
:customer-identifier #"Bill To:\s+([A-Za-z\s]+)"
|
||||
:total #"Total:\s+\$([\d,]+\.\d{2})"}
|
||||
:parser {:date [:clj-time "MM/dd/yy"]
|
||||
:total [:trim-commas nil]}}
|
||||
```
|
||||
|
||||
### Field Extraction Patterns
|
||||
|
||||
**Invoice Number:**
|
||||
- Look for: `"Invoice #12345"` or `"INV: 12345"`
|
||||
- Pattern: `#"Invoice\s*#?\s*(\d+)"` or `#"INV:\s*(\d+)"`
|
||||
|
||||
**Date:**
|
||||
- Common formats: `"01/15/26"`, `"Jan 15, 2026"`, `"2026-01-15"`
|
||||
- Pattern: `#"(\d{2}/\d{2}/\d{2})"` for MM/dd/yy
|
||||
- Parser: `:date [:clj-time "MM/dd/yy"]`
|
||||
|
||||
**Customer Identifier:**
|
||||
- Look for: `"Bill To: Customer Name"` or `"Sold To: Customer Name"`
|
||||
- Pattern: `#"Bill To:\s+([A-Za-z\s]+?)(?=\s{2,}|\n)"`
|
||||
- Use non-greedy `+?` and lookahead `(?=...)` to stop at boundaries
|
||||
|
||||
**Total:**
|
||||
- Look for: `"Total: $100.00"` or `"Amount Due: 100.00"`
|
||||
- Pattern: `#"Total:\s+\$?([\d,]+\.\d{2})"`
|
||||
- Parser: `:total [:trim-commas nil]` removes commas
|
||||
|
||||
### Advanced Patterns
|
||||
|
||||
**Multi-line customer address:**
|
||||
When customer info spans multiple lines (name + address):
|
||||
|
||||
```clojure
|
||||
:customer-identifier #"(?s)I\s+([A-Z][A-Z\s]+?)\s{2,}.*?L\s+([0-9][A-Z0-9\s]+?)(?=\s{2,}|\n)"
|
||||
:account-number #"(?s)L\s+([0-9][A-Z0-9\s]+?)(?=\s{2,}|\n)"
|
||||
```
|
||||
|
||||
The `(?s)` flag makes `.` match newlines. Use non-greedy `+?` and lookaheads `(?=...)` to capture clean values.
|
||||
|
||||
**Multiple date formats:**
|
||||
|
||||
```clojure
|
||||
:parser {:date [:clj-time ["MM/dd/yy" "yyyy-MM-dd"]]}
|
||||
```
|
||||
|
||||
**Credit memos (negative amounts):**
|
||||
|
||||
```clojure
|
||||
:parser {:total [:trim-commas-and-negate nil]}
|
||||
```
|
||||
|
||||
## Testing Best Practices
|
||||
|
||||
1. **Start with a failing test** - Define expected values before implementing
|
||||
2. **Test actual PDF parsing** - Use `parse-file` or `parse` with real PDF text
|
||||
3. **Verify each field individually** - Separate assertions for clarity
|
||||
4. **Handle date comparisons carefully** - Compare year/month/day separately if needed
|
||||
5. **Use `str/trim`** - Account for extra whitespace in extracted values
|
||||
|
||||
## Example Test Structure
|
||||
|
||||
```clojure
|
||||
(deftest parse-vendor-invoice-12345
|
||||
(testing "Should parse Vendor invoice with expected values"
|
||||
(let [results (sut/parse-file (io/file "dev-resources/INVOICE.pdf")
|
||||
"INVOICE.pdf")
|
||||
result (first results)]
|
||||
(is (some? results) "Should return results")
|
||||
(is (some? result) "Template should match")
|
||||
(when result
|
||||
(is (= "Vendor Name" (:vendor-code result)))
|
||||
(is (= "12345" (:invoice-number result)))
|
||||
(is (= "Customer Name" (:customer-identifier result)))
|
||||
(is (= "100.00" (:total result)))))))
|
||||
```
|
||||
|
||||
## Common Pitfalls
|
||||
|
||||
1. **Keywords must all match** - Every pattern in `:keywords` must be found in the PDF
|
||||
2. **Capture groups required** - Regexes need `()` to extract values
|
||||
3. **PDF text != visual text** - Layout may differ from what you see visually
|
||||
4. **Greedy quantifiers** - Use `+?` instead of `+` to avoid over-matching
|
||||
5. **Case sensitivity** - Regex is case-sensitive unless you use `(?i)` flag
|
||||
|
||||
## Post-Creation Checklist
|
||||
|
||||
After creating the template:
|
||||
|
||||
- [ ] Test passes: `lein test auto-ap.parse.templates-test`
|
||||
- [ ] Format is correct: `lein cljfmt check`
|
||||
- [ ] Code compiles: `lein check`
|
||||
- [ ] Template is in correct position in `pdf-templates` vector
|
||||
- [ ] Keywords uniquely identify this vendor (won't match other templates)
|
||||
- [ ] Test file follows naming conventions
|
||||
|
||||
## Integration with Workflow
|
||||
|
||||
This skill is typically used as part of a larger workflow:
|
||||
|
||||
1. User provides PDF and requirements
|
||||
2. This skill creates template and test
|
||||
3. User reviews and refines if needed
|
||||
4. Test is run to verify extraction
|
||||
5. Code is committed
|
||||
|
||||
The skill ensures consistency with existing patterns and reduces manual boilerplate when adding new vendor support.
|
||||
188
.claude/skills/invoice-template-creator/references/examples.md
Normal file
188
.claude/skills/invoice-template-creator/references/examples.md
Normal file
@@ -0,0 +1,188 @@
|
||||
# Invoice Template Examples
|
||||
|
||||
## Simple Single Invoice
|
||||
|
||||
```clojure
|
||||
{:vendor "Gstar Seafood"
|
||||
:keywords [#"G Star Seafood"]
|
||||
:extract {:total #"Total\s{2,}([\d\-,]+\.\d{2,2}+)"
|
||||
:customer-identifier #"(.*?)(?:\s+)Invoice #"
|
||||
:date #"Invoice Date\s{2,}([0-9]+/[0-9]+/[0-9]+)"
|
||||
:invoice-number #"Invoice #\s+(\d+)"}
|
||||
:parser {:date [:clj-time "MM/dd/yyyy"]
|
||||
:total [:trim-commas nil]}}
|
||||
```
|
||||
|
||||
## Multi-Invoice Statement
|
||||
|
||||
```clojure
|
||||
{:vendor "Southbay Fresh Produce"
|
||||
:keywords [#"(SOUTH BAY FRESH PRODUCE|SOUTH BAY PRODUCE)"]
|
||||
:extract {:date #"^([0-9]+/[0-9]+/[0-9]+)"
|
||||
:customer-identifier #"To:[^\n]*\n\s+([A-Za-z' ]+)\s{2}"
|
||||
:invoice-number #"INV #\/(\d+)"
|
||||
:total #"\$([0-9.]+)\."}
|
||||
:parser {:date [:clj-time "MM/dd/yyyy"]}
|
||||
:multi #"\n"
|
||||
:multi-match? #"^[0-9]+/[0-9]+/[0-9]+\s+INV "}
|
||||
```
|
||||
|
||||
## Customer with Address (Multi-line)
|
||||
|
||||
```clojure
|
||||
{:vendor "Bonanza Produce"
|
||||
:keywords [#"530-544-4136"]
|
||||
:extract {:invoice-number #"NO\s+(\d{8,})\s+\d{2}/\d{2}/\d{2}"
|
||||
:date #"NO\s+\d{8,}\s+(\d{2}/\d{2}/\d{2})"
|
||||
:customer-identifier #"(?s)I\s+([A-Z][A-Z\s]+?)\s{2,}.*?L\s+([0-9][A-Z0-9\s]+?)(?=\s{2,}|\n)"
|
||||
:account-number #"(?s)L\s+([0-9][A-Z0-9\s]+?)(?=\s{2,}|\n)"
|
||||
:total #"SHIPPED\s+[\d\.]+\s+TOTAL\s+([\d\.]+)"}
|
||||
:parser {:date [:clj-time "MM/dd/yy"]
|
||||
:total [:trim-commas nil]}}
|
||||
```
|
||||
|
||||
## Credit Memo (Negative Amounts)
|
||||
|
||||
```clojure
|
||||
{:vendor "General Produce Company"
|
||||
:keywords [#"916-552-6495"]
|
||||
:extract {:date #"DATE.*\n.*\n.*?([0-9]+/[0-9]+/[0-9]+)"
|
||||
:invoice-number #"CREDIT NO.*\n.*\n.*?(\d{5,}?)\s+"
|
||||
:account-number #"CUST NO.*\n.*\n\s+(\d+)"
|
||||
:total #"TOTAL:\s+\|\s*(.*)"}
|
||||
:parser {:date [:clj-time "MM/dd/yy"]
|
||||
:total [:trim-commas-and-negate nil]}}
|
||||
```
|
||||
|
||||
## Complex Date Parsing
|
||||
|
||||
```clojure
|
||||
{:vendor "Ben E. Keith"
|
||||
:keywords [#"BEN E. KEITH"]
|
||||
:extract {:date #"Customer No Mo Day Yr.*?\n.*?\d{5,}\s{2,}(\d+\s+\d+\s+\d+)"
|
||||
:customer-identifier #"Customer No Mo Day Yr.*?\n.*?(\d{5,})"
|
||||
:invoice-number #"Invoice No.*?\n.*?(\d{8,})"
|
||||
:total #"Total Invoice.*?\n.*?([\-]?[0-9]+\.[0-9]{2,})"}
|
||||
:parser {:date [:month-day-year nil]
|
||||
:total [:trim-commas-and-negate nil]}}
|
||||
```
|
||||
|
||||
## Multiple Date Formats
|
||||
|
||||
```clojure
|
||||
{:vendor "RNDC"
|
||||
:keywords [#"P.O.Box 743564"]
|
||||
:extract {:date #"(?:INVOICE|CREDIT) DATE\n(?:.*?)(\S+)\n"
|
||||
:account-number #"Store Number:\s+(\d+)"
|
||||
:invoice-number #"(?:INVOICE|CREDIT) DATE\n(?:.*?)\s{2,}(\d+?)\s+\S+\n"
|
||||
:total #"Net Amount(?:.*\n){4}(?:.*?)([\-]?[0-9\.]+)\n"}
|
||||
:parser {:date [:clj-time ["MM/dd/yy" "dd-MMM-yy"]]
|
||||
:total [:trim-commas-and-negate nil]}}
|
||||
```
|
||||
|
||||
## Common Regex Patterns
|
||||
|
||||
### Phone Numbers
|
||||
```clojure
|
||||
#"\(?[0-9]{3}\)?[-.\s]?[0-9]{3}[-.\s]?[0-9]{4}"
|
||||
```
|
||||
|
||||
### Dollar Amounts
|
||||
```clojure
|
||||
#"\$?([0-9,]+\.[0-9]{2})"
|
||||
```
|
||||
|
||||
### Dates (MM/dd/yy)
|
||||
```clojure
|
||||
#"([0-9]{2}/[0-9]{2}/[0-9]{2})"
|
||||
```
|
||||
|
||||
### Dates (MM/dd/yyyy)
|
||||
```clojure
|
||||
#"([0-9]{2}/[0-9]{2}/[0-9]{4})"
|
||||
```
|
||||
|
||||
### Multi-line Text (dotall mode)
|
||||
```clojure
|
||||
#"(?s)start.*?end"
|
||||
```
|
||||
|
||||
### Non-greedy Match
|
||||
```clojure
|
||||
#"(pattern.+?)"
|
||||
```
|
||||
|
||||
### Lookahead Boundary
|
||||
```clojure
|
||||
#"value(?=\s{2,}|\n)"
|
||||
```
|
||||
|
||||
## Field Extraction Strategies
|
||||
|
||||
### 1. Simple Line-based
|
||||
Use `[^\n]*` to match until end of line:
|
||||
```clojure
|
||||
#"Invoice:\s+([^\n]+)"
|
||||
```
|
||||
|
||||
### 2. Whitespace Boundary
|
||||
Use `(?=\s{2,}|\n)` to stop at multiple spaces or newline:
|
||||
```clojure
|
||||
#"Customer:\s+(.+?)(?=\s{2,}|\n)"
|
||||
```
|
||||
|
||||
### 3. Specific Marker
|
||||
Match until a specific pattern is found:
|
||||
```clojure
|
||||
#"(?s)Start(.*?)End"
|
||||
```
|
||||
|
||||
### 4. Multi-part Extraction
|
||||
Use multiple capture groups for related fields:
|
||||
```clojure
|
||||
#"Date:\s+(\d{2})/(\d{2})/(\d{2})"
|
||||
```
|
||||
|
||||
## Parser Options
|
||||
|
||||
### Date Parsers
|
||||
- `[:clj-time "MM/dd/yyyy"]` - Standard US date
|
||||
- `[:clj-time "MM/dd/yy"]` - 2-digit year
|
||||
- `[:clj-time "MMM dd, yyyy"]` - Named month
|
||||
- `[:clj-time ["MM/dd/yy" "yyyy-MM-dd"]]` - Multiple formats
|
||||
- `[:month-day-year nil]` - Space-separated (1 15 26)
|
||||
|
||||
### Number Parsers
|
||||
- `[:trim-commas nil]` - Remove commas from numbers
|
||||
- `[:trim-commas-and-negate nil]` - Handle negative/credit amounts
|
||||
- `[:trim-commas-and-remove-dollars nil]` - Remove $ and commas
|
||||
- `nil` - No parsing, return raw string
|
||||
|
||||
## Testing Patterns
|
||||
|
||||
### Basic Test Structure
|
||||
```clojure
|
||||
(deftest parse-vendor-invoice
|
||||
(testing "Should parse vendor invoice"
|
||||
(let [results (sut/parse-file (io/file "dev-resources/INVOICE.pdf")
|
||||
"INVOICE.pdf")
|
||||
result (first results)]
|
||||
(is (some? result))
|
||||
(is (= "Vendor" (:vendor-code result)))
|
||||
(is (= "12345" (:invoice-number result))))))
|
||||
```
|
||||
|
||||
### Date Testing
|
||||
```clojure
|
||||
(let [d (:date result)]
|
||||
(is (= 2026 (time/year d)))
|
||||
(is (= 1 (time/month d)))
|
||||
(is (= 15 (time/day d))))
|
||||
```
|
||||
|
||||
### Multi-field Verification
|
||||
```clojure
|
||||
(is (= "Expected Name" (:customer-identifier result)))
|
||||
(is (= "Expected Street" (str/trim (:account-number result))))
|
||||
(is (= "Expected City, ST 12345" (str/trim (:location result))))
|
||||
```
|
||||
Reference in New Issue
Block a user