Merge branch 'master' into clauding
This commit is contained in:
@@ -0,0 +1,132 @@
|
||||
---
|
||||
module: Invoice Parsing
|
||||
date: 2026-02-07
|
||||
problem_type: integration_failure
|
||||
component: pdf_template_parser
|
||||
symptoms:
|
||||
- "Bonanza Produce multi-invoice statement (13595522.pdf) fails to parse correctly"
|
||||
- "Single invoice template extracts only one invoice instead of four"
|
||||
- "Multi-invoice statement lacks I/L markers present in single invoices"
|
||||
- "Customer identifier extraction pattern requires different regex for statements"
|
||||
root_cause: template_inadequate
|
||||
resolution_type: template_fix
|
||||
severity: high
|
||||
tags: [pdf, parsing, invoice, bonanza-produce, multi-invoice, integration]
|
||||
---
|
||||
|
||||
# Bonanza Produce Multi-Invoice Statement Template Fix
|
||||
|
||||
## Problem
|
||||
|
||||
Bonanza Produce sends two different invoice formats:
|
||||
1. **Single invoices** (e.g., 03881260.pdf) with I/L markers and specific layout
|
||||
2. **Multi-invoice statements** (e.g., 13595522.pdf) containing 4 invoices per page
|
||||
|
||||
The single invoice template failed to parse multi-invoice statements because:
|
||||
- Multi-invoice statements lack the I/L (Invoice/Location) markers used in single invoice templates
|
||||
- The layout structure is completely different, with invoices listed as table rows instead of distinct sections
|
||||
- Customer identifier extraction requires a different regex pattern
|
||||
|
||||
## Environment
|
||||
|
||||
- Component: PDF Template Parser (Clojure)
|
||||
- Date: 2026-02-07
|
||||
- Test File: `test/clj/auto_ap/parse/templates_test.clj`
|
||||
- Template File: `src/clj/auto_ap/parse/templates.clj`
|
||||
- Test Document: `dev-resources/13595522.pdf` (4 invoices on single page)
|
||||
|
||||
## Symptoms
|
||||
|
||||
- Single invoice template only parses first invoice from multi-invoice statement
|
||||
- Parse returns single result instead of 4 separate invoice records
|
||||
- `:customer-identifier` extraction returns empty or incorrect values for statements
|
||||
- Test `parse-bonanza-produce-statement-13595522` expects 4 results but receives 1
|
||||
|
||||
## What Didn't Work
|
||||
|
||||
**Attempted Solution 1: Reuse single invoice template with `:multi` flag**
|
||||
- Added `:multi #"\n"` and `:multi-match?` pattern to existing single invoice template
|
||||
- **Why it failed:** The single invoice template's regex patterns (e.g., `I\s+([A-Z][A-Z\s]+?)\s{2,}.*?L\s+`) expect I/L markers that don't exist in multi-invoice statements. The layout structure is fundamentally different.
|
||||
|
||||
**Attempted Solution 2: Using simpler customer identifier pattern**
|
||||
- Tried pattern `#"(.*?)\s+RETURN"` extracted from multi-invoice statement text
|
||||
- **Why it failed:** This pattern alone doesn't account for the statement's column-based layout. Need to combine with `:multi` and `:multi-match?` flags to parse multiple invoices.
|
||||
|
||||
## Solution
|
||||
|
||||
Added a dedicated multi-invoice template that:
|
||||
1. Uses different keywords to identify multi-invoice statements
|
||||
2. Employs `:multi` and `:multi-match?` flags for multiple invoice extraction
|
||||
3. Uses simpler regex patterns suitable for the statement layout
|
||||
|
||||
**Implementation:**
|
||||
|
||||
```clojure
|
||||
;; Bonanza Produce Statement (multi-invoice)
|
||||
{:vendor "Bonanza Produce"
|
||||
:keywords [#"The perishable agricultural commodities" #"SPARKS, NEVADA"]
|
||||
:extract {:invoice-number #"^\s+[0-9]{2}/[0-9]{2}/[0-9]{2}\s+([0-9]+)\s+INVOICE"
|
||||
:customer-identifier #"(.*?)\s+RETURN"
|
||||
:date #"^\s+([0-9]{2}/[0-9]{2}/[0-9]{2})"
|
||||
:total #"^\s+[0-9]{2}/[0-9]{2}/[0-9]{2}\s+[0-9]+\s+INVOICE\s+([\d.]+)"}
|
||||
:parser {:date [:clj-time "MM/dd/yy"]
|
||||
:total [:trim-commas nil]}
|
||||
:multi #"\n"
|
||||
:multi-match? #"\s+[0-9]{2}/[0-9]{2}/[0-9]{2}\s+[0-9]+\s+INVOICE"}
|
||||
```
|
||||
|
||||
**Key differences from single invoice template:**
|
||||
- `:keywords`: Look for statement header text instead of phone number
|
||||
- `:customer-identifier`: Pattern `#"(.*?)\s+RETURN"` works for statement format
|
||||
- `:multi #"\n"`: Split results on newline boundaries
|
||||
- `:multi-match?`: Match invoice header pattern to identify individual invoices
|
||||
- No I/L markers: Patterns scan from left margin without location markers
|
||||
|
||||
## Why This Works
|
||||
|
||||
1. **Statement-specific keywords:** "The perishable agricultural commodities" and "SPARKS, NEVADA" uniquely identify multi-invoice statements vs. single invoices (which have phone number 530-544-4136)
|
||||
|
||||
2. **Multi-flag parsing:** The `:multi` and `:multi-match?` flags tell the parser to split the document on newlines and identify individual invoices using the date/invoice-number pattern, rather than treating the whole page as one invoice
|
||||
|
||||
3. **Simplified patterns:** Without I/L markers, patterns scan from line start (`^\s+`) and extract columns based on whitespace positions. The `:customer-identifier` pattern `(.*?)\s+RETURN` captures everything before "RETURN" on each line
|
||||
|
||||
4. **Separate templates:** Having distinct templates for single invoices vs. statements prevents conflict and allows optimization for each format
|
||||
|
||||
## Prevention
|
||||
|
||||
**When adding templates for vendors with multiple document formats:**
|
||||
|
||||
1. **Create separate templates:** Don't try to make one template handle both formats. Use distinct keywords to identify each format
|
||||
|
||||
2. **Test both single and multi-invoice documents:** Ensure templates parse expected number of invoices:
|
||||
```clojure
|
||||
(is (= 4 (count results)) "Should parse 4 invoices from statement")
|
||||
```
|
||||
|
||||
3. **Verify `:multi` usage:** Multi-invoice templates should have both `:multi` and `:multi-match?` flags:
|
||||
```clojure
|
||||
:multi #"\n"
|
||||
:multi-match? #"\s+[0-9]{2}/[0-9]{2}/[0-9]{2}\s+[0-9]+\s+INVOICE"
|
||||
```
|
||||
|
||||
4. **Check pattern scope:** Multi-invoice statements often lack structural markers (I/L), so patterns should:
|
||||
- Use `^\s+` to anchor at line start
|
||||
- Extract from whitespace-separated columns
|
||||
- Avoid patterns requiring specific markers
|
||||
|
||||
5. **Run all template tests:** Before committing, run:
|
||||
```bash
|
||||
lein test auto-ap.parse.templates-test
|
||||
```
|
||||
|
||||
## Related Issues
|
||||
|
||||
- Single invoice template: `src/clj/auto_ap/parse/templates.clj` lines 756-765
|
||||
- Similar multi-invoice patterns: Search for `:multi` and `:multi-match?` in `src/clj/auto_ap/parse/templates.clj`
|
||||
|
||||
## Key Files
|
||||
|
||||
- **Tests:** `test/clj/auto_ap/parse/templates_test.clj` (lines 36-53)
|
||||
- **Template:** `src/clj/auto_ap/parse/templates.clj` (lines 767-777)
|
||||
- **Test document:** `dev-resources/13595522.pdf`
|
||||
- **Template parser:** `src/clj/auto_ap/parse.clj`
|
||||
@@ -0,0 +1,133 @@
|
||||
---
|
||||
module: SSR Admin
|
||||
component: testing_framework
|
||||
date: '2026-02-07'
|
||||
problem_type: best_practice
|
||||
resolution_type: test_fix
|
||||
severity: medium
|
||||
root_cause: inadequate_documentation
|
||||
symptoms:
|
||||
- Route tests only verified HTTP status codes (200), not actual HTML content
|
||||
- No verification that route responses contain expected page elements
|
||||
- Could have false positives where routes return empty or wrong content
|
||||
rails_version: 7.1.0
|
||||
tags:
|
||||
- testing
|
||||
- routes
|
||||
- hiccup
|
||||
- html-verification
|
||||
- clojure
|
||||
- str-includes
|
||||
---
|
||||
|
||||
# Enhancing Route Tests with HTML Content Verification
|
||||
|
||||
## Problem
|
||||
|
||||
Route tests for the SSR admin modules (vendors and transaction-rules) were only verifying HTTP status codes, making them vulnerable to false positives. A route could return a 200 status but with empty or incorrect HTML content, and the tests would still pass.
|
||||
|
||||
## Symptoms
|
||||
|
||||
- Tests like `(is (= 200 (:status response)))` only checked HTTP status
|
||||
- No assertions about the actual HTML content returned
|
||||
- Route handlers could return malformed or empty hiccup vectors without test failures
|
||||
- Dialog routes could return generic HTML without the expected content
|
||||
|
||||
## Root Cause
|
||||
|
||||
Missing best practice for route testing in Clojure SSR applications. Unlike Rails controller tests that can use `assert_select` or Capybara matchers, there was no established pattern for verifying hiccup-rendered HTML content.
|
||||
|
||||
## Solution
|
||||
|
||||
Enhanced route tests to verify HTML content using `clojure.string/includes?` checks on the rendered HTML string.
|
||||
|
||||
### Implementation Pattern
|
||||
|
||||
```clojure
|
||||
;; BEFORE: Only status check
|
||||
(deftest page-route-returns-html-response
|
||||
(testing "Page route returns HTML response"
|
||||
(let [request {:identity (admin-token)}
|
||||
response ((get sut/key->handler :auto-ap.routes.admin.transaction-rules/page) request)]
|
||||
(is (= 200 (:status response))))))
|
||||
|
||||
;; AFTER: Status + content verification
|
||||
(deftest page-route-returns-html-response
|
||||
(testing "Page route returns HTML response"
|
||||
(let [request {:identity (admin-token)}
|
||||
response ((get sut/key->handler :auto-ap.routes.admin.transaction-rules/page) request)
|
||||
html-str (apply str (:body response))]
|
||||
(is (= 200 (:status response)))
|
||||
(is (str/includes? html-str "Transaction Rules")))))
|
||||
```
|
||||
|
||||
### Key Changes
|
||||
|
||||
1. **Convert body to string**: Use `(apply str (:body response))` to convert hiccup vectors to HTML string
|
||||
2. **Add content assertions**: Use `clojure.string/includes?` to verify expected content exists
|
||||
3. **Test-specific content**: Match content unique to that route (page titles, button text, entity names)
|
||||
|
||||
### Files Modified
|
||||
|
||||
- `test/clj/auto_ap/ssr/admin/vendors_test.clj`
|
||||
- Added `vendor-page-route-contains-vendor-content` test
|
||||
|
||||
- `test/clj/auto_ap/ssr/admin/transaction_rules_test.clj`
|
||||
- Enhanced 7 route tests with content verification:
|
||||
- `page-route-returns-html-response` → checks for "Transaction Rules"
|
||||
- `table-route-returns-table-data` → checks for "New Transaction Rule"
|
||||
- `edit-dialog-route-returns-dialog` → checks for entity-specific content
|
||||
- `account-typeahead-route-works` → checks for "account"
|
||||
- `location-select-route-works` → checks for "location"
|
||||
- `execute-dialog-route-works` → checks for "Code transactions"
|
||||
- `new-dialog-route-returns-empty-form` → checks for "Transaction rule"
|
||||
|
||||
### Testing Strategy
|
||||
|
||||
For each route, identify the minimal but specific content that indicates the route is working:
|
||||
|
||||
- **Page routes**: Check for page title or heading
|
||||
- **Dialog routes**: Check for dialog-specific button text or the entity name being edited
|
||||
- **Typeahead routes**: Check for the resource type (e.g., "account")
|
||||
- **Table routes**: Check for action buttons or empty state messages
|
||||
|
||||
## Prevention
|
||||
|
||||
When writing route tests, always:
|
||||
|
||||
1. ✅ Verify HTTP status code (200, 302, etc.)
|
||||
2. ✅ Verify response contains expected HTML content
|
||||
3. ✅ Use specific content unique to that route
|
||||
4. ✅ Avoid overly generic strings that might appear on any page
|
||||
|
||||
### Template for Route Tests
|
||||
|
||||
```clojure
|
||||
(deftest [route-name]-returns-expected-content
|
||||
(testing "[Route description]"
|
||||
(let [request {:identity (admin-token)
|
||||
;; Add route-params, query-params as needed
|
||||
}
|
||||
response ((get sut/key->handler :auto-ap.routes.[module]/[route]) request)
|
||||
html-str (apply str (:body response))]
|
||||
(is (= 200 (:status response)))
|
||||
(is (str/includes? html-str "[Expected content]")))))
|
||||
```
|
||||
|
||||
## Tools Used
|
||||
|
||||
- `clojure.string/includes?` - Simple string containment check
|
||||
- `apply str` - Converts hiccup vector to HTML string
|
||||
- No additional dependencies needed
|
||||
|
||||
## Benefits
|
||||
|
||||
- **Catches regressions**: Tests fail if route returns wrong content
|
||||
- **Self-documenting**: Test assertions describe expected behavior
|
||||
- **Lightweight**: No complex HTML parsing libraries required
|
||||
- **Fast**: String operations are performant
|
||||
|
||||
## Related
|
||||
|
||||
- Similar pattern could apply to any Clojure SSR application using hiccup
|
||||
- For more complex DOM assertions, consider adding hickory or enlive for structured HTML parsing
|
||||
Reference in New Issue
Block a user