Files
integreat/docs/plans/2026-02-07-feat-add-invoice-template-03881260-plan.md
Bryce 37351e5f92 Add Bonanza Produce invoice template
- Add new PDF template for Bonanza Produce vendor
- Template uses phone number 530-544-4136 as unique identifier
- Extracts invoice number, date, customer identifier, and total
- Includes passing test for invoice 03881260

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2026-02-08 07:50:42 -08:00

7.8 KiB

title, type, date, status
title type date status
Add New Invoice Template for Produce Distributor (Invoice 03881260) feat 2026-02-07 completed

Add New Invoice Template for Produce Distributor (Invoice 03881260)

Status: Completed

Summary: Successfully implemented a new PDF parsing template for Bonanza Produce invoices. All tests pass.

Overview

Implement a new PDF parsing template for a produce/food distributor invoice type. The invoice originates from a distributor with multiple locations (South Lake Tahoe, Sparks NV, Elko NV) and serves customers like "NICK THE GREEK".

Problem Statement / Motivation

Currently, invoices from this produce distributor cannot be automatically parsed, requiring manual data entry. The invoice has a unique layout with multiple warehouse locations and specific formatting that doesn't match existing templates.

Proposed Solution

Add a new template entry to src/clj/auto_ap/parse/templates.clj for Bonanza Produce with regex patterns to extract:

  • Invoice number
  • Date (MM/dd/yy format)
  • Customer identifier (including address for disambiguation)
  • Total amount

Technical Considerations

Vendor Identification Strategy

Vendor Name: Bonanza Produce

Based on the PDF analysis, use these unique identifiers as keywords:

  • "3717 OSGOOD AVE" - Unique South Lake Tahoe address
  • "SPARKS, NEVADA" - Primary warehouse location
  • "1925 FREEPORT BLVD" - Sparks warehouse address

Recommended keyword combination: [#"3717 OSGOOD AVE" #"SPARKS, NEVADA"] - These two together uniquely identify this vendor.

Extract Patterns Required

From the PDF text analysis:

Field Value in PDF Proposed Regex
:invoice-number 03881260 #"INVOICE\s+(\d+)"
:date 01/20/26 #"(\d{2}/\d{2}/\d{2})" (after invoice #)
:customer-identifier NICK THE GREEK #"BILL TO.*\n\s+([A-Z][A-Z\s]+)"
:total 23.22 #"TOTAL\s+([\d\.]+)" or #"TOTAL\s+([\d\.]+)\s*$" (end of line)

Parser Configuration

:parser {:date [:clj-time "MM/dd/yy"]
         :total [:trim-commas nil]}

Date format note: The invoice uses 2-digit year format (01/20/26), so use "MM/dd/yy" format string.

Template Structure

{:vendor "Bonanza Produce"
 :keywords [#"3717 OSGOOD AVE" #"SPARKS, NEVADA"]
 :extract {:invoice-number #"INVOICE\s+(\d+)"
           :date #"INVOICE\s+\d+\s+(\d{2}/\d{2}/\d{2})"
           :customer-identifier #"BILL TO.*?\n\s+([A-Z][A-Z\s]+)(?:\s{2,}|\n)"
           :total #"TOTAL\s+([\d\.]+)(?:\s*$|\s+TOTAL)"}
 :parser {:date [:clj-time "MM/dd/yy"]
          :total [:trim-commas nil]}}

Open Questions

  1. Is this a single invoice or multi-invoice document?

    • Current PDF shows single invoice
    • Check if statements from this vendor contain multiple invoices
    • If multi-invoice, need :multi and :multi-match? keys
  2. Are credit memos formatted differently?

    • Current example shows standard invoice
    • Need to verify if credits have different layout
    • May need separate template for credit memos
  3. How to capture the full customer address in the regex?

    • The customer name is on one line: "NICK THE GREEK"
    • The street address is on the next line: "600 VISTA WAY"
    • The city/state/zip is on the third line: "MILPITAS, CA 95035"
    • The regex needs to span multiple lines to capture all three components

Acceptance Criteria

  • Template successfully matches invoices from this vendor
  • Correctly extracts invoice number (e.g., 03881260)
  • Correctly extracts date and parses to proper format
  • Correctly extracts customer identifier (e.g., NICK THE GREEK)
  • Correctly extracts total amount (e.g., 23.22)
  • Parser handles edge cases (commas in amounts, different date formats)
  • Tested with at least 3 different invoices from this vendor

Implementation Steps

Phase 1: Extract PDF Text

# Convert PDF to text for analysis
pdftotext -layout "dev-resources/INVOICE - 03881260.pdf" -

Phase 2: Determine Vendor Name

  1. Examine the PDF header for company name/logo
  2. Search for known identifiers (phone numbers, addresses)
  3. Identify the vendor code for :vendor field

Phase 3: Develop Regex Patterns

Test patterns in REPL:

(require '[clojure.string :as str])

(def text "...") ; paste PDF text here

;; Test invoice number pattern
(re-find #"INVOICE\s+(\d+)" text)

;; Test date pattern
(re-find #"INVOICE\s+\d+\s+(\d{2}/\d{2}/\d{2})" text)

;; Test customer pattern
(re-find #"BILL TO.*?\n\s+([A-Z][A-Z\s]+)" text)

;; Test total pattern
(re-find #"TOTAL\s+([\d\.]+)" text)

Phase 4: Add Template

Add to src/clj/auto_ap/parse/templates.clj in the pdf-templates vector:

;; Bonanza Produce
{:vendor "Bonanza Produce"
 :keywords [#"3717 OSGOOD AVE" #"SPARKS, NEVADA"]
 :extract {:invoice-number #"INVOICE\s+(\d+)"
           :date #"INVOICE\s+\d+\s+(\d{2}/\d{2}/\d{2})"
           :customer-identifier #"BILL TO.*?\n\s+([A-Z][A-Z\s]+)(?:\s{2,}|\n)"
           :total #"TOTAL\s+([\d\.]+)(?:\s*$|\s+TOTAL)"}
 :parser {:date [:clj-time "MM/dd/yy"]
          :total [:trim-commas nil]}}

Phase 5: Test Implementation

;; Load the namespace
(require '[auto-ap.parse :as p])
(require '[auto-ap.parse.templates :as t])

;; Test parsing
(p/parse "...pdf text here...")

;; Or test full file
(p/parse-file "dev-resources/INVOICE - 03881260.pdf" "INVOICE - 03881260.pdf")

Testing Considerations

  1. Date edge cases: Ensure 2-digit year parsing works correctly (26 → 2026)
  2. Amount edge cases: Test with larger amounts that may include commas
  3. Customer name variations: Test with different customer names/lengths
  4. Multi-page invoices: Verify template handles page breaks if applicable

Known PDF Structure

SOUTH LAKE TAHOE, CA
3717 OSGOOD AVE.
...
                    SPARKS, NEVADA                    ELKO, NEVADA
                    1925 FREEPORT BLVD...             428 RIVER ST...

     CUST. PHONE              775-622-0159    ...     INVOICE      DATE
     ...                                                  03881260   01/20/26
     B     NICKGR                                           
     I     NICK THE GREEK                              S    NICK THE GREEK
     L     NICK THE GREEK                              H    NICK THE GREEK
     L     600 VISTA WAY                               I    VIA MICHELE
     ...
                                                                      TOTAL
                                                             TOTAL      23.22

References & Research

Similar Templates for Reference

Based on src/clj/auto_ap/parse/templates.clj, these templates have similar patterns:

  1. Gstar Seafood (lines 19-26) - Simple single invoice, uses :trim-commas
  2. Don Vito Ozuna Food Corp (lines 121-127) - Uses customer-identifier with multiline pattern
  3. C&L Produce (lines 260-267) - Similar "Bill To" pattern for customer extraction

File Locations

  • Templates: src/clj/auto_ap/parse/templates.clj
  • Parser logic: src/clj/auto_ap/parse.clj
  • Utility functions: src/clj/auto_ap/parse/util.clj
  • Test PDF: dev-resources/INVOICE - 03881260.pdf

Parser Utilities Available

From src/clj/auto_ap/parse/util.clj:

  • :clj-time - Date parsing with format strings
  • :trim-commas - Remove commas from numbers
  • :trim-commas-and-negate - Handle credit/negative amounts
  • :month-day-year - Special format for space-separated dates

Next Steps

  1. Identify the vendor name by examining the PDF more closely or asking the user
  2. Test regex patterns in the REPL with the actual PDF text
  3. Refine patterns based on edge cases discovered during testing
  4. Add template to templates.clj
  5. Test with multiple invoices from this vendor to ensure robustness