# Invoice Template Examples ## Simple Single Invoice ```clojure {:vendor "Gstar Seafood" :keywords [#"G Star Seafood"] :extract {:total #"Total\s{2,}([\d\-,]+\.\d{2,2}+)" :customer-identifier #"(.*?)(?:\s+)Invoice #" :date #"Invoice Date\s{2,}([0-9]+/[0-9]+/[0-9]+)" :invoice-number #"Invoice #\s+(\d+)"} :parser {:date [:clj-time "MM/dd/yyyy"] :total [:trim-commas nil]}} ``` ## Multi-Invoice Statement ```clojure {:vendor "Southbay Fresh Produce" :keywords [#"(SOUTH BAY FRESH PRODUCE|SOUTH BAY PRODUCE)"] :extract {:date #"^([0-9]+/[0-9]+/[0-9]+)" :customer-identifier #"To:[^\n]*\n\s+([A-Za-z' ]+)\s{2}" :invoice-number #"INV #\/(\d+)" :total #"\$([0-9.]+)\."} :parser {:date [:clj-time "MM/dd/yyyy"]} :multi #"\n" :multi-match? #"^[0-9]+/[0-9]+/[0-9]+\s+INV "} ``` ## Customer with Address (Multi-line) ```clojure {:vendor "Bonanza Produce" :keywords [#"530-544-4136"] :extract {:invoice-number #"NO\s+(\d{8,})\s+\d{2}/\d{2}/\d{2}" :date #"NO\s+\d{8,}\s+(\d{2}/\d{2}/\d{2})" :customer-identifier #"(?s)I\s+([A-Z][A-Z\s]+?)\s{2,}.*?L\s+([0-9][A-Z0-9\s]+?)(?=\s{2,}|\n)" :account-number #"(?s)L\s+([0-9][A-Z0-9\s]+?)(?=\s{2,}|\n)" :total #"SHIPPED\s+[\d\.]+\s+TOTAL\s+([\d\.]+)"} :parser {:date [:clj-time "MM/dd/yy"] :total [:trim-commas nil]}} ``` ## Credit Memo (Negative Amounts) ```clojure {:vendor "General Produce Company" :keywords [#"916-552-6495"] :extract {:date #"DATE.*\n.*\n.*?([0-9]+/[0-9]+/[0-9]+)" :invoice-number #"CREDIT NO.*\n.*\n.*?(\d{5,}?)\s+" :account-number #"CUST NO.*\n.*\n\s+(\d+)" :total #"TOTAL:\s+\|\s*(.*)"} :parser {:date [:clj-time "MM/dd/yy"] :total [:trim-commas-and-negate nil]}} ``` ## Complex Date Parsing ```clojure {:vendor "Ben E. Keith" :keywords [#"BEN E. KEITH"] :extract {:date #"Customer No Mo Day Yr.*?\n.*?\d{5,}\s{2,}(\d+\s+\d+\s+\d+)" :customer-identifier #"Customer No Mo Day Yr.*?\n.*?(\d{5,})" :invoice-number #"Invoice No.*?\n.*?(\d{8,})" :total #"Total Invoice.*?\n.*?([\-]?[0-9]+\.[0-9]{2,})"} :parser {:date [:month-day-year nil] :total [:trim-commas-and-negate nil]}} ``` ## Multiple Date Formats ```clojure {:vendor "RNDC" :keywords [#"P.O.Box 743564"] :extract {:date #"(?:INVOICE|CREDIT) DATE\n(?:.*?)(\S+)\n" :account-number #"Store Number:\s+(\d+)" :invoice-number #"(?:INVOICE|CREDIT) DATE\n(?:.*?)\s{2,}(\d+?)\s+\S+\n" :total #"Net Amount(?:.*\n){4}(?:.*?)([\-]?[0-9\.]+)\n"} :parser {:date [:clj-time ["MM/dd/yy" "dd-MMM-yy"]] :total [:trim-commas-and-negate nil]}} ``` ## Common Regex Patterns ### Phone Numbers ```clojure #"\(?[0-9]{3}\)?[-.\s]?[0-9]{3}[-.\s]?[0-9]{4}" ``` ### Dollar Amounts ```clojure #"\$?([0-9,]+\.[0-9]{2})" ``` ### Dates (MM/dd/yy) ```clojure #"([0-9]{2}/[0-9]{2}/[0-9]{2})" ``` ### Dates (MM/dd/yyyy) ```clojure #"([0-9]{2}/[0-9]{2}/[0-9]{4})" ``` ### Multi-line Text (dotall mode) ```clojure #"(?s)start.*?end" ``` ### Non-greedy Match ```clojure #"(pattern.+?)" ``` ### Lookahead Boundary ```clojure #"value(?=\s{2,}|\n)" ``` ## Field Extraction Strategies ### 1. Simple Line-based Use `[^\n]*` to match until end of line: ```clojure #"Invoice:\s+([^\n]+)" ``` ### 2. Whitespace Boundary Use `(?=\s{2,}|\n)` to stop at multiple spaces or newline: ```clojure #"Customer:\s+(.+?)(?=\s{2,}|\n)" ``` ### 3. Specific Marker Match until a specific pattern is found: ```clojure #"(?s)Start(.*?)End" ``` ### 4. Multi-part Extraction Use multiple capture groups for related fields: ```clojure #"Date:\s+(\d{2})/(\d{2})/(\d{2})" ``` ## Parser Options ### Date Parsers - `[:clj-time "MM/dd/yyyy"]` - Standard US date - `[:clj-time "MM/dd/yy"]` - 2-digit year - `[:clj-time "MMM dd, yyyy"]` - Named month - `[:clj-time ["MM/dd/yy" "yyyy-MM-dd"]]` - Multiple formats - `[:month-day-year nil]` - Space-separated (1 15 26) ### Number Parsers - `[:trim-commas nil]` - Remove commas from numbers - `[:trim-commas-and-negate nil]` - Handle negative/credit amounts - `[:trim-commas-and-remove-dollars nil]` - Remove $ and commas - `nil` - No parsing, return raw string ## Testing Patterns ### Basic Test Structure ```clojure (deftest parse-vendor-invoice (testing "Should parse vendor invoice" (let [results (sut/parse-file (io/file "dev-resources/INVOICE.pdf") "INVOICE.pdf") result (first results)] (is (some? result)) (is (= "Vendor" (:vendor-code result))) (is (= "12345" (:invoice-number result)))))) ``` ### Date Testing ```clojure (let [d (:date result)] (is (= 2026 (time/year d))) (is (= 1 (time/month d))) (is (= 15 (time/day d)))) ``` ### Multi-field Verification ```clojure (is (= "Expected Name" (:customer-identifier result))) (is (= "Expected Street" (str/trim (:account-number result)))) (is (= "Expected City, ST 12345" (str/trim (:location result)))) ```