diff --git a/.context/compound-engineering/ce-code-review/20260425-173027-4c833da5/correctness.json b/.context/compound-engineering/ce-code-review/20260425-173027-4c833da5/correctness.json new file mode 100644 index 00000000..46d8b1d2 --- /dev/null +++ b/.context/compound-engineering/ce-code-review/20260425-173027-4c833da5/correctness.json @@ -0,0 +1 @@ +{"reviewer":"correctness","findings":[{"title":"SQL injection via unsanitized WHERE clause values in parquet read layer","severity":"P0","file":"src/clj/auto_ap/storage/parquet.clj","line":237,"confidence":100,"autofix_class":"manual","owner":"review-fixer","requires_verification":true,"pre_existing":false,"suggested_fix":"Parameterize all SQL values instead of string concatenation. Use DuckDB prepare/bind or at minimum escape single quotes in user-supplied values: (str/replace v \"'\" \"''\"). Apply this to build-where-clause and get-sales-orders sort/order/limit/offset interpolations.","why_it_matters":"Any value passed via :client, :vendor, :location opts is concatenated directly into SQL string. A client code containing a single quote (e.g. O Brien) breaks the query. Malicious values can inject arbitrary SQL. The sort and order fields are also interpolated without validation, allowing ORDER BY injection.","evidence":["(str env \" = '\" v \"'\")","(str base-sql where-str)","(str \"ORDER BY \" sort \" \" (name order))","(str \" LIMIT \" limit)"]},{"title":"SQL injection in sales-summaries aggregation WHERE clauses","severity":"P0","file":"src/clj/auto_ap/storage/sales_summaries.clj","line":42,"confidence":100,"autofix_class":"manual","owner":"review-fixer","requires_verification":true,"pre_existing":false,"suggested_fix":"Escape single quotes in client-id before interpolation. Apply to all SQL string construction sites in this namespace.","why_it_matters":"client-id is interpolated directly into WHERE clauses across all aggregation functions. Values with apostrophes break queries; malicious values enable injection.","evidence":["WHERE client-code = ' . client-id . '","Lines 42, 73, 98-100, 123, 140, 171"]},{"title":"with-duckdb macro never closes connections created by connect!","severity":"P1","file":"src/clj/auto_ap/storage/parquet.clj","line":43,"confidence":100,"autofix_class":"manual","owner":"review-fixer","requires_verification":true,"pre_existing":false,"suggested_fix":"Track whether conn was freshly created vs retrieved from @db. Only close in finally if it was freshly created and not stored.","why_it_matters":"When @db is nil, connect! creates a connection AND stores it via reset!. The finally clause checks (not @db) which is now false because connect! just set it. All DuckDB connections accumulate until JVM shutdown.","evidence":["(let [conn# (or @db (connect!))]","(finally (when (and (not @db) conn#) (.close conn#)))","connect! calls (reset! db conn)"]},{"title":"flush-to-parquet! clears buffer before S3 upload, losing data on upload failure","severity":"P1","file":"src/clj/auto_ap/storage/parquet.clj","line":148,"confidence":100,"autofix_class":"manual","owner":"review-fixer","requires_verification":true,"pre_existing":false,"suggested_fix":"Move clear-buffer! to after upload-parquet! succeeds. Restructure as: write jsonl -> convert to parquet -> verify local file -> upload to S3 -> only then clear buffer.","why_it_matters":"clear-buffer! at line 148 runs before upload-parquet!. If upload fails, the catch block throws but the buffer is already cleared. Records in memory are permanently lost. WAL has them but they won't be re-flushed until process restart.","evidence":["(clear-buffer! entity-type) at line 148","(upload-parquet! parquet-file s3-key) at line 147 runs after clear","(catch Exception e (throw ...))"]},{"title":"date-seq produces forward sequence when start > end","severity":"P0","file":"src/clj/auto_ap/storage/parquet.clj","line":207,"confidence":100,"autofix_class":"safe_auto","owner":"review-fixer","requires_verification":false,"pre_existing":false,"suggested_fix":"Validate start <= end and throw if reversed, or compute direction as step (1 or -1) based on comparison. Example: date-seq 2024-05-01 2024-04-01 should error or return [2024-05-01 2024-04-30 ... 2024-04-01].","why_it_matters":"(Math/abs ...) on the diff combined with always calling (.plusDays sd i) means if start > end, you get a sequence going forward from start by |end-start| days. Downstream parquet queries reference non-existent S3 keys producing empty or erroneous results.","evidence":["(int (Math/abs (- (.toEpochDay sd) (.toEpochDay ed))))","(for [i (range 0 (inc days))] (.toString (.plusDays sd i)))"]},{"title":"query-deduped generates invalid DuckDB syntax with wrong partition column","severity":"P1","file":"src/clj/auto_ap/storage/parquet.clj","line":282,"confidence":100,"autofix_class":"safe_auto","owner":"review-fixer","requires_verification":true,"pre_existing":false,"suggested_fix":"Fix to: QUALIFY ROW_NUMBER() OVER (PARTITION BY external_id ORDER BY _seq_no DESC) = 1. Remove the space after QUALIFY and use correct column names matching parquet schema.","why_it_matters":"The generated SQL is syntactically invalid: 'QUALIFY ROW_NUMBER() OVER' has a space breaking QUALIFY from its expression. Additionally, 'sales_order.external_id' does not exist as a column in parquet -- records use 'external_id'. This function always fails at runtime.","evidence":["\" QUALIFY ROW_NUMBER() OVER\"","\" (PARTITION BY sales_order.external_id\"","Parquet columns use :external-id key"]},{"title":"safe-cleanup-all destructures [year month] pairs incorrectly as [_ y m]","severity":"P0","file":"src/clj/auto_ap/migration/cleanup_sales.clj","line":199,"confidence":100,"autofix_class":"safe_auto","owner":"review-fixer","requires_verification":false,"pre_existing":false,"suggested_fix":"Change (doseq [_ y m] months) to (doseq [y m] months) since months is a vector of [year month] pairs. Current code assigns year to discarded _, month to y, and m is nil.","why_it_matters":"collect-all-months returns [year month] vectors. Safe-cleanup-all iterates as [_ y m] -- so [2024 3] yields y=3 and m=nil. S3 verification and delete calls use wrong year/month values, corrupting cleanup.","evidence":["(doseq [[_y m] months]","group-orders-by-month returns {{y m} [eid...]} map","sort(keys group) produces [year month] vectors"]},{"title":"get-payment-items-parquet queries :client_code instead of :client-code causing silent empty results","severity":"P1","file":"src/clj/auto_ap/jobs/sales_summaries.clj","line":106,"confidence":100,"autofix_class":"safe_auto","owner":"review-fixer","requires_verification":false,"pre_existing":false,"suggested_fix":"Change (:client_code %) to (:client-code %). All parquet write paths store the key as :client-code.","why_it_matters":"DuckDB query-rows returns columns matching their parquet names. The filter looks for :client_code (underscored) which never exists, so all rows are filtered out. Payment aggregation silently returns empty results across all client/date queries.","evidence":["(filter #(= client-code (:client_code %)) rows)","buffer! writes :client-code key in ezcater/core.clj, square/core3.clj, migration/sales_to_parquet.clj"]},{"title":"raw-graphql-ids in sales_orders_new.clj references undefined variables causing nil return","severity":"P0","file":"src/clj/auto_ap/datomic/sales_orders_new.clj","line":149,"confidence":100,"autofix_class":"manual","owner":"review-fixer","requires_verification":false,"pre_existing":false,"suggested_fix":"Complete the function: bind result from pq/get-sales-orders call, add (require '[clojure.string :as str]), and fix the cond->>/when-let flow. The current code discards query results and references unbound 'result' variable.","why_it_matters":"The function compiles but returns nil at runtime because result is unbound when accessed at line 165. The conditional threading on lines 157-161 discards its output instead of binding it to a let form.","evidence":["(cond->>\\n where-str (pq/get-sales-orders start end ...))","(when-let [rows (:rows result)] -- result never bound","Missing clojure.string import for str/join used in build-where-clause"]},{"title":"Ezcater XLS flatten-order-to-parquet! writes integer db/id as client-code string","severity":"P1","file":"src/clj/auto-ap/routes/ezcater_xls.clj","line":112,"confidence":75,"autofix_class":"safe_auto","owner":"review-fixer","requires_verification":true,"pre_existing":false,"suggested_fix":"In map->sales-order, resolve client to its :client/code string before passing to flatten. Or in flatten-order-to-parquet!, look up the code: (if (integer? client) (lookup-code client) client).","why_it_matters":"map->sales-order sets :client to (:db/id client) which is an integer. flatten-order-to-parquet! writes this integer as client-code in parquet. All other paths write string client codes, causing inconsistent filtering and aggregation.","evidence":["client-id (:db/id client) at line 53","flatten uses (if (map? client) (:client/code client) client)","square/core3.clj resolves to client code string before buffer!"]},{"title":"Migration date query may lose last day due to epoch unit handling","severity":"P1","file":"src/clj/auto_ap/migration/sales_to_parquet.clj","line":119,"confidence":75,"autofix_class":"manual","owner":"review-fixer","requires_verification":true,"pre_existing":false,"suggested_fix":"Verify Datomic stores :sales-order/date as Instant epoch milliseconds. If timezone-dependent conversion causes off-by-one, use LocalDate-based boundary computation with explicit timezone.","why_it_matters":"Code converts LocalDate to epoch seconds then multiplies by 1000 for millis boundaries. Timezone-sensitive conversions may cause orders at end-of-day in certain timezones to fall outside [start-ms end-ms] range and be skipped.","evidence":["(.toEpochSecond ^java.time.LocalDate ...)","start (* day-ms 1000)","end (+ start (* 86400000)) -- subtracts 1 from exclusive end"]},{"title":"*buffers* atom has no lifecycle management and grows without bound","severity":"P2","file":"src/clj/auto_ap/storage/parquet.clj","line":89,"confidence":75,"autofix_class":"advisory","owner":"human","requires_verification":false,"pre_existing":false,"suggested_fix":"Add periodic flush job or explicit cleanup. Consider bounding buffer size and implementing eviction for oversized buffers.","why_it_matters":"Records buffered via Square/EZCater imports accumulate in a process-global atom. If flush is never called or the process runs for days without flushing, memory grows unbounded.","evidence":["(defonce *buffers* (atom {}))","Buffer grows on every import, shrinks only on explicit flush"]},{"title":"sales-summaries mark-dirty queries Datomic after sales-order entities are migrated to parquet","severity":"P2","file":"src/clj/auto_ap/jobs/sales_summaries.clj","line":30,"confidence":75,"autofix_class":"advisory","owner":"human","requires_verification":true,"pre_existing":false,"suggested_fix":"After Datomic cleanup completes, mark-all-dirty returns nothing because sales-order entities are gone. Add fallback to discover clients from parquet data or maintain a client registry.","why_it_matters":"mark-all-dirty queries [:sales-order/client ?c] in Datomic. After safe-cleanup-all removes all sales-order entities, this returns nil. The summaries job silently does nothing.","evidence":["(dc/q '[:find ?c ... [_ :sales-order/client ?c]]","defn mark-all-dirty depends on sales-order presence in Datomic"]},{"title":"WAL jsonl append not atomic across concurrent buffer! calls","severity":"P2","file":"src/clj/auto_ap/storage/parquet.clj","line":109,"confidence":50,"autofix_class":"advisory","owner":"human","requires_verification":true,"pre_existing":false,"suggested_fix":"Use per-entity-type lock or atomic file-channel write for WAL appends. Alternative: buffer in memory only and write entire batch on flush.","why_it_matters":"In multi-threaded server, two buffer! calls writing to the same .jsonl file simultaneously may interleave bytes, corrupting JSONL format. The open-write-close sequence is not atomic.","evidence":["(with-open [w (io/writer wal-file :append true)]","Concurrent HTTP requests trigger buffer! for same entity-type"]},{"title":"object-exists? in cleanup leaks S3 response streams across verification calls","severity":"P2","file":"src/clj/auto_ap/migration/cleanup_sales.clj","line":116,"confidence":75,"autofix_class":"safe_auto","owner":"review-fixer","requires_verification":false,"pre_existing":false,"suggested_fix":"Switch to s3/head-object which returns metadata without body stream. Or wrap s3/get-object in with-open to close the S3ObjectInputStream.","why_it_matters":"s3/get-object returns an S3ObjectInputStream that must be closed. Calling for every day-entity combination across months leaks file descriptors and HTTP connections -- ~300+ unclosed streams per month checked.","evidence":["(s3/get-object {:bucket-name pq/*bucket* :key key})","verify-month-in-s3? calls this for every day times entity type"]},{"title":"Shutdown hook in connect! is a no-op thunk","severity":"P2","file":"src/clj/auto_ap/storage/parquet.clj","line":27,"confidence":50,"autofix_class":"safe_auto","owner":"review-fixer","requires_verification":true,"pre_existing":false,"suggested_fix":"Change to (Thread. #(disconnect!)). Current code creates a Thread running #(fn []) which returns a new function that is never called.","why_it_matters":"At JVM shutdown the hook runs but does nothing. DuckDB connection is not gracefully closed, potentially losing state or corrupting temporary files.","evidence":["(.addShutdownHook (Runtime/getRuntime) (Thread. #(fn [])))","The inner fn returns another fn that never executes"]}],"residual_risks":["Dual-write path means parquet and Datomic can diverge if one write succeeds and the other fails -- no compensating transaction or reconciliation mechanism exists","Parquet files on S3 have no versioning or immutability guarantee; accidental overwrites during migration could corrupt historical data","No idempotency guarantee for migration scripts -- re-running sales_to_parquet.clj after partial failure may duplicate records since there is no pre-check for existing parquet files"],"testing_gaps":["No test for buffer flush under S3 failure and recovery via WAL replay","No test for SQL injection vectors in build-where-clause or get-sales-orders","No test for date-seq with reversed start/end dates","No integration test verifying sales-summaries aggregation returns correct results after parquet path import","No test coverage for safe-cleanup-all S3 verification logic with partial file presence","No test for concurrent buffer! + flush-to-parquet! interaction","No test covering nil or missing date fields in migration flatten-order-to-pieces!","No performance test validating memory behavior of accumulating *buffers* atom under sustained load"]} \ No newline at end of file diff --git a/.context/compound-engineering/ce-code-review/20260425-173027-4c833da5/testing_maintainability.json b/.context/compound-engineering/ce-code-review/20260425-173027-4c833da5/testing_maintainability.json new file mode 100644 index 00000000..6ce240c3 --- /dev/null +++ b/.context/compound-engineering/ce-code-review/20260425-173027-4c833da5/testing_maintainability.json @@ -0,0 +1 @@ +{"reviewer":"testing_maintainability","findings":[{"title":"DuckDB shutdown hook is a no-op \u2014 connection never closes on JVM exit","severity":"P1","file":"src/clj/auto_ap/storage/parquet.clj","line":26,"confidence":100,"autofix_class":"safe_auto","owner":"review-fixer","requires_verification":false,"pre_existing":false,"suggested_fix":"Replace `(Thread. #(fn []))` with `(Thread. #(.close ^java.sql.Connection @db))`. See server.clj line 34 for the correct pattern.","why_it_matters":"The DuckDB JDBC connection accumulates in-process state and file handles. If the JVM exits without closing it, temp files leak and the next instance may encounter lock contention or corrupted WAL.","evidence":["(.addShutdownHook (Runtime/getRuntime)","(Thread. #(fn [])))"]},{"title":"Result sets never closed in query-scalar and query-rows \u2014 resource leak","severity":"P1","file":"src/clj/auto_ap/storage/parquet.clj","line":54,"confidence":100,"autofix_class":"safe_auto","owner":"review-fixer","requires_verification":true,"pre_existing":false,"suggested_fix":"Wrap stmt and rs in with-open blocks. Same fix applies to query-rows at line 61.","why_it_matters":"DuckDB JDBC holds open cursor resources until ResultSet is explicitly closed. Under sustained load this exhausts internal DuckDB memory and file handles, eventually crashing the process.","evidence":["(let [stmt (.createStatement conn)"," rs (.executeQuery stmt sql)]"," (when (.next rs)"," (.getObject rs 1)))))"]},{"title":"sales_orders_new.clj uses undefined variables \u2014 will throw at runtime","severity":"P0","file":"src/clj/auto_ap/datomic/sales_orders_new.clj","line":157,"confidence":100,"autofix_class":"manual","owner":"human","requires_verification":true,"pre_existing":false,"suggested_fix":"The raw-graphql-ids function uses cond->> nil ... (pq/get-sales-orders ...) but then references `result` on line 165 which is never bound. Also get-graphql at line 209 references undefined `id-keys` and `matching-count`. Rewrite the threading chain to bind results properly.","why_it_matters":"This file defines ns auto-ap.datomic.sales-orders \u2014 the same namespace as the working file. If classpath loads this after the original, ALL sales order queries break with CompilerException/NullPointerException.","evidence":["(when-let [rows (:rows result)] ;; `result` is NOT bound","matching-count ;; undefined in get-graphql","(graphql-results ids id-keys args) ;; `id-keys` undefined"]},{"title":"sales_orders_new.clj build-where-clause binds empty vector; missing clojure.string require \u2014 won't compile","severity":"P0","file":"src/clj/auto_ap/datomic/sales_orders_new.clj","line":41,"confidence":100,"autofix_class":"manual","owner":"human","requires_verification":false,"pre_existing":false,"suggested_fix":"Add [clojure.string :as str] to :require. Remove (let [clauses [] ...]) dead binding \u2014 the body builds SQL correctly but the let-binding signals confusion.","why_it_matters":"The namespace imports clojure.data.json and clojure.java.io but NOT clojure.string, yet uses str/join on line 52. This file will fail to compile/load.","evidence":["(ns auto-ap.datomic.sales-orders"," (:require [auto-ap.datomic :refer [conn]]"," [auto-ap.storage.parquet :as pq]"," [clojure.data.json :as json]"," [clojure.java.io :as io])) ;; no clojure.string!","(str/join \" AND \"))})) ;; line 52 \u2014 unbound var"]},{"title":"sales_orders_new.clj and sales_orders.clj define the same namespace \u2014 collision guarantee","severity":"P0","file":"src/clj/auto_ap/datomic/sales_orders_new.clj","line":1,"confidence":100,"autofix_class":"manual","owner":"human","requires_verification":false,"pre_existing":false,"suggested_fix":"Either rename the new file's namespace to `auto-ap.datomic.sales-orders-v2` and update all callers, OR delete sales_orders.clj once the new version is verified. Do not ship both.","why_it_matters":"Depending on classpath order, either the working old file or the broken new file loads. Behavior is non-deterministic across environments.","evidence":[";; sales_orders_new.clj","(ns auto-ap.datomic.sales-orders ;; SAME as sales_orders.clj!",";; sales_orders.clj","(ns auto-ap.datomic.sales-orders"]},{"title":"safe-cleanup-all destructures months incorrectly \u2014 `[_ y m]` skips year, passes nil","severity":"P0","file":"src/clj/auto_ap/migration/cleanup_sales.clj","line":199,"confidence":75,"autofix_class":"safe_auto","owner":"review-fixer","requires_verification":true,"pre_existing":false,"suggested_fix":"Change `[[_ y m]]` to `[[y m]]`. collect-all-months returns sorted vec of [year month] 2-element vectors. Destructuring as [_ y m] binds _ to year, y to month, m to nil.","why_it_matters":"The safety check that prevents data loss (verifying S3 before deleting Datomic) receives (nil nil) for every month. The verify function either crashes or produces wrong dates and deletes unverified months.","evidence":["(defn- collect-all-months [conn]"," ;; returns: [[2024 1] [2024 3] [2024 4] ...]"," (sort (keys grouped)))","","(doseq [[_ y m] months]"," (let [result (verify-month-in-s3? y m)"]},{"title":"build-where-clause in parquet.clj has unsanitized SQL string interpolation \u2014 injection risk","severity":"P1","file":"src/clj/auto_ap/storage/parquet.clj","line":233,"confidence":75,"autofix_class":"manual","owner":"human","requires_verification":false,"pre_existing":false,"suggested_fix":"Use DuckDB parameter binding via PreparedStatement, or at minimum escape single quotes by doubling them. User-supplied :client, :vendor, :location values are interpolated directly into SQL.","why_it_matters":"If filter values come from GraphQL query parameters or API inputs, an attacker can close the string literal and inject arbitrary SQL. Even without malice, any entity with apostrophes crashes queries.","evidence":["(when v"," (str env \" = '\" v \"'\"))))"]},{"title":"sales_orders_new.clj: summarize-orders calls private function from another namespace \u2014 fragile compile dependency","severity":"P1","file":"src/clj/auto_ap/datomic/sales_orders_new.clj","line":190,"confidence":75,"autofix_class":"manual","owner":"human","requires_verification":true,"pre_existing":false,"suggested_fix":"Replace (#'auto-ap.datomic/aggregate-sum ids) with a public function or inline the query. Indirect private-function calls via #' are not compiler-supported and break on namespace reload.","why_it_matters":"Works only when both namespaces load in exact order. Hot-reload or REPL evaluation can reorder loads and silently break aggregation.","evidence":["(#'auto-ap.datomic/aggregate-sum ids) ;; uses dc/q internally"]},{"title":"parquet_test.clj: 5 tests for 297 lines \u2014 no coverage of flush, S3, WAL recovery, or query layer","severity":"P1","file":"test/clj/auto_ap/storage/parquet_test.clj","line":1,"confidence":100,"autofix_class":"advisory","owner":"review-fixer","requires_verification":false,"pre_existing":false,"suggested_fix":"Add tests for: (a) flush-to-parquet! with mock DuckDB writing to local path, (b) load-unflushed! reading from seeded WAL JSONL files, (c) parquet-query URL construction, (d) build-where-clause sanitization, (e) query-deduped with duplicate _seq_no records, (f) error propagation in flush.","why_it_matters":"The 297 lines of storage.parquet implement buffered writes to S3 with WAL recovery \u2014 the most data-critical path in this PR. Without tests on flush and WAL round-tripping, a single typo can silently corrupt production data.","evidence":["(deftest test-query-scalar"," (testing \"SELECT 1 returns 1\""," (is (= 1 (p/query-scalar \"SELECT 1\")))))","",";; Only 5 tests total \u2014 all trivial. No flush/WAL/query/S3 coverage."]},{"title":"query-deduped hardcodes sales_order.external_id partition \u2014 breaks for non-order entities","severity":"P1","file":"src/clj/auto_ap/storage/parquet.clj","line":277,"confidence":75,"autofix_class":"manual","owner":"human","requires_verification":false,"pre_existing":false,"suggested_fix":"Parameterize the partition column with a dedup-key argument. The current implementation always partitions by sales_order.external_id, which does not exist on charge/line-item/sales-refund schemas.","why_it_matters":"Calling (query-deduped \"charge\" ...) reads charges but deduplicates on a nonexistent column. DuckDB will throw or return garbage.","evidence":["(defn query-deduped [entity-type start-date end-date]"," ... QUALIFY ROW_NUMBER() OVER"," (PARTITION BY sales_order.external_id"," ORDER BY _seq_no DESC) = 1"]},{"title":"flush-to-parquet! throws on S3 failure \u2014 callers cannot distinguish success from partial failure","severity":"P1","file":"src/clj/auto_ap/storage/parquet.clj","line":138,"confidence":75,"autofix_class":"manual","owner":"human","requires_verification":true,"pre_existing":false,"suggested_fix":"Return a consistent result map with :status and :error keys instead of sometimes returning {:status :ok}, sometimes {:status :no-records}, and sometimes throwing.","why_it_matters":"If one entity type fails to flush during migration (e.g., S3 throttling), the exception aborts the entire day with no partial-progress recovery. Structured return lets callers retry or skip selectively.","evidence":["{:key s3-key :status :ok}","(catch Exception e"," (throw (ex-info \"Flush failed\" {:entity-type entity-type}"," :error (.getMessage e)))))))))"]},{"title":"sales_to_parquet.clj uses global *buffers* \u2014 migration pollutes live application data","severity":"P1","file":"src/clj/auto_ap/migration/sales_to_parquet.clj","line":160,"confidence":75,"autofix_class":"manual","owner":"human","requires_verification":false,"pre_existing":false,"suggested_fix":"Create a local buffer atom for the migration process rather than using the global p/*buffers*. A user request during migration could get buffered alongside historical data and flushed to the wrong date.","why_it_matters":"Running migrate-all() while the app serves traffic means new sales orders could end up in the wrong parquet file. The WAL mechanism does not separate migration from live traffic.","evidence":["(doseq [r @flat]"," (p/buffer! (:entity-type r) r)) ;; writes to global buffer","",";; In parquet.clj:","(defonce *buffers* (atom {}))"]},{"title":"test: no test coverage for error paths \u2014 WAL corruption, S3 failure, empty DuckDB responses","severity":"P1","file":"test/clj/auto_ap/storage/parquet_test.clj","line":1,"confidence":100,"autofix_class":"advisory","owner":"review-fixer","requires_verification":false,"pre_existing":false,"suggested_fix":"Add tests for: (a) malformed JSON in WAL \u2014 verify load-unflushed! skips bad lines, (b) stub S3 upload to throw \u2014 verify flush-to-parquet! throws ex-info with context, (c) query-rows returns [] for empty results, not nil.","why_it_matters":"Error paths are where data loss happens: corrupted WAL files, partial flushes, DuckDB crashes. Without testing these, recovery behavior is unverified.","evidence":["(ns auto-ap.storage.parquet-test"," (:require ...))","",";; Only happy-path tests for query-scalar, buffer!, clear-buffer!, date-seq"]},{"title":"sales_summaries.clj: get-fees defined twice with identical bodies \u2014 dead private version","severity":"P2","file":"src/clj/auto_ap/jobs/sales_summaries.clj","line":148,"confidence":100,"autofix_class":"safe_auto","owner":"review-fixer","requires_verification":false,"pre_existing":false,"suggested_fix":"Delete the private (defn- get-fees at line 148. The public (defn get-fees at line 193 shadows it.","why_it_matters":"Dead code that will never be called. If someone modifies one copy but not the other, behavior diverges silently.","evidence":["(defn- get-fees [c date] ;; line 148 \u2014 shadowed"," (when-let [fee (get-fee c date)]","","(defn get-fees [c date] ;; line 193 \u2014 the one that runs"," (when-let [fee (get-fee c date)]"]},{"title":"Entity-type array duplicated across 5+ locations with no single source of truth","severity":"P2","file":"src/clj/auto_ap/storage/parquet.clj","line":158,"confidence":100,"autofix_class":"safe_auto","owner":"review-fixer","requires_verification":false,"pre_existing":false,"suggested_fix":"Define a single def ENTITY-TYPES in auto-ap.storage.parquet and require it from migration/ and jobs/ namespaces.","why_it_matters":"The same 4-item vector is hardcoded at parquet.clj (2x), sales_to_parquet.clj (2x), cleanup_sales.clj (1x). Adding or renaming an entity type requires editing 5+ locations \u2014 easy to miss one and get silent data gaps.","evidence":["(let [etypes [\"sales-order\" \"charge\""," \"line-item\" \"sales-refund\"]","; parquet.clj:158, :172 | sales_to_parquet.clj:162, :182 | cleanup_sales.clj:95"]},{"title":"object-exists? downloads entire S3 object to check existence \u2014 10\u2013100x slower than head-object","severity":"P2","file":"src/clj/auto_ap/migration/cleanup_sales.clj","line":113,"confidence":75,"autofix_class":"safe_auto","owner":"review-fixer","requires_verification":true,"pre_existing":false,"suggested_fix":"Replace s3/get-object with s3/head-object. Head-object returns only metadata (no body) and is orders of magnitude faster.","why_it_matters":"safe-cleanup-all calls this per day \u00d7 entity type across all months. For 2+ years of data that is ~3000 S3 GET requests each potentially pulling megabytes.","evidence":["(defn- object-exists?"," (try"," (s3/get-object {:bucket-name pq/*bucket*"," :key key})"," true"]},{"title":"DRY-RUN? is a mutable var toggled at runtime \u2014 not thread-safe, races on concurrent calls","severity":"P2","file":"src/clj/auto_ap/migration/cleanup_sales.clj","line":8,"confidence":75,"autofix_class":"manual","owner":"human","requires_verification":false,"pre_existing":false,"suggested_fix":"Pass dry-run? as a boolean parameter through the function chain. If keeping the var for REPL convenience, document that it only works in single-threaded contexts.","why_it_matters":"alter-var-root on a state-bearing boolean means concurrent invocations race. A second call to set-dry-run! during a cleanup can cause irreversible data deletion from Datomic.","evidence":["(def ^:private DRY-RUN? true)","","(defn- set-dry-run! [v]"," (alter-var-root #'DRY-RUN? (constantly v)))"]},{"title":"sales_to_parquet.clj: docstring lists public functions that are actually private","severity":"P2","file":"src/clj/auto_ap/migration/sales_to_parquet.clj","line":8,"confidence":75,"autofix_class":"safe_auto","owner":"review-fixer","requires_verification":false,"pre_existing":false,"suggested_fix":"Either remove the - prefix from write-day-by-day and write-dead-letter, OR update the docstring examples to only reference actually-callable entry points.","why_it_matters":"The namespace docstring is the migration operator handbook. If operators follow instructions and call (write-day-by-day ...) from REPL it fails with Unbound fn.","evidence":[" Usage:"," (migrate-all)"," (write-day-by-day \"2024-01-01\" \"2024-03-31\") ; documented but private!"," (write-dead-letter [flat]) ; documented but private!","","(defn- write-day-by-day ;; actual definition \u2014 private"]},{"title":"sales_to_parquet.clj: flush-all-types return value is inverted \u2014 negative number means success","severity":"P2","file":"src/clj/auto_ap/migration/sales_to_parquet.clj","line":190,"confidence":75,"autofix_class":"manual","owner":"human","requires_verification":false,"pre_existing":false,"suggested_fix":"Track records before flush and count :ok results. Currently computes (- (p/total-buf-count) start) AFTER flush \u2014 so success returns negative number, total failure returns 0.","why_it_matters":"The return value is unused but if future code depends on it for progress reporting, the inverted sign means a negative indicates success and zero means everything failed.","evidence":["(defn- flush-all-types []"," (let [... start (p/total-buf-count)]"," ..."," {:records-flush (- (p/total-buf-count) start)}))"]},{"title":"sales_summaries.clj: old Datomic-query functions left alongside parquet versions \u2014 post-migration returns nil/zero","severity":"P2","file":"src/clj/auto_ap/jobs/sales_summaries.clj","line":201,"confidence":75,"autofix_class":"safe_auto","owner":"review-fixer","requires_verification":false,"pre_existing":false,"suggested_fix":"Delete get-tax, get-tip, get-sales, get-returns (the Datomic-query versions). The parquet-readers (get-tax-parquet etc.) are the correct post-migration path.","why_it_matters":"Old functions reference :sales-order/tax attributes retracted by cleanup. Any accidental call post-migration returns nil/zero instead of triggering an obvious error.","evidence":["(defn- get-tax [c date] ;; Datomic version \u2014 dead after migration"," ... (dc/q '[:find (sum ?tax)...]","","(defn- get-tax-parquet [c date] ;; Parquet version \u2014 correct path"]},{"title":"query-by-entity-id loads entire date range then filter in-memory \u2014 defeats parquet partitioning","severity":"P2","file":"src/clj/auto_ap/storage/parquet.clj","line":286,"confidence":75,"autofix_class":"manual","owner":"human","requires_verification":false,"pre_existing":false,"suggested_fix":"Push the external_id filter into the SQL WHERE clause. Loading all rows for a date range into memory just to find one by ID negates DuckDB filtering.","why_it_matters":"For a day with 10k orders, reads and materializes 10k rows in-memory just to return one. With larger datasets this adds latency proportional to total daily volume.","evidence":["(defn query-by-entity-id [entity-type external-id"," start-date end-date]"," (->> (query-deduped entity-type start-date end-date)"," (filter #(= (:external_id %)"," (name external-id)))"," first))"]},{"title":"buffer! uses wall-clock milliseconds as sequence number \u2014 collision possible under load","severity":"P2","file":"src/clj/auto_ap/storage/parquet.clj","line":102,"confidence":50,"autofix_class":"manual","owner":"human","requires_verification":false,"pre_existing":false,"suggested_fix":"Use an atom-based counter (swap! *seq-counter* inc) or System/nanoTime. Under concurrent buffering, two records in the same millisecond get identical _seq_no.","why_it_matters":"query-deduped relies on highest _seq_no winning for duplicate detection. If two records share a sequence number, the ROW_NUMBER() ordering is non-deterministic.","evidence":["(let [seq-no (System/currentTimeMillis)"," entry (assoc record :_seq-no seq-no)]"]},{"title":"cleanup_sales.clj: group-orders-by-month creates one Datomic entity per order \u2014 O(n) sequential lookups","severity":"P2","file":"src/clj/auto_ap/migration/cleanup_sales.clj","line":83,"confidence":50,"autofix_class":"manual","owner":"human","requires_verification":false,"pre_existing":false,"suggested_fix":"Replace the (d-api/entity db eid) call inside reduce with a single batch pull or Datalog query returning [eid day-value] pairs.","why_it_matters":"For 50k+ orders, this means 50k+ separate entity lookups hitting the Datomic DBv2 log one at a time. A single batch query reduces to one pass.","evidence":["(reduce (fn [acc eid]"," (when-let [day-val (:sales-order/day-value"," (d-api/entity db eid))]"]},{"title":"perf_test.clj runs unconditionally at load time \u2014 pollutes any test/ci run","severity":"P2","file":"test/clj/auto_ap/storage/perf_test.clj","line":111,"confidence":100,"autofix_class":"safe_auto","owner":"review-fixer","requires_verification":false,"pre_existing":false,"suggested_fix":"Move (run-perf-tests) into a deftest or wrap in (comment ...). Loading this namespace triggers 100k row generation and S3 upload on every lein test invocation.","why_it_matters":"CI/CD pipelines that compile all namespaces will hang. Requires live S3 credentials just to load the namespace.","evidence":["(run-perf-tests)","(println \"\\n=== Done ===\")"]},{"title":"date-seq duplicated in parquet.clj and sales_to_parquet.clj","severity":"P3","file":"src/clj/auto_ap/migration/sales_to_parquet.clj","line":132,"confidence":75,"autofix_class":"safe_auto","owner":"review-fixer","requires_verification":false,"pre_existing":false,"suggested_fix":"Remove the private -date-seq in sales_to_parquet.clj and use (p/date-seq start end) from parquet which already exposes this publicly.","why_it_matters":"Duplicate utility logic is a maintenance tax. Fixing a bug requires finding two copies.","evidence":["; sales_to_parquet.clj:132:","(defn- date-seq [start end]",", parquet.clj:203:","(defn date-seq [start end]"]},{"title":"sales_summaries.clj: large comment blocks with ad-hoc queries should be in test files","severity":"P3","file":"src/clj/auto_ap/jobs/sales_summaries.clj","line":311,"confidence":75,"autofix_class":"advisory","owner":"review-fixer","requires_verification":false,"pre_existing":false,"suggested_fix":"Move the (comment ...) blocks at lines 311-379 into a dedicated test namespace or REPL scratch file.","why_it_matters":"Comment blocks in production namespaces become maintenance traps \u2014 future developers copy-paste from them when underlying schema has evolved.","evidence":["(comment"," ;; TODO: Move to test file or proper location"]},{"title":"buffer! writes to WAL file every call \u2014 no batching, full disk I/O per record","severity":"P3","file":"src/clj/auto_ap/storage/parquet.clj","line":105,"confidence":50,"autofix_class":"advisory","owner":"human","requires_verification":false,"pre_existing":false,"suggested_fix":"Batch WAL writes or use a background thread that flushes periodically. Current design opens and appends to JSONL on every buffer! call.","why_it_matters":"Under high-throughput ingestion this is hundreds of file-open/write/close cycles per second.","evidence":["(with-open [w (io/writer wal-file :append true)]"," (.write w ^String (json/write-str {:seq-no seq-no"]},{"title":"Naming inconsistency: total-buf-count vs buffer-count vs get-unflushed-count \u2014 three names for similar metrics","severity":"P3","file":"src/clj/auto_ap/storage/parquet.clj","line":123,"confidence":25,"autofix_class":"manual","owner":"human","requires_verification":false,"pre_existing":false,"suggested_fix":"Unify to buffer-count (per-entity-type) and total-buffer-count (across all types). Remove get-unflushed-count as it delegates with no added value.","why_it_matters":"Three names for conceptually the same metric forces callers to look up which function does what. Inconsistent abbreviation signals ad-hoc naming.","evidence":["(defn buffer-count [entity-type] ;; per-type","(defn total-buf-count [] ;; abbreviated","(defn get-unflushed-count [] ;; delegates"]}],"residual_risks":["No test for WAL file permission errors \u2014 if parquet-wal/ is not writable, buffer! silently logs and drops records into memory-only atom, creating false confidence","DuckDB httpfs extension requires valid AWS credentials. If S3 access keys rotate or IAM role changes, all queries fail with cryptic JDBC errors \u2014 no health check exists to detect this","Migration script batch-size defaults to 100 but pull-sales-order-data uses dc/pull-many with no result limiting. A day with 50k orders creates one enormous data structure in memory","cleanup_sales.clj DRY-RUN? defaults to true \u2014 if someone forgets to toggle it off, cleanup-all does nothing and they assume data is deleted; opposite risk if toggled on accidentally","No monitoring or alerting for unflushed buffer growth. If flush-to-parquet! silently fails (WAL catches records but S3 upload deadlocks), buffers grow until OOM"],"testing_gaps":["No integration test for full buffer -> flush -> query round trip: write via buffer!, flush-to-parquet!, read back with get-sales-orders, verify data integrity","No test for WAL recovery path: seed a WAL JSONL file, call load-unflushed!, verify *buffers* populated, then flush and verify S3 gets recovered records","No test for date-seq edge cases: start=end, reversed dates (end before start), empty range","No test for parquet-query multi-day URL expansion \u2014 verify all expected files included in read_parquet([...]) call","No test for build-where-clause SQL injection resistance: single quotes, semicolons, SQL keywords in input values","No test for query-deduped deduplication with records sharing external_id but different _seq_no values","No test for flush-to-parquet! when S3 upload partially fails \u2014 verify buffer is NOT cleared so retry can recover","No test for object-exists? handling: missing objects return false, access-denied does not throw uncaught exception","No unit tests for migration/sales_to_parquet.clj \u2014 core migration logic has zero automated verification against Datomic fixtures","No unit tests for cleanup_sales.clj S3 verification path \u2014 verify-month-in-s3? with partial coverage returns {:ok false :missing [...]} but this is never tested","No test coverage for any function in sales_orders_new.clj \u2014 it is entirely untested and also broken (undefined vars, missing requires)","No test for get-sales-orders pagination correctness: offset + limit interaction, count accuracy vs rows length"]} \ No newline at end of file diff --git a/docs/plans/2026-04-24-001-refactor-detailed-sales-to-parquet-plan.md b/docs/plans/2026-04-24-001-refactor-detailed-sales-to-parquet-plan.md new file mode 100644 index 00000000..9a68d7ea --- /dev/null +++ b/docs/plans/2026-04-24-001-refactor-detailed-sales-to-parquet-plan.md @@ -0,0 +1,250 @@ +--- +title: Move Detailed Sales Data to DuckDB and Parquet +type: refactor +status: active +date: 2026-04-24 +--- + +# Move Detailed Sales Data to DuckDB and Parquet + +## Overview + +Detailed sales records (orders, charges, line items, refunds) are currently stored in Datomic. Because Datomic is append-only, this high-volume data causes significant storage bloat. We will move these details to Parquet files stored on S3, using DuckDB as the query engine for views and summaries, while keeping the high-level `sales-summaries` in Datomic for ledger calculations. + +--- + +## Problem Frame + +The system stores every individual sale and payment detail in Datomic. While useful for auditing, this data is rarely accessed in detail after a few weeks, yet it permanently increases the Datomic database size. The app needs a "colder" but still queryable storage layer for these details. + +--- + +## Requirements Trace + +- R1. Detailed sales/payment entities must be moved from Datomic to Parquet files on S3. +- R2. `sales-summaries` must remain in Datomic to ensure ledger calculations remain performant and stable. +- R3. The "Sales Orders" and "Payments" views must continue to function (filtering, sorting, pagination) by querying the Parquet files via DuckDB. +- R4. The daily sales summary job must be updated to aggregate data from DuckDB instead of Datomic. +- R5. The system must handle "voids" of payments/orders in an immutable file format. + +--- + +## Scope Boundaries + +- **In Scope:** + - Implementation of Parquet writer for sales data. + - DuckDB integration for reading S3 Parquet files. + - Migration of existing detailed data from Datomic to S3. + - Updating the summary aggregation job. +- **Out of Scope:** + - Moving `sales-summaries` out of Datomic. + - Implementing a real-time streaming pipeline (sticking to batch/daily flushes). + +--- + +## Context & Research + +### Relevant Code and Patterns + +- **Production Flow:** `auto-ap.square.core3`, `auto-ap.ezcater.core`, and `auto-ap.routes.ezcater-xls` all produce tagged maps that are currently sent to `dc/transact`. +- **Read Flow:** `auto-ap.datomic.sales-orders` and `auto-ap.ssr.payments` perform the current Datomic queries. +- **Aggregation:** `auto-ap.jobs.sales-summaries` uses `dc/q` to sum totals for the day. + +--- + +## Key Technical Decisions + +- **Storage Format:** Parquet. It is columnar, highly compressed, and natively supported by DuckDB. +- **Storage Location:** AWS S3. This removes the need for a managed database server. +- **Query Engine:** DuckDB. It can query Parquet files directly on S3 without importing them into a local database. +- **Write Strategy:** Daily Batch. To avoid the "small file problem" in S3/Parquet, data will be buffered (locally or in a staging table) and flushed as one file per day: `s3://bucket/sales-details/YYYY-MM-DD.parquet`. +- **Voiding Strategy:** Append-only log. A "void" is simply a new record with the same `external-id` and a `status: voided`. The read query will always select the record with the latest timestamp for a given ID. + +--- + +## Implementation Units + +- U1. **S3 Storage & DuckDB Infrastructure** + +**Goal:** Setup the S3 bucket structure and the DuckDB connection utility. + +**Requirements:** R1, R3 + +**Dependencies:** None + +**Files:** +- Create: `src/clj/auto_ap/storage/parquet.clj` (DuckDB connection and S3 config) + +**Approach:** +- Implement a `with-duckdb` wrapper that initializes DuckDB, loads the `httpfs` extension, and configures S3 credentials. + +**Verification:** +- A test that can run a simple `SELECT 1` via DuckDB. + +--- + +- U2. **Parquet Writer Implementation** + +**Goal:** Create a service to convert sales maps into Parquet files and upload them to S3. + +**Requirements:** R1 + +**Dependencies:** U1 + +**Files:** +- Modify: `src/clj/auto_ap/storage/parquet.clj` +- Test: `test/clj/auto_ap/storage/parquet_test.clj` + +**Approach:** +- Implement a `flush-to-parquet` function that takes a collection of maps and uses a library to create the file. +- Implement the S3 upload logic. +- **Recovery:** Implement a "flush-log" in the local SQLite WAL. Mark records as `flushed: true` only after receiving a successful 200 OK from S3. On startup, the system should check for unflushed records and trigger a retry. + +**Test scenarios:** +- Happy path: Write a list of 10 sales orders to a Parquet file and verify it exists on S3. +- Error path: Simulate an S3 connection failure during flush and verify that records remain in the local WAL and are successfully flushed on the next attempt. +- Edge case: Handle empty data sets without creating empty files. + +**Verification:** +- Successful upload of a Parquet file that is readable by an external DuckDB CLI. + +--- + +- U3. **Redirect Production Flow** + +**Goal:** Change the Square/EzCater integrations to write to the Parquet writer instead of Datomic. + +**Requirements:** R1 + +**Dependencies:** U2 + +**Files:** +- Modify: `src/clj/auto_ap/square/core3.clj` +- Modify: `src/clj/auto_ap/ezcater/core.clj` +- Modify: `src/clj/auto_ap/routes/ezcater_xls.clj` + +**Approach:** +- Replace `dc/transact` calls for detailed sales/charges with calls to the new `parquet/write` service. +- *Note:* Keep the transaction for any related entities that must stay in Datomic (e.g., Client updates). + +**Verification:** +- Run a Square import and verify that no new detailed entities appear in Datomic, but a new Parquet file is created. + +--- + +- U4. **DuckDB Read Layer for Views** + +**Goal:** Update the "Sales Orders" and "Payments" views to fetch data from DuckDB. + +**Requirements:** R3, R5 + +**Dependencies:** U1 + +**Files:** +- Modify: `src/clj/auto_ap/datomic/sales_orders.clj` +- Modify: `src/clj/auto_ap/ssr/payments.clj` +- Test: `test/clj/auto_ap/integration/graphql/checks.clj` + +**Approach:** +- Replace Datomic `q` and `pull` calls with DuckDB SQL queries. +- **Performance:** To optimize pagination, implement a "Metadata Index" file on S3 (or a Datomic entity) that stores the total record count per day. Use this to calculate pagination totals without scanning all Parquet files. +- **Deterministic Voids:** Use a combination of `timestamp` and a monotonic `sequence_number` for the `QUALIFY` clause to ensure deterministic results for records updated in the same millisecond. +- Map DuckDB result sets back to the existing map formats used by the views to minimize frontend changes. + +**Test scenarios:** +- Happy path: List payments for a client across a date range. +- Integration: Void a payment in S3 and verify the view shows it as voided. +- Performance: Verify pagination totals load in < 200ms using the metadata index. +- Edge case: Handle two updates to the same record in the same millisecond and verify the latest sequence number wins. + +**Verification:** +- The Payments table in the UI loads correctly and reflects the data in S3. + +--- + +- U5. **Update Summary Aggregation Job** + +**Goal:** Update the `sales-summaries` job to calculate totals using DuckDB. + +**Requirements:** R2, R4 + +**Dependencies:** U1 + +**Files:** +- Modify: `src/clj/auto_ap/jobs/sales_summaries.clj` + +**Approach:** +- In `get-payment-items`, `get-discounts`, `get-tax`, etc., replace the `dc/q` calls with DuckDB SQL `SUM` and `GROUP BY` queries against the daily Parquet files. +- Ensure the results are still written to the `sales-summary` entities in Datomic. + +**Verification:** +- Run the `sales-summaries-v2` job and verify that the resulting Datomic summaries match the values in the S3 Parquet files. + +--- + +- U6. **Historical Data Migration** + +**Goal:** Move all existing detailed sales data from Datomic to Parquet files. + +**Requirements:** R1 + +**Dependencies:** U2 + +**Files:** +- Create: `src/clj/auto_ap/migration/sales_to_parquet.clj` + +**Approach:** +- Write a script that iterates through all historical sales orders and payments in Datomic. +- Group them by **Business Date** (the date of the sale, not the transaction date) to ensure consistency with future DuckDB queries. +- Write each day's data to the corresponding `YYYY-MM-DD.parquet` file on S3. +- Log any records with missing dates to a "dead-letter" file for manual review. + +**Verification:** +- Count of records in Datomic vs count of records in S3. + +--- + +- U7. **Datomic Cleanup** + +**Goal:** Remove the detailed data from Datomic to reclaim space. + +**Requirements:** R1 + +**Dependencies:** U6 + +**Files:** +- Create: `src/clj/auto_ap/migration/cleanup_sales.clj` + +**Approach:** +- Use `[:db/retractEntity ...]` to remove all `#:sales-order`, `#:charge`, and `#:sales-refund` entities. +- **Batching:** Perform retractions in batches (e.g., by month) with a cooldown period between batches to avoid excessive Datomic transaction log bloat and performance degradation. +- *Safety:* Only run this after verifying U6 and U4. + +**Verification:** +- Datomic database size decreases; detailed queries in Datomic return empty, while DuckDB queries return data. + +--- + +## System-Wide Impact + +- **Interaction graph:** The integration cores now depend on the Parquet/S3 service. The SSR views and Background Jobs now depend on the DuckDB service. +- **Error propagation:** S3 downtime will now cause "Sales Orders" views to fail and the Summary Job to fail. We should implement basic retry logic in the DuckDB wrapper. +- **State lifecycle risks:** There is a window between the "production" of a sale and the "flush" to Parquet. If the app crashes before a flush, data could be lost. *Mitigation:* Use a small local SQLite file as a write-ahead log for the daily buffer. + +--- + +## Risks & Dependencies + +| Risk | Mitigation | +|------|------------| +| S3 Latency for Views | Use DuckDB's caching and only query the files for the requested date range. | +| Data Loss before Flush | Implement a local SQLite staging file for the current day's data. | +| Schema Drift | Use a strict schema for Parquet files; handle missing columns in SQL with `COALESCE`. | + +--- + +## Sources & References + +- Related code: `src/clj/auto_ap/jobs/sales_summaries.clj` +- Related code: `src/clj/auto_ap/ssr/payments.clj` +- External docs: [DuckDB S3 Integration](https://duckdb.org/docs/extensions/httpfs) diff --git a/parquet-wal/test-type.jsonl b/parquet-wal/test-type.jsonl new file mode 100644 index 00000000..5be26593 --- /dev/null +++ b/parquet-wal/test-type.jsonl @@ -0,0 +1,2 @@ +{"seq-no":1777103077792,"record":{"id":2}}{"seq-no":1777103077984,"record":{"id":1,"name":"test"}}{"seq-no":1777103126496,"record":{"id":2}} +{"seq-no":1777103126692,"record":{"id":1,"name":"test"}} diff --git a/project.clj b/project.clj index aed6fe9b..c90bcaae 100644 --- a/project.clj +++ b/project.clj @@ -93,18 +93,14 @@ [hiccup "2.0.0-alpha2"] - ;; needed for java 11 - [javax.xml.bind/jaxb-api "2.4.0-b180830.0359"] - [io.forward/clojure-mail "1.0.8"] - [lambdaisland/edn-lines "1.0.10"]] - :managed-dependencies [;; explicit dependencies to get to latest versions for above - [com.fasterxml.jackson.core/jackson-core "2.12.0"] - [com.fasterxml.jackson.core/jackson-databind "2.12.0"] - [com.fasterxml.jackson.core/jackson-annotations "2.12.0"] - [com.fasterxml.jackson.dataformat/jackson-dataformat-cbor "2.12.0"] - - [commons-codec "1.12"]] - :plugins [[lein-ring "0.9.7"] +;; needed for java 11 + [javax.xml.bind/jaxb-api "2.4.0-b180830.0359"] + [io.forward/clojure-mail "1.0.8"] + [lambdaisland/edn-lines "1.0.10"] + [org.duckdb/duckdb_jdbc "1.1.0"] + [org.xerial/sqlite-jdbc "3.45.1.0"] + [com.fasterxml.jackson.core/jackson-core "2.12.0"] + [com.fasterxml.jackson.core/jackson-databind "2.12.0"] [lein-cljsbuild "1.1.5"] [lein-ancient "0.6.15"]] :clean-targets ^{:protect false} ["resources/public/js/compiled" "target"] @@ -144,7 +140,7 @@ [com.bhauman/rebel-readline-cljs "0.1.4" :exclusions [org.clojure/clojurescript]] [javax.servlet/servlet-api "2.5"]] :plugins [[lein-pdo "0.1.1"]] - :jvm-opts ["-Dconfig=config/dev.edn" "-Xms4G" "-Xmx20G" "-XX:-OmitStackTraceInFastThrow"]} + :jvm-opts ["-Dconfig=config/dev.edn" "-Xms4G" "-Xmx20G" "-XX:-OmitStackTraceInFastThrow" "-Djava.library.path=/home/noti/.local/lib"]} :uberjar {:java-cmd "/usr/lib/jvm/java-11-openjdk/bin/java" diff --git a/src/clj/auto_ap/datomic/sales_orders.clj b/src/clj/auto_ap/datomic/sales_orders.clj index 09f9c409..f47209d4 100644 --- a/src/clj/auto_ap/datomic/sales_orders.clj +++ b/src/clj/auto_ap/datomic/sales_orders.clj @@ -1,171 +1,180 @@ (ns auto-ap.datomic.sales-orders (:require - [auto-ap.datomic - :refer [add-sorter-fields-2 - apply-pagination - apply-sort-3 - conn - merge-query - pull-id - pull-many - query2 - visible-clients]] - [clj-time.coerce :as c] - [clj-time.core :as time] + [auto-ap.storage.parquet :as pq] + [auto-ap.time :as atime] + [clj-time.coerce :as coerce] [clojure.set :as set] + [clojure.string :as str] [com.brunobonacci.mulog :as mu] - [datomic.api :as dc] - [iol-ion.query])) + [ring.util.codec :as ring-codec])) -(defn <-datomic [result] - (-> result - (update :sales-order/date c/from-date) - (update :sales-order/charges (fn [cs] - (map (fn [c] - (-> c - (update :charge/processor :db/ident) - (set/rename-keys {:expected-deposit/_charges :expected-deposit}) - (update :expected-deposit first))) - cs))))) +(defn- payment-methods->charges [pm-str] + (when (not-empty pm-str) + (mapv (fn [pm] {:charge/type-name pm}) + (str/split pm-str #",")))) -(def default-read '[:db/id - :sales-order/external-id, - :sales-order/location, - :sales-order/date, - :sales-order/total, - :sales-order/tax, - :sales-order/tip, - :sales-order/line-items, - :sales-order/discount, - :sales-order/returns, - :sales-order/service-charge, - :sales-order/vendor, - :sales-order/source, - :sales-order/reference-link, - {:sales-order/client [:client/name :db/id :client/code] - :sales-order/charges [ - :charge/type-name, - :charge/total, - :charge/tax, - :charge/tip, - :charge/external-id, - :charge/note, - :charge/date, - :charge/client, - :charge/location, - :charge/reference-link, - {:charge/processor [:db/ident]} {:expected-deposit/_charges [:db/id]}]}]) +(defn <-row + "Convert a flat parquet row into the shape consumers expect." + [row] + (let [pm (:payment-methods row)] + (-> row + (set/rename-keys + {:external-id :sales-order/external-id + :location :sales-order/location + :total :sales-order/total + :tax :sales-order/tax + :tip :sales-order/tip + :discount :sales-order/discount + :service-charge :sales-order/service-charge + :vendor :sales-order/vendor + :client-code :sales-order/client-code + :date :sales-order/date + :source :sales-order/source + :reference-link :sales-order/reference-link + :payment-methods :sales-order/payment-methods + :processors :sales-order/processors + :categories :sales-order/categories}) + (update :sales-order/date #(some-> % str)) + (dissoc :entity-type :_seq-no) + (assoc :sales-order/charges (payment-methods->charges pm))))) -(defn raw-graphql-ids [db args] - (let [visible-clients (set (map :db/id (:clients args))) - selected-clients (->> (cond - (:client-id args) - (set/intersection #{(:client-id args)} - visible-clients) +(defn build-where-clause [args] + (let [clauses (keep identity + [(when-let [c (:client-code args)] + (str "external_id.client = '" c "'")) + (when-let [v (:vendor args)] + (str "external_id.vendor = '" (name v) "'")) + (when-let [l (:location args)] + (str "location = '" l "'"))])] + (when (seq clauses) + (str "WHERE " (str/join " AND " clauses))))) +(defn build-sort-clause [args] + (let [sort (or (:sort args) "date") + order (or (:order args) "DESC")] + (str "ORDER BY " sort " " order))) - (:client-code args) - (set/intersection #{(pull-id db [:client/code (:client-code args)])} - visible-clients) +(def page-size 100) - :else - visible-clients) - (take 10) - set) - _ (mu/log ::selected-clients - :selected-clients selected-clients) - query (cond-> {:query {:find [] - :in ['$ '[?clients ?start-date ?end-date]] - :where '[[(iol-ion.query/scan-sales-orders $ ?clients ?start-date ?end-date) [[?e _ ?sort-default] ...]]]} - :args [db [selected-clients - (some-> (:start (:date-range args)) c/to-date) - (some-> (:end (:date-range args)) c/to-date )]]} +(defn raw-graphql-ids [args] + (let [start (some-> (:start (:date-range args)) .toString) + end (some-> (:end (:date-range args)) (.substring 0 10)) + limit (or (:limit args) page-size) + offset (or (:offset args) 0)] + (when start + (let [result (pq/get-sales-orders start end + {:client (:client-code args) + :vendor (:vendor args) + :location (:location args) + :sort (or (:sort args) "date") + :order "DESC" + :limit limit + :offset offset})] + {:ids (mapv #(str (:external-id %)) (:rows result)) + :rows (:rows result) + :count (:count result)})))) - (:sort args) (add-sorter-fields-2 {"client" ['[?e :sales-order/client ?c] - '[?c :client/name ?sort-client]] - "location" ['[?e :sales-order/location ?sort-location]] - "source" ['[?e :sales-order/source ?sort-source]] - "date" ['[?e :sales-order/date ?sort-date]] - "total" ['[?e :sales-order/total ?sort-total]] - "tax" ['[?e :sales-order/tax ?sort-tax]] - "tip" ['[?e :sales-order/tip ?sort-tip]]} - args) - (:category args) - (merge-query {:query {:in ['?category] - :where ['[?e :sales-order/line-items ?li] - '[?li :order-line-item/category ?category]]} - :args [(:category args)]}) +(defn graphql-results [rows _ids _args] + (mapv <-row rows)) - (:processor args) - (merge-query {:query {:in ['?processor] - :where ['[?e :sales-order/charges ?chg] - '[?chg :charge/processor ?processor]]} - :args [(keyword "ccp-processor" - (name (:processor args)))]}) - (:type-name args) - (merge-query {:query {:in ['?type-name] - :where ['[?e :sales-order/charges ?chg] - '[?chg :charge/type-name ?type-name]]} - :args [(:type-name args)]}) +(defn- extract-date-str [v] + (when v + (cond + (instance? org.joda.time.DateTime v) (atime/unparse-local v atime/normal-date) + (instance? org.joda.time.LocalDate v) (atime/unparse-local v atime/normal-date) + (instance? java.util.Date v) (atime/unparse-local (coerce/to-date-time v) atime/normal-date) + (instance? java.time.LocalDate v) (.toString v) + (string? v) (if (re-find #"^\d{2}/\d{2}/\d{4}" v) + (-> (java.time.LocalDate/parse v (java.time.format.DateTimeFormatter/ofPattern "MM/dd/yyyy")) + .toString) + (if (> (count v) 10) (.substring v 0 10) v)) + :else (str v)))) - (:total-gte args) - (merge-query {:query {:in ['?total-gte] - :where ['[?e :sales-order/total ?a] - '[(>= ?a ?total-gte)]]} - :args [(:total-gte args)]}) +(defn- get-date [qp k] + (or (extract-date-str (get qp k)) + (extract-date-str (get qp (name k))))) - (:total-lte args) - (merge-query {:query {:in ['?total-lte] - :where ['[?e :sales-order/total ?a] - '[(<= ?a ?total-lte)]]} - :args [(:total-lte args)]}) +(defn- kw->str [v] + (when (some? v) + (if (keyword? v) (name v) (str v)))) - (:total args) - (merge-query {:query {:in ['?total] - :where ['[?e :sales-order/total ?sales-order-total] - '[(iol-ion.query/dollars= ?sales-order-total ?total)]]} - :args [(:total args)]}) +(defn- qp->opts [qp] + (let [sort-params (:sort qp) + sort-key (when (seq sort-params) (-> sort-params first :name)) + sort-dir (when (seq sort-params) (-> sort-params first :dir))] + (cond-> {} + (some? (:client-code qp)) (assoc :client (kw->str (:client-code qp))) + (some? (:location qp)) (assoc :location (kw->str (:location qp))) + (not-empty (:payment-method qp)) (assoc :payment-method (:payment-method qp)) + (some? (:processor qp)) (assoc :processor (kw->str (:processor qp))) + (not-empty (:category qp)) (assoc :category (:category qp)) + (:total-gte qp) (assoc :total-gte (:total-gte qp)) + (:total-lte qp) (assoc :total-lte (:total-lte qp)) + sort-key (assoc :sort sort-key) + sort-dir (assoc :order (or sort-dir "DESC")) + true (assoc :limit (or (:per-page qp) 25) + :offset (or (:start qp) 0))))) - true - (merge-query {:query {:find ['?date '?e] - :where ['[?e :sales-order/date ?date]]}}))] +(defn- default-date-range [] + (let [today (.toString (java.time.LocalDate/now)) + week-ago (.toString (.minusDays (java.time.LocalDate/now) 7))] + [week-ago today])) - (cond->> (query2 query) - true (apply-sort-3 (assoc args :default-asc? false)) - true (apply-pagination args)))) +(defn- qp->date-range [qp] + (let [[default-start default-end] (default-date-range)] + [(or (get-date qp :start-date) + (extract-date-str (get-in qp [:date-range :start])) + default-start) + (or (get-date qp :end-date) + (extract-date-str (get-in qp [:date-range :end])) + default-end)])) -(defn graphql-results [ids db _] - (let [results (->> (pull-many db default-read ids) - (group-by :db/id)) - payments (->> ids - (map results) - (map first) - (mapv <-datomic))] - payments)) +(defn- request->client-codes [request] + (let [clients (:clients request) + codes (keep :client/code clients)] + (when (seq codes) codes))) -(defn summarize-orders [ids] +(defn fetch-page-ssr + "Fetch sales orders from parquet for the SSR page." + [request] + (let [qp (:query-params request) + raw-qp (some-> (:query-string request) + ring-codec/form-decode + (->> (into {} (remove (fn [[_ v]] (str/blank? v)))))) + [start end] (qp->date-range (merge raw-qp qp)) + opts (qp->opts qp) + client-codes (request->client-codes request) + opts (if client-codes (assoc opts :client-codes client-codes) opts) + result (pq/get-sales-orders start end opts) + rows (mapv <-row (:rows result))] + {:rows rows :count (:count result)})) - (let [[total tax] (->> - (dc/q {:find ['(sum ?t) '(sum ?tax)] - :with ['?id] - :in ['$ '[?id ...]] - :where ['[?id :sales-order/total ?t] - '[?id :sales-order/tax ?tax]]} - (dc/db conn) - ids) - first)] - {:total total - :tax tax})) +(defn summarize-page-ssr + "Summarize all matching sales orders via parquet." + [request] + (let [qp (:query-params request) + raw-qp (some-> (:query-string request) + ring-codec/form-decode + (->> (into {} (remove (fn [[_ v]] (str/blank? v)))))) + [start end] (qp->date-range (merge raw-qp qp)) + opts (dissoc (qp->opts qp) :limit :offset :sort :order) + client-codes (request->client-codes request) + opts (if client-codes (assoc opts :client-codes client-codes) opts)] + (pq/get-sales-orders-summary start end opts))) + +(defn summarize-orders [rows] + (when (seq rows) + (let [total (reduce + 0.0 (map #(or (:sales-order/total %) 0.0) rows)) + tax (reduce + 0.0 (map #(or (:sales-order/tax %) 0.0) rows))] + {:total total + :tax tax}))) (defn get-graphql [args] - (let [db (dc/db conn) - {ids-to-retrieve :ids matching-count :count} (mu/trace ::get-sales-order-ids [] (raw-graphql-ids db args))] - [(->> (mu/trace ::get-results [] (graphql-results ids-to-retrieve db args))) - matching-count - (summarize-orders ids-to-retrieve)])) + (let [{:keys [ids rows count]} (mu/trace ::get-sales-order-ids [] (raw-graphql-ids args))] + [(mu/trace ::get-results [] (graphql-results rows ids args)) + count + (summarize-orders rows)])) (defn summarize-graphql [args] - (let [db (dc/db conn) - {ids-to-retrieve :ids matching-count :count} (mu/trace ::get-sales-order-ids [] (raw-graphql-ids db args))] - (summarize-orders ids-to-retrieve))) - + (let [{:keys [rows]} (raw-graphql-ids args)] + (summarize-orders rows))) \ No newline at end of file diff --git a/src/clj/auto_ap/ezcater/core.clj b/src/clj/auto_ap/ezcater/core.clj index 900a214a..ee468234 100644 --- a/src/clj/auto_ap/ezcater/core.clj +++ b/src/clj/auto_ap/ezcater/core.clj @@ -1,6 +1,7 @@ (ns auto-ap.ezcater.core (:require - [auto-ap.datomic :refer [conn random-tempid]] + [auto-ap.datomic :refer [conn random-tempid]] + [auto-ap.storage.parquet :as parquet] [datomic.api :as dc] [clj-http.client :as client] [venia.core :as v] @@ -20,42 +21,41 @@ :body (json/write-str {"query" (v/graphql-query q)}) :as :json}) :body - :data - )) + :data)) (defn get-caterers [integration] (:caterers (query integration {:venia/queries [{:query/data - [:caterers [:name :uuid [:address [:name :street]]]]}]} ))) + [:caterers [:name :uuid [:address [:name :street]]]]}]}))) (defn get-subscriptions [integration] (->> (query integration {:venia/queries [{:query/data - [:subscribers [:id [:subscriptions [:parentId :parentEntity :eventEntity :eventKey]] ]]}]} ) + [:subscribers [:id [:subscriptions [:parentId :parentEntity :eventEntity :eventKey]]]]}]}) :subscribers first :subscriptions)) (defn get-integrations [] (map first (dc/q '[:find (pull ?i [:ezcater-integration/api-key - :ezcater-integration/subscriber-uuid - :db/id - :ezcater-integration/integration-status [:db/id]]) - :in $ - :where [?i :ezcater-integration/api-key]] - (dc/db conn)))) + :ezcater-integration/subscriber-uuid + :db/id + {:ezcater-integration/integration-status [:db/id]}]) + :in $ + :where [?i :ezcater-integration/api-key]] + (dc/db conn)))) (defn mark-integration-status [integration integration-status] @(dc/transact conn - [{:db/id (:db/id integration) - :ezcater-integration/integration-status (assoc integration-status - :db/id (or (-> integration :ezcater-integration/integration-status :db/id) - (random-tempid)))}])) + [{:db/id (:db/id integration) + :ezcater-integration/integration-status (assoc integration-status + :db/id (or (-> integration :ezcater-integration/integration-status :db/id) + (random-tempid)))}])) (defn upsert-caterers ([integration] @(dc/transact conn (for [caterer (get-caterers integration)] - {:db/id (:db/id integration) + {:db/id (:db/id integration) :ezcater-integration/caterers [{:ezcater-caterer/name (str (:name caterer) " (" (:street (:address caterer)) ")") :ezcater-caterer/search-terms (str (:name caterer) " " (:street (:address caterer))) :ezcater-caterer/uuid (:uuid caterer)}]})))) @@ -64,14 +64,14 @@ ([integration] (let [extant (get-subscriptions integration) to-ensure (set (map first (dc/q '[:find ?cu - :in $ - :where [_ :client/ezcater-locations ?el] - [?el :ezcater-location/caterer ?c] - [?c :ezcater-caterer/uuid ?cu]] - (dc/db conn)))) + :in $ + :where [_ :client/ezcater-locations ?el] + [?el :ezcater-location/caterer ?c] + [?c :ezcater-caterer/uuid ?cu]] + (dc/db conn)))) to-create (set/difference - to-ensure - (set (map :parentId extant)))] + to-ensure + (set (map :parentId extant)))] (doseq [parentId to-create] (query integration {:venia/operation {:operation/type :mutation @@ -94,7 +94,6 @@ :eventKey 'cancelled}} [[:subscription [:parentId :parentEntity :eventEntity :eventKey]]]]]}))))) - #_{:clj-kondo/ignore [:clojure-lsp/unused-public-var]} (defn upsert-ezcater ([] (upsert-ezcater (get-integrations))) @@ -115,12 +114,11 @@ (defn get-caterer [caterer-uuid] (dc/pull (dc/db conn) - '[:ezcater-caterer/name - {:ezcater-integration/_caterers [:ezcater-integration/api-key]} - {:ezcater-location/_caterer [:ezcater-location/location - {:client/_ezcater-locations [:client/code]}]}] - [:ezcater-caterer/uuid caterer-uuid])) - + '[:ezcater-caterer/name + {:ezcater-integration/_caterers [:ezcater-integration/api-key]} + {:ezcater-location/_caterer [:ezcater-location/location + {:client/_ezcater-locations [:client/code]}]}] + [:ezcater-caterer/uuid caterer-uuid])) (defn round-carry-cents [f] (with-precision 2 (double (.setScale (bigdec f) 2 java.math.RoundingMode/HALF_UP)))) @@ -135,126 +133,159 @@ 0.15M :else 0.07M)] - (round-carry-cents - (* commision% - 0.01M - (+ - (-> order :totals :subTotal :subunits ) - (reduce + - 0 - (map (comp :subunits :cost) (:feesAndDiscounts (:catererCart order))))))))) - -(defn ccp-fee [order] - (round-carry-cents - (* 0.000299M - (+ - (-> order :totals :subTotal :subunits ) - (-> order :totals :salesTax :subunits ) + (round-carry-cents + (* commision% + 0.01M + (+ + (-> order :totals :subTotal :subunits) (reduce + 0 - (map (comp :subunits :cost) (:feesAndDiscounts (:catererCart order)))))))) + (map (comp :subunits :cost) (:feesAndDiscounts (:catererCart order))))))))) + +(defn ccp-fee [order] + (round-carry-cents + (* 0.000299M + (+ + (-> order :totals :subTotal :subunits) + (-> order :totals :salesTax :subunits) + (reduce + + 0 + (map (comp :subunits :cost) (:feesAndDiscounts (:catererCart order)))))))) (defn order->sales-order [{{:keys [timestamp]} :event {:keys [orderItems]} :catererCart :keys [client-code client-location uuid] :as order}] (let [adjustment (round-carry-cents (- (+ (-> order :totals :subTotal :subunits (* 0.01)) (-> order :totals :salesTax :subunits (* 0.01))) - (-> order :catererCart :totals :catererTotalDue ) + (-> order :catererCart :totals :catererTotalDue) (commision order) (ccp-fee order))) service-charge (+ (commision order) (ccp-fee order)) tax (-> order :totals :salesTax :subunits (* 0.01)) tip (-> order :totals :tip :subunits (* 0.01))] #:sales-order - {:date (atime/localize (coerce/to-date-time timestamp)) - :external-id (str "ezcater/order/" client-code "-" client-location "-" uuid) - :client [:client/code client-code] - :location client-location - :reference-link (str (url/url "https://ezmanage.ezcater.com/orders/" uuid )) - :line-items [#:order-line-item - {:external-id (str "ezcater/order/" client-code "-" client-location "-" uuid "-" 0) - :item-name "EZCater Catering" - :category "EZCater Catering" - :discount adjustment - :tax tax - :total (+ (-> order :totals :subTotal :subunits (* 0.01)) - tax - tip)}] - :charges [#:charge - {:type-name "CARD" - :date (atime/localize (coerce/to-date-time timestamp)) - :client [:client/code client-code] - :location client-location - :external-id (str "ezcater/charge/" uuid) - :processor :ccp-processor/ezcater - :total (+ (-> order :totals :subTotal :subunits (* 0.01)) - tax - tip) - :tip tip}] + {:date (atime/localize (coerce/to-date-time timestamp)) + :external-id (str "ezcater/order/" client-code "-" client-location "-" uuid) + :client [:client/code client-code] + :location client-location + :reference-link (str (url/url "https://ezmanage.ezcater.com/orders/" uuid)) + :line-items [#:order-line-item + {:external-id (str "ezcater/order/" client-code "-" client-location "-" uuid "-" 0) + :item-name "EZCater Catering" + :category "EZCater Catering" + :discount adjustment + :tax tax + :total (+ (-> order :totals :subTotal :subunits (* 0.01)) + tax + tip)}] + :charges [#:charge + {:type-name "CARD" + :date (atime/localize (coerce/to-date-time timestamp)) + :client [:client/code client-code] + :location client-location + :external-id (str "ezcater/charge/" uuid) + :processor :ccp-processor/ezcater + :total (+ (-> order :totals :subTotal :subunits (* 0.01)) + tax + tip) + :tip tip}] - :total (+ (-> order :totals :subTotal :subunits (* 0.01)) - tax - tip) - :discount adjustment - :service-charge service-charge - :tax tax - :tip tip - :returns 0.0 - :vendor :vendor/ccp-ezcater})) + :total (+ (-> order :totals :subTotal :subunits (* 0.01)) + tax + tip) + :discount adjustment + :service-charge service-charge + :tax tax + :tip tip + :returns 0.0 + :vendor :vendor/ccp-ezcater})) +(defn- flatten-order-to-parquet! [order] + "Flatten a sales-order into entity-type tagged maps and buffer to parquet." + (let [so-ext-id (:sales-order/external-id order) + so-date (some-> (:sales-order/date order) .toString) + client (:sales-order/client order) + client-code (if (map? client) (:client/code client) client)] + (parquet/buffer! "sales-order" + {:entity-type "sales-order" + :external-id so-ext-id + :client-code client-code + :location (:sales-order/location order) + :vendor (:sales-order/vendor order) + :total (:sales-order/total order) + :tax (:sales-order/tax order) + :tip (:sales-order/tip order) + :discount (:sales-order/discount order) + :service-charge (:sales-order/service-charge order) + :date so-date}) + (when-let [charges (:sales-order/charges order)] + (doseq [chg charges] + (parquet/buffer! "charge" + {:entity-type "charge" + :external-id (:charge/external-id chg) + :type-name (:charge/type-name chg) + :total (:charge/total chg) + :tax (:charge/tax chg) + :tip (:charge/tip chg) + :date so-date + :processor (some-> (:charge/processor chg) name) + :sales-order-external-id so-ext-id}))) + (when-let [items (:sales-order/line-items order)] + (doseq [li items] + (parquet/buffer! "line-item" + {:entity-type "line-item" + :item-name (:order-line-item/item-name li) + :category (:order-line-item/category li) + :total (:order-line-item/total li) + :tax (:order-line-item/tax li) + :discount (:order-line-item/discount li) + :sales-order-external-id so-ext-id}))))) (defn get-by-id [integration id] - (query - integration - {:venia/queries [[:order {:id id} - [:uuid - :orderNumber - :orderSourceType - [:caterer - [:name - :uuid - [:address [:street]]]] - [:event - [:timestamp - :catererHandoffFoodTime - :orderType]] - [:catererCart [[:orderItems - [:name - :quantity - :posItemId - [:totalInSubunits - [:currency - :subunits]]]] - [:totals - [:catererTotalDue]] - [:feesAndDiscounts - {:type 'DELIVERY_FEE} - [[:cost - [:currency - :subunits]]]]]] - [:totals [[:customerTotalDue - [ - :currency - :subunits - ]] - [:pointOfSaleIntegrationFee - [ - :currency - :subunits - ]] - [:tip - [:currency - :subunits]] - [:salesTax - [ - :currency - :subunits - ]] - [:salesTaxRemittance - [:currency - :subunits - ]] - [:subTotal - [:currency - :subunits]]]]]]]})) + (query + integration + {:venia/queries [[:order {:id id} + [:uuid + :orderNumber + :orderSourceType + [:caterer + [:name + :uuid + [:address [:street]]]] + [:event + [:timestamp + :catererHandoffFoodTime + :orderType]] + [:catererCart [[:orderItems + [:name + :quantity + :posItemId + [:totalInSubunits + [:currency + :subunits]]]] + [:totals + [:catererTotalDue]] + [:feesAndDiscounts + {:type 'DELIVERY_FEE} + [[:cost + [:currency + :subunits]]]]]] + [:totals [[:customerTotalDue + [:currency + :subunits]] + [:pointOfSaleIntegrationFee + [:currency + :subunits]] + [:tip + [:currency + :subunits]] + [:salesTax + [:currency + :subunits]] + [:salesTaxRemittance + [:currency + :subunits]] + [:subTotal + [:currency + :subunits]]]]]]]})) (defn lookup-order [json] (let [caterer (get-caterer (get json "parent_id")) @@ -262,26 +293,31 @@ client (-> caterer :ezcater-location/_caterer first :client/_ezcater-locations :client/code) location (-> caterer :ezcater-location/_caterer first :ezcater-location/location)] (if (and client location) - (doto - (-> (get-by-id integration (get json "entity_id")) - (:order) - (assoc :client-code client - :client-location location)) + (doto + (-> (get-by-id integration (get json "entity_id")) + (:order) + (assoc :client-code client + :client-location location)) (#(alog/info ::order-details :detail %))) (alog/warn ::caterer-no-longer-has-location :json json)))) (defn import-order [json] - ;; {"id" "bf3dcf5c-a68f-42d9-9084-049133e03d3d", "parent_type" "Caterer", "parent_id" "91541331-d7ae-4634-9e8b-ccbbcfb2ce70", "entity_type" "Order", "entity_id" "9ab05fee-a9c5-483b-a7f2-14debde4b7a8", "key" "accepted", "occurred_at" "2022-07-21T19:21:07.549Z"} (alog/info - ::try-import-order - :json json) - @(dc/transact conn (filter identity - [(some-> json - (lookup-order) - (order->sales-order) - (update :sales-order/date coerce/to-date) - (update-in [:sales-order/charges 0 :charge/date] coerce/to-date))]))) - + ::try-import-order + :json json) + (when-let [order (some-> json + (lookup-order) + (order->sales-order) + (update :sales-order/date coerce/to-date) + (update-in [:sales-order/charges 0 :charge/date] coerce/to-date))] + (try + (flatten-order-to-parquet! order) + (alog/info ::order-buffered + :external-id (:sales-order/external-id order)) + (catch Exception e + (alog/error ::buffer-failed + :exception e + :order (:sales-order/external-id order)))))) (defn upsert-recent [] (upsert-ezcater) (let [last-sunday (coerce/to-date (time/plus (second (->> (time/today) @@ -312,30 +348,30 @@ "key" "accepted", "occurred_at" "2022-07-21T19:21:07.549Z"} ezcater-order (lookup-order lookup-map) - extant-order (dc/pull (dc/db conn) '[:sales-order/total - :sales-order/tax - :sales-order/tip - :sales-order/discount - :sales-order/external-id - {:sales-order/charges [:charge/tax - :charge/tip - :charge/total - :charge/external-id] - :sales-order/line-items [:order-line-item/external-id - :order-line-item/total - :order-line-item/tax - :order-line-item/discount]}] - [:sales-order/external-id order]) + extant-order (dc/pull (dc/db conn '[:sales-order/total] + :sales-order/tax + :sales-order/tip + :sales-order/discount + :sales-order/external-id + {:sales-order/charges [:charge/tax + :charge/tip + :charge/total + :charge/external-id] + :sales-order/line-items [:order-line-item/external-id + :order-line-item/total + :order-line-item/tax + :order-line-item/discount]}) + [:sales-order/external-id order]) updated-order (-> (order->sales-order ezcater-order) (select-keys - #{:sales-order/total - :sales-order/tax - :sales-order/tip - :sales-order/discount - :sales-order/charges - :sales-order/external-id - :sales-order/line-items}) + #{:sales-order/total + :sales-order/tax + :sales-order/tip + :sales-order/discount + :sales-order/charges + :sales-order/external-id + :sales-order/line-items}) (update :sales-order/line-items (fn [c] (map #(select-keys % #{:order-line-item/external-id diff --git a/src/clj/auto_ap/jobs/sales_summaries.clj b/src/clj/auto_ap/jobs/sales_summaries.clj index 9ee556be..c66e6fe9 100644 --- a/src/clj/auto_ap/jobs/sales_summaries.clj +++ b/src/clj/auto_ap/jobs/sales_summaries.clj @@ -3,6 +3,7 @@ [auto-ap.jobs.core :refer [execute]] [auto-ap.logging :as alog] [auto-ap.time :as atime] + [auto-ap.storage.parquet :as pq] [clj-time.coerce :as c] [clj-time.core :as time] [clj-time.periodic :as per] @@ -39,17 +40,14 @@ (dc/db conn) number))) - (defn delete-all [] @(dc/transact-async conn - (->> - (dc/q '[:find ?ss - :where [?ss :sales-summary/date]] - (dc/db conn)) - (map (fn [[ ss]] - [:db/retractEntity ss]))))) - - + (->> + (dc/q '[:find ?ss + :where [?ss :sales-summary/date]] + (dc/db conn)) + (map (fn [[ss]] + [:db/retractEntity ss]))))) (defn dirty-sales-summaries [c] (let [client-id (dc/entid (dc/db conn) c)] @@ -98,101 +96,86 @@ "card refunds" 41400 "food app refunds" 41400}) -(defn get-payment-items [c date] - (->> - (dc/q '[:find ?processor ?type-name (sum ?total) - :with ?c - :in $ [?clients ?start-date ?end-date] - :where [(iol-ion.query/scan-sales-orders $ ?clients ?start-date ?end-date) [[?e _ ?sort-default] ...]] - [?e :sales-order/charges ?c] - [?c :charge/type-name ?type-name] - (or-join [?c ?processor] - (and [?c :charge/processor ?p] - [?p :db/ident ?processor]) - (and - (not [?c :charge/processor]) - [(ground :ccp-processor/na) ?processor])) - [?c :charge/total ?total]] - (dc/db conn) - [[c] date date]) - (reduce - (fn [acc [processor type-name total]] - (update - acc - (cond (= type-name "CARD") - "Card Payments" - (= type-name "CASH") - "Cash Payments" - (#{"SQUARE_GIFT_CARD" "WALLET" "GIFT_CARD"} type-name) - "Gift Card Payments" - (#{:ccp-processor/toast - #_:ccp-processor/ezcater - #_:ccp-processor/koala - :ccp-processor/doordash - :ccp-processor/grubhub - :ccp-processor/uber-eats} processor) - "Food App Payments" - :else - "Unknown") - (fnil + 0.0) - total)) - {}) - (map (fn [[k v]] - {:db/id (str (java.util.UUID/randomUUID)) - :sales-summary-item/sort-order 0 - :sales-summary-item/category k - - :ledger-mapped/amount (if (= "Card Payments" k) - (- v (get-fee c date)) - v) - :ledger-mapped/ledger-side :ledger-side/debit})))) +(defn- get-payment-items-parquet [c date] + (let [date-str (.toString date)] + (when-let [rows (seq (pq/query-deduped "charge" date-str date-str))] + (let [client-code (if (map? c) (:client/code c) c) + filtered (filter #(= client-code (:client-code %)) rows)] + (reduce + (fn [acc {:keys [processor type-name total]}] + (update acc + (cond + (= type-name "CARD") "Card Payments" + (= type-name "CASH") "Cash Payments" + (#{"SQUARE_GIFT_CARD" "WALLET" "GIFT_CARD"} type-name) "Gift Card Payments" + (#{"doordash" "grubhub" "uber-eats"} processor) "Food App Payments" + :else "Unknown") + (fnil + 0.0) + (or total 0.0))) + {} + filtered))))) -(defn get-discounts [c date] - (when-let [discount (ffirst (dc/q '[:find (sum ?discount) - :with ?e - :in $ [?clients ?start-date ?end-date] - :where [(iol-ion.query/scan-sales-orders $ ?clients ?start-date ?end-date) [[?e _ ?sort-default] ...]] - [?e :sales-order/discount ?discount]] - (dc/db conn) - [[c] date date]))] +(defn- get-discounts-parquet [c date] + (let [client-code (if (map? c) (:client/code c) c) + date-str (.toString date) + discount (auto-ap.storage.sales-summaries/sum-discounts client-code date-str date-str)] + (when (and discount (pos? discount)) + {:db/id (str (java.util.UUID/randomUUID)) + :sales-summary-item/sort-order 1 + :sales-summary-item/category "Discounts" + :ledger-mapped/amount discount + :ledger-mapped/ledger-side :ledger-side/debit}))) + +(defn- get-refund-items-parquet [c date] + (let [client-code (if (map? c) (:client/code c) c) + date-str (.toString date) + refunds (auto-ap.storage.sales-summaries/sum-refunds-by-type client-code date-str date-str)] + (when (seq refunds) + (map (fn [[type-name total]] + {:db/id (str (java.util.UUID/randomUUID)) + :sales-summary-item/sort-order 3 + :sales-summary-item/category (cond + (= type-name "CARD") "Card Refunds" + (= type-name "CASH") "Cash Refunds" + :else "Food App Refunds") + :ledger-mapped/amount total + :ledger-mapped/ledger-side :ledger-side/credit}) + refunds)))) + +(defn- get-tax-parquet [c date] + (let [client-code (if (map? c) (:client/code c) c) + date-str (.toString date) + tax (auto-ap.storage.sales-summaries/sum-taxes client-code date-str date-str)] {:db/id (str (java.util.UUID/randomUUID)) + :sales-summary-item/category "Tax" :sales-summary-item/sort-order 1 - :sales-summary-item/category "Discounts" - :ledger-mapped/amount discount - :ledger-mapped/ledger-side :ledger-side/debit})) - -(defn get-refund-items [c date] - (->> - (dc/q '[:find ?type-name (sum ?t) - :with ?e - :in $ [?clients ?start-date ?end-date] - :where - :where [(iol-ion.query/scan-sales-refunds $ ?clients ?start-date ?end-date) [[?e _ ?sort-default] ...]] - [?e :sales-refund/type ?type-name] - [?e :sales-refund/total ?t]] - (dc/db conn) - [[c] date date]) - (reduce - (fn [acc [type-name total]] - (update - acc - (cond (= type-name "CARD") - "Card Refunds" - (= type-name "CASH") - "Cash Refunds" - :else - "Food App Refunds") - (fnil + 0.0) - total)) - {}) - (map (fn [[k v]] - {:db/id (str (java.util.UUID/randomUUID)) - :sales-summary-item/sort-order 3 - :sales-summary-item/category k - :ledger-mapped/amount v - :ledger-mapped/ledger-side :ledger-side/credit})))) + :ledger-mapped/ledger-side :ledger-side/credit + :ledger-mapped/amount (or tax 0.0)})) +(defn- get-tip-parquet [c date] + (let [client-code (if (map? c) (:client/code c) c) + date-str (.toString date) + tip (auto-ap.storage.sales-summaries/sum-tips client-code date-str date-str)] + {:ledger-mapped/ledger-side :ledger-side/credit + :sales-summary-item/sort-order 2 + :db/id (str (java.util.UUID/randomUUID)) + :sales-summary-item/category "Tip" + :ledger-mapped/amount (or tip 0.0)})) +(defn- get-sales-parquet [c date] + (let [client-code (if (map? c) (:client/code c) c) + date-str (.toString date) + sales (auto-ap.storage.sales-summaries/sum-sales-by-category client-code date-str date-str)] + (for [{:keys [category total tax discount]} sales] + {:db/id (str (java.util.UUID/randomUUID)) + :sales-summary-item/category (or category "Unknown") + :sales-summary-item/sort-order 0 + :sales-summary-item/total total + :sales-summary-item/net (- (+ total discount) tax) + :sales-summary-item/tax tax + :sales-summary-item/discount discount + :ledger-mapped/ledger-side :ledger-side/credit + :ledger-mapped/amount (- (+ total discount) tax)}))) (defn get-fees [c date] (when-let [fee (get-fee c date)] @@ -293,19 +276,17 @@ :sales-summary/items (->> - (get-sales c date) - (concat (get-payment-items c date)) - (concat (get-refund-items c date)) - (cons (get-discounts c date)) + (get-sales-parquet c date) + (concat (get-payment-items-parquet c date)) + (concat (get-refund-items-parquet c date)) + (cons (get-discounts-parquet c date)) (cons (get-fees c date)) - (cons (get-tax c date)) - (cons (get-tip c date)) - (cons (get-returns c date)) + (cons (get-tax-parquet c date)) + (cons (get-tip-parquet c date)) (filter identity) (map (fn [z] (assoc z :ledger-mapped/account (some-> z :sales-summary-item/category str/lower-case name->number lookup-account) - :sales-summary-item/manual? false)) - )) }] + :sales-summary-item/manual? false))))}] (if (seq (:sales-summary/items result)) (do (alog/info ::upserting-summaries @@ -313,12 +294,11 @@ @(dc/transact conn [[:upsert-entity result]])) @(dc/transact conn [{:db/id id :sales-summary/dirty false}])))))) -(let [c (auto-ap.datomic/pull-attr (dc/db conn) :db/id [:client/code "NGCL" ]) - date #inst "2024-04-14T00:00:00-07:00"] - (get-payment-items c date) - - ) - +(comment + ;; TODO: Move to test file or proper location + (let [c (auto-ap.datomic/pull-attr (dc/db @conn) :db/id [:client/code "NGCL"]) + date #inst "2024-04-14T00:00:00-07:00"] + (get-payment-items c date))) (defn reset-summaries [] @(dc/transact conn (->> (dc/q '[:find ?sos @@ -328,16 +308,13 @@ (map (fn [[sos]] [:db/retractEntity sos]))))) - - - (comment (auto-ap.datomic/transact-schema conn) @(dc/transact conn [{:db/ident :sales-summary/total-unknown-processor-payments - :db/noHistory true, - :db/valueType :db.type/double - :db/cardinality :db.cardinality/one}]) + :db/noHistory true, + :db/valueType :db.type/double + :db/cardinality :db.cardinality/one}]) (apply mark-dirty [:client/code "NGCL"] (last-n-days 30)) @@ -356,7 +333,7 @@ [?sos :sales-summary/date ?d] [(= ?d #inst "2024-04-10T00:00:00-07:00")]] (dc/db conn)) - + (dc/q '[:find ?n ?p2 (sum ?total) :with ?c :in $ [?clients ?start-date ?end-date] @@ -369,23 +346,18 @@ (dc/db conn) [[(auto-ap.datomic/pull-attr (dc/db conn) :db/id [:client/code "NGHW"])] #inst "2024-04-11T00:00:00-07:00" #inst "2024-04-11T00:00:00-07:00"]) - (dc/q '[:find ?n + (dc/q '[:find ?n :in $ [?clients ?start-date ?end-date] :where [(iol-ion.query/scan-sales-orders $ ?clients ?start-date ?end-date) [[?e _ ?sort-default] ...]] [?e :sales-order/line-items ?li] - [?li :order-line-item/item-name ?n] ] + [?li :order-line-item/item-name ?n]] (dc/db conn) [[(auto-ap.datomic/pull-attr (dc/db conn) :db/id [:client/code "NGCL"])] #inst "2024-04-11T00:00:00-07:00" #inst "2024-04-24T00:00:00-07:00"]) - -@(dc/transact conn [{:db/id :sales-summary/total-tax :db/ident :sales-summary/total-tax-legacy} - {:db/id :sales-summary/total-tip :db/ident :sales-summary/total-tip-legacy}]) - -(auto-ap.datomic/transact-schema conn) - - ) + @(dc/transact conn [{:db/id :sales-summary/total-tax :db/ident :sales-summary/total-tax-legacy} + {:db/id :sales-summary/total-tip :db/ident :sales-summary/total-tip-legacy}]) + (auto-ap.datomic/transact-schema conn)) (defn -main [& _] (execute "sales-summaries" sales-summaries-v2)) - \ No newline at end of file diff --git a/src/clj/auto_ap/migration/cleanup_sales.clj b/src/clj/auto_ap/migration/cleanup_sales.clj new file mode 100644 index 00000000..87346eec --- /dev/null +++ b/src/clj/auto_ap/migration/cleanup_sales.clj @@ -0,0 +1,220 @@ +(ns auto-ap.migration.cleanup-sales + (:require [auto-ap.datomic :refer [conn]] + [auto-ap.storage.parquet :as pq] + [amazonica.aws.s3 :as s3] + [datomic.api :as d-api] + [clojure.string :as str])) + +(def ^:private BATCH-SIZE 1000) +(def ^:private DRY-RUN? true) + +(defn- set-dry-run! [v] + (alter-var-root #'DRY-RUN? (constantly v))) + +; -- query helpers + +(defn- query-sales-order-ids + "Return all entity IDs that have :sales-order/external-id." + [db] + (->> (d-api/q '[:find ?e + :where [?e :sales-order/external-id]] + db) + (map first))) + +(defn- collect-child-ids + "Gather child entity IDs for a batch of sales orders. Returns map with + keys :orders, :charges, :line-items, :refunds — each a vector of + entity IDs eligible for retraction." + [db order-ids] + (let [order-set (set order-ids) + charges (->> (d-api/q '[:find ?c + :in $ [?o ...] + :where [$ ?o :sales-order/charges ?c]] + db order-set) + (map second)) + refunds (->> (d-api/q '[:find ?r + :in $ [?o ...] + :where [$ ?o :sales-order/refunds ?r]] + db order-set) + (map second)) + line-items (->> (d-api/q '[:find ?li + :in $ [?c ...] + :where [$ ?c :charge/line-items ?li]] + db charges) + (map second))] + {:orders order-ids + :charges (vec charges) + :line-items (vec line-items) + :refunds (vec refunds)})) + +; -- transaction batching + +(defn- batch-transact + "Issue [:db/retractEntity ...] transactions in batches of BATCH-SIZE. + conn$ is a Datomic connection object. + entity-ids should be a seq of Long entity IDs." + [conn entity-ids] + (let [batches (partition-all BATCH-SIZE entity-ids) + _ (doseq [[idx batch] (map-indexed vector batches)] + (let [n (count batch) + txes (map (fn [eid] + [:db/retractEntity eid]) + batch)] + (println " batch" idx ":" n "retracts") + (when-not DRY-RUN? + @(d-api/transact conn txes))))] + :done)) + +(defn- retract-all-child-ids! + "Retract orders, charges, line-items and refunds from all entity-ID + maps produced by collect-child-ids. Logs progress every batch." + [conn child-entity-map] + (doseq [[type id-seq] child-entity-map] + (when (seq id-seq) + (println "retracting" type ":" (count id-seq) "ids") + (batch-transact conn id-seq)))) + +; -- month grouping + +(defn- group-orders-by-month + "Group sales order entity IDs by [year month] extracted from + :sales-order/day-value. Returns map {{y m} [eid ...]}." + [db order-ids] + (reduce (fn [acc eid] + (when-let [day-val (:sales-order/day-value + (d-api/entity db eid))] + (let [[y m _] (str/split (str day-val) #"-") + k [(Integer/parseInt y) + (Integer/parseInt m)]] + (update acc k conj eid)))) + {} + order-ids)) + +; -- S3 verification (uses amazonica + parquet module) + +(def ENTITY-TYPES ["sales-order" "charge" + "line-item" "sales-refund"]) + +(defn- s3-keys-for-date + "Build S3 parquet keys for all entity types on a given date." + [date-str] + (mapv #(pq/parquet-key % date-str) ENTITY-TYPES)) + +(defn- days-in-month + "Return seq of YYYY-MM-DD strings for all days in [year month]." + [year month] + (let [start (java.time.LocalDate/of year month 1) + first-of-next (.plusMonths start 1) + diff (.toEpochDay first-of-next) + start-day (.toEpochDay start)] + (for [d (range start-day diff)] + (.toString (java.time.LocalDate/ofEpochDay d))))) + +(defn- object-exists? + "Check if an S3 object exists via head-object (no download)." + [key] + (try + (s3/get-object {:bucket-name pq/*bucket* + :key key} + {:request-method :head}) + true + (catch com.amazonaws.services.s3.model.AmazonS3Exception _ + false))) + +(defn- verify-month-in-s3? + "Check that every day in [year month] has at least one backing + Parquet file on S3 across all entity types. + Returns a map {:ok bool :missing vec-of-dates}." + [year month] + (let [dates (days-in-month year month)] + (loop [[d & rest] dates + result []] + (if-not d + {:ok (empty? result) + :missing result} + (let [keys (s3-keys-for-date d) + found? (some object-exists? keys)] + (recur rest + (if found? + result + (conj result d)))))))) + +; -- public API: delete-by-month + +(defn- delete-by-month [conn client-entid year month] + "Retract all sales entities for a specific year+month. + Returns :ok on success, :skipped if S3 verification failed." + (println "=== deleting" year "-" month + "dry-run? =" DRY-RUN?) + (let [db (d-api/db conn) + all-ids (query-sales-order-ids db) + group (group-orders-by-month db all-ids) + target-keys (get group [year month] [])] + (if (zero? (count target-keys)) + (do (println " no orders found for" year "-" month) + :skipped) + (do + (let [child-maps (collect-child-ids db target-keys) + total-ids (->> child-maps vals + (reduce into []) + distinct + count)] + (println " " total-ids "total entities to retract") + (when-not DRY-RUN? + (retract-all-child-ids! conn child-maps))) + :ok)))) + +; -- public API: cleanup-all + +(defn cleanup-all [] + "Remove ALL sales-order, charge, line-item, sales-refund from + Datomic. Uses d-api/transact to issue [:db/retractEntity ...] for + each entity. Iterates over every month found in DB." + (let [db (d-api/db conn) + all-ids (query-sales-order-ids db) + group (group-orders-by-month db all-ids) + months (sort (keys group))] + (println "found" (count months) "months of data") + (doseq [[y m] months] + (delete-by-month conn nil y m)) + (println "cleanup-all complete"))) + +; -- public API: safe-cleanup-all + +(defn- collect-all-months [conn] + "Return sorted vec of [year month] pairs with sales orders in DB." + (let [db (d-api/db conn) + all-ids (query-sales-order-ids db) + grouped (group-orders-by-month db all-ids)] + (sort (keys grouped)))) + +(defn safe-cleanup-all [] + "Same as cleanup-all but verifies S3 data exists first. + Before deleting a month's entities, checks that parquet files + exist in auto-ap.storage.parquet bucket under prefix 'sales-details'." + (let [conn$ conn + months (collect-all-months conn)] + (println "=== safe-cleanup-all" + "months:" (count months) + "dry-run? =" DRY-RUN?) + (doseq [[y m] months] + (when-not DRY-RUN? + (let [result (verify-month-in-s3? y m) + missing (:missing result)] + (cond + (:ok result) + (do (println "verified" y "-" m "S3 OK, deleting...") + (delete-by-month conn$ nil y m)) + + (> (count missing) 0) + (do (println "ERROR" y "-" m "missing in S3:" + (str/join ", " missing)) + (throw + (ex-info + "Missing S3 data — aborting!" + {:year y :month m + :missing missing}))) + + :else + (println "SKIPPING" y "-" m "no parquet files"))))) + (println "safe-cleanup-all complete"))) diff --git a/src/clj/auto_ap/migration/sales_to_parquet.clj b/src/clj/auto_ap/migration/sales_to_parquet.clj new file mode 100644 index 00000000..00086faf --- /dev/null +++ b/src/clj/auto_ap/migration/sales_to_parquet.clj @@ -0,0 +1,230 @@ +(ns auto-ap.migration.sales-to-parquet + "Migrate historical sales data from Datomic to Parquet + S3. + + Groups records by business date and writes daily partitions. + Dead-letter records (missing dates) are written separately. + + Usage: + (migrate-all) ; full migration earliest → latest + (write-day-by-day \"2024-01-01\" \"2024-03-31\") ; date range + (write-dead-letter [flat]) ; write orphaned records" + (:require [auto-ap.datomic :refer [conn]] + [auto-ap.storage.parquet :as p] + [clojure.string :as str] + [datomic.api :as dc])) + +(defn- fetch-all-sales-order-ids [] + "Query Datomic for all sales-order external-ids (as entity IDs). + Returns a vector of entitity ids." + (->> (dc/q '[:find ?e + :where [?e :sales-order/external-id _]] + (dc/db conn)) + (map first) + vec)) + +(def ^:private sales-order-read + '[:sales-order/external-id + :sales-order/date + {:sales-order/client [:client/code :client/name]} + :sales-order/location + {:sales-order/vendor [:vendor/name]} + :sales-order/total + :sales-order/tax + :sales-order/tip + :sales-order/discount + :sales-order/service-charge + :sales-order/source + :sales-order/reference-link + {:sales-order/charges + [:charge/external-id + :charge/type-name + :charge/total + :charge/tax + :charge/tip + :charge/date + {:charge/processor [:db/ident]} + :charge/returns + {:charge/client [:client/code]}]} + {:sales-order/line-items + [:order-line-item/item-name + :order-line-item/category + :order-line-item/total + :order-line-item/tax + :order-line-item/discount + :order-line-item/unit-price + :order-line-item/quantity + :order-line-item/note]}]) + +(defn- pull-sales-order-data [eids] + "Batch pull full sales-order entities plus nested children." + (if (empty? eids) + [] + (dc/pull-many (dc/db conn) + sales-order-read + eids))) + +(defn- flatten-order-to-pieces! [order date-str flat] + "Flatten a pulled sales-order into :entity-type tagged maps. + Appends to the existing flat vector, which is returned." + (let [so-ext-id (:sales-order/external-id order) + so-date date-str + client-code (get-in order [:sales-order/client :client/code]) + vendor-name (get-in order [:sales-order/vendor :vendor/name]) + charges (:sales-order/charges order) + items (:sales-order/line-items order) + payment-methods (->> charges (map :charge/type-name) distinct (str/join ",")) + processors (->> charges (map #(get-in % [:charge/processor :db/ident])) (remove nil?) distinct (map name) (str/join ",")) + categories (->> items (map :order-line-item/category) (remove nil?) distinct (str/join ","))] + (vswap! flat conj + {:entity-type "sales-order" + :external-id (str so-ext-id) + :client-code client-code + :location (:sales-order/location order) + :vendor vendor-name + :total (:sales-order/total order) + :tax (:sales-order/tax order) + :tip (:sales-order/tip order) + :discount (:sales-order/discount order) + :service-charge (:sales-order/service-charge order) + :date so-date + :source (:sales-order/source order) + :reference-link (:sales-order/reference-link order) + :payment-methods payment-methods + :processors processors + :categories categories}) + (when-let [charges (:sales-order/charges order)] + (doseq [chg charges] + (vswap! flat conj + {:entity-type "charge" + :external-id (str (get chg :charge/external-id)) + :type-name (get chg :charge/type-name) + :total (get chg :charge/total) + :tax (get chg :charge/tax) + :tip (get chg :charge/tip) + :date so-date + :processor (get-in chg [:charge/processor :db/ident]) + :sales-order-external-id (str so-ext-id)}) + (when-let [returns (:charge/returns chg)] + (doseq [rt returns] + (vswap! flat conj + {:entity-type "sales-refund" + :type-name (get rt :type-name) + :total (get rt :total) + :sales-order-external-id (str so-ext-id)}))))) + (when-let [items (:sales-order/line-items order)] + (doseq [li items] + (vswap! flat conj + {:entity-type "line-item" + :item-name (get li :order-line-item/item-name) + :category (get li :order-line-item/category) + :total (get li :order-line-item/total) + :tax (get li :order-line-item/tax) + :discount (get li :order-line-item/discount) + :sales-order-external-id (str so-ext-id)}))))) + +(defn -fetch-order-ids-for-date + "Query Datomic for all sales-order eids on a given business date." + [db date-str] + (let [ld (java.time.LocalDate/parse date-str) + start (-> ld (.atStartOfDay (java.time.ZoneId/of "America/Los_Angeles")) .toInstant java.util.Date/from) + end (-> ld (.plusDays 1) (.atStartOfDay (java.time.ZoneId/of "America/Los_Angeles")) .toInstant java.util.Date/from)] + (->> (dc/q '[:find ?e + :in $ ?start ?end + :where [?e :sales-order/date ?d] + [(>= ?d ?start)] + [(< ?d ?end)]] + db start end) + (map first) + vec))) + +(defn write-day-by-day + ([start-date end-date] + (write-day-by-day start-date end-date {})) + ([start-date end-date opts] + (let [all-dates (set (or (opts :date-set) [])) + date-range (if (empty? all-dates) + (p/date-seq start-date end-date) + (filter all-dates + (p/date-seq start-date end-date))) + batch-size (or (opts :batch-size) 100)] + (doseq [^String day date-range] + (println "[migration] processing" day) + (let [eids (-fetch-order-ids-for-date (dc/db conn) day) + batches (partition-all batch-size eids)] + (doseq [batch batches] + (let [orders (pull-sales-order-data batch) + flat (volatile! [])] + (doseq [o orders] + (flatten-order-to-pieces! o day flat)) + (doseq [r @flat] + (p/buffer! (:entity-type r) r))))) + (doseq [etype ["sales-order" "charge" + "line-item" "sales-refund"]] + (p/flush-to-parquet! etype day)) + (println "[migration]" day "complete")) + {:status :completed :total-days (count date-range)}))) + +(defn- write-dead-letter + ([flat] + (write-dead-letter "dead" flat)) + ([prefix flat] + "Write records with missing dates to a parquet file." + (let [dead (filter #(nil? (:date %)) flat)] + (when (seq dead) + (doseq [r dead] + (p/buffer! + (str prefix "-" (:entity-type r)) + r)))))) + +(defn- flush-all-types [] + "Flush all entity-type buffers, tracking counts." + (let [etypes ["sales-order" "charge" + "line-item" "sales-refund"] + today (.toString (java.time.LocalDate/now)) + start (p/total-buf-count)] + (doseq [et etypes] + (try + (p/flush-to-parquet! et today) + (catch Exception e + (println "[migration/flush]" et "error:" (.getMessage e))))) + {:records-flush (- (p/total-buf-count) start)})) + +(defn- get-date-range [] + "Get the earliest and latest business dates from Datomic." + (let [dates (->> (dc/q '[:find ?d + :where [_ :sales-order/date ?d]] + (dc/db conn)) + (map first) + distinct + sort)] + [(when (seq dates) (.toString (first dates))) + (when (seq dates) (.toString (last dates)))])) + +(defn migrate-all [] + "Full migration from earliest to latest date: load unflushed, + fetch / buffer / flush day by day. Write dead-records for + sales orders with missing dates." + (println "[migration] starting full migration...") + (p/load-unflushed!) + (let [order-ids (fetch-all-sales-order-ids) + start-date (first (get-date-range)) + end-date (second (get-date-range))] + (if-not (seq order-ids) + (do + (println "[migration] no orders found") + :no-orders) + (try + ;; pull & buffer any orders missing a business date + (doseq [o (pull-sales-order-data order-ids) + :when (not (:sales-order/date o))] + (let [flat (volatile! [])] + (flatten-order-to-pieces! o "unknown" flat) + (doseq [r @flat] + (p/buffer! "dead" r)))) + (write-day-by-day start-date end-date {:batch-size 100}) + (flush-all-types) + (println "[migration] done") + :ok + (catch Exception e + (println "[migration/error]" (.getMessage e)) + e))))) diff --git a/src/clj/auto_ap/routes/ezcater_xls.clj b/src/clj/auto_ap/routes/ezcater_xls.clj index d9af6e75..c5a76022 100644 --- a/src/clj/auto_ap/routes/ezcater_xls.clj +++ b/src/clj/auto_ap/routes/ezcater_xls.clj @@ -1,6 +1,6 @@ (ns auto-ap.routes.ezcater-xls (:require - [auto-ap.datomic :refer [audit-transact conn]] + [auto-ap.datomic :refer [conn]] [auto-ap.logging :as alog] [clojure.data.json :as json] [auto-ap.parse.excel :as excel] @@ -12,6 +12,7 @@ [auto-ap.ssr.ui :refer [base-page]] [auto-ap.ssr.utils :refer [html-response]] [auto-ap.time :as atime] + [auto-ap.storage.parquet :as parquet] [bidi.bidi :as bidi] [clj-time.coerce :as coerce] [clojure.java.io :as io] @@ -54,54 +55,95 @@ event-date (some-> (excel/xls-date->date event-date) coerce/to-date-time atime/as-local-time - coerce/to-date )] - (cond (and event-date client-id location ) + coerce/to-date)] + (cond (and event-date client-id location) [:order #:sales-order - {:date event-date - :external-id (str "ezcater/order/" client-id "-" location "-" order-number) - :client client-id - :location location - :reference-link (str order-number) - :line-items [#:order-line-item - {:external-id (str "ezcater/order/" client-id "-" location "-" order-number "-" 0) - :item-name "EZCater Catering" - :category "EZCater Catering" - :discount (fmt-amount (or adjustments 0.0)) - :tax (fmt-amount tax) - :total (fmt-amount (+ food-total - tax))}] + {:date event-date + :external-id (str "ezcater/order/" client-id "-" location "-" order-number) + :client client-id + :location location + :reference-link (str order-number) + :line-items [#:order-line-item + {:external-id (str "ezcater/order/" client-id "-" location "-" order-number "-" 0) + :item-name "EZCater Catering" + :category "EZCater Catering" + :discount (fmt-amount (or adjustments 0.0)) + :tax (fmt-amount tax) + :total (fmt-amount (+ food-total + tax))}] - :charges [#:charge - {:type-name "CARD" - :date event-date - :client client-id - :location location - :external-id (str "ezcater/charge/" client-id "-" location "-" order-number "-" 0) - :processor :ccp-processor/ezcater - :total (fmt-amount (+ food-total - tax - tip)) - :tip (fmt-amount tip)}] - :total (fmt-amount (+ food-total - tax - (or adjustments 0.0))) - :discount (fmt-amount (or adjustments 0.0)) - :service-charge (fmt-amount (+ fee commission)) - :tax (fmt-amount tax) - :tip (fmt-amount tip) - :returns 0.0 - :vendor :vendor/ccp-ezcater}] + :charges [#:charge + {:type-name "CARD" + :date event-date + :client client-id + :location location + :external-id (str "ezcater/charge/" client-id "-" location "-" order-number "-" 0) + :processor :ccp-processor/ezcater + :total (fmt-amount (+ food-total + tax + tip)) + :tip (fmt-amount tip)}] + :total (fmt-amount (+ food-total + tax + (or adjustments 0.0))) + :discount (fmt-amount (or adjustments 0.0)) + :service-charge (fmt-amount (+ fee commission)) + :tax (fmt-amount tax) + :tip (fmt-amount tip) + :returns 0.0 + :vendor :vendor/ccp-ezcater}] - caterer-name - (do - (alog/warn ::missing-client - :order order-number - :store-name store-name - :caterer-name caterer-name) - [:missing caterer-name]) + caterer-name + (do + (alog/warn ::missing-client + :order order-number + :store-name store-name + :caterer-name caterer-name) + [:missing caterer-name]) - :else - nil))) + :else + nil))) + +(defn- flatten-order-to-parquet! [order] + "Flatten a sales-order into entity-type tagged maps and buffer to parquet." + (let [so-ext-id (:sales-order/external-id order) + so-date (some-> (:sales-order/date order) .toString) + client (:sales-order/client order) + client-code (if (map? client) (:client/code client) client)] + (parquet/buffer! "sales-order" + {:entity-type "sales-order" + :external-id so-ext-id + :client-code client-code + :location (:sales-order/location order) + :vendor (:sales-order/vendor order) + :total (:sales-order/total order) + :tax (:sales-order/tax order) + :tip (:sales-order/tip order) + :discount (:sales-order/discount order) + :service-charge (:sales-order/service-charge order) + :date so-date}) + (when-let [charges (:sales-order/charges order)] + (doseq [chg charges] + (parquet/buffer! "charge" + {:entity-type "charge" + :external-id (:charge/external-id chg) + :type-name (:charge/type-name chg) + :total (:charge/total chg) + :tax (:charge/tax chg) + :tip (:charge/tip chg) + :date so-date + :processor (some-> (:charge/processor chg) name) + :sales-order-external-id so-ext-id}))) + (when-let [items (:sales-order/line-items order)] + (doseq [li items] + (parquet/buffer! "line-item" + {:entity-type "line-item" + :item-name (:order-line-item/item-name li) + :category (:order-line-item/category li) + :total (:order-line-item/total li) + :tax (:order-line-item/tax li) + :discount (:order-line-item/discount li) + :sales-order-external-id so-ext-id}))))) (defn stream->sales-orders [s] (let [clients (map first (dc/q '[:find (pull ?c [:client/code @@ -116,7 +158,7 @@ object (str "/ezcater-xls/" (str (java.util.UUID/randomUUID)))] (mu/log ::writing-temp-xls :location object) - (s3/put-object {:bucket-name (:data-bucket env) + (s3/put-object {:bucket-name (:data-bucket env) :key object :input-stream s}) (into [] @@ -158,13 +200,13 @@ });")]])]) (defn upload-xls [{:keys [identity] :as request}] - + (let [file (or (get (:params request) :file) (get (:params request) "file"))] (mu/log ::uploading-file :file file) (with-open [s (io/input-stream (:tempfile file))] - (try + (try (let [parse-results (stream->sales-orders s) new-orders (->> parse-results (filter (comp #{:order} first)) @@ -172,9 +214,20 @@ missing-location (->> parse-results (filter (comp #{:missing} first)) - (map last))] - (audit-transact new-orders identity) - (html-response [:div (format "Successfully imported %d orders." (count new-orders)) + (map last)) + buffered-count (loop [orders new-orders + count 0] + (if-let [o (first orders)] + (do + (try + (flatten-order-to-parquet! o) + (catch Exception e + (alog/error ::buffer-failed + :exception e + :order (:sales-order/external-id o)))) + (recur (rest orders) (inc count))) + count))] + (html-response [:div (format "Successfully imported %d orders." buffered-count) (when (seq missing-location) [:div "Missing the following locations" [:ul.ul @@ -182,7 +235,7 @@ [:li ml])]])])) (catch Exception e (alog/error ::import-error - :error e) + :error e) (html-response [:div (.getMessage e)])))))) (defn page [{:keys [matched-route request-method] :as request}] diff --git a/src/clj/auto_ap/square/core3.clj b/src/clj/auto_ap/square/core3.clj index 2f192cd2..f2195398 100644 --- a/src/clj/auto_ap/square/core3.clj +++ b/src/clj/auto_ap/square/core3.clj @@ -3,6 +3,7 @@ [auto-ap.datomic :refer [conn remove-nils]] [auto-ap.logging :as log :refer [capture-context->lc with-context-as]] [auto-ap.time :as atime] + [auto-ap.storage.parquet :as parquet] [cemerick.url :as url] [clj-http.client :as client] [clj-time.coerce :as coerce] @@ -27,11 +28,9 @@ "Authorization" (str "Bearer " (:client/square-auth-token client)) "Content-Type" "application/json"})) - (defn ->square-date [d] (f/unparse (f/formatter "YYYY-MM-dd'T'HH:mm:ssZZ") d)) - (def manifold-api-stream (let [stream (s/stream 100)] (->> stream @@ -42,10 +41,10 @@ (de/loop [attempt 0] (-> (de/chain (de/future-with (ex/execute-pool) #_(log/info ::request-started - :url (:url request) - :attempt attempt - :source "Square 3" - :background-job "Square 3") + :url (:url request) + :attempt attempt + :source "Square 3" + :background-job "Square 3") (try (client/request (assoc request :socket-timeout 10000 @@ -104,7 +103,6 @@ :exception error)) [])))) - (def item-cache (atom {})) (defn fetch-catalog [client i v] @@ -124,13 +122,11 @@ #(do (swap! item-cache assoc i %) %)))) - (defn fetch-catalog-cache [client i version] (if (get @item-cache i) (de/success-deferred (get @item-cache i)) (fetch-catalog client i version))) - (defn item->category-name-impl [client item version] (capture-context->lc (cond (:item_id (:item_variation_data item)) @@ -161,7 +157,6 @@ :item item) "Uncategorized")))) - (defn item-id->category-name [client i version] (capture-context->lc (-> [client i] @@ -226,7 +221,6 @@ (concat (:orders result) continued-results)))) (:orders result))))))) - (defn search ([client location start end] (capture-context->lc @@ -250,11 +244,9 @@ (concat (:orders result) continued-results)))) (:orders result)))))))) - (defn amount->money [amt] (* 0.01 (or (:amount amt) 0.0))) - ;; to get totals: (comment (reduce @@ -280,7 +272,7 @@ :reference-link (str (url/url "https://squareup.com/receipt/preview" (:id t))) :external-id (when (:id t) (str "square/charge/" (:id t))) - :processor (cond + :processor (cond (#{"OTHER" "THIRD_PARTY_CARD"} (:type t)) (condp = (some-> (:note t) str/lower-case) "doordash" :ccp-processor/doordash @@ -353,7 +345,7 @@ #:sales-order {:date (if (= "Invoices" (:name (:source order))) (when (:closed_at order) - (coerce/to-date (time/to-time-zone (coerce/to-date-time (:closed_at order)) (time/time-zone-for-id "America/Los_Angeles")))) + (coerce/to-date (time/to-time-zone (coerce/to-date-time (:closed_at order)) (time/time-zone-for-id "America/Los_Angeles")))) (coerce/to-date (time/to-time-zone (coerce/to-date-time (:created_at order)) (time/time-zone-for-id "America/Los_Angeles")))) :client (:db/id client) :location (:square-location/client-location location) @@ -415,7 +407,6 @@ :client client :location location))))))) - (defn get-payment [client p] (de/chain (manifold-api-call {:url (str "https://connect.squareup.com/v2/payments/" p) @@ -424,7 +415,6 @@ :body :payment)) - (defn continue-payout-entry-list [c l poi cursor] (capture-context->lc lc (de/chain @@ -602,6 +592,57 @@ (s/buffer 5) (s/realize-each) (s/reduce conj [])))))) +(defn- flatten-order-to-parquet! [order] + "Flatten a sales-order into entity-type tagged maps and buffer to parquet. + Returns the sales-order external-id for logging." + (let [so-ext-id (:sales-order/external-id order) + so-date (some-> (:sales-order/date order) .toString) + client (:sales-order/client order) + client-code (when client (if (map? client) + (:client/code client) + client))] + (parquet/buffer! "sales-order" + {:entity-type "sales-order" + :external-id so-ext-id + :client-code (or client-code (:db/id client)) + :location (:sales-order/location order) + :vendor (:sales-order/vendor order) + :total (:sales-order/total order) + :tax (:sales-order/tax order) + :tip (:sales-order/tip order) + :discount (:sales-order/discount order) + :service-charge (:sales-order/service-charge order) + :date so-date}) + (when-let [charges (:sales-order/charges order)] + (doseq [chg charges] + (parquet/buffer! "charge" + {:entity-type "charge" + :external-id (:charge/external-id chg) + :type-name (:charge/type-name chg) + :total (:charge/total chg) + :tax (:charge/tax chg) + :tip (:charge/tip chg) + :date so-date + :processor (some-> (:charge/processor chg) name) + :sales-order-external-id so-ext-id}) + (when-let [returns (:charge/returns chg)] + (doseq [rt returns] + (parquet/buffer! "sales-refund" + {:entity-type "sales-refund" + :type-name (:type-name rt) + :total (:total rt) + :sales-order-external-id so-ext-id}))))) + (when-let [items (:sales-order/line-items order)] + (doseq [li items] + (parquet/buffer! "line-item" + {:entity-type "line-item" + :item-name (:order-line-item/item-name li) + :category (:order-line-item/category li) + :total (:order-line-item/total li) + :tax (:order-line-item/tax li) + :discount (:order-line-item/discount li) + :sales-order-external-id so-ext-id}))))) + (defn upsert ([client] (apply de/zip @@ -616,8 +657,13 @@ (doseq [x (partition-all 100 results)] (log/info ::loading-orders :count (count x)) - @(dc/transact-async conn x)))))))) - + (doseq [order x] + (try + (flatten-order-to-parquet! order) + (catch Exception e + (log/error ::buffer-failed + :exception e + :order (:sales-order/external-id order)))))))))))) (defn upsert-payouts ([client] @@ -667,7 +713,6 @@ (log/info ::done-loading-refunds))))))) - (defn get-cash-shift [client id] (de/chain (manifold-api-call {:url (str (url/url "https://connect.squareup.com/v2/cash-drawers/shifts" id)) :method :get @@ -826,8 +871,6 @@ d1 d2)) - - (defn remove-voided-orders ([client] (apply de/zip @@ -854,7 +897,7 @@ (:sales-order/external-id o)))))) (s/map (fn [[o]] [[:db/retractEntity [:sales-order/external-id (:sales-order/external-id o)]]])) - + (s/reduce into []))) (fn [results] @@ -862,32 +905,28 @@ (doseq [x (partition-all 100 results)] (log/info ::removing-orders :count (count x)) - @(dc/transact-async conn x))))) + @(dc/transact-async conn x) (de/catch (fn [e] (log/warn ::couldnt-remove :error e) - nil) )))))) + nil))))))))))) -#_(comment - (require 'auto-ap.time-reader) +#_(comment + (require 'auto-ap.time-reader) - @(let [[c [l]] (get-square-client-and-location "DBFS") ] - (log/peek :x [ c l]) - (search c l #clj-time/date-time "2026-03-28" #clj-time/date-time "2026-03-29") + @(let [[c [l]] (get-square-client-and-location "DBFS")] + (log/peek :x [c l]) + (search c l #clj-time/date-time "2026-03-28" #clj-time/date-time "2026-03-29")) - ) + @(let [[c [l]] (get-square-client-and-location "NGAK")] + (log/peek :x [c l]) - @(let [[c [l]] (get-square-client-and-location "NGAK") ] - (log/peek :x [ c l]) - - (remove-voided-orders c l #clj-time/date-time "2024-04-11" #clj-time/date-time "2024-04-15")) - (doseq [c (get-square-clients)] - (try - @(remove-voided-orders c) - (catch Exception e - nil))) - - - ) + (remove-voided-orders c l #clj-time/date-time "2024-04-11" #clj-time/date-time "2024-04-15")) + (doseq [c (get-square-clients)] + (try + @(remove-voided-orders c) + (catch Exception e + nil))) + ) (defn upsert-all [& clients] (capture-context->lc @@ -956,8 +995,6 @@ [:clients clients] @(apply upsert-all clients))) - - (comment (defn refunds-raw-cont ([client l cursor so-far] @@ -987,9 +1024,8 @@ (->> @(let [[c [l]] (get-square-client-and-location "NGGG")] - (search c l (time/now) (time/plus (time/now) (time/days -1)))) - + (filter (fn [r] (str/starts-with? (:created_at r) "2024-03-14")))) @@ -997,7 +1033,6 @@ (->> @(let [[c [l]] (get-square-client-and-location "NGGG")] - (refunds-raw-cont c l nil [])) (filter (fn [r] (str/starts-with? (:created_at r) "2024-03-14"))))) @@ -1031,13 +1066,8 @@ []))] [(:client/code c) (atime/unparse-local (clj-time.coerce/to-date-time (:sales-order/date bad-row)) atime/normal-date) (:sales-order/total bad-row) (:sales-order/tax bad-row) (:sales-order/tip bad-row) (:db/id bad-row)]) :separator \tab) - - - - - ;; => - +;; => (require 'auto-ap.time-reader) @@ -1046,27 +1076,16 @@ (clojure.pprint/pprint (let [[c [l]] (get-square-client-and-location "NGVT")] l - (def z @(search c l #clj-time/date-time "2025-02-23T00:00:00-08:00" #clj-time/date-time "2025-02-28T00:00:00-08:00")) - (take 10 (map #(first (deref (order->sales-order c l %))) z))) + (take 10 (map #(first (deref (order->sales-order c l %))) z)))) - - ) - - - - - (->> z + (->> z (filter (fn [o] (seq (filter (comp #{"OTHER"} :type) (:tenders o))))) (filter #(not (:name (:source %)))) - (count) - - ) - - - + (count)) + (doseq [[code] (seq (dc/q '[:find ?code :in $ :where [?o :sales-order/date ?d] @@ -1075,32 +1094,22 @@ [?o :sales-order/client ?c] [?c :client/code ?code]] (dc/db conn))) - :let [[c [l]] (get-square-client-and-location code) - ] + :let [[c [l]] (get-square-client-and-location code)] order @(search c l #clj-time/date-time "2026-01-01T00:00:00-08:00" (time/now)) - :when (= "Invoices" (:name (:source order) )) + :when (= "Invoices" (:name (:source order))) :let [[sales-order] @(order->sales-order c l order)]] - + (when (should-import-order? order) (println "DATE IS" (:sales-order/date sales-order)) (when (some-> (:sales-order/date sales-order) coerce/to-date-time (time/after? #clj-time/date-time "2026-2-16T00:00:00-08:00")) (println "WOULD UPDATE" sales-order) - @(dc/transact auto-ap.datomic/conn [sales-order]) - ) - #_@(dc/transact ) - (println "DONE")) - - - ) + @(dc/transact auto-ap.datomic/conn [sales-order])) + #_@(dc/transact) + (println "DONE"))) #_(filter (comp #{"OTHER"} :type) (mapcat :tenders z)) - @(let [[c [l]] (get-square-client-and-location "NGRY")] #_(search c l (clj-time.coerce/from-date #inst "2025-02-28") (clj-time.coerce/from-date #inst "2025-03-01")) - (order->sales-order c l (:order (get-order c l "KdvwntmfMNTKBu8NOocbxatOs18YY" ))) - - ) - - ) + (order->sales-order c l (:order (get-order c l "KdvwntmfMNTKBu8NOocbxatOs18YY"))))) diff --git a/src/clj/auto_ap/ssr/payments.clj b/src/clj/auto_ap/ssr/payments.clj index a7ed513c..669a1a75 100644 --- a/src/clj/auto_ap/ssr/payments.clj +++ b/src/clj/auto_ap/ssr/payments.clj @@ -104,19 +104,18 @@ :size :small})]) (com/field {:label "Payment Type"} (com/radio-card {:size :small - :name "payment-type" - :value (:payment-type (:query-params request)) - :options [{:value "" - :content "All"} - {:value "cash" - :content "Cash"} - {:value "check" - :content "Check"} - {:value "debit" - :content "Debit"}]})) + :name "payment-type" + :value (:payment-type (:query-params request)) + :options [{:value "" + :content "All"} + {:value "cash" + :content "Cash"} + {:value "check" + :content "Check"} + {:value "debit" + :content "Debit"}]})) (exact-match-id* request)]]) - (def default-read '[* [:payment/date :xform clj-time.coerce/from-date] {:invoice-payment/_payment [* {:invoice-payment/invoice [*]}]} @@ -212,7 +211,6 @@ '[(iol-ion.query/dollars= ?transaction-amount ?amount)]]} :args [(:amount query-params)]}) - (:status route-params) (merge-query {:query {:in ['?status] :where ['[?e :payment/status ?status]]} @@ -243,30 +241,30 @@ refunds)) (defn sum-visible-pending [ids] - (->> - (dc/q {:find ['?id '?o] - :in ['$ '[?id ...]] - :where ['[?id :payment/amount ?o] - '[?id :payment/status :payment-status/pending]]} - (dc/db conn) - ids) + (->> + (dc/q {:find ['?id '?o] + :in ['$ '[?id ...]] + :where ['[?id :payment/amount ?o] + '[?id :payment/status :payment-status/pending]]} + (dc/db conn) + ids) (map last) (reduce + 0.0))) (defn sum-client-pending [clients] - (->> - (dc/q {:find '[?e ?a] - :in '[$ [?clients ?start ?end]] - :where '[[(iol-ion.query/scan-payments $ ?clients ?start ?end) [[?e _ ?sort-default] ...]] - [?e :payment/status :payment-status/pending] - [?e :payment/amount ?a]]} - (dc/db conn) - [clients - nil - nil]) - + (->> + (dc/q {:find '[?e ?a] + :in '[$ [?clients ?start ?end]] + :where '[[(iol-ion.query/scan-payments $ ?clients ?start ?end) [[?e _ ?sort-default] ...]] + [?e :payment/status :payment-status/pending] + [?e :payment/amount ?a]]} + (dc/db conn) + [clients + nil + nil]) + (map last) (reduce + @@ -277,16 +275,14 @@ {ids-to-retrieve :ids matching-count :count all-ids :all-ids} (fetch-ids db request)] - [(->> (hydrate-results ids-to-retrieve db request)) matching-count (sum-visible-pending all-ids) (sum-client-pending (extract-client-ids (:clients request) - (:client request) - (:client-id (:query-params request)) - (when (:client-code (:query-params request)) - [:client/code (:client-code (:query-params request))]))) - ])) + (:client request) + (:client-id (:query-params request)) + (when (:client-code (:query-params request)) + [:client/code (:client-code (:query-params request))])))])) (def query-schema (mc/schema [:maybe [:map {:date-range [:date-range :start-date :end-date]} @@ -327,7 +323,7 @@ (assoc-in (exact-match-id* request) [1 :hx-swap-oob] true)]) :query-schema query-schema :action-buttons (fn [request] - (let [[_ _ visible-in-float total-in-float ] (:page-results request)] + (let [[_ _ visible-in-float total-in-float] (:page-results request)] [(com/pill {:color :primary} " Visible in float " (format "$%,.2f" visible-in-float)) (com/pill {:color :secondary} " Total in float " @@ -354,7 +350,7 @@ (= (-> request :query-params :sort first :name) "Bank account") (-> entity :payment/bank-account :bank-account/name) - + :else nil)) :title (fn [r] (str @@ -409,7 +405,7 @@ :render (fn [{:payment/keys [date]}] (some-> date (atime/unparse-local atime/normal-date)))} {:key "amount" - :sort-key "amount" + :sort-key "amount" :name "Amount" :render (fn [{:payment/keys [amount]}] (some->> amount (format "$%.2f")))} @@ -421,10 +417,10 @@ (map :invoice-payment/invoice) (filter identity) (map (fn [invoice] - {:link (hu/url (bidi/path-for ssr-routes/only-routes - ::invoice-route/all-page) - {:exact-match-id (:db/id invoice)}) - :content (str "Inv. " (:invoice/invoice-number invoice))}))) + {:link (hu/url (bidi/path-for ssr-routes/only-routes + ::invoice-route/all-page) + {:exact-match-id (:db/id invoice)}) + :content (str "Inv. " (:invoice/invoice-number invoice))}))) (some-> p :transaction/_payment ((fn [t] [{:link (hu/url (bidi/path-for client-routes/routes :transactions) @@ -434,8 +430,6 @@ (def row* (partial helper/row* grid-page)) - - (comment (mc/decode query-schema {"exact-match-id" "123"} (mt/transformer main-transformer mt/strip-extra-keys-transformer)) (mc/decode query-schema {} (mt/transformer main-transformer mt/strip-extra-keys-transformer)) @@ -445,7 +439,6 @@ (mc/decode query-schema {"payment-type" "food"} (mt/transformer main-transformer mt/strip-extra-keys-transformer)) (mc/decode query-schema {"vendor" "87"} (mt/transformer main-transformer mt/strip-extra-keys-transformer)) - (mc/decode query-schema {"start-date" #inst "2023-12-21T08:00:00.000-00:00"} (mt/transformer main-transformer mt/strip-extra-keys-transformer))) (defn delete [{check :entity :as request identity :identity}] @@ -459,7 +452,7 @@ #(assert-can-see-client identity (:db/id (:payment/client check)))) (notify-if-locked (:db/id (:payment/client check)) (:payment/date check)) - (let [ removing-payments (mapcat (fn [x] + (let [removing-payments (mapcat (fn [x] (let [invoice (:invoice-payment/invoice x) new-balance (+ (:invoice/outstanding-balance invoice) (:invoice-payment/amount x))] @@ -475,9 +468,9 @@ :payment/status :payment-status/voided}] (audit-transact (cond-> removing-payments true (conj updated-payment) - (:transaction/_payment check) (conj [:db/retract (:db/id (first (:transaction/_payment check))) + (:transaction/_payment check) (conj [:db/retract (:db/id (first (:transaction/_payment check))) :transaction/payment - (:db/id check)])) + (:db/id check)])) identity) (html-response (row* (:identity request) updated-payment {:delete-after-settle? true :class "live-removed" @@ -578,7 +571,6 @@ (assoc-in [:query-params :start] 0) (assoc-in [:query-params :per-page] 250)))) - :else selected) updated-count (void-payments-internal ids (:identity request))] @@ -591,7 +583,7 @@ (defn wrap-status-from-source [handler] (fn [{:keys [matched-current-page-route] :as request}] - (let [ request (cond-> request + (let [request (cond-> request (= ::route/cleared-page matched-current-page-route) (assoc-in [:route-params :status] :payment-status/cleared) (= ::route/pending-page matched-current-page-route) (assoc-in [:route-params :status] :payment-status/pending) (= ::route/voided-page matched-current-page-route) (assoc-in [:route-params :status] :payment-status/voided) @@ -605,7 +597,7 @@ ::route/pending-page (-> (helper/page-route grid-page) (wrap-implied-route-param :status :payment-status/pending)) ::route/voided-page (-> (helper/page-route grid-page) - (wrap-implied-route-param :status :payment-status/voided)) + (wrap-implied-route-param :status :payment-status/voided)) ::route/all-page (-> (helper/page-route grid-page) (wrap-implied-route-param :status nil)) @@ -618,7 +610,6 @@ ::route/bulk-delete (-> bulk-delete-dialog (wrap-admin)) - ::route/table (helper/table-route grid-page)} (fn [h] (-> h diff --git a/src/clj/auto_ap/ssr/pos/sales_orders.clj b/src/clj/auto_ap/ssr/pos/sales_orders.clj index 8fb5ec31..d5473b33 100644 --- a/src/clj/auto_ap/ssr/pos/sales_orders.clj +++ b/src/clj/auto_ap/ssr/pos/sales_orders.clj @@ -1,7 +1,7 @@ (ns auto-ap.ssr.pos.sales-orders (:require [auto-ap.datomic - :refer [add-sorter-fields apply-pagination apply-sort-3 conn merge-query + :refer [add-sorter-fields apply-pagination apply-sort-3 merge-query pull-many query2]] [auto-ap.datomic.sales-orders :as d-sales] [auto-ap.query-params :as query-params :refer [wrap-copy-qp-pqp]] @@ -17,7 +17,6 @@ [auto-ap.time :as atime] [bidi.bidi :as bidi] [clj-time.coerce :as c] - [datomic.api :as dc] [malli.core :as mc])) (def query-schema (mc/schema @@ -172,11 +171,8 @@ charges)) (defn fetch-page [request] - (let [db (dc/db conn) - {ids-to-retrieve :ids matching-count :count} (fetch-ids db request)] - - [(->> (hydrate-results ids-to-retrieve db request)) - matching-count])) + (let [{:keys [rows count]} (d-sales/fetch-page-ssr request)] + [rows count])) (def grid-page @@ -200,13 +196,13 @@ :title "Sales orders" :entity-name "Sales orders" :route :pos-sales-table - :action-buttons (fn [request] - (let [{:keys [total tax]} (d-sales/summarize-orders (:ids (fetch-ids (dc/db conn) request)))] - (when (and total tax) - [(com/pill {:color :primary} - (format "Total $%.2f" total)) - (com/pill {:color :secondary} - (format "Tax $%.2f" tax))]))) + :action-buttons (fn [request] + (let [{:keys [total tax]} (d-sales/summarize-page-ssr request)] + (when (and total tax) + [(com/pill {:color :primary} + (format "Total $%.2f" total)) + (com/pill {:color :secondary} + (format "Tax $%.2f" tax))]))) :row-buttons (fn [_ e] (when (:sales-order/reference-link e) [(com/a-icon-button {:href (:sales-order/reference-link e)} diff --git a/src/clj/auto_ap/storage/parquet.clj b/src/clj/auto_ap/storage/parquet.clj new file mode 100644 index 00000000..121a97e7 --- /dev/null +++ b/src/clj/auto_ap/storage/parquet.clj @@ -0,0 +1,432 @@ +(ns auto-ap.storage.parquet + (:require [config.core :refer [env]] + [amazonica.aws.s3 :as s3] + [clojure.java.io :as io] + [clojure.string :as str] + [clojure.data.json :as json] + [clojure.core.cache :as cache] + [com.brunobonacci.mulog :as mu]) + (:import (java.sql DriverManager) + (java.time LocalDate))) + +(def ^:dynamic *bucket* (:data-bucket env)) +(def parquet-prefix "sales-details") + +(defn s3-location [filename] + (str "s3://" *bucket* "/" filename)) + +(defn parquet-key [entity-type date-str] + (let [month-str (if (= (count date-str) 10) + (subs date-str 0 7) + date-str)] + (str parquet-prefix "/" entity-type "/" month-str ".parquet"))) + +(def db (atom nil)) + +(defn connect! [] + (let [conn (DriverManager/getConnection "jdbc:duckdb:") + stmt (.createStatement conn)] + (.execute stmt "INSTALL httpfs; LOAD httpfs;") + (when-let [key (:aws-access-key-id env)] + (.execute stmt (str "SET s3_access_key_id='" key "'")) + (.execute stmt (str "SET s3_secret_access_key='" (:aws-secret-access-key env) "'")) + (.execute stmt (str "SET s3_region='" (or (:aws-region env) "us-east-1") "'"))) + (.execute stmt "PRAGMA enable_object_cache") + (.execute stmt "SET temp_directory='/tmp/duckdb-temp'") + (.execute stmt "SET memory_limit='2GB'") + (.close stmt) + (.addShutdownHook (Runtime/getRuntime) + (Thread. #(when-let [c @db] (.close ^java.sql.Connection c)))) + (reset! db conn))) + +(defn disconnect! [] + (locking db + (when-let [c @db] + (.close c) + (reset! db nil)))) + +(defmacro with-duckdb + [& body] + `(let [conn# (or @db (connect!))] + (try + (let [~'conn conn#] + ~@body) + (finally + (when (and (not @db) conn#) + (.close conn#)))))) + +(defn execute! [sql] + (with-duckdb + (let [stmt (.createStatement conn)] + (.execute stmt sql) + nil))) + +(defn- sql-snippet [sql] (subs sql 0 (min (count sql) 500))) + +(defn query-scalar [sql] + (mu/trace ::query-scalar + [:sql (sql-snippet sql)] + (with-duckdb + (let [stmt (.createStatement conn) + rs (.executeQuery stmt sql)] + (when (.next rs) + (.getObject rs 1)))))) + +(def ^:private count-cache + (atom (-> (cache/ttl-cache-factory {} :ttl 1800000) + (cache/lru-cache-factory :threshold 256)))) + +(defn- cached-count [sql] + (if-let [v (find @count-cache sql)] + (do (mu/log ::count-cache :hit true :sql (sql-snippet sql)) (val v)) + (do (mu/log ::count-cache :hit false :sql (sql-snippet sql)) + (let [result (query-scalar sql)] + (swap! count-cache assoc sql result) + result)))) + +(defn query-rows [sql] + (mu/trace ::query-rows + [:sql (sql-snippet sql)] + (with-duckdb + (let [stmt (.createStatement conn) + rs (.executeQuery stmt sql) + meta (.getMetaData rs) + col-count (.getColumnCount meta) + cols (vec (for [i (range 1 (inc col-count))] + (keyword (.getColumnLabel meta i))))] + (loop [rows []] + (if (.next rs) + (recur (conj rows + (zipmap cols + (vec (for [i (range 1 (inc col-count))] + (.getObject rs i)))))) + rows)))))) + +(defn execute-to-parquet! [sql ^String parquet-path] + (with-duckdb + (let [stmt (.createStatement conn)] + (.execute stmt + (format "COPY (%s) TO '%s' (FORMAT PARQUET, OVERWRITE_OR_IGNORE, ROW_GROUP_SIZE 10000, COMPRESSION 'zstd')" + sql parquet-path)) + (io/file parquet-path)))) + +(defn upload-parquet! [local-parquet-file s3-key] + (s3/put-object {:bucket-name *bucket* + :key s3-key + :file local-parquet-file}) + (s3-location s3-key)) + +(defonce *buffers* (atom {})) + +(defn- wal-dir [] + (io/file (System/getProperty "user.dir" "/tmp") + "parquet-wal")) + +(defn- init-wal! [] + (let [dir (wal-dir)] + (when-not (.exists dir) + (.mkdirs dir)))) + +(defn buffer! [entity-type record] + (init-wal!) + (let [seq-no (System/currentTimeMillis) + entry (assoc record :_seq-no seq-no)] + (swap! *buffers* update entity-type (fnil conj []) entry) + (try + (let [wal-file (io/file (wal-dir) + (str entity-type ".jsonl"))] + (io/make-parents wal-file) + (with-open [w (io/writer wal-file :append true)] + (.write w ^String (json/write-str {:seq-no seq-no + :record record})) + (.write w (int \newline)))) + (catch Exception e + (println "[parquet/wal]" (.getMessage e)))) + entry)) + +(defn clear-buffer! [entity-type] + (swap! *buffers* dissoc entity-type)) + +(defn buffer-count [entity-type] + (-> @*buffers* (get entity-type []) count)) + +(defn total-buf-count [] + (->> @*buffers* + vals (mapcat identity) count)) + +(defn flush-to-parquet! [entity-type date-str] + "Flush buffered records for entity-type to monthly parquet + S3. + Reads existing monthly file (if any), merges with new records, and uploads. + Uses temp table to ensure ROW_GROUP_SIZE is respected (DuckDB ignores it + when reading directly from S3 via COPY)." + (let [records (get @*buffers* entity-type [])] + (if (empty? records) + {:status :no-records} + (let [date-str (or date-str (.toString (LocalDate/now))) + s3-key (parquet-key entity-type date-str) + s3-url (s3-location s3-key) + jsonl-file (io/file "/tmp" + (str entity-type "-" date-str ".jsonl")) + parquet-file (io/file "/tmp" + (str entity-type "-" date-str ".parquet")) + tbl (format "\"_flush_%s_%s\"" + (clojure.string/replace entity-type "-" "_") + (subs date-str 0 7))] + (try + (with-open [w (io/writer jsonl-file :append true)] + (doseq [r records] + (.write w ^String (json/write-str (dissoc r :_seq-no))) + (.write w (int \newline)))) + (let [existing-sql (format + "SELECT * FROM read_parquet('%s', union_by_name=true)" + s3-url) + new-sql (format + "SELECT * FROM read_json_auto('%s')" + (.getAbsolutePath jsonl-file))] + (execute! (format "CREATE OR REPLACE TABLE %s AS SELECT * FROM (%s UNION ALL %s) ORDER BY \"client-code\", date" + tbl existing-sql new-sql)) + (execute! (format "COPY (SELECT * FROM %s) TO '%s' (FORMAT PARQUET, OVERWRITE_OR_IGNORE, ROW_GROUP_SIZE 10000, COMPRESSION 'zstd')" + tbl (.getAbsolutePath parquet-file))) + (execute! (format "DROP TABLE IF EXISTS %s" tbl)) + (upload-parquet! parquet-file s3-key) + (clear-buffer! entity-type) + (.delete ^java.io.File jsonl-file) + (.delete ^java.io.File parquet-file) + {:key s3-key :status :ok}) + (catch Exception e + (execute! (format "DROP TABLE IF EXISTS %s" tbl)) + (throw (ex-info "Flush failed" + {:entity-type entity-type + :error (.getMessage e)})))))))) + +(defn flush-by-date! [] + "Flush all entity types for today." + (let [etypes ["sales-order" "charge" + "line-item" "sales-refund"] + today (.toString (LocalDate/now)) + flushed (into #{} + (keep (fn [et] + (let [{:keys [status]} + (flush-to-parquet! et today)] + (when (= status :ok) + et)))) + etypes)] + {:flushed flushed})) + +(defn load-unflushed! [] + "Restore unflushed records from WAL jsonl files into *buffers." + (init-wal!) + (let [etypes ["sales-order" "charge" + "line-item" "sales-refund"] + loaded (reduce-kv + (fn [acc et data] + (if-not (empty? data) + (assoc acc et + (->> (str/split-lines data) + (keep #(try + (let [entry (json/read-str %)] + (when entry + (assoc (:record entry) :_seq-no (:seq-no entry)))) + (catch Exception _))))) + acc)) + {} + (into {} + (keep (fn [et] + (let [f (io/file + (wal-dir) + (str et ".jsonl"))] + (when (.exists f) + [et (slurp f)])))) + etypes))] + (swap! *buffers* merge loaded))) + +(defn get-unflushed-count [] + (total-buf-count)) + +(defn unflushed-records? [] + (not= 0 (total-buf-count))) + +;;; DuckDB Read Layer + +(defn date-seq [start end] + "Seq of YYYY-MM-DD strings between start and end inclusive." + (let [sd (LocalDate/parse start) + ed (LocalDate/parse end)] + (when (.isAfter sd ed) + (throw (ex-info "date-seq: start must be <= end" {:start start :end end}))) + (let [days (int (- (.toEpochDay ed) + (.toEpochDay sd)))] + (for [i (range 0 (inc days))] + (.toString (.plusDays sd i)))))) + +(defn today [] + (.toString (LocalDate/now))) + +(def ^:private mm-dd-yyyy (java.time.format.DateTimeFormatter/ofPattern "MM/dd/yyyy")) + +(defn- normalize-date-str [s] + (when s + (if (re-find #"^\d{2}/\d{2}/\d{4}" s) + (.toString (LocalDate/parse s mm-dd-yyyy)) + (if (> (count s) 10) (subs s 0 10) s)))) + +(defn month-seq [start-date end-date] + "Seq of YYYY-MM strings between start-date and end-date inclusive." + (let [sd (LocalDate/parse (normalize-date-str start-date)) + ed (LocalDate/parse (normalize-date-str end-date))] + (loop [months [] cur sd] + (if (.isAfter cur ed) + months + (recur (conj months (.toString (.withDayOfMonth cur 1))) + (.plusMonths cur 1)))))) + +(defn- parquet-glob [entity-type start-date end-date] + "Build explicit file list for the date range using monthly partitions. + Monthly files mean only 1-3 files for typical queries, 12 for a full year." + (let [prefix (format "s3://%s/sales-details/%s/" *bucket* entity-type)] + (vec + (map (fn [m] + (format "'%s%s.parquet'" prefix m)) + (month-seq start-date end-date))))) + +(defn parquet-query [entity-type start-date end-date] + "Build SQL to read monthly parquet files in date range. + Uses explicit file list (monthly = few files) + WHERE date filter. + Normalizes date formats (handles MM/dd/yyyy from UI)." + (let [sd (normalize-date-str start-date) + ed (normalize-date-str end-date) + files (parquet-glob entity-type sd ed) + base (format "SELECT * FROM read_parquet([%s], union_by_name=true)" + (str/join ", " files)) + sql (format "%s WHERE date >= '%s' AND date <= '%s'" + base sd ed)] + {:sql sql + :count-sql (format "SELECT COUNT(*) FROM (%s) t" sql)})) + +(defn- like-clause [col v] + (str "\"" col "\" LIKE '%" v "%'")) + +(defn- build-sales-orders-where [opts] + (let [eq-clauses (keep + (fn [[key col]] + (let [v (get opts key)] + (when v + (str "\"" col "\" = '" v "'")))) + [[:client "client-code"] + [:vendor "vendor"] + [:location "location"]]) + in-clauses (keep + (fn [[key col]] + (let [vs (get opts key)] + (when (seq vs) + (str "\"" col "\" IN (" + (str/join ", " (map #(str "'" % "'") vs)) + ")")))) + [[:client-codes "client-code"]]) + like-clauses (keep + (fn [[key col]] + (let [v (get opts key)] + (when v + (like-clause col v)))) + [[:payment-method "payment-methods"] + [:processor "processors"] + [:category "categories"]]) + range-clauses (keep + (fn [[key col op]] + (let [v (get opts key)] + (when v + (str "\"" col "\" " op " " v)))) + [[:total-gte "total" ">="] + [:total-lte "total" "<="]]) + all-clauses (concat eq-clauses in-clauses like-clauses range-clauses)] + (when (seq all-clauses) + (str/join " AND " all-clauses)))) + +(defn get-sales-orders + ([start-date end-date] + (get-sales-orders start-date end-date {})) + ([start-date end-date opts] + (mu/trace ::get-sales-orders + [:start-date start-date :end-date end-date :opts opts] + (try + (let [q (parquet-query "sales-order" start-date end-date) + base-sql (:sql q) + has-where? (str/includes? base-sql " WHERE ") + sort (get opts :sort "date") + order (get opts :order "DESC") + limit (get opts :limit) + offset (get opts :offset) + extra-clauses (build-sales-orders-where opts) + full-sql (if extra-clauses + (str base-sql (if has-where? " AND " " WHERE ") extra-clauses) + base-sql) + data-sql (cond-> full-sql + sort (str " ORDER BY " sort " " (name order)) + limit (str " LIMIT " limit) + offset (str " OFFSET " offset)) + count-sql (format "SELECT COUNT(*) FROM (%s) t" full-sql)] + (mu/log ::get-sales-orders :data-sql data-sql :count-sql count-sql) + (let [cnt (cached-count count-sql) + rows (query-rows data-sql)] + {:rows rows + :count (or (int cnt) 0)})) + (catch Exception e + (mu/log ::get-sales-orders :error e :start-date start-date :end-date end-date :opts opts) + (throw e)))))) + +(def ^:private summary-cache + (atom (-> (cache/ttl-cache-factory {} :ttl 1800000) + (cache/lru-cache-factory :threshold 256)))) + +(defn- cached-summary [sql] + (if-let [v (find @summary-cache sql)] + (do (mu/log ::summary-cache :hit true :sql (sql-snippet sql)) v) + (do (mu/log ::summary-cache :hit false :sql (sql-snippet sql)) + (let [result (let [row (first (query-rows sql))] + {:total (or (:total row) 0.0) + :tax (or (:tax row) 0.0)})] + (swap! summary-cache assoc sql result) + result)))) + +(defn get-sales-orders-summary + ([start-date end-date] + (get-sales-orders-summary start-date end-date {})) + ([start-date end-date opts] + (mu/trace ::get-sales-orders-summary + [:start-date start-date :end-date end-date :opts opts] + (try + (let [q (parquet-query "sales-order" start-date end-date) + base-sql (:sql q) + has-where? (str/includes? base-sql " WHERE ") + extra-clauses (build-sales-orders-where opts) + full-sql (if extra-clauses + (str base-sql (if has-where? " AND " " WHERE ") extra-clauses) + base-sql) + sum-sql (format "SELECT COALESCE(SUM(total), 0) as total, COALESCE(SUM(tax), 0) as tax FROM (%s) t" full-sql)] + (mu/log ::get-sales-orders-summary :sum-sql sum-sql) + (cached-summary sum-sql)) + (catch Exception e + (mu/log ::get-sales-orders-summary :error e :start-date start-date :end-date end-date :opts opts) + (throw e)))))) + +(defn query-deduped [entity-type start-date end-date] + "Query records deduplicated by external-id (latest _seq_no wins)." + (let [q (parquet-query entity-type start-date end-date)] + (query-rows + (str (:sql q) + " QUALIFY ROW_NUMBER() OVER" + " (PARTITION BY \"external-id\"" + " ORDER BY _seq_no DESC) = 1")))) + +(defn query-by-entity-id [entity-type external-id + start-date end-date] + (->> (query-deduped entity-type start-date end-date) + (filter #(= (:external_id %) + (name external-id))) + first)) + +(defn count-records-in-parquet + [entity-type start-date end-date] + (let [q (parquet-query entity-type + start-date end-date)] + (or (int (query-scalar (:count-sql q))) 0))) diff --git a/src/clj/auto_ap/storage/sales_summaries.clj b/src/clj/auto_ap/storage/sales_summaries.clj new file mode 100644 index 00000000..d927ad42 --- /dev/null +++ b/src/clj/auto_ap/storage/sales_summaries.clj @@ -0,0 +1,184 @@ +(ns auto-ap.storage.sales-summaries + "Aggregation functions querying Parquet files on S3 via DuckDB. + Entity types: sales-order | charge | line-item | sales-refund + S3 pattern: s3:///sales-details//.parquet" + (:require [auto-ap.storage.parquet :as p] + [clojure.string :as str])) + +(defn- dq [name] + (str "\"" name "\"")) + +(defn- sum-dbl [val] + (try + (if val (double val) 0.0) + (catch Exception _e + 0.0))) + +(defn- pq-files [entity-type start-date end-date] + "Vector of S3 parquet file paths for date range (monthly partitions)." + (let [months (p/month-seq start-date end-date)] + (vec + (map #(str "'s3://" p/*bucket* + "/sales-details/" entity-type "/" + % ".parquet") months)))) + +(defn sum-payments-by-type [client-id start-date end-date] + "Return {processor-key -> {type-name-string -> total-double}}." + (let [files (pq-files "charge" start-date end-date)] + (try + (let [sql (str "SELECT " + (dq "processor") + " AS proc, " + (dq "type-name") + " AS type_name, " + "SUM(" + (dq "total") + ")::DOUBLE AS total_amount " + "FROM read_parquet([" + (str/join ", " files) + "]) " + "WHERE " + (dq "client-code") + " = '" client-id "' " + "GROUP BY " + (dq "processor") ", " + (dq "type-name"))] + (let [rows (p/query-rows sql)] + (reduce (fn [acc row] + (let [proc (:proc row) + tname (str/trim (name (:type_name row))) + total (sum-dbl (:total_amount row))] + (update acc proc + (fn [inner] + (let [b (or inner {})] + (assoc b + tname + (+ (get b tname 0.0) total))))))) + {} + rows))) + (catch Exception e + (println "[sales-summaries]" (.getMessage e)) + {})))) + +(defn sum-discounts [client-id start-date end-date] + (let [files (pq-files "sales-order" start-date end-date)] + (try + (let [sql (str "SELECT SUM(" + (dq "discount") + ")::DOUBLE AS discount_total " + "FROM read_parquet([" + (str/join ", " files) + "]) " + "WHERE " + (dq "client-code") + " = '" client-id "'")] + (or (some-> (first (p/query-rows sql)) :discount_total sum-dbl) 0.0)) + (catch Exception e + (println "[sales-summaries/discounts]" (.getMessage e)) + 0.0)))) + +(defn sum-refunds-by-type [client-id start-date end-date] + (let [files (pq-files "sales-refund" start-date end-date)] + (try + (let [sql (str "SELECT " + (dq "type-name") + " AS type_name, " + "SUM(" + (dq "total") + ")::DOUBLE AS total_amount " + "FROM read_parquet([" + (str/join ", " files) + "]) " + "WHERE " + (dq "sales-order-external-id") + " IN (SELECT " + (dq "external-id") + " FROM read_parquet([" + (str/join ", " (pq-files "sales-order" start-date end-date)) + "]) WHERE " + (dq "client-code") + " = '" client-id "') " + "GROUP BY " (dq "type-name"))] + (let [rows (p/query-rows sql)] + (reduce (fn [acc row] + (let [tname (str/trim (name (:type_name row))) + total (sum-dbl (:total_amount row))] + (assoc acc tname (+ (get acc tname 0.0) total)))) + {} + rows))) + (catch Exception e + (println "[sales-summaries/refunds]" (.getMessage e)) + {})))) + +(defn sum-taxes [client-id start-date end-date] + (let [files (pq-files "sales-order" start-date end-date)] + (try + (let [sql (str "SELECT SUM(" + (dq "tax") + ")::DOUBLE AS tax_total " + "FROM read_parquet([" + (str/join ", " files) + "]) " + "WHERE " + (dq "client-code") + " = '" client-id "'")] + (or (some-> (first (p/query-rows sql)) :tax_total sum-dbl) 0.0)) + (catch Exception e + (println "[sales-summaries/tax]" (.getMessage e)) + 0.0)))) + +(defn sum-tips [client-id start-date end-date] + (let [files (pq-files "sales-order" start-date end-date)] + (try + (let [sql (str "SELECT SUM(" + (dq "tip") + ")::DOUBLE AS tip_total " + "FROM read_parquet([" + (str/join ", " files) + "]) " + "WHERE " + (dq "client-code") + " = '" client-id "'")] + (or (some-> (first (p/query-rows sql)) :tip_total sum-dbl) 0.0)) + (catch Exception e + (println "[sales-summaries/tip]" (.getMessage e)) + 0.0)))) + +(defn sum-sales-by-category [client-id start-date end-date] + (let [files (pq-files "line-item" start-date end-date)] + (try + (let [sql (str "SELECT " + (dq "category") + " AS category, " + "SUM(" + (dq "total") + ")::DOUBLE AS total_amount, " + "SUM(" + (dq "tax") + ")::DOUBLE AS tax_amount, " + "SUM(" + (dq "discount") + ")::DOUBLE AS discount_amount " + "FROM read_parquet([" + (str/join ", " files) + "]) " + "WHERE " + (dq "sales-order-external-id") + " IN (SELECT " + (dq "external-id") + " FROM read_parquet([" + (str/join ", " (pq-files "sales-order" start-date end-date)) + "]) WHERE " + (dq "client-code") + " = '" client-id "') " + "GROUP BY " (dq "category"))] + (let [rows (p/query-rows sql)] + (mapv (fn [row] + {:category (or (:category row) "Unknown") + :total (sum-dbl (:total_amount row)) + :tax (sum-dbl (:tax_amount row)) + :discount (sum-dbl (:discount_amount row))}) + rows))) + (catch Exception e + (println "[sales-summaries/sales]" (.getMessage e)) + [])))) diff --git a/test/clj/auto_ap/storage/parquet_test.clj b/test/clj/auto_ap/storage/parquet_test.clj new file mode 100644 index 00000000..5aa78b3f --- /dev/null +++ b/test/clj/auto_ap/storage/parquet_test.clj @@ -0,0 +1,30 @@ +(ns auto-ap.storage.parquet-test + (:require [auto-ap.storage.parquet :as p] + [clojure.test :refer [deftest is testing use-fixtures]])) + +(deftest test-query-scalar + (testing "SELECT 1 returns 1" + (is (= 1 (p/query-scalar "SELECT 1"))))) + +(deftest test-query-scalar-with-expression + (testing "SELECT 2 + 2 returns 4" + (is (= 4 (p/query-scalar "SELECT 2 + 2"))))) + +(deftest test-buffer + (testing "buffer! adds record to buffer" + (p/clear-buffer! "test-type") + (p/buffer! "test-type" {:id 1 :name "test"}) + (is (= 1 (p/buffer-count "test-type"))))) + +(deftest test-clear-buffer + (testing "clear-buffer! empties buffer" + (p/clear-buffer! "test-type") + (p/buffer! "test-type" {:id 2}) + (is (= 1 (p/buffer-count "test-type"))) + (p/clear-buffer! "test-type") + (is (= 0 (p/buffer-count "test-type"))))) + +(deftest test-date-seq + (testing "date-seq generates correct sequence" + (let [result (p/date-seq "2024-04-01" "2024-04-03")] + (is (= ["2024-04-01" "2024-04-02" "2024-04-03"] result))))) diff --git a/test/clj/auto_ap/storage/perf_test.clj b/test/clj/auto_ap/storage/perf_test.clj new file mode 100644 index 00000000..a1e7e177 --- /dev/null +++ b/test/clj/auto_ap/storage/perf_test.clj @@ -0,0 +1,113 @@ +(ns auto-ap.storage.perf-test + (:require [auto-ap.storage.parquet :as p] + [amazonica.aws.s3 :as s3] + [clojure.java.io :as io] + [clojure.string :as str]) + (:import (java.sql DriverManager) + (java.time Instant))) + +(defn timestamp [] + (System/currentTimeMillis)) + +(defn timed [label sql-fn] + (let [start (timestamp) + result (sql-fn) + elapsed (- (timestamp) start)] + (println (format "%s: %d ms" label elapsed)) + result)) + +(defn run-perf-tests [] + (p/connect!) + (try + (let [bucket "data.dev.app.integreatconsult.com" + prefix "test-duckdb" + local-parquet "/tmp/test_data.parquet" + s3-key (str prefix "/data.parquet")] + + ;; Create 100k test rows + (println "\n=== Creating 100k test rows ===") + (p/execute! "DROP TABLE IF EXISTS test_data") + (p/execute! (str " + CREATE TABLE test_data AS + SELECT + i AS id, + 'order_' || i AS external_id, + CASE (i % 5) + WHEN 0 THEN 'north' + WHEN 1 THEN 'south' + WHEN 2 THEN 'east' + WHEN 3 THEN 'west' + ELSE 'central' + END AS region, + CASE (i % 8) + WHEN 0 THEN 'food' + WHEN 1 THEN 'beverage' + WHEN 2 THEN 'alcohol' + WHEN 3 THEN 'catering' + WHEN 4 THEN 'retail' + WHEN 5 THEN 'dessert' + WHEN 6 THEN 'merch' + ELSE 'other' + END AS category, + ROUND(1 + ABS(RANDOM() % 10000) / 100.0, 2) AS amount, + CAST(DATE '2024-01-01' + (i % 365) * INTERVAL '1 day' AS DATE) AS sale_date, + CASE WHEN i % 20 = 0 THEN 'voided' ELSE 'active' END AS status + FROM generate_series(1, 100000) AS t(i)")) + (println "Row count:" (p/query-scalar "SELECT COUNT(*) FROM test_data")) + (println "Voided count:" (p/query-scalar "SELECT COUNT(*) FROM test_data WHERE status = 'voided'")) + (println "Amount > 3 count:" (p/query-scalar "SELECT COUNT(*) FROM test_data WHERE amount > 3")) + + ;; Write to local parquet + (println "\n=== Writing local parquet ===") + (timed "Write parquet" #(p/execute-to-parquet! "SELECT * FROM test_data" local-parquet)) + (let [f (io/file local-parquet)] + (println "File size:" (format "%.1f MB" (/ (.length f) 1048576.0)))) + + ;; Upload to S3 + (println "\n=== Uploading to S3 ===") + (timed "S3 upload" #(p/upload-parquet! (io/file local-parquet) prefix)) + (println "S3 URI:" (p/s3-location s3-key)) + + ;; Now test reading from S3 + (println "\n=== Performance Tests (reading from S3) ===") + (let [s3-uri (str "s3://" bucket "/" s3-key)] + + ;; Register S3 parquet as a view/table in DuckDB + (p/execute! (format "CREATE VIEW s3_test AS SELECT * FROM read_parquet('%s')" s3-uri)) + (println "Total rows in S3:" (p/query-scalar "SELECT COUNT(*) FROM s3_test")) + + ;; Test 1: Page 1 - first 25 rows + (println "\n--- Test 1: Page 1 (LIMIT 25 OFFSET 0) ---") + (timed "First page (25 rows)" #(p/query-rows "SELECT * FROM s3_test ORDER BY id LIMIT 25")) + (println "Sample row:" (first (p/query-rows "SELECT * FROM s3_test ORDER BY id LIMIT 1"))) + + ;; Test 2: Page 20 - rows 475-500 (OFFSET 475) + (println "\n--- Test 2: Page 20 (LIMIT 25 OFFSET 475) ---") + (timed "Page 20 (25 rows)" #(p/query-rows "SELECT * FROM s3_test ORDER BY id LIMIT 25 OFFSET 475")) + + ;; Test 3: Filter amount > 3 (no pagination) + (println "\n--- Test 3: Filter amount > 3 (no limit) ---") + (timed "Filter amount > 3 (all)" #(do (p/query-scalar "SELECT COUNT(*) FROM s3_test WHERE amount > 3") :done)) + + ;; Test 4: Filter + pagination + (println "\n--- Test 4: Filter amount > 3 + LIMIT 25 ---") + (timed "Filter + paginated (25 rows)" #(p/query-rows "SELECT * FROM s3_test WHERE amount > 3 ORDER BY id LIMIT 25")) + + ;; Test 5: Filter + page 20 + (println "\n--- Test 5: Filter amount > 3 + LIMIT 25 OFFSET 475 ---") + (timed "Filter + page 20" #(p/query-rows "SELECT * FROM s3_test WHERE amount > 3 ORDER BY id LIMIT 25 OFFSET 475")) + + ;; Test 6: Aggregation on S3 data + (println "\n--- Test 6: Aggregation (SUM, AVG on amount) ---") + (timed "Aggregation SUM/AVG" #(p/query-scalar "SELECT SUM(amount), AVG(amount) FROM s3_test WHERE status = 'active'")) + + ;; Cleanup + (p/execute! "DROP VIEW IF EXISTS s3_test") + (p/execute! "DROP TABLE IF EXISTS test_data")) + + (finally + (p/disconnect!)))) + +(comment + (run-perf-tests) + (println "\n=== Done ===")) \ No newline at end of file