- Remove sales_orders_new.clj (unreferenced, duplicate ns) - Add [clojure.string :as str] to sales_orders.clj ns
1 line
14 KiB
JSON
1 line
14 KiB
JSON
{"reviewer":"correctness","findings":[{"title":"SQL injection via unsanitized WHERE clause values in parquet read layer","severity":"P0","file":"src/clj/auto_ap/storage/parquet.clj","line":237,"confidence":100,"autofix_class":"manual","owner":"review-fixer","requires_verification":true,"pre_existing":false,"suggested_fix":"Parameterize all SQL values instead of string concatenation. Use DuckDB prepare/bind or at minimum escape single quotes in user-supplied values: (str/replace v \"'\" \"''\"). Apply this to build-where-clause and get-sales-orders sort/order/limit/offset interpolations.","why_it_matters":"Any value passed via :client, :vendor, :location opts is concatenated directly into SQL string. A client code containing a single quote (e.g. O Brien) breaks the query. Malicious values can inject arbitrary SQL. The sort and order fields are also interpolated without validation, allowing ORDER BY injection.","evidence":["(str env \" = '\" v \"'\")","(str base-sql where-str)","(str \"ORDER BY \" sort \" \" (name order))","(str \" LIMIT \" limit)"]},{"title":"SQL injection in sales-summaries aggregation WHERE clauses","severity":"P0","file":"src/clj/auto_ap/storage/sales_summaries.clj","line":42,"confidence":100,"autofix_class":"manual","owner":"review-fixer","requires_verification":true,"pre_existing":false,"suggested_fix":"Escape single quotes in client-id before interpolation. Apply to all SQL string construction sites in this namespace.","why_it_matters":"client-id is interpolated directly into WHERE clauses across all aggregation functions. Values with apostrophes break queries; malicious values enable injection.","evidence":["WHERE client-code = ' . client-id . '","Lines 42, 73, 98-100, 123, 140, 171"]},{"title":"with-duckdb macro never closes connections created by connect!","severity":"P1","file":"src/clj/auto_ap/storage/parquet.clj","line":43,"confidence":100,"autofix_class":"manual","owner":"review-fixer","requires_verification":true,"pre_existing":false,"suggested_fix":"Track whether conn was freshly created vs retrieved from @db. Only close in finally if it was freshly created and not stored.","why_it_matters":"When @db is nil, connect! creates a connection AND stores it via reset!. The finally clause checks (not @db) which is now false because connect! just set it. All DuckDB connections accumulate until JVM shutdown.","evidence":["(let [conn# (or @db (connect!))]","(finally (when (and (not @db) conn#) (.close conn#)))","connect! calls (reset! db conn)"]},{"title":"flush-to-parquet! clears buffer before S3 upload, losing data on upload failure","severity":"P1","file":"src/clj/auto_ap/storage/parquet.clj","line":148,"confidence":100,"autofix_class":"manual","owner":"review-fixer","requires_verification":true,"pre_existing":false,"suggested_fix":"Move clear-buffer! to after upload-parquet! succeeds. Restructure as: write jsonl -> convert to parquet -> verify local file -> upload to S3 -> only then clear buffer.","why_it_matters":"clear-buffer! at line 148 runs before upload-parquet!. If upload fails, the catch block throws but the buffer is already cleared. Records in memory are permanently lost. WAL has them but they won't be re-flushed until process restart.","evidence":["(clear-buffer! entity-type) at line 148","(upload-parquet! parquet-file s3-key) at line 147 runs after clear","(catch Exception e (throw ...))"]},{"title":"date-seq produces forward sequence when start > end","severity":"P0","file":"src/clj/auto_ap/storage/parquet.clj","line":207,"confidence":100,"autofix_class":"safe_auto","owner":"review-fixer","requires_verification":false,"pre_existing":false,"suggested_fix":"Validate start <= end and throw if reversed, or compute direction as step (1 or -1) based on comparison. Example: date-seq 2024-05-01 2024-04-01 should error or return [2024-05-01 2024-04-30 ... 2024-04-01].","why_it_matters":"(Math/abs ...) on the diff combined with always calling (.plusDays sd i) means if start > end, you get a sequence going forward from start by |end-start| days. Downstream parquet queries reference non-existent S3 keys producing empty or erroneous results.","evidence":["(int (Math/abs (- (.toEpochDay sd) (.toEpochDay ed))))","(for [i (range 0 (inc days))] (.toString (.plusDays sd i)))"]},{"title":"query-deduped generates invalid DuckDB syntax with wrong partition column","severity":"P1","file":"src/clj/auto_ap/storage/parquet.clj","line":282,"confidence":100,"autofix_class":"safe_auto","owner":"review-fixer","requires_verification":true,"pre_existing":false,"suggested_fix":"Fix to: QUALIFY ROW_NUMBER() OVER (PARTITION BY external_id ORDER BY _seq_no DESC) = 1. Remove the space after QUALIFY and use correct column names matching parquet schema.","why_it_matters":"The generated SQL is syntactically invalid: 'QUALIFY ROW_NUMBER() OVER' has a space breaking QUALIFY from its expression. Additionally, 'sales_order.external_id' does not exist as a column in parquet -- records use 'external_id'. This function always fails at runtime.","evidence":["\" QUALIFY ROW_NUMBER() OVER\"","\" (PARTITION BY sales_order.external_id\"","Parquet columns use :external-id key"]},{"title":"safe-cleanup-all destructures [year month] pairs incorrectly as [_ y m]","severity":"P0","file":"src/clj/auto_ap/migration/cleanup_sales.clj","line":199,"confidence":100,"autofix_class":"safe_auto","owner":"review-fixer","requires_verification":false,"pre_existing":false,"suggested_fix":"Change (doseq [_ y m] months) to (doseq [y m] months) since months is a vector of [year month] pairs. Current code assigns year to discarded _, month to y, and m is nil.","why_it_matters":"collect-all-months returns [year month] vectors. Safe-cleanup-all iterates as [_ y m] -- so [2024 3] yields y=3 and m=nil. S3 verification and delete calls use wrong year/month values, corrupting cleanup.","evidence":["(doseq [[_y m] months]","group-orders-by-month returns {{y m} [eid...]} map","sort(keys group) produces [year month] vectors"]},{"title":"get-payment-items-parquet queries :client_code instead of :client-code causing silent empty results","severity":"P1","file":"src/clj/auto_ap/jobs/sales_summaries.clj","line":106,"confidence":100,"autofix_class":"safe_auto","owner":"review-fixer","requires_verification":false,"pre_existing":false,"suggested_fix":"Change (:client_code %) to (:client-code %). All parquet write paths store the key as :client-code.","why_it_matters":"DuckDB query-rows returns columns matching their parquet names. The filter looks for :client_code (underscored) which never exists, so all rows are filtered out. Payment aggregation silently returns empty results across all client/date queries.","evidence":["(filter #(= client-code (:client_code %)) rows)","buffer! writes :client-code key in ezcater/core.clj, square/core3.clj, migration/sales_to_parquet.clj"]},{"title":"raw-graphql-ids in sales_orders_new.clj references undefined variables causing nil return","severity":"P0","file":"src/clj/auto_ap/datomic/sales_orders_new.clj","line":149,"confidence":100,"autofix_class":"manual","owner":"review-fixer","requires_verification":false,"pre_existing":false,"suggested_fix":"Complete the function: bind result from pq/get-sales-orders call, add (require '[clojure.string :as str]), and fix the cond->>/when-let flow. The current code discards query results and references unbound 'result' variable.","why_it_matters":"The function compiles but returns nil at runtime because result is unbound when accessed at line 165. The conditional threading on lines 157-161 discards its output instead of binding it to a let form.","evidence":["(cond->>\\n where-str (pq/get-sales-orders start end ...))","(when-let [rows (:rows result)] -- result never bound","Missing clojure.string import for str/join used in build-where-clause"]},{"title":"Ezcater XLS flatten-order-to-parquet! writes integer db/id as client-code string","severity":"P1","file":"src/clj/auto-ap/routes/ezcater_xls.clj","line":112,"confidence":75,"autofix_class":"safe_auto","owner":"review-fixer","requires_verification":true,"pre_existing":false,"suggested_fix":"In map->sales-order, resolve client to its :client/code string before passing to flatten. Or in flatten-order-to-parquet!, look up the code: (if (integer? client) (lookup-code client) client).","why_it_matters":"map->sales-order sets :client to (:db/id client) which is an integer. flatten-order-to-parquet! writes this integer as client-code in parquet. All other paths write string client codes, causing inconsistent filtering and aggregation.","evidence":["client-id (:db/id client) at line 53","flatten uses (if (map? client) (:client/code client) client)","square/core3.clj resolves to client code string before buffer!"]},{"title":"Migration date query may lose last day due to epoch unit handling","severity":"P1","file":"src/clj/auto_ap/migration/sales_to_parquet.clj","line":119,"confidence":75,"autofix_class":"manual","owner":"review-fixer","requires_verification":true,"pre_existing":false,"suggested_fix":"Verify Datomic stores :sales-order/date as Instant epoch milliseconds. If timezone-dependent conversion causes off-by-one, use LocalDate-based boundary computation with explicit timezone.","why_it_matters":"Code converts LocalDate to epoch seconds then multiplies by 1000 for millis boundaries. Timezone-sensitive conversions may cause orders at end-of-day in certain timezones to fall outside [start-ms end-ms] range and be skipped.","evidence":["(.toEpochSecond ^java.time.LocalDate ...)","start (* day-ms 1000)","end (+ start (* 86400000)) -- subtracts 1 from exclusive end"]},{"title":"*buffers* atom has no lifecycle management and grows without bound","severity":"P2","file":"src/clj/auto_ap/storage/parquet.clj","line":89,"confidence":75,"autofix_class":"advisory","owner":"human","requires_verification":false,"pre_existing":false,"suggested_fix":"Add periodic flush job or explicit cleanup. Consider bounding buffer size and implementing eviction for oversized buffers.","why_it_matters":"Records buffered via Square/EZCater imports accumulate in a process-global atom. If flush is never called or the process runs for days without flushing, memory grows unbounded.","evidence":["(defonce *buffers* (atom {}))","Buffer grows on every import, shrinks only on explicit flush"]},{"title":"sales-summaries mark-dirty queries Datomic after sales-order entities are migrated to parquet","severity":"P2","file":"src/clj/auto_ap/jobs/sales_summaries.clj","line":30,"confidence":75,"autofix_class":"advisory","owner":"human","requires_verification":true,"pre_existing":false,"suggested_fix":"After Datomic cleanup completes, mark-all-dirty returns nothing because sales-order entities are gone. Add fallback to discover clients from parquet data or maintain a client registry.","why_it_matters":"mark-all-dirty queries [:sales-order/client ?c] in Datomic. After safe-cleanup-all removes all sales-order entities, this returns nil. The summaries job silently does nothing.","evidence":["(dc/q '[:find ?c ... [_ :sales-order/client ?c]]","defn mark-all-dirty depends on sales-order presence in Datomic"]},{"title":"WAL jsonl append not atomic across concurrent buffer! calls","severity":"P2","file":"src/clj/auto_ap/storage/parquet.clj","line":109,"confidence":50,"autofix_class":"advisory","owner":"human","requires_verification":true,"pre_existing":false,"suggested_fix":"Use per-entity-type lock or atomic file-channel write for WAL appends. Alternative: buffer in memory only and write entire batch on flush.","why_it_matters":"In multi-threaded server, two buffer! calls writing to the same .jsonl file simultaneously may interleave bytes, corrupting JSONL format. The open-write-close sequence is not atomic.","evidence":["(with-open [w (io/writer wal-file :append true)]","Concurrent HTTP requests trigger buffer! for same entity-type"]},{"title":"object-exists? in cleanup leaks S3 response streams across verification calls","severity":"P2","file":"src/clj/auto_ap/migration/cleanup_sales.clj","line":116,"confidence":75,"autofix_class":"safe_auto","owner":"review-fixer","requires_verification":false,"pre_existing":false,"suggested_fix":"Switch to s3/head-object which returns metadata without body stream. Or wrap s3/get-object in with-open to close the S3ObjectInputStream.","why_it_matters":"s3/get-object returns an S3ObjectInputStream that must be closed. Calling for every day-entity combination across months leaks file descriptors and HTTP connections -- ~300+ unclosed streams per month checked.","evidence":["(s3/get-object {:bucket-name pq/*bucket* :key key})","verify-month-in-s3? calls this for every day times entity type"]},{"title":"Shutdown hook in connect! is a no-op thunk","severity":"P2","file":"src/clj/auto_ap/storage/parquet.clj","line":27,"confidence":50,"autofix_class":"safe_auto","owner":"review-fixer","requires_verification":true,"pre_existing":false,"suggested_fix":"Change to (Thread. #(disconnect!)). Current code creates a Thread running #(fn []) which returns a new function that is never called.","why_it_matters":"At JVM shutdown the hook runs but does nothing. DuckDB connection is not gracefully closed, potentially losing state or corrupting temporary files.","evidence":["(.addShutdownHook (Runtime/getRuntime) (Thread. #(fn [])))","The inner fn returns another fn that never executes"]}],"residual_risks":["Dual-write path means parquet and Datomic can diverge if one write succeeds and the other fails -- no compensating transaction or reconciliation mechanism exists","Parquet files on S3 have no versioning or immutability guarantee; accidental overwrites during migration could corrupt historical data","No idempotency guarantee for migration scripts -- re-running sales_to_parquet.clj after partial failure may duplicate records since there is no pre-check for existing parquet files"],"testing_gaps":["No test for buffer flush under S3 failure and recovery via WAL replay","No test for SQL injection vectors in build-where-clause or get-sales-orders","No test for date-seq with reversed start/end dates","No integration test verifying sales-summaries aggregation returns correct results after parquet path import","No test coverage for safe-cleanup-all S3 verification logic with partial file presence","No test for concurrent buffer! + flush-to-parquet! interaction","No test covering nil or missing date fields in migration flatten-order-to-pieces!","No performance test validating memory behavior of accumulating *buffers* atom under sustained load"]} |