{"reviewer":"testing_maintainability","findings":[{"title":"DuckDB shutdown hook is a no-op \u2014 connection never closes on JVM exit","severity":"P1","file":"src/clj/auto_ap/storage/parquet.clj","line":26,"confidence":100,"autofix_class":"safe_auto","owner":"review-fixer","requires_verification":false,"pre_existing":false,"suggested_fix":"Replace `(Thread. #(fn []))` with `(Thread. #(.close ^java.sql.Connection @db))`. See server.clj line 34 for the correct pattern.","why_it_matters":"The DuckDB JDBC connection accumulates in-process state and file handles. If the JVM exits without closing it, temp files leak and the next instance may encounter lock contention or corrupted WAL.","evidence":["(.addShutdownHook (Runtime/getRuntime)","(Thread. #(fn [])))"]},{"title":"Result sets never closed in query-scalar and query-rows \u2014 resource leak","severity":"P1","file":"src/clj/auto_ap/storage/parquet.clj","line":54,"confidence":100,"autofix_class":"safe_auto","owner":"review-fixer","requires_verification":true,"pre_existing":false,"suggested_fix":"Wrap stmt and rs in with-open blocks. Same fix applies to query-rows at line 61.","why_it_matters":"DuckDB JDBC holds open cursor resources until ResultSet is explicitly closed. Under sustained load this exhausts internal DuckDB memory and file handles, eventually crashing the process.","evidence":["(let [stmt (.createStatement conn)"," rs (.executeQuery stmt sql)]"," (when (.next rs)"," (.getObject rs 1)))))"]},{"title":"sales_orders_new.clj uses undefined variables \u2014 will throw at runtime","severity":"P0","file":"src/clj/auto_ap/datomic/sales_orders_new.clj","line":157,"confidence":100,"autofix_class":"manual","owner":"human","requires_verification":true,"pre_existing":false,"suggested_fix":"The raw-graphql-ids function uses cond->> nil ... (pq/get-sales-orders ...) but then references `result` on line 165 which is never bound. Also get-graphql at line 209 references undefined `id-keys` and `matching-count`. Rewrite the threading chain to bind results properly.","why_it_matters":"This file defines ns auto-ap.datomic.sales-orders \u2014 the same namespace as the working file. If classpath loads this after the original, ALL sales order queries break with CompilerException/NullPointerException.","evidence":["(when-let [rows (:rows result)] ;; `result` is NOT bound","matching-count ;; undefined in get-graphql","(graphql-results ids id-keys args) ;; `id-keys` undefined"]},{"title":"sales_orders_new.clj build-where-clause binds empty vector; missing clojure.string require \u2014 won't compile","severity":"P0","file":"src/clj/auto_ap/datomic/sales_orders_new.clj","line":41,"confidence":100,"autofix_class":"manual","owner":"human","requires_verification":false,"pre_existing":false,"suggested_fix":"Add [clojure.string :as str] to :require. Remove (let [clauses [] ...]) dead binding \u2014 the body builds SQL correctly but the let-binding signals confusion.","why_it_matters":"The namespace imports clojure.data.json and clojure.java.io but NOT clojure.string, yet uses str/join on line 52. This file will fail to compile/load.","evidence":["(ns auto-ap.datomic.sales-orders"," (:require [auto-ap.datomic :refer [conn]]"," [auto-ap.storage.parquet :as pq]"," [clojure.data.json :as json]"," [clojure.java.io :as io])) ;; no clojure.string!","(str/join \" AND \"))})) ;; line 52 \u2014 unbound var"]},{"title":"sales_orders_new.clj and sales_orders.clj define the same namespace \u2014 collision guarantee","severity":"P0","file":"src/clj/auto_ap/datomic/sales_orders_new.clj","line":1,"confidence":100,"autofix_class":"manual","owner":"human","requires_verification":false,"pre_existing":false,"suggested_fix":"Either rename the new file's namespace to `auto-ap.datomic.sales-orders-v2` and update all callers, OR delete sales_orders.clj once the new version is verified. Do not ship both.","why_it_matters":"Depending on classpath order, either the working old file or the broken new file loads. Behavior is non-deterministic across environments.","evidence":[";; sales_orders_new.clj","(ns auto-ap.datomic.sales-orders ;; SAME as sales_orders.clj!",";; sales_orders.clj","(ns auto-ap.datomic.sales-orders"]},{"title":"safe-cleanup-all destructures months incorrectly \u2014 `[_ y m]` skips year, passes nil","severity":"P0","file":"src/clj/auto_ap/migration/cleanup_sales.clj","line":199,"confidence":75,"autofix_class":"safe_auto","owner":"review-fixer","requires_verification":true,"pre_existing":false,"suggested_fix":"Change `[[_ y m]]` to `[[y m]]`. collect-all-months returns sorted vec of [year month] 2-element vectors. Destructuring as [_ y m] binds _ to year, y to month, m to nil.","why_it_matters":"The safety check that prevents data loss (verifying S3 before deleting Datomic) receives (nil nil) for every month. The verify function either crashes or produces wrong dates and deletes unverified months.","evidence":["(defn- collect-all-months [conn]"," ;; returns: [[2024 1] [2024 3] [2024 4] ...]"," (sort (keys grouped)))","","(doseq [[_ y m] months]"," (let [result (verify-month-in-s3? y m)"]},{"title":"build-where-clause in parquet.clj has unsanitized SQL string interpolation \u2014 injection risk","severity":"P1","file":"src/clj/auto_ap/storage/parquet.clj","line":233,"confidence":75,"autofix_class":"manual","owner":"human","requires_verification":false,"pre_existing":false,"suggested_fix":"Use DuckDB parameter binding via PreparedStatement, or at minimum escape single quotes by doubling them. User-supplied :client, :vendor, :location values are interpolated directly into SQL.","why_it_matters":"If filter values come from GraphQL query parameters or API inputs, an attacker can close the string literal and inject arbitrary SQL. Even without malice, any entity with apostrophes crashes queries.","evidence":["(when v"," (str env \" = '\" v \"'\"))))"]},{"title":"sales_orders_new.clj: summarize-orders calls private function from another namespace \u2014 fragile compile dependency","severity":"P1","file":"src/clj/auto_ap/datomic/sales_orders_new.clj","line":190,"confidence":75,"autofix_class":"manual","owner":"human","requires_verification":true,"pre_existing":false,"suggested_fix":"Replace (#'auto-ap.datomic/aggregate-sum ids) with a public function or inline the query. Indirect private-function calls via #' are not compiler-supported and break on namespace reload.","why_it_matters":"Works only when both namespaces load in exact order. Hot-reload or REPL evaluation can reorder loads and silently break aggregation.","evidence":["(#'auto-ap.datomic/aggregate-sum ids) ;; uses dc/q internally"]},{"title":"parquet_test.clj: 5 tests for 297 lines \u2014 no coverage of flush, S3, WAL recovery, or query layer","severity":"P1","file":"test/clj/auto_ap/storage/parquet_test.clj","line":1,"confidence":100,"autofix_class":"advisory","owner":"review-fixer","requires_verification":false,"pre_existing":false,"suggested_fix":"Add tests for: (a) flush-to-parquet! with mock DuckDB writing to local path, (b) load-unflushed! reading from seeded WAL JSONL files, (c) parquet-query URL construction, (d) build-where-clause sanitization, (e) query-deduped with duplicate _seq_no records, (f) error propagation in flush.","why_it_matters":"The 297 lines of storage.parquet implement buffered writes to S3 with WAL recovery \u2014 the most data-critical path in this PR. Without tests on flush and WAL round-tripping, a single typo can silently corrupt production data.","evidence":["(deftest test-query-scalar"," (testing \"SELECT 1 returns 1\""," (is (= 1 (p/query-scalar \"SELECT 1\")))))","",";; Only 5 tests total \u2014 all trivial. No flush/WAL/query/S3 coverage."]},{"title":"query-deduped hardcodes sales_order.external_id partition \u2014 breaks for non-order entities","severity":"P1","file":"src/clj/auto_ap/storage/parquet.clj","line":277,"confidence":75,"autofix_class":"manual","owner":"human","requires_verification":false,"pre_existing":false,"suggested_fix":"Parameterize the partition column with a dedup-key argument. The current implementation always partitions by sales_order.external_id, which does not exist on charge/line-item/sales-refund schemas.","why_it_matters":"Calling (query-deduped \"charge\" ...) reads charges but deduplicates on a nonexistent column. DuckDB will throw or return garbage.","evidence":["(defn query-deduped [entity-type start-date end-date]"," ... QUALIFY ROW_NUMBER() OVER"," (PARTITION BY sales_order.external_id"," ORDER BY _seq_no DESC) = 1"]},{"title":"flush-to-parquet! throws on S3 failure \u2014 callers cannot distinguish success from partial failure","severity":"P1","file":"src/clj/auto_ap/storage/parquet.clj","line":138,"confidence":75,"autofix_class":"manual","owner":"human","requires_verification":true,"pre_existing":false,"suggested_fix":"Return a consistent result map with :status and :error keys instead of sometimes returning {:status :ok}, sometimes {:status :no-records}, and sometimes throwing.","why_it_matters":"If one entity type fails to flush during migration (e.g., S3 throttling), the exception aborts the entire day with no partial-progress recovery. Structured return lets callers retry or skip selectively.","evidence":["{:key s3-key :status :ok}","(catch Exception e"," (throw (ex-info \"Flush failed\" {:entity-type entity-type}"," :error (.getMessage e)))))))))"]},{"title":"sales_to_parquet.clj uses global *buffers* \u2014 migration pollutes live application data","severity":"P1","file":"src/clj/auto_ap/migration/sales_to_parquet.clj","line":160,"confidence":75,"autofix_class":"manual","owner":"human","requires_verification":false,"pre_existing":false,"suggested_fix":"Create a local buffer atom for the migration process rather than using the global p/*buffers*. A user request during migration could get buffered alongside historical data and flushed to the wrong date.","why_it_matters":"Running migrate-all() while the app serves traffic means new sales orders could end up in the wrong parquet file. The WAL mechanism does not separate migration from live traffic.","evidence":["(doseq [r @flat]"," (p/buffer! (:entity-type r) r)) ;; writes to global buffer","",";; In parquet.clj:","(defonce *buffers* (atom {}))"]},{"title":"test: no test coverage for error paths \u2014 WAL corruption, S3 failure, empty DuckDB responses","severity":"P1","file":"test/clj/auto_ap/storage/parquet_test.clj","line":1,"confidence":100,"autofix_class":"advisory","owner":"review-fixer","requires_verification":false,"pre_existing":false,"suggested_fix":"Add tests for: (a) malformed JSON in WAL \u2014 verify load-unflushed! skips bad lines, (b) stub S3 upload to throw \u2014 verify flush-to-parquet! throws ex-info with context, (c) query-rows returns [] for empty results, not nil.","why_it_matters":"Error paths are where data loss happens: corrupted WAL files, partial flushes, DuckDB crashes. Without testing these, recovery behavior is unverified.","evidence":["(ns auto-ap.storage.parquet-test"," (:require ...))","",";; Only happy-path tests for query-scalar, buffer!, clear-buffer!, date-seq"]},{"title":"sales_summaries.clj: get-fees defined twice with identical bodies \u2014 dead private version","severity":"P2","file":"src/clj/auto_ap/jobs/sales_summaries.clj","line":148,"confidence":100,"autofix_class":"safe_auto","owner":"review-fixer","requires_verification":false,"pre_existing":false,"suggested_fix":"Delete the private (defn- get-fees at line 148. The public (defn get-fees at line 193 shadows it.","why_it_matters":"Dead code that will never be called. If someone modifies one copy but not the other, behavior diverges silently.","evidence":["(defn- get-fees [c date] ;; line 148 \u2014 shadowed"," (when-let [fee (get-fee c date)]","","(defn get-fees [c date] ;; line 193 \u2014 the one that runs"," (when-let [fee (get-fee c date)]"]},{"title":"Entity-type array duplicated across 5+ locations with no single source of truth","severity":"P2","file":"src/clj/auto_ap/storage/parquet.clj","line":158,"confidence":100,"autofix_class":"safe_auto","owner":"review-fixer","requires_verification":false,"pre_existing":false,"suggested_fix":"Define a single def ENTITY-TYPES in auto-ap.storage.parquet and require it from migration/ and jobs/ namespaces.","why_it_matters":"The same 4-item vector is hardcoded at parquet.clj (2x), sales_to_parquet.clj (2x), cleanup_sales.clj (1x). Adding or renaming an entity type requires editing 5+ locations \u2014 easy to miss one and get silent data gaps.","evidence":["(let [etypes [\"sales-order\" \"charge\""," \"line-item\" \"sales-refund\"]","; parquet.clj:158, :172 | sales_to_parquet.clj:162, :182 | cleanup_sales.clj:95"]},{"title":"object-exists? downloads entire S3 object to check existence \u2014 10\u2013100x slower than head-object","severity":"P2","file":"src/clj/auto_ap/migration/cleanup_sales.clj","line":113,"confidence":75,"autofix_class":"safe_auto","owner":"review-fixer","requires_verification":true,"pre_existing":false,"suggested_fix":"Replace s3/get-object with s3/head-object. Head-object returns only metadata (no body) and is orders of magnitude faster.","why_it_matters":"safe-cleanup-all calls this per day \u00d7 entity type across all months. For 2+ years of data that is ~3000 S3 GET requests each potentially pulling megabytes.","evidence":["(defn- object-exists?"," (try"," (s3/get-object {:bucket-name pq/*bucket*"," :key key})"," true"]},{"title":"DRY-RUN? is a mutable var toggled at runtime \u2014 not thread-safe, races on concurrent calls","severity":"P2","file":"src/clj/auto_ap/migration/cleanup_sales.clj","line":8,"confidence":75,"autofix_class":"manual","owner":"human","requires_verification":false,"pre_existing":false,"suggested_fix":"Pass dry-run? as a boolean parameter through the function chain. If keeping the var for REPL convenience, document that it only works in single-threaded contexts.","why_it_matters":"alter-var-root on a state-bearing boolean means concurrent invocations race. A second call to set-dry-run! during a cleanup can cause irreversible data deletion from Datomic.","evidence":["(def ^:private DRY-RUN? true)","","(defn- set-dry-run! [v]"," (alter-var-root #'DRY-RUN? (constantly v)))"]},{"title":"sales_to_parquet.clj: docstring lists public functions that are actually private","severity":"P2","file":"src/clj/auto_ap/migration/sales_to_parquet.clj","line":8,"confidence":75,"autofix_class":"safe_auto","owner":"review-fixer","requires_verification":false,"pre_existing":false,"suggested_fix":"Either remove the - prefix from write-day-by-day and write-dead-letter, OR update the docstring examples to only reference actually-callable entry points.","why_it_matters":"The namespace docstring is the migration operator handbook. If operators follow instructions and call (write-day-by-day ...) from REPL it fails with Unbound fn.","evidence":[" Usage:"," (migrate-all)"," (write-day-by-day \"2024-01-01\" \"2024-03-31\") ; documented but private!"," (write-dead-letter [flat]) ; documented but private!","","(defn- write-day-by-day ;; actual definition \u2014 private"]},{"title":"sales_to_parquet.clj: flush-all-types return value is inverted \u2014 negative number means success","severity":"P2","file":"src/clj/auto_ap/migration/sales_to_parquet.clj","line":190,"confidence":75,"autofix_class":"manual","owner":"human","requires_verification":false,"pre_existing":false,"suggested_fix":"Track records before flush and count :ok results. Currently computes (- (p/total-buf-count) start) AFTER flush \u2014 so success returns negative number, total failure returns 0.","why_it_matters":"The return value is unused but if future code depends on it for progress reporting, the inverted sign means a negative indicates success and zero means everything failed.","evidence":["(defn- flush-all-types []"," (let [... start (p/total-buf-count)]"," ..."," {:records-flush (- (p/total-buf-count) start)}))"]},{"title":"sales_summaries.clj: old Datomic-query functions left alongside parquet versions \u2014 post-migration returns nil/zero","severity":"P2","file":"src/clj/auto_ap/jobs/sales_summaries.clj","line":201,"confidence":75,"autofix_class":"safe_auto","owner":"review-fixer","requires_verification":false,"pre_existing":false,"suggested_fix":"Delete get-tax, get-tip, get-sales, get-returns (the Datomic-query versions). The parquet-readers (get-tax-parquet etc.) are the correct post-migration path.","why_it_matters":"Old functions reference :sales-order/tax attributes retracted by cleanup. Any accidental call post-migration returns nil/zero instead of triggering an obvious error.","evidence":["(defn- get-tax [c date] ;; Datomic version \u2014 dead after migration"," ... (dc/q '[:find (sum ?tax)...]","","(defn- get-tax-parquet [c date] ;; Parquet version \u2014 correct path"]},{"title":"query-by-entity-id loads entire date range then filter in-memory \u2014 defeats parquet partitioning","severity":"P2","file":"src/clj/auto_ap/storage/parquet.clj","line":286,"confidence":75,"autofix_class":"manual","owner":"human","requires_verification":false,"pre_existing":false,"suggested_fix":"Push the external_id filter into the SQL WHERE clause. Loading all rows for a date range into memory just to find one by ID negates DuckDB filtering.","why_it_matters":"For a day with 10k orders, reads and materializes 10k rows in-memory just to return one. With larger datasets this adds latency proportional to total daily volume.","evidence":["(defn query-by-entity-id [entity-type external-id"," start-date end-date]"," (->> (query-deduped entity-type start-date end-date)"," (filter #(= (:external_id %)"," (name external-id)))"," first))"]},{"title":"buffer! uses wall-clock milliseconds as sequence number \u2014 collision possible under load","severity":"P2","file":"src/clj/auto_ap/storage/parquet.clj","line":102,"confidence":50,"autofix_class":"manual","owner":"human","requires_verification":false,"pre_existing":false,"suggested_fix":"Use an atom-based counter (swap! *seq-counter* inc) or System/nanoTime. Under concurrent buffering, two records in the same millisecond get identical _seq_no.","why_it_matters":"query-deduped relies on highest _seq_no winning for duplicate detection. If two records share a sequence number, the ROW_NUMBER() ordering is non-deterministic.","evidence":["(let [seq-no (System/currentTimeMillis)"," entry (assoc record :_seq-no seq-no)]"]},{"title":"cleanup_sales.clj: group-orders-by-month creates one Datomic entity per order \u2014 O(n) sequential lookups","severity":"P2","file":"src/clj/auto_ap/migration/cleanup_sales.clj","line":83,"confidence":50,"autofix_class":"manual","owner":"human","requires_verification":false,"pre_existing":false,"suggested_fix":"Replace the (d-api/entity db eid) call inside reduce with a single batch pull or Datalog query returning [eid day-value] pairs.","why_it_matters":"For 50k+ orders, this means 50k+ separate entity lookups hitting the Datomic DBv2 log one at a time. A single batch query reduces to one pass.","evidence":["(reduce (fn [acc eid]"," (when-let [day-val (:sales-order/day-value"," (d-api/entity db eid))]"]},{"title":"perf_test.clj runs unconditionally at load time \u2014 pollutes any test/ci run","severity":"P2","file":"test/clj/auto_ap/storage/perf_test.clj","line":111,"confidence":100,"autofix_class":"safe_auto","owner":"review-fixer","requires_verification":false,"pre_existing":false,"suggested_fix":"Move (run-perf-tests) into a deftest or wrap in (comment ...). Loading this namespace triggers 100k row generation and S3 upload on every lein test invocation.","why_it_matters":"CI/CD pipelines that compile all namespaces will hang. Requires live S3 credentials just to load the namespace.","evidence":["(run-perf-tests)","(println \"\\n=== Done ===\")"]},{"title":"date-seq duplicated in parquet.clj and sales_to_parquet.clj","severity":"P3","file":"src/clj/auto_ap/migration/sales_to_parquet.clj","line":132,"confidence":75,"autofix_class":"safe_auto","owner":"review-fixer","requires_verification":false,"pre_existing":false,"suggested_fix":"Remove the private -date-seq in sales_to_parquet.clj and use (p/date-seq start end) from parquet which already exposes this publicly.","why_it_matters":"Duplicate utility logic is a maintenance tax. Fixing a bug requires finding two copies.","evidence":["; sales_to_parquet.clj:132:","(defn- date-seq [start end]",", parquet.clj:203:","(defn date-seq [start end]"]},{"title":"sales_summaries.clj: large comment blocks with ad-hoc queries should be in test files","severity":"P3","file":"src/clj/auto_ap/jobs/sales_summaries.clj","line":311,"confidence":75,"autofix_class":"advisory","owner":"review-fixer","requires_verification":false,"pre_existing":false,"suggested_fix":"Move the (comment ...) blocks at lines 311-379 into a dedicated test namespace or REPL scratch file.","why_it_matters":"Comment blocks in production namespaces become maintenance traps \u2014 future developers copy-paste from them when underlying schema has evolved.","evidence":["(comment"," ;; TODO: Move to test file or proper location"]},{"title":"buffer! writes to WAL file every call \u2014 no batching, full disk I/O per record","severity":"P3","file":"src/clj/auto_ap/storage/parquet.clj","line":105,"confidence":50,"autofix_class":"advisory","owner":"human","requires_verification":false,"pre_existing":false,"suggested_fix":"Batch WAL writes or use a background thread that flushes periodically. Current design opens and appends to JSONL on every buffer! call.","why_it_matters":"Under high-throughput ingestion this is hundreds of file-open/write/close cycles per second.","evidence":["(with-open [w (io/writer wal-file :append true)]"," (.write w ^String (json/write-str {:seq-no seq-no"]},{"title":"Naming inconsistency: total-buf-count vs buffer-count vs get-unflushed-count \u2014 three names for similar metrics","severity":"P3","file":"src/clj/auto_ap/storage/parquet.clj","line":123,"confidence":25,"autofix_class":"manual","owner":"human","requires_verification":false,"pre_existing":false,"suggested_fix":"Unify to buffer-count (per-entity-type) and total-buffer-count (across all types). Remove get-unflushed-count as it delegates with no added value.","why_it_matters":"Three names for conceptually the same metric forces callers to look up which function does what. Inconsistent abbreviation signals ad-hoc naming.","evidence":["(defn buffer-count [entity-type] ;; per-type","(defn total-buf-count [] ;; abbreviated","(defn get-unflushed-count [] ;; delegates"]}],"residual_risks":["No test for WAL file permission errors \u2014 if parquet-wal/ is not writable, buffer! silently logs and drops records into memory-only atom, creating false confidence","DuckDB httpfs extension requires valid AWS credentials. If S3 access keys rotate or IAM role changes, all queries fail with cryptic JDBC errors \u2014 no health check exists to detect this","Migration script batch-size defaults to 100 but pull-sales-order-data uses dc/pull-many with no result limiting. A day with 50k orders creates one enormous data structure in memory","cleanup_sales.clj DRY-RUN? defaults to true \u2014 if someone forgets to toggle it off, cleanup-all does nothing and they assume data is deleted; opposite risk if toggled on accidentally","No monitoring or alerting for unflushed buffer growth. If flush-to-parquet! silently fails (WAL catches records but S3 upload deadlocks), buffers grow until OOM"],"testing_gaps":["No integration test for full buffer -> flush -> query round trip: write via buffer!, flush-to-parquet!, read back with get-sales-orders, verify data integrity","No test for WAL recovery path: seed a WAL JSONL file, call load-unflushed!, verify *buffers* populated, then flush and verify S3 gets recovered records","No test for date-seq edge cases: start=end, reversed dates (end before start), empty range","No test for parquet-query multi-day URL expansion \u2014 verify all expected files included in read_parquet([...]) call","No test for build-where-clause SQL injection resistance: single quotes, semicolons, SQL keywords in input values","No test for query-deduped deduplication with records sharing external_id but different _seq_no values","No test for flush-to-parquet! when S3 upload partially fails \u2014 verify buffer is NOT cleared so retry can recover","No test for object-exists? handling: missing objects return false, access-denied does not throw uncaught exception","No unit tests for migration/sales_to_parquet.clj \u2014 core migration logic has zero automated verification against Datomic fixtures","No unit tests for cleanup_sales.clj S3 verification path \u2014 verify-month-in-s3? with partial coverage returns {:ok false :missing [...]} but this is never tested","No test coverage for any function in sales_orders_new.clj \u2014 it is entirely untested and also broken (undefined vars, missing requires)","No test for get-sales-orders pagination correctness: offset + limit interaction, count accuracy vs rows length"]}