Files
integreat/.claude/skills/ssr-form-migration/reference/gotchas.md
Bryce c5dc305854 refactor(ssr): Phase 9 — migrate New/Edit Vendor onto the engine (5-step wizard)
A five-step linear wizard (info → terms → account → address → legal) plus a
separate Merge dialog, migrated off mm/* + form-cursor + the EDN snapshot onto the
session-backed engine (wizard2), following the Phase 8 template.

Latent bug found + fixed: the old "Next" PUT /admin/vendor/navigat carried a
[:map [:db/id entity-id]] route-schema on a route with no :db/id path param, so empty
route-params {} → main-transformer's parse-empty-as-nil → nil → 500 on every advance
(the same quirk as Phase 8's query-params, now via route-params). The engine's submit
is a POST with no such schema; the dead navigate route is deleted.

What changed:
- defrecord 5 → 0 (InfoModal/TermsModal/AccountModal/AddressModal/LegalEntityModal +
  VendorWizard), mm/ 0, fc/ cursor refs 0 (wizard AND the de-cursored Merge dialog),
  step-params[…] 0.
- 5 de-cursored step renders (plain data + path->name2 + a *errors* binding); the 3
  repeated grids became add-row-handler + a blank-row row render; the timeline is
  preserved as a per-step side panel.
- :init-fn branches new (empty) vs edit (entity split across the 5 steps' :init-data,
  seeded as per-step step-data so edit opens populated); per-step :validate via
  mc/validate + me/humanize replaces wrap-ensure-step; vendor-step wraps
  handle-step-submit in try+ to surface create-time validation as a 4xx.

Two new gotchas found + fixed + documented:
- empty-step decode: an all-blank step collapses to nil (parse-empty-as-nil), which a
  schema :validate rejects as "invalid type"; decode-with coerces nil → {} so optional-
  only steps advance while required-field steps still fail on the missing key.
- blank nested entity: an untouched Address (all-nil, no :db/id) makes :upsert-entity
  mint a tempid used only as value (datomic error); blank-address? drops it.

Verification: full e2e suite 65/65 (61 prior + 4 new: info renders + timeline; create
across all 5 steps persists; edit opens prefilled and a rename persists; a too-short
name blocks advancing). Create + edit confirmed at the REPL incl. the cookie-session
EDN round-trip. Skill fed (scorecard Phase 9; gotchas for both new traps).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-25 22:39:10 -07:00

26 KiB
Raw Blame History

Gotchas

GROWS every migration. One entry per surprise. Also the home for any written exception to the scorecard ratchet (a metric that regressed for a documented reason).


Stale $refs / tippy after a swap

A whole-form swap can run an Alpine event handler before the component re-initialises, so a handler that dereferences $refs.input.__x_tippy or calls tippy.show() throws. Always null-guard: $refs.input?.__x_tippy?.hide(), tippy?.show(). The transaction-edit-swap.spec.ts trackErrors() helper fails the test on any pageerror or console.error, which is exactly how a stale-ref throw surfaces.

Let the server value win — don't preserve Alpine state across a server-driven change

When a server change should update a component (e.g. choosing a vendor sets its default account), rebuild that section fresh on the swap so the server-provided value lands without keying tricks. The bug this prevents: "changing the vendor a second time doesn't update the account" because preserved Alpine state shadowed the new server value. If you must preserve a component, key it by value so a change forces re-init: (assoc attrs :key (str id "--" current-value)).

Focus dies if the typed input is inside its own swapped region

The single most important invariant. Amount field → swap a sibling tbody, not the row. Memo → swap nothing. If a caret test (sameNode) fails, the input is in its own swap region — re-target to a sibling/ancestor that excludes it.

Faked cursors breed duplicate render fns

A with-cursor/MapCursor re-root to fake a deep start forces a *-no-cursor* twin. Removing the fake lets you delete the twin. Don't "fix" a faked cursor in place — top-root it and collapse to one render fn. (See render-functions.md.)

Edit Clojure with clojure-mcp tools, not the file editor

clojure_edit / clojure_edit_replace_sexp. If a file won't compile: clj-paren-repair the file, then retry; if still broken, lein cljfmt check. Run tests via clojure-eval / clj-nrepl-eval -p PORT, never lein test (slow, last resort).

Solr/typeahead in tests

Account/vendor search is backed by Solr, unavailable in tests. To drive a typeahead in e2e: type under the 3-char threshold, then inject a result into Alpine state (Alpine.$data(el).elements = [{value, label}]) and click it — the real click handler, tippy.hide(), Alpine reactivity, and the HTMX swap all run as in production. Entity ids come from GET /test-info.


UI-only control fields must be stripped before a Datomic upsert

The wizard snapshot/step-params carry UI control fields that are not schema attributes — :action, :amount-mode, and (added by the simple/advanced work) :mode. The :manual save handler stripped :action/:amount-mode but not :mode, so every advanced manual save passed :mode "advanced" into :upsert-transaction and 500'd with :db.error/not-an-entity :mode. Lesson: when a save derives its tx-data from the form snapshot, strip every non-schema control key before transacting. The session-backed wizard engine (Phase 6) avoids this class of bug by storing per-step validated data only — UI control fields never enter the combined data. This was a real production bug surfaced by the e2e gate, not a test artifact.

E2E helpers must use the Alpine v3 API, not the v2 __x internal

The app loads Alpine v3 (cdn.jsdelivr.net/npm/alpinejs@3.x.x). The v2 internal el.__x.$data is goneel.__x is undefined, so any helper that pokes it silently no-ops. A stale selectAccountFromTypeahead did this and left the posted account empty (account-controlled by x-model, so the raw DOM .value you set is overwritten from Alpine's empty state). Drive components the real way instead: window.Alpine.$data(el), open the tippy dropdown, inject elements, click the result — exactly as transaction-edit-swap.spec.ts does. Probe with { hasLegacy__x: !!el.__x, hasAlpineData: !!window.Alpine.$data(el) }.

Diagnosing a "modal won't close after save"

The edit modal closes on an hx-trigger: modalclose from a successful save; a validation failure re-renders the #wizard-form (200), and a server exception returns 500 (caught by wrap-error). To find which: capture POST responses in Playwright (page.on('response', …)), read the edit-submit body — a <form id="wizard-form"> means validation re-render; a #error {…} stack means a 500. Then serialize the form right before save (new FormData(document.querySelector('#wizard-form'))) to see exactly what posts. This is how the :mode 500 and the empty-account bugs above were isolated.

De-faking a cursor is not a drop-in — with-field-default mutates

Tempting fix for a faked deep cursor (with-cursor + synthetic MapCursor at index 0): replace it with (fc/with-field-default 0 {}) to advance naturally. It broke the simple-mode swap (transaction-edit-swap test 1 threw). with-field-default calls cursor/transact! — it mutates the form cursor (assoc-ing the default row) as a render side effect, which changes simple-mode behavior. The read-only synthetic MapCursor did not. Lesson: removing a faked cursor on these modals is not a one-liner — it's part of the larger render-fn extraction (render the row from explicit data, construct field names directly, look up errors explicitly), done when the simple/advanced rows are reworked into pure render fns / Selmer. Don't swap one cursor primitive for another and assume parity; verify against the swap spec, and expect the de-fake to come with the render-fn rewrite.

Snapshot operations read stale state and drop live form values (heuristic 2)

The whole-form operation handlers (apply-new-account, apply-remove-account, apply-toggle-amount-mode) rebuild the account rows from the decoded :snapshot (the hidden EDN field), not from the live posted :step-params. So any value the user has typed but that hasn't been re-serialised into the snapshot yet — e.g. an amount typed right before clicking "New account" — is silently lost when the operation re-renders. This is the snapshot round-trip fragility the migration removes (heuristic 2: → 0 merges; state should ride in the form, not a parallel snapshot). It bit the percentage-split e2e: typing 50% then adding a row reverted the first row to its snapshot value, yielding a 66.67/33.33 split. Two ways it shows up and how to handle until the snapshot is gone:

Fixed (Stage 1): the operation handlers read the live :step-params rows (already schema-decoded by mm/wrap-wizard) so typed values survive add/remove/toggle.

Done (Stage 2 — the snapshot round-trip is gone). The EDN snapshot hidden field + custom readers + merge-multi-form-state are removed. A db/id hidden rides in the form; wrap-derive-state rebuilds :multi-form-state per request from entity step-params, and EditWizard.render-wizard renders a plain form (no snapshot/edit-path/current-step hiddens). The ~34 :snapshot reads still work — :snapshot is now a derived map, not a round-tripped blob.

Trap that cost hours — derive entity step-params correctly. First cut was (merge base step-params). Bug: base always carries the entity's persisted accounts, so after the user removes every row (step-params has no accounts key) the merge falls back to base → the persisted accounts resurrect on the next operation. Fix: editable fields (accounts, vendor, memo, approval, action, mode, amount-mode) come only from the live form (absent = cleared); only entity-only fields (db/id, client, amount, description, status, type) come from the entity. Lesson: with a posted form, "field absent" means cleared, not "use the persisted value" — never merge the entity's editable fields back in.

Verify the snapshot removal on a FRESH server, and don't trust a long-lived in-process test server. Protocol/defrecord (EditWizard.render-wizard) and middleware reloads do not fully take in a running REPL — the server kept rendering the old snapshot field after :reload, and an in-process server that isn't reseeded between npx playwright invocations accumulates state that makes order-dependent tests flake. Both produced hours of phantom failures. Restart the REPL clean (or reseed) before trusting an e2e result; CI boots a fresh server per run, so the fresh-server number (38 pass / 1 unrelated) is the real one.

Characterization tests rot against table order and removed wizard chrome

Two stale-test traps surfaced once the masking failure was fixed (a mode: 'serial' file hides every test after the first failure, so fixing one unmasks the next):

  • Hard-coded amounts per table row index (openEditModal(page, 3) then expect(amount).toBeCloseTo(400)) break because same-date seed transactions have no pinned row order. Read the actual value (e.g. the grid's .account-grand-total-row) instead of hard-coding.
  • Helpers that navigate the old multi-step wizard (click('button:has-text("Transaction Actions")')) hang once the modal is single-page. Drop the navigation; the action tabs are present immediately.

Flat decode leaks stray form fields into the saved entity (the method 500)

Dropping the wizard's step-params[...] field-name prefix and decoding posted params straight into the form schema means the decode now captures every posted field, not just the namespaced ones. A single stray field breaks the save:

  • The tab switcher is (com/button-group {:name "method"} …), which emits <input type="hidden" name="method">. Under the wizard, method lived outside step-params[...] so it never entered the decoded map. After the rename it decodes to :method "" (malli :map is open and passes unknown keys), rides into snapshottx-data, and :upsert-transaction rejects it → HTTP 500 on save.
  • Symptom: the save POST fires (confirm with a println in the submit handler) but the modal never closes, because the 500 trips htmx:response-error. The server error may go to mulog, not stdout — an empty stdout log does not mean "no error." Reproduce the exact POST with curl (add/remove one field) to isolate the offender fast.

Fix: strip the decoded map to the schema's known top-level keys before threading on (select-keys decoded edit-form-keys); keep that allowlist next to the schema. Nested account sub-maps decode fine — only the top level needs the guard.

REPL reload does not refresh a running jetty's routes — restart the JVM

handler/match->handler-lookup is a top-level def capturing (merge ssr/key->handler …) at load, through a chain of module-level defs (editssr.transactionssr.corehandler). Reloading the leaf edit.clj updates it but not the captured merges, and a jetty started (run-jetty app …) holds a static app that doesn't re-deref the lookup per request. Net: after a handler/route/record change, an already-running dev server keeps serving the old code — curl shows the pre-change response (e.g. the old wizard transitioner) while your REPL renders the new one. Restart in a fresh JVM for route/record/middleware changes. For e2e, the Playwright test server (lein run -m auto-ap.test-server) is a fresh JVM compiling from disk — but kill any stale :3333 first (reuseExistingServer reuses it), and kill by port (ss -tlnp | grep :3333), never pkill -f test-server (it matches its own command line).

Full-suite e2e flakes are shared-seed interference

The test server seeds once at boot; edit tests save (mutate) those seed transactions. Run in parallel, workers race the same rows and earlier saves pollute later reads → phantom failures that pass in isolation.

Proper fix (landed on staging, adopted at the rebase): a /test-reset endpoint (test_server.cljreset-test-data! recreates + re-seeds the in-memory db) called from a test.beforeEach in each spec, plus fullyParallel: false + workers: 1 in playwright.config.ts. Every test starts from the same deterministic dataset regardless of run order. This supersedes the earlier --workers=1-only workaround (which kept order dependence; it merely serialized the races instead of eliminating cross-test state). Post-adoption baseline is 39 pass / 0 fail — the previously-flaky transaction-navigation.spec.ts date-range test is now green, because /test-reset removes the residual mutation it was tripping over.

A value-bound typeahead hidden goes stale across a whole-form swap unless keyed

A typeahead (sc/typeahead) posts its value through a hidden <input :value="value.value"> whose DOM .value is set by Alpine, not by the server-rendered static value attr. After a whole-form outerHTML swap that re-renders the typeahead, Alpine may preserve the previous component's empty .value instead of binding the new server value — so the field posts blank on the next submit. Fix: pass :id to sc/typeahead (the account typeahead already does). :id makes the wrapper emit :key (str id "--" value), and the value-keyed :key forces a clean Alpine re-init that lands the server value. The bulk-code vendor typeahead hit this (account rows didn't, because they pass :id) — symptom: "vendor not preserved on a validation re-render." Note the testing trap: reading the hidden's .value in isolation (inputValue() / toHaveValue) is an unreliable probe — it lags Alpine. Assert what the form actually posts instead: new FormData(form).get('vendor') (wrap in expect.poll).

Round-trip a multi-row selection as ids[], not as an EDN/filter snapshot

A bulk modal acts on a selection of N entities (bulk-code: the checked transactions), the analog of a single modal's one db/id. The wizard stashed the whole search-params blob (filters

  • selected + all-selected) in the EDN snapshot and re-ran the filter query on every post. Don't carry that forward. Instead resolve the selection to a concrete id vector once at open (selected->ids → the not-locked set) and ride it back in hidden ids[0..n] fields; re-read it on each post ([:vector {:coerce? true} entity-id] + the coerce-vector transformer turns the {"0" "123"} index-map into [123]). No snapshot, no filter round-trip, and it's more correct — you code exactly the rows the user saw, immune to data changing between open and submit. This is heuristic 2 → 0 for a multi-select modal.

No parity gate? Build one first — seed + characterization spec, before touching code

A modal with no e2e coverage (and no test-server seed for its domain) cannot be migrated safely — "behavior parity is proven by tests, not by reading" is the skill's #1 non-negotiable. Phase 4 (POS Sales Summary) had zero coverage. The fix: (1) seed a representative entity in test_server.clj's seed-test-data and surface its id via /test-info; (2) write a characterization spec against the unmodified modal and confirm it green; (3) commit the gate separately, ahead of the rewrite. Reach the modal the real way (grid → row's edit button), not a direct fragment URL. To discover the actual rendered structure (field names, ids, swap targets) — especially when the code has dead/buggy render fns — dump the live modal HTML with a throwaway spec first; assert against what renders, not what the code looks like.

Characterize before you fix; never assert a bug as working

Writing the gate often surfaces pre-existing bugs (Phase 4: a "New Summary Item" button that threw newRowIndex is not defined, and a totals display whose malformed Hiccup discarded its own labels). Do not assert the broken behavior as if it works, and do not silently "fix" it mid-refactor — surface it and let the user decide fix-vs-preserve. If they choose fix: the spec first documents the break (a passing test of the current inert behavior or an explicit note), then is rewritten to assert the fixed behavior as part of the migration commit.

htmx keyup-triggered inputs need real keystrokes in tests

A money/text input wired hx-trigger="keyup changed delay:300ms" does not fire on Playwright .fill() + dispatchEvent('change')fill sets the value without keyup events. Use .click() then .pressSequentially('500') (types char-by-char, firing keyup) so the targeted swap actually triggers. (A change-triggered control is the opposite — dispatchEvent('change') is fine there.)

clojure-mcp structural edits reformat the whole file — use text Edit in big shared files

clojure_edit / clojure_edit_replace_sexp re-emit the entire file through the formatter. In a small single-modal file that's fine (cljfmt-clean output). In a large multi-modal file (Phase 5: invoices.clj, 1812 lines) a one-line require addition produced a 650-line spurious whitespace diff that buries the real change and makes review impossible. For a surgical migration inside a big shared file, use the text-based Edit tool (exact-string match — no reformat); this is the AGENTS.md "edit Clojure with file tools only when absolutely necessary" carve-out. Verify with load-file (compile) + lein cljfmt check, not by eyeballing. Confirm the diff is contained with git diff -U0 <file> | grep '^@@' — the hunks should cluster only where you edited (requires + the modal region), nothing else.

Wiring a modal onto the wizard2 engine — use the engine's primitives, don't re-roll them

Phase 6's first migration (Transaction Rule) hit three traps; an adversarial review pointed out the engine had the information to prevent all three, so the engine now absorbs them. A consumer is just a config map + the step :render fns — reach for these instead of re-implementing them (and re-hitting the bug):

  • Nav fields are stripped for you. handle-step-submit dissocs its own wizard-id/current-step/direction from :form-params before calling a step's :decode (wizard2.clj), so your decode sees only real fields and they can't ride into the saved entity. (The old failure was a 500 on save:db.error/not-an-entity :current-step — because an open :map decode kept them. No allowlist needed anymore.)
  • wizard2/open-wizard owns the modal wrap. Give the config an :open-response fn (e.g. (fn [form] (modal-response [:div#transitioner.flex-1 form]))); then the new/edit routes are literally (partial wizard2/open-wizard config). Don't hand-roll create!/render/wrap/thread — that boilerplate was duplicating engine internals.
  • Add rows with wizard2/blank-row. It supplies a temp :db/id (so a row schema requiring [:db/id [:or entity-id temp-id]] validates and the step actually advances — the old symptom was "the Next/Test button does nothing") plus :new? for the appear animation: (wizard2/blank-row :foo/location "Shared").
  • Footer with wizard2/nav-footer. It emits the direction submit buttons (Back / primary advance / Save), marks the advance/save button data-primary, and the form's Enter guard (wizard2/wizard-form) triggers data-primary — so Enter and Back/Save aren't left to per-consumer convention. (Testing note that survives: Back and Save are both type=submit, so target a save button by its text, not button[type=submit].)

Scorecard exceptions (ratchet violations with a reason)

Heuristic 4 (LOC net ↓) — exception (Phase 3, Transaction Bulk Code: 420→506). When the modal's wizard was a thin shell that delegated almost everything to mm/* defaults (default-render-step, default-render-wizard, submit-handler, open-wizard-handler), ripping the wizard out moves that previously-shared plumbing into the file as explicit render/decode/submit/handler code, so the single-file LOC rises even though total system complexity drops. This is the opposite of a fat wizard (edit went 1608→1548). The trade is intended and every other heuristic improved sharply (mm coupling 19→0, snapshot merges 4→0, wizard records 3→0, routes 4→3, find *→explicit-id swap). Watch for it on the small "single-step wearing a wizard costume" modals — LOC is the wrong headline metric there; the mm-coupling / snapshot / route counts are.

Heuristic 9 (Hiccup in render path) — partial exception (Phase 2-final). The post-save com/success-modal confirmation dialogs in save-handler keep ~6 [:p …] Hiccup lines. They are terminal responses (shown after the form closes), reuse a shared dialog component, and sit outside the form's interactive render path. Migrating them means porting the shared success-modal to Selmer — a Phase 11 cross-cutting task, not a single-modal one.

The session-backed engine stores per-step data + context in the Ring session, and this app's session store is a cookie-store (ring.middleware.session.cookie) that serializes with pr-str and reads back with plain clojure.edn/read-stringno custom tag readers. So anything you put in a wizard's :context or that a step :decode returns (which put-step persists) must round-trip through bare EDN. A clj-time DateTime does not: it pr-strs as #clj-time/date-time "…" and the read side 500s with "No reader function for tag clj-time/date-time" on the next request that reads the cookie.

This first bit Invoice Pay (Phase 7), whose context defaulted :handwritten-date (time/now). Rules of thumb:

  • Context: store only EDN-safe primitives (numbers, strings, keywords, vectors, maps, #inst/java.util.Date). Compute clj-time defaults in the render fn, not in context.
  • Step data: a clj-time value decoded by a step is fine in memory on the terminal (:done) path — get-all reads it before forget clears the wizard, so it never reaches the cookie. It only bites if a clj-time value survives in a step that gets re-persisted (a non-terminal put-step). When in doubt, decode dates to #inst or keep them as strings until the done-fn.
  • The old mm wizard dodged this because it read its EDN snapshot with clojure.edn/read-string {:readers clj-time.coerce/data-readers} (see multi_modal.clj) — the cookie store has no such readers. (A durable/typed session backend would remove this constraint; until then, EDN-safe is the rule. See form-vs-wizard.md open question.)

A bare [:map …] query-schema 500s on empty query-params (the {}→nil trap)

auto-ap.ssr.utils/main-transformer includes parse-empty-as-nil, whose :map decoder turns any map with no truthy values into nil ((if (seq (filter identity (vals m))) m nil)). So (mc/coerce [:map [:k {:optional true} …]] {} main-transformer) decodes {}nil, then validates nil against [:map …]:malli.core/invalid-type500.

Ring's wrap-params sets :query-params to {} (not nil) for a request with no query string. So any handler wrapped with wrap-schema-enforce :query-schema [:map …] 500s on a PUT/POST that carries no ?query(and query-schema query-params) is truthy for {}, so the coercion runs and blows up. This is exactly why the pre-migration New Invoice basic-details "Save" was broken: its button hx-puts /invoice/new/navigate (no ?to), and mm/next-handler's [:to {:optional true} …] query-schema 500d every time (the CustomNext/308-to-submit logic never even ran).

  • A [:maybe [:map …]] query-schema survives (nil is valid) — that's why the grid query-schema, hit by the same empty POST, doesn't throw.
  • The engine sidesteps this entirely: handle-step-submit is a POST with no query-schema, so empty query-params never reach a [:map] coercion. Migrating a wizard off the mm navigate route removes the bug; you don't need to fix the old route.

Keep wizard dates as #inst, not clj-time, in step-data

Reinforcing the EDN-safety rule above: a new+edit wizard that stores dates across a non-terminal step (New Invoice: basic-details holds :invoice/date while you visit accounts) must keep them EDN-safe. Decode them to java.util.Date (coerce/to-date) before they land in step-data, and coerce back to clj-time only for display (coerce/from-dateatime/unparse-local). A helper that maps over the date keys (->edn-safe-dates) right after mc/decode is the clean seam — both the step :decode and the edit :init-fn run the posted/persisted map through it. Datomic's upsert wants java.util.Date anyway, so the done-fn needs no extra conversion.

The {}→nil trap has a THIRD face: empty-step decode → validation "invalid type"

Beyond query-params (Phase 8) and route-params (Phase 9's /navigat), the same parse-empty-as-nil :map decoder bites a wizard step whose fields are all blank: an all-empty step posts only blank inputs → the decoded all-nil map collapses to nil. If that nil then flows into a :validate that does (mc/validate step-schema data), validation fails with [invalid type] (nil isn't a map) and the step can never advance — even though every field is optional. The legal/address steps (all-optional) hit this.

Fix at the seam: have the step :decode coerce nil back to {}:

(defn- decode-with [schema request]
  (or (mc/decode schema (... nested form-params ...) main-transformer) {}))

Now an optional-only step validates {} (passes, advances) while a required-field step (e.g. account needs :vendor/default-account) still fails on the missing key, not on a spurious nil. Don't "fix" it by skipping validation when data is nil — that lets a genuinely empty required step through.

A new (db/id-less) nested entity with all-nil fields → datomic "tempid used only as value"

The empty Address step decodes to {:vendor/address {:address/street1 nil, …}} — a map of nils with no :db/id. :upsert-entity mints a tempid for that nested map but, since every attribute is nil, the address entity has nothing transacted, so the tempid is referenced as a ref value but never defined → :db.error/tempid-not-an-entity … used only as value. Drop such blank nested maps before the upsert:

(defn- blank-address? [a] (and (map? a) (not (:db/id a)) (every? nil? (vals a))))

This is the nested-entity analogue of "don't create empty rows"; the engine's blank-row gives added rows a tempid, but a never-touched optional nested entity must be elided.