Skip to content

Momentic

Momentic is AI-native end-to-end testing: you “describe test behavior in natural language,” and “an AI agent turns your prompts into reliable steps, runs them against your app, and auto-heals brittle locators” (Docs). Tests live in your repo as YAML and run against web, iOS, and Android — pitched as the “modern alternative to Selenium, Cypress, and Playwright” (YC). The interesting part isn’t the natural-language front door — it’s the intent-based step cache underneath that lets Momentic call an LLM on ~1 step in 20 and replay the other 19 deterministically (95%+ hit, ~300ms vs >5s uncached). The product is a cache wrapped in an agent (intent blog).

Vitals: founded 2023 · YC W24 · $15M Series A (Standard Capital) + $3.7M seed · ~12 people · SF (on-site).

Business context — founders, funding, customers
  • Founders Wei-Wei Wu (CEO — ex-Assembled, founding engineer at Nashi → acq. Density 2021, staff engineer at Density) and Jeff An (ex-Splunk/Google; led testing at Robinhood and enterprise quality at Retool; U. Waterloo) (YC) — “two engineers who dreaded testing so much we founded a company to do it for us.”
  • Series A: $15M led by Standard Capital, with Dropbox Ventures and existing investors (Y Combinator, FCVC, Transpose Platform, Karman Ventures), on top of a $3.7M seed in March 2025 (TechCrunch).
  • 2,600 users across “1000+ engineer organizations” — Notion, Xero, Bilt, Webflow, Retool, Quora, plus Pocus, Nuvo, Mutiny, CoverGo, Coframe, GPTZero (TechCrunch, intent blog, home).
  • Wu estimates Momentic “automated more than 200 million test steps” in the last month (TechCrunch).
  • A locator is a compiled multi-signal matcher. A step’s NL description resolves once into stored signals — on-screen position, appearance, text, accessibility + structural attributes — plus validity conditions; replay matches those against the live page with no LLM call, so inference runs on ~1 step in 20 (intent blog, step cache).
  • Invalidation keys on intent, not DOM identity. The cache busts when the element no longer satisfies the attributes / related-elements the user named — not when the DOM node changes — so randomized classnames and restructures don’t bust it, but a renamed semantic does (intent blog).
  • The cache is an OLTP lookup on ClickHouse. Keyed by test / step / version / branch / commit and served via a sparse primary index + materialized view at ~250ms over ~20B entry-touches/day — an OLAP engine repurposed for high-write key-value reads after Postgres hit lock contention (ClickHouse blog).

A TypeScript-first CLI, tests as YAML in git, and a ClickHouse cache plane. Every row is named in a first-party doc, repo, or engineering post.

LayerChoiceEvidence
LanguagesTypeScript (primary), PythonGitHub org top languages
Distributionnpm CLInpx momentic, CLI-first; cloud authoring deprecatedDocs, config
Test formatYAML in the repo (*.test.yaml, *.module.yaml)How it works, config
EditorCodeMirror + TypeScript (low-code local editor)codemirror-ts fork
Cache storeClickHouse (ReplacingMergeTree, sparse PK, materialized view) — migrated off Postgres + RedisClickHouse blog
Browser automationChromium driver (Playwright-class), local or managed runnerDocs, Playwright cmp
MobileiOS simulators · Android emulators, remote-hosted (regioned)Docs, config
LLM layermanaged, multi-provider with cross-provider failover (models unnamed)Playwright cmp, AI config
Coding-agent integrationClaude Agent SDK skill (npx skills add momentic-ai/skills) + MCPGitHub skills, Docs
CI targetsGitHub Actions · CircleCI (orb) · BitriseDocs, orb repo
Executionmanaged, multi-region runnerPlaywright cmp

The in-product agents run on “latest 2025 models” but Momentic never names the provider — the model layer is “managed; cross-provider failover handled by the platform” (Playwright cmp, AI config). The one verified Anthropic touchpoint is the open-source skills repo, “Claude Agent SDK with a E2E testing tool” (GitHub).

The parts an engineer at this company loses sleep over. Public signal is cited (verified); likely approach is labeled speculation — best-practice fill-in, hedged.

ProblemWhy it’s hardPublic signalLikely approach (speculative)
Flaky tests / cache correctnessNL intent is ambiguous; a cache too strict busts on cosmetic change, too loose grabs the wrong element; branches and CLI versions pollute a shared cacheFour documented failure modes; “1M potential flakes across 200M resolutions” (Feb 2026); 95%+ hit rate (intent blog)Intent conditions (attributes + related elements) from the locator agent; per-branch/version isolation with merge-base seeding — already shipped, now tuning SVG/icon and relativity checks
Inference cost + latencyAn LLM per step is ~5s and expensive across 2M+ resolves/day”300ms cached vs over 5s uncached”; LLM fires only on cache miss (intent blog, how it works)Aggressive caching as the default path; small specialized agents per task; cap agentic plan depth — only the heal path pays for inference
Cache storage at scale~20B entry-touches/day, high concurrent read+write, query cost must not grow with dataPostgres lock contention at ~1B entries → ClickHouse; ~250ms avg (ClickHouse blog)ClickHouse ReplacingMergeTree + sparse PK + materialized-view of commit timestamps; insert-only TTL; async dedupe
Testing non-deterministic appsGen-AI products don’t return the same output twice, so string-match assertions failPoe/Quora case: validate “AI chatbot responses, even when they weren’t deterministic” (home); assert/assertVisually are agent-scored (Playwright cmp)Assertion + visual-assertion agents reason over intent (“chart is visible and not cut off”) rather than literal text; never-cache AI-evaluated steps

The infrastructure Momentic doesn’t name publicly, inferred from the stack it does:

ComponentLikely choiceBasis
LLM providersOpenAI + Anthropic + Google, routed”cross-provider failover” (Playwright cmp); Anthropic confirmed for the skill (GitHub skills); failover implies ≥2 frontier vendors
App-graph embeddingsa hosted embedding API (OpenAI/Cohere-class) over minhashed DOM summariesstates are “embedded” and clustered (app graph); no in-house model signal on a ~12-person team
Mobile runner hostinga managed device cloud or self-run emulators on a cloudemulators are “remote-hosted” and regioned (config); provider not named
Run-artifact storeS3-class object storage for videos/tracesdashboard serves “run videos, traces, network” (Playwright cmp); object storage is the default for this
Control-plane DBPostgres (retained for app/org/auth data after the cache moved to ClickHouse)they “eliminate[d] the Redis layer” but only moved cache off Postgres (ClickHouse blog); relational data likely stays
Hostinga major cloud (AWS or GCP) with managed ClickHousemulti-region runner + ClickHouse at this scale (Playwright cmp, ClickHouse blog); managed ClickHouse Cloud is the low-ops path for ~12 people
Authenterprise SSO (SAML/OIDC), API keys”custom SSO” offered (YC); MOMENTIC_API_KEY for CLI auth (config)

A step’s life is prompt → context → action → verify → cache → replay → heal. The agent “reads the page (DOM, accessibility tree, screenshot),” picks an element, acts, waits for “the network and DOM to settle,” then writes the resolved locator to cache. “On the next run, Momentic replays from cache, no LLM call, until something changes” — and only when “the cached locator misses, auto-heal uses the AI agent to find the element again and updates the cache” (How it works). This is the inversion that controls both cost and latency: the LLM is invoked “only when it’s actually needed.”

Momentic step lifecycle: a natural-language step enters resolution where Momentic checks whether the step cache hits — if the stored signals still match the live page (~95% of the time) it replays from cache in ~300ms with no LLM call; on a miss the locator agent re-resolves the description against DOM, accessibility tree, and screenshot in ~5s using one LLM completion; either way the action is issued, a stability check waits for network and DOM to settle, and the resolved locator plus intent conditions are written back to the step cache for the next run.

Mermaid source
flowchart LR
classDef io fill:#fdf4e8,stroke:#d97706,stroke-width:1.5px,color:#0f172a;
classDef agent fill:#eafbf1,stroke:#16a34a,stroke-width:1.5px,color:#0f172a;
classDef cache fill:#eef0fe,stroke:#6366f1,stroke-width:1.5px,color:#0f172a;
classDef data fill:#e8f1fd,stroke:#2563eb,stroke-width:1.5px,color:#0f172a;
Step(["NL step<br/>'Click the Sign in button'"]):::io
subgraph Resolve["Resolve a step"]
direction TB
Hit{"Step cache hit?<br/>signals match live page?"}:::cache
Replay("Replay from cache<br/>~300ms · no LLM call"):::cache
Heal("Auto-heal: locator agent<br/>re-resolves NL vs DOM + a11y + screenshot<br/>~5s · 1 LLM completion"):::agent
end
subgraph Act["Act + verify"]
direction TB
Do("Issue action<br/>click · type · scroll · check"):::agent
Settle("Stability check<br/>wait for network + DOM to settle"):::data
end
Save[("Write resolved locator<br/>+ intent conditions to step cache")]:::cache
Done(["Step done"]):::io
Step --> Hit
Hit -->|hit ~95%| Replay --> Do
Hit -->|miss| Heal --> Do
Do --> Settle --> Save --> Done
Save -. "next run" .-> Hit

A cached step “stores more than one way to find its target: where the element sits on screen, what it looks like, what text it contains, and the accessibility and structural attributes around it” — a multi-modal locator. Which signals matter “is inferred from the step’s natural-language description”: “the red Cancel button below the Order Summary header” leans visual+positional; “the Sign in button” leans accessibility+text (step cache, Playwright cmp). Step-based tests are “deterministic and fast”; the act primitive runs agentic flows where “you give Momentic a goal, and an AI agent figures out the steps on the fly” — and the V3 act agent is “planner-style … drafts the full flow up front, caches the resolved steps … and self-heals” (agentic).

The cache plane: an OLAP database doing OLTP work

Section titled “The cache plane: an OLAP database doing OLTP work”

The hard engineering is in the cache store. Adding signals to the key took Momentic from “around 80k active cache entries to now approximately 1B”, and the original “single table in Postgres … started to show cracks”: “lock contention from queries trying to read and write to the cache concurrently” (ClickHouse blog). They moved the store to ClickHouse, exploiting its sparse primary index: the cache is keyed by “test ID, step ID, Momentic version, git branch, and commit timestamp,” so a known-key lookup “narrow[s] down the search space to just a few granules” instead of a B-tree scan that grows with data.

Momentic cache plane: a CLI run issues a resolve query against a composite cache key of test ID, step ID, CLI version, git branch and commit timestamp; the locator agent also emits intent conditions — required attributes (text, color, role, arbitrary HTML) and related elements — that are stored alongside each entry; the store is ClickHouse using a ReplacingMergeTree with a sparse primary index and insert-only TTL extension, plus a materialized view of available commit timestamps per test to bound main-branch scans; it serves ~250ms average lookups at a 95%+ hit rate, having replaced an earlier single Postgres table plus Redis that hit lock contention at ~1B entries, migrated via double-write then double-read consistency check then cutover.

Mermaid source
flowchart LR
classDef io fill:#fdf4e8,stroke:#d97706,stroke-width:1.5px,color:#0f172a;
classDef agent fill:#eafbf1,stroke:#16a34a,stroke-width:1.5px,color:#0f172a;
classDef cache fill:#eef0fe,stroke:#6366f1,stroke-width:1.5px,color:#0f172a;
classDef data fill:#e8f1fd,stroke:#2563eb,stroke-width:1.5px,color:#0f172a;
classDef old fill:#fdecec,stroke:#e0564f,stroke-width:1.5px,color:#0f172a;
CLI(["CLI run · local or CI"]):::io
subgraph Key["Cache key (composite)"]
direction TB
K("test ID · step ID<br/>CLI version · git branch<br/>commit timestamp"):::data
end
subgraph Intent["Intent conditions (locator agent emits)"]
direction TB
Attr("Attributes<br/>text · color · role · arbitrary HTML"):::agent
Rel("Related elements<br/>'login above sign-up'"):::agent
end
subgraph CH["ClickHouse · cache plane"]
direction TB
RMT[("ReplacingMergeTree<br/>sparse primary index · insert-only TTL")]:::cache
MV[("Materialized view<br/>available commit timestamps per test")]:::cache
RMT --- MV
end
Old["was: single Postgres table + Redis<br/>lock contention at ~1B entries"]:::old
CLI -->|"resolve query"| Key
Key --> CH
Intent --> RMT
CH -->|"~250ms avg · 95%+ hit"| CLI
Old -. "migrated: double-write -> double-read check -> cutover" .-> CH

Two ClickHouse-native moves carry the design. Main-branch scans still read “500k+ rows,” so they added “a materialized view to precompute all of the available commit timestamps for a given test ID,” narrowing back to “one or two parts.” And because “2/3 queries are updates, which aren’t very performant” in ClickHouse, they went insert-only: SELECT, re-INSERT used caches to extend TTL, INSERT new caches, “and let ClickHouse take care of deduplicating entries asynchronously” via ReplacingMergeTree“such an improvement that we were able to fully eliminate the Redis layer.” The cutover was a careful double-write → double-read consistency check → gradual cutover (ClickHouse blog). Result: “over two million cache queries per day, processing almost 20 billion cache entries every day while maintaining ~250ms resolution latency on average.”

The reliability claim hinges on caching user intent rather than a DOM snapshot. The earlier “does this look like the element we saw before?” check failed four ways at scale: cross-branch pollution, cross-version pollution, false misses (randomized classnames bust the cache), and false hits (nth-child selectors grab the wrong row when order changes) (intent blog). The fix: the locator agent now “classif[ies] which attributes it used in its reasoning” and emits two condition types — attributes (“text, color, or any arbitrary HTML attribute”) and related elements (“the login button above the sign up button”). The question became “does this element still match what the user meant?” — so “the blue button” strictly enforces blue. Branch/version isolation was solved by git-aware cache seeding: new branches “seed from the cache at their merge base,” and merges fold the branch cache back into main (step cache, intent blog).

Two healing tiers: in-run auto-heal re-resolves locators and waits for stability, persisting fixes only as cache entries when the run is eligible to save cache (auto-heal). The post-run triage agent (momentic ai triage / heal) “permanently rewrites the failing tests, and opens a pull request (or emits a patch)” — respecting the repo’s PULL_REQUEST_TEMPLATE.md. A separate app graph models coverage from run traces: each UI state is “fingerprinted (canonical URL plus a normalized, minhashed view of the DOM),” a semantic summary is “embedded,” and states cluster into “product areas, features, journeys, variants” to show which flows are Covered / Partial / Missing (app graph).

Two founders, ~12 people at the Series A; San Francisco, on-site (YC).

RolePersonSource
Co-founder / CEOWei-Wei Wu (ex-Assembled; founding eng at Nashi → acq. Density 2021; staff eng at Density)YC
Co-founderJeff An (ex-Splunk, Google; led testing at Robinhood, enterprise quality at Retool)YC
EngineeringHenry Haefliger (author of the caching engineering posts)ClickHouse blog, intent blog

The founder DNA is testing and reliability at scale — Jeff An “led testing at Robinhood and enterprise quality at Retool”; Wu led “product reliability” at Density (YC). The product philosophy is tests-as-code, engineer-owned: Momentic is “CLI-first … authoring and running tests in the cloud is deprecated” (Docs), tests are YAML in the repo, and the company markets “a migration … from outsourced QA to engineering-owned tests” (blog). Cache eligibility is git-aware (CI always saves; local saves only off main/protected branches), and healing is wired into the SCM workflow — a successful heal can open a PR, draft PR, direct commit, patch, or leave changes on disk (step cache, auto-heal). The stated creed: “truth-driven development … you cannot verify what you cannot reason,” keeping behavioral tests green “at Cursor speed” (blog). Open roles are GTM (founding AE/SDR) plus a “Founding Engineer (Frontend)” — a sales-led growth phase on a still-tiny eng team (Ashby, YC).

Reconstructed from public sources only — no insider information. Crawled 2026-06-08 via Chrome MCP (logged-out browsing) + the public docs, engineering blog, GitHub org, Ashby board, and YC profile. Claim tiers: verified (stated on a public page, linked) · inferred (reasoned from a cited signal, confidence flagged) · speculative (best-practice fill-in, labeled). Links are live; pages change, so the supporting quote for each claim is kept in this repo’s evidence map (evidence/momentic-evidence-map.md).

#SourceLink
S1Homepagehttps://momentic.ai/
S2Docs — Welcomehttps://momentic.ai/docs
S3Docs — How Momentic workshttps://momentic.ai/docs/get-started/how-momentic-works
S4Docs — Step cachinghttps://momentic.ai/docs/reliability/step-cache
S5Docs — Auto-healinghttps://momentic.ai/docs/reliability/auto-heal
S6Docs — Agentic testinghttps://momentic.ai/docs/core-concepts/agentic-testing
S7Docs — Finding elementshttps://momentic.ai/docs/core-concepts/finding-elements
S8Docs — App graphhttps://momentic.ai/docs/ai/app-graph
S9Docs — Memoryhttps://momentic.ai/docs/ai/memory
S10Docs — momentic.config.yamlhttps://momentic.ai/docs/configuration/momentic-config
S11Docs — AI configurationhttps://momentic.ai/docs/configuration/ai
S12Docs — vs Playwrighthttps://momentic.ai/docs/comparisons/playwright
S13Blog — Postgres → ClickHousehttps://momentic.ai/blog/postgres-to-clickhouse-migration
S14Blog — Intent-based cachinghttps://momentic.ai/blog/teaching-browser-agents-user-intent
S15Blog indexhttps://momentic.ai/blog
S16GitHub org (momentic-ai)https://github.com/momentic-ai
S17GitHub — skills (Claude Agent SDK)https://github.com/momentic-ai/skills
S18Ashby job boardhttps://jobs.ashbyhq.com/momentic
S19Y Combinator profilehttps://www.ycombinator.com/companies/momentic
S20TechCrunch — $15M Series Ahttps://techcrunch.com/2025/11/24/momentic-raises-15m-to-automate-software-testing/