Confido

Confido is “the AI infrastructure powering CPG brands from deduction to production plan” — one platform unifying “cash application, deductions, disputes, trade promotion management, forecasting, demand planning, and analytics” as “the single source of truth” for a consumer-packaged-goods brand’s finance, accounting, sales, and ops teams (Ashby, home). The engineering problem isn’t a chat box: it’s LLM extraction over genuinely messy retailer/distributor paperwork, agentic retrieval across legacy systems with no clean API, and a correctness bar set by the fact that every number is money owed.

Vitals: founded 2020 · YC S21 · $15M Series A (Footwork) + seed (Watchfire) · ~28 people · NYC, on-site (YC, Series A note).

Business context — founders, funding, customers, moat

Founders: Justin Hunter (CEO — ex-Capital One corporate strategy; Harvard) and Kara Holinski (technical founder — engineering at MIT; ex-APM Schmidt Futures) (YC). Head of AI Matan Friedmann joined May 2026 (ex Co-Founder/CTO Clearly Labs; ex Nexar; ex Q.ai, acq. Apple; AutoPrompt, 3K★) (AI hire).
Funding: $15M Series A led by Footwork, plus a “previously unannounced seed round led by Watchfire Ventures” (Series A note); the combined raise was reported in trade press as ~$20M. Board member Mike Smith (public-retailer + brand board experience).
Traction: “trusted by 200+ brands managing $20B+ in revenue, including OLIPOP, Simple Mills, Dr. Squatch, Tropicana” (Ashby) — also DUDE Wipes, Serenity Kids, Cappello’s, Rebel Creamery (Series A note, home).
Founding insight: finance teams managing “high-growth environments, without often increasing headcount,” where “deductions, trade, and planning were all mission critical, but horribly disconnected and manual” (Series A note).
Moat (inferred): the per-customer data — learned SOPs, validated extractions, mapped legacy connectors — compounds with tenure and doesn’t transfer to a rival; 50+ hard-won retailer/distributor/ERP/accounting integrations raise switching cost.
Hiring: 8 engineering + 4 product roles open at the time of writing, all NYC on-site (Ashby).

The heavy lifting

The AI surface is structured extraction, not chat. Per-retailer deduction backup, invoices, and reports parse into a fixed line-item schema via a “format-agnostic” pipeline — one extractor across sources, not a template per retailer (ML JD, AI hire).
Agents substitute for missing APIs. Retailer/distributor systems often expose no clean API; agents “retrieve data from legacy systems” to assemble the context — POS, contracts, promo plans — a deduction needs to be adjudicated (ML JD).
Human review is the validation gate. Low-confidence / high-dollar items route to a reviewer, with “automated and human-in-the-loop validation” and “full traceability” per record — what lets a finance team post AI-extracted figures to the ledger (AI hire).

Stack

The JDs describe what the systems do (AI document ingestion, financial data pipelines, analytics) but deliberately don’t name languages or frameworks. So this table is the AI + data stack that is public; the conventional infra (languages, DB, cloud) is unconfirmed and reconstructed in Likely internals.

Layer	Choice	Evidence
Document understanding	LLM-powered extraction of structured line items from messy invoices, deductions, retailer reports	ML JD, Staff SWE JD, AI hire
Agents	agentic workflows that “retrieve and reason over data across fragmented enterprise systems” / legacy systems	ML JD
Predictive ML	sales/financial forecasting, anomaly detection on retail data, promotion-optimization recommenders	ML JD
Foundation models	LLMs/NLP; fine-tuning “(Llama, GPT, etc.)” listed as a hire signal	ML JD
Model strategy	building “proprietary models and agentic architectures specifically tuned for … CPG”	AI hire
Validation	automated + human-in-the-loop validation; “comprehensive databases for full traceability”	AI hire
Ingestion	50+ connectors — retailers (Costco, Albertsons, Aldi, BJ’s, Ahold), distributors (C&S, Core-Mark, AWG), ERP + accounting	Integrations
External data	syndicated IRI / Circana, retailer POS, distributor & customer-inventory feeds	home
Workplace tooling	MacBooks; 401(k) via Vestwell; fully on-site NYC	SWE JD

Hard problems

The parts an engineer at this company loses sleep over. Public signal is cited (verified); likely approach is labeled speculation — best-practice fill-in, hedged.

Problem	Why it’s hard	Public signal	Likely approach (speculative)
Messy-document extraction	Every retailer/distributor formats invoices, deductions, and backup differently; layouts shift; scans are noisy	”extract structured data from complex financial documents”; “format-agnostic pipeline” for “messy” docs (ML JD, AI hire)	Multimodal LLM + OCR with per-source templates; confidence scoring; route low-confidence to humans whose corrections fine-tune the extractor
Fragmented / legacy integration	50+ sources, many behind retailer portals or EDI with no clean API; data is incomplete and inconsistent	50+ connectors (Integrations); agents that “retrieve data from legacy systems” (ML JD)	Connector framework + agentic browsing/scraping for portals; FDEs to onboard each brand’s source mix; normalize into the unified model
Financial correctness / trust	Outputs are money owed; a wrong deduction classification or dispute is a real loss and erodes trust	”automated and human-in-the-loop validation to ensure 100% reliability”; “full traceability” (AI hire)	HITL gates on low-confidence/high-dollar items; immutable audit trail per record; reconciliation against the ledger
Forecasting on sparse retail data	POS and syndicated data are laggy, partial, and noisy across hundreds of SKUs and retailers	forecasting from “statistical models and live sales data” (home); “anomaly detection,” “promotion optimization models” (ML JD)	Hierarchical statistical + ML forecasts blending IRI/Circana + POS; anomaly flags feed planners; promo-lift models for TPM ROI

Likely internals

The infrastructure Confido doesn’t name publicly, inferred from the stack it does:

Component	Likely choice	Basis
Backend / API	TypeScript/Node + Python services	”backend services and APIs” (SWE JD); ML-heavy product implies Python beside a TS web tier
Frontend	React / TypeScript	Senior Frontend + Design-Engineer roles (Ashby)
Cloud	AWS	default for a NYC YC B2B SaaS at this stage
Primary DB	Postgres	relational financial / ledger data
Document AI	multimodal LLM + OCR, per-source templates	”format-agnostic” extraction of “messy” docs (AI hire)
LLM providers	OpenAI (GPT) + open-weight Llama fine-tunes	fine-tuning “(Llama, GPT, etc.)” named (ML JD)
Retrieval	managed vector DB for agentic RAG	”retrieval systems,” “agentic workflows” (ML JD)
Retailer-portal access	EDI + agentic browsing where no API exists	”retrieve data from legacy systems” (ML JD); exact mechanism unconfirmed
Proprietary models	mostly fine-tuned / prompted today; bespoke is the stated direction	named as a goal, not a shipped fact (AI hire)
Auth	enterprise SSO (SAML/OIDC)	finance buyers at 200+ brands

Architecture

Fragmented sources → document AI → one source of truth

Confido’s spine is an ingestion-and-extraction pipeline that collapses incompatible inputs into a single financial data model. Connectors pull from “50+ critical data sources” (Integrations); an LLM layer reads the “messy” documents those sources emit — “invoices, deductions, and retailer reports” — and “extract[s] structured data from complex financial documents” (ML JD, Staff SWE JD). The Head of AI’s prior work names the pattern exactly: an “end-to-end, format-agnostic pipeline that transformed unstructured, real-world documents into clean, system-ready insights,” paired with “automated and human-in-the-loop validation to ensure 100% reliability” (AI hire). The cleaned data becomes the “single source of truth” every product surface reads from.

Mermaid source

flowchart LR
  classDef src fill:#eef2f8,stroke:#94a3b8,stroke-width:1.5px,color:#0f172a;
  classDef ai fill:#eafbf1,stroke:#16a34a,stroke-width:1.5px,color:#0f172a;
  classDef data fill:#e8f1fd,stroke:#2563eb,stroke-width:1.5px,color:#0f172a;
  classDef prod fill:#fdf4e8,stroke:#d97706,stroke-width:1.5px,color:#0f172a;

  subgraph Sources["Fragmented sources · 50+ connectors"]
    direction TB
    Ret("Retailers<br/>Costco · Albertsons · Aldi · BJ's · Ahold<br/>POS · deduction backup"):::src
    Dist("Distributors<br/>C&amp;S · Core-Mark · AWG"):::src
    ERP("ERP + accounting systems"):::src
    Synd("Syndicated data<br/>IRI / Circana"):::src
  end

  subgraph Ingest["Ingestion + AI extraction"]
    direction TB
    Conn("Connectors<br/>portals · EDI · files"):::data
    Doc("LLM document understanding<br/>messy invoices/deductions/reports<br/>-> line-item structured data"):::ai
    HITL("Automated + human-in-the-loop<br/>validation · full traceability"):::ai
    Conn --> Doc --> HITL
  end

  SoT[("Unified financial data model<br/>single source of truth<br/>finance · accounting · sales · ops")]:::data

  subgraph Products["Product surfaces"]
    direction TB
    P1("Cash Application"):::prod
    P2("Deduction Mgmt · Auto-Disputes"):::prod
    P3("Trade Promotion Mgmt"):::prod
    P4("Sales Forecasting · Demand Planning"):::prod
    P5("Sales Analytics"):::prod
  end

  Ret --> Conn
  Dist --> Conn
  ERP --> Conn
  Synd --> Conn
  HITL --> SoT
  SoT --> Products

The deduction loop: where document AI, agents, and humans meet

The flagship workflow shows why this is hard. A retailer pays an invoice short and attaches “deduction backup” — often a scanned or PDF’d justification in a format unique to that retailer. Confido extracts the line items, an agent “retrieve[s] data from legacy systems” to gather the matching context (contracts, promo plans, POS), and the system classifies whether the deduction is valid trade spend or an invalid chargeback to fight. Low-confidence or high-dollar cases route to a human; the rest flow to Auto-Disputes, then to cash application against the ledger.

Mermaid source

flowchart LR
  classDef io fill:#fdf4e8,stroke:#d97706,stroke-width:1.5px,color:#0f172a;
  classDef ai fill:#eafbf1,stroke:#16a34a,stroke-width:1.5px,color:#0f172a;
  classDef data fill:#e8f1fd,stroke:#2563eb,stroke-width:1.5px,color:#0f172a;
  classDef human fill:#eef0fe,stroke:#6366f1,stroke-width:1.5px,color:#0f172a;

  Pay(["Retailer pays short<br/>+ deduction backup docs"]):::io

  subgraph Agent["Agentic deduction workflow"]
    direction TB
    Extract("Extract line items<br/>from messy backup (LLM)"):::ai
    Retrieve("Agent retrieves context<br/>across fragmented/legacy systems<br/>(POS, contracts, promo plans)"):::ai
    Classify("Classify + match deduction<br/>valid trade spend vs invalid?"):::ai
  end

  Review{"Confidence<br/>high?"}:::data
  Human("Human-in-the-loop<br/>review / correction"):::human
  Dispute("Auto-dispute invalid deductions<br/>file claim + evidence"):::ai
  Ledger[("Cash application -> ledger<br/>single source of truth")]:::data

  Pay --> Extract --> Retrieve --> Classify --> Review
  Review -->|yes| Dispute
  Review -->|low / high $| Human --> Dispute
  Dispute --> Ledger
  Human -. "labels feed back" .-> Classify

On top of the data model sit the analytical products: Trade Promotion Management (“plan, track, and analyze trade promotions … with clear visibility into spend and ROI”), Sales Forecasting / Demand Planning (“statistical models and live sales data”), and Sales Analytics over syndicated + POS feeds (home). The ML team also builds “anomaly detection across retailer performance data” and “promotion optimization models” (ML JD).

Team & process

A technical-founder-led, ~28-person team (YC) hiring hard in NYC.

Role	Person	Source
Co-founder (CEO)	Justin Hunter — ex-Capital One strategy; Harvard	YC
Co-founder (technical)	Kara Holinski — engineering at MIT; ex-APM Schmidt Futures	YC
Head of AI	Matan Friedmann — ex Clearly Labs CTO; ex Nexar; ex Q.ai (acq. Apple)	AI hire

The build is design-partner-driven and forward-deployed: Confido shipped “nights and weekends … with our brand partners” and still spends “hundreds of hours with our brand partners every week” (Series A note), with a dedicated Forward Deployed Engineer embedding to wire up each customer’s retailer/distributor data (Ashby). The org runs intense and fully in-person — all NYC on-site, “Nightly Team Dinners for those staying past 6:30pm” (SWE JD); comp spans SWE $170–200K to Staff ML/AI $300–350K + bonus.

Sources

Reconstructed from public sources only — no insider information. Crawled 2026-06-09 via Chrome MCP (logged-out) + web. First-party (confidotech.com, the Confido Aisle blog, Confido’s Ashby JDs) prioritized; YC profile labeled third-party. Claim tiers: verified (stated on a public page, linked) · inferred (reasoned from a cited signal, confidence flagged) · speculative (best-practice fill-in, labeled). Links are live; pages change, so the supporting quote for each claim is kept in this repo’s evidence map (evidence/confido-evidence-map.md).

#	Source	Link
S1	Homepage	https://www.confidotech.com/
S2	About	https://www.confidotech.com/about
S3	Careers	https://www.confidotech.com/careers
S4	Integrations	https://www.confidotech.com/integrations
S5	Blog — Series A founders’ note	https://www.confidotech.com/blogs/a-note-from-our-founders-raising-our-series-a-to-build-the-future-of-cpg-finance
S6	Blog — Head of AI (Matan Friedmann)	https://www.confidotech.com/blogs/scaling-ai-in-cpg-matan-friedmann-joins-confido-as-head-of-ai
S7	Ashby job board	https://jobs.ashbyhq.com/confido
S8	Staff Software Engineer (JD)	https://jobs.ashbyhq.com/confido/b1d615bc-2040-4593-84ba-54039a5a8c75
S9	Staff ML / AI Engineer (JD)	https://jobs.ashbyhq.com/confido/c133c8b1-12a9-450d-8fa5-715ae123ee69
S10	Software Engineer (JD)	https://jobs.ashbyhq.com/confido/d5520ce5-bc5f-4947-8912-292615b0c5ac
S11	Y Combinator profile (third-party)	https://www.ycombinator.com/companies/confido