
The agentic COGS stack
As head of AI R&D, I spend a lot of time with architects and CTOs, and the conversation almost always lands on a COGS breakdown that mirrors the agent’s architecture:
- Model inference: Tokens across planner/executor/verifier calls, usually the largest contributor to COGS of agentic software
- Tools and side effects: Paid APIs (e.g., web search), per-record automation fees, retries and idempotent write safeguards.
- Orchestration runtime: Workers, queues, state storage and sandboxed execution for code and documents.
- Memory and retrieval: Embeddings, vector storage, index refresh and context-building or summarization checkpoints.
- Governance and observability: Tracing, evaluation suites, safety filters and audit retention.
- Humans in the loop: Review time, escalations and support load created by agent mistakes.
How does FinOps help standardize unit economics when outcomes span actions, workflows and tasks?
Gartner has cautioned that cost pressure can derail agentic programs, which makes unit economics a delivery requirement.
When it comes to most SaaS products, customers don’t buy raw tokens; instead, they buy progress toward completing their work, e.g., cases resolved, pipelines updated, reports produced or exceptions handled. Unit economics becomes actionable when we measure at the boundary where that value is delivered, and that boundary expands as your agentic SaaS matures: from answers in the UI, to a single approved operation, to a multi-step process and eventually to a recurring responsibility the agent runs end-to-end. In the following table, we lay out this structure and the corresponding unit metric and outcome to meter at each level of scope.

