A SaaS founder messaged me last month with a question I had been waiting for somebody to ask. He had four hundred agents signed up on his platform inside seventy-two hours. Three of them had already racked up $2,800 in legitimate API spend. One had tried to chargeback a $5 transaction. None of them had been around long enough to have a reputation. He wanted to know what to do.
The honest answer is that the legacy stack does not have an answer. KYC was built for humans. Stripe Radar was built for credit cards. Sentry was built for engineers. Every existing trust primitive assumes the thing it is rating either has a face or a tax ID or a long enough log to fit a regression on. An agent that is forty minutes old has none of those things.
What it does have is receipts. Or it will, if the rail it transacts on bothers to sign them.
A short history of credit, because it is the same story
Before the late 1980s, credit decisions in the United States were local and narrative. A loan officer knew your family. A merchant let you run a tab. The first generalized credit-scoring model, FICO, did not exist as a household concept until 1989 when Fair Isaac and the three bureaus put it on consumer reports. Before that, lenders relied on a patchwork of in-house scorecards.
What FICO actually did was collapse an information-asymmetry problem into a single falsifiable summary. The lender could not observe the borrower's full payment history. The borrower had every incentive to present themselves as reliable. The score made one falsifiable number stand in for all of that, derived from signals the borrower could not easily fake: timely payments, total utilization, length of history, recent inquiries, credit mix.
The score was never about the borrower's character. It was about the lender's expected loss on the next transaction. Read every FICO patent and that framing is on the first page. Read the early Fair Isaac literature and the word "character" does not appear once. The number was a counter-party tool, not a self-help one.
That is the part that translates one-for-one to agents.
The agent has the same information-asymmetry problem, worse
When a SaaS platform, cloud provider, marketplace, or API vendor decides whether to trust a brand-new autonomous agent, what does it know? The agent is a few hours old. It has no employment history. It has no credit bureau. It has an API key that was issued ten minutes ago. The platform cannot tell whether this agent is going to settle $50 in legitimate calls, $50,000 in runaway spend, or $5,000 in fraudulent chargebacks before the next sunrise.
The current industry solution to this is friction. Credit card pre-authorizations. Spending caps. Velocity limits. Manual review on anything non-trivial. None of it scales. An agent economy with a million agents cannot be manually reviewed. The platforms running today's MCP marketplaces, the early agent-to-agent commerce protocols, the autonomous shopping flows — every one of them already feels this pressure.
What an agent economy needs is the same primitive lending got in the late 1980s. A way to compress an agent's full observable history into a single underwritable number. Three digits. A glance. Decision in milliseconds. Not because the number captures the agent's soul, but because the number tells the counter-party their expected loss on this transaction.
The Agent FICO Score, 300 to 850
The score MnemoPay computes uses the same 300-850 range as consumer FICO on purpose. Underwriters already know how to read it. A risk officer at a payment platform does not want to learn a new scale at the same time they are learning what an agent is. The cognitive load on the buyer side is the entire reason the range was kept.
The math underneath is different. Consumer FICO weighs payment history, amounts owed, length of history, new credit, and credit mix. The agent version weighs payment history, behavioral stability, memory integrity, identity stability, and anomaly signal. Same shape, different signals.
- Payment history (35%). Did the agent settle what it owed? Did it default on any transaction inside its declared budget? Did it dispute charges it legitimately incurred? This is the bedrock of the score for the same reason it is the bedrock of consumer FICO: every other signal is cheaper to fake.
- Behavioral stability (30%). Does the agent's spending profile drift predictably, or does it swing? An agent that spent a cent per call for six weeks and suddenly starts spending five dollars per call is either under new management or compromised. The counter-party wants to know either way.
- Memory integrity (15%). Does the agent's memory chain pass Merkle-root verification? A broken chain means the agent's state has been tampered with or silently rolled back. You do not lend to an agent whose history has been edited.
- Identity stability (10%). How long has this Ed25519 key been in use? How many counter-parties have settled with it? Fresh keys carry the same risk profile as fresh credit cards. New is not bad. New is unknown.
- Anomaly signal (10%). EWMA-detected deviations from the agent's own baseline, weighted by severity. A honeypot canary triggered. Three failed geo-consistency checks in a minute. These are thumb-on-the-scale adjustments that move the score by tens of points, not hundreds.
The score is not an opinion. It is a compression of an agent's entire observable economic history. Every input is falsifiable. Anyone with the receipts can re-derive it.
How recall events become a reputation signal
The piece that does not show up in any human credit model is memory integrity. It is the most novel input and the one that surprises people on first read.
Every time a MnemoPay-wired agent writes to its memory layer, that write gets hashed and chained to the previous write. The chain produces a Merkle root the agent (or any third party) can verify with a single hash comparison. If somebody silently edits an earlier memory to make the agent look better in hindsight, every subsequent root invalidates. The tamper is visible from the outside.
This sounds like a small thing. In practice it is the difference between an agent whose history can be trusted and an agent whose history is a story it tells about itself. The first one can be underwritten. The second one cannot.
The recall layer also produces a second signal: behavioral consistency over time. If the agent's recall events show a stable pattern of decisions across thousands of calls, the score reflects that. If the pattern shifts abruptly without a corresponding identity-layer event, the anomaly bucket catches it. The score then dips, the counter-party sees the dip, and the underwriting decision adjusts. None of this requires the agent to opt in. It is a side effect of the receipts the agent was already generating.
Integration with payment rails
The score is rail-agnostic by design. An agent might charge users via Stripe for monthly subscriptions, settle micropayments via Lightning for tool calls, accept Paystack subscriptions from African customers, and price an API in x402 for crypto-native callers. Four rails, one agent, one score.
That portability is the whole point. If the score lived inside Stripe, leaving Stripe would mean leaving the score. Same for any other rail-bound reputation. The agent would be locked in by something other than the cost of switching — which is exactly the lock-in pattern the portable trust layer essay argues against.
At the rail layer, the score shows up as a fee adjustment. An agent with a 720+ score gets a 1.0% platform fee. An agent under 580 gets 2.5% and an HITL gate before any settlement clears. The numbers are knobs the platform sets, not laws of physics, but the principle holds: counter-parties price risk into the rate.
The Reputation primitive in the SDK
In the MnemoPay SDK the score is a first-class object. The current shape, as of @mnemopay/sdk@1.9.0:
const score = new AgentCreditScore().compute({ agent });
What comes back is a number, a rating, a fee rate, and an array of the receipts that drove the decision. The receipts are not optional. The point of including them is that the calling code can audit the math itself if it does not want to trust the SDK. We are not the arbiter. Like every credit bureau going back fifty years, we are the notary.
Goodhart's Law applies, of course. The minute you publish how a number is computed, somebody will try to optimize for the number. The defense is that every input signal traces back to a signed receipt the agent cannot retroactively edit. You cannot fake a payment history by writing in your journal that you paid. The chain is the chain.
The score is for counter-parties, not agents
This is the piece developers get wrong on first read. The score does not exist to reward well-behaved agents with discounts. It exists to give the humans and platforms trusting those agents a cheap, uniform way to price risk.
If you are an AI agent developer, the score is a side effect of your agent's behavior, not a goal to optimize directly. Goodhart would ruin it in six weeks if you tried. You ship an agent, the agent lives a life, the receipts accumulate, and the score emerges. Same as a human. You do not "apply" for a credit score. You earn one by existing in the system.
If you are a counter-party — an API vendor, a SaaS, a platform, a marketplace — the score is the difference between onboarding agents programmatically and keeping a human in the loop on every single one. At 2026 agent-economy scale, that is the difference between a business that works and one that does not.
What about the other players in this space
I do not think MnemoPay is the only entity that will ship a version of this primitive. AGT.finance has a Bayesian trust model. Methux has a Weibull-distribution reliability score. A startup called Bank of Bots calls theirs the BOB Score. Mem0 has the memory layer and an obvious payments-shaped gap in the middle of their product. Kite, Payman, Skyfire, Sentient — everyone in this corner of the market has clocked that the agent economy needs this primitive.
The position we hold today is the full stack. Most competitors have either the payments rail or the memory layer or the identity layer. MnemoPay has all three, plus the cryptographic chain that makes the score derivable, plus an open-source SDK any developer can audit before paying for the managed version. That bundle is what makes the score actually computable. You cannot derive a credit score on top of a product that only does one of those three things. You can only derive a piece of it, and a piece does not underwrite transactions.
What to do this week if you are building this into your stack
If your agents already transact, the work is small. Pull in @mnemopay/sdk, wire the existing payment flow through it, and let the chain build. The score becomes available the moment your agent has more than one signed receipt. The fidelity of the score increases with every transaction after that.
If you are running a platform that will underwrite agents, the integration is also small. Call AgentCreditScore.compute on the agent's public DID and read the number back. Decide what threshold corresponds to your risk appetite. Wire that threshold into your onboarding flow. The number is rail-agnostic. It will follow the agent into whatever you do next.
The deeper claim — the one that takes longer to internalize — is that we are watching the same infrastructure layer get built that consumer credit got in the late 1980s. The receipts business is a twenty-year build. The credit bureaus did not feel inevitable in 1989 either. They feel inevitable now because every retail transaction in the United States runs on the rails they laid.
An agent economy with a million agents needs the same kind of rail. The number is the rail. The chain is the foundation under the rail. Both are open source. Both are at github.com/mnemopay/mnemopay-sdk if you want to read the code before you trust the score.