Why High Accuracy Can Still Lose Money
Payments company deployed a fraud detector. 99% accuracy. Beautiful metrics. Six months later, they ripped it out and replaced it. Too expensive.
Here's why: a false positive (blocking a real customer) cost them $80. Customer service, friction, abandoned transaction, lost trust. A missed fraud? $250 cost. The model optimized for accuracy. Which meant it was super conservative. Caught fraud. Also blocked a ton of legitimate customers. The false positive cost per transaction absolutely destroyed any benefit from catching fraud.
A model with 91% accuracy, built to minimize cost instead of maximizing accuracy, would've made them $2M more annually. Lower accuracy. Better business result. Sounds backwards, but that's the trap: accuracy and profit move in different directions.
Errors Don't Cost the Same
Here's what textbooks don't tell you: accuracy assumes all mistakes are created equal. One false positive, one false negative, same penalty. Works fine for "is this a cat" problems. Brutal for business.
Business errors have completely asymmetric costs:
- Lending: Reject a good loan? $12K opportunity loss. Approve a bad loan? $3K loss. Four-times asymmetry.
- Fraud: Block a real customer? $80 cost. Miss actual fraud? $250 cost. Different universe.
- Maintenance: Schedule unnecessary maintenance? $500. Miss a critical failure? $50K. That's 100x asymmetry.
- Hiring: Offer to bad candidate? $200K onboarding sunk cost. Skip good candidate? $30K in lost value. Lopsided.
In every case, one error type is radically more expensive than the other. But accuracy sees them as equal. So models optimized for accuracy will make systematically wrong tradeoffs.
The Numbers Don't Lie (They Just Confuse)
Let me walk you through actual numbers. Makes it concrete.
You've got a payments platform processing 100M transactions monthly. Fraud rate? 0.5% (500K actual frauds). You ship a model with 99% accuracy.
What does 99% accuracy mean in this context?
- You're right on 99% of transactions overall
- You're wrong on 1% (1M transactions)
- But because fraud is only 0.5%, almost all those wrong predictions are false positives (legitimate transactions you're flagging as fraud).
Math time:
| Scenario | Count | Cost per Item | Total Cost |
|---|---|---|---|
| True Positives (Fraud Caught) | 495,000 | $0 (fraud prevented) | $0 |
| False Positives (Legit Blocked) | 498,000 | -$80 | -$39.8M |
| False Negatives (Fraud Missed) | 5,000 | -$250 | -$1.25M |
| True Negatives | 99,502,000 | $0 | $0 |
Your 99%-accurate model? $41M monthly cost. $492M annually. You're hemorrhaging money. The higher the accuracy, the worse it gets.
Now contrast with a cost-optimized model hitting 91% accuracy:
| Scenario | Count | Cost per Item | Total Cost |
|---|---|---|---|
| True Positives (Fraud Caught) | 450,000 | $0 | $0 |
| False Positives (Legit Blocked) | 500,000 | -$80 | -$40M |
| False Negatives (Fraud Missed) | 50,000 | -$250 | -$12.5M |
| True Negatives | 99,000,000 | $0 | $0 |
That's $52.5M monthly. Worse. But this assumes we use the same threshold. What if we tune for cost instead of accuracy? We catch less fraud but block way fewer customers:
| Scenario | Count | Cost per Item | Total Cost |
|---|---|---|---|
| True Positives (Fraud Caught) | 400,000 | $0 | $0 |
| False Positives (Legit Blocked) | 200,000 | -$80 | -$16M |
| False Negatives (Fraud Missed) | 100,000 | -$250 | -$25M |
| True Negatives | 99,300,000 | $0 | $0 |
That cost-optimized version? $41M monthly. Same as the 99%-accurate model, except it catches 80% of fraud. But here's the key: it was designed to minimize cost, not chase accuracy. Different objective, different outcome.
Best case scenario: a model that catches 450K frauds (90% recall) while only falsely flagging 100K transactions:
- False positive cost: 100K × $80 = $8M
- False negative cost: 50K × $250 = $12.5M
- Monthly total: $20.5M
- Annually: $246M
Raw accuracy on this one? Maybe 88%. But cost-wise? $246M vs $492M for the 99%-accurate model. That's a $2.5B annual difference. Which one do you want?
The Cost-Weighted Alternative
Stop optimizing for accuracy. Start optimizing for this:
Business Value = (TP × Value_TP) + (TN × Value_TN) - (FP × Cost_FP) - (FN × Cost_FN)
This metric actually tells you: does your model make or lose money? You optimize the model to maximize this, not accuracy.
For fraud, it'd look like:
- TP (caught fraud) = $0 (you prevented harm, that's the baseline)
- TN (legit allowed) = $0.02 (tiny processing fee)
- FP (legit blocked) = -$80 (customer service, friction, lost repeat business)
- FN (fraud missed) = -$250 (chargeback, liability)
Now when you evaluate, you don't ask "which model is most accurate?" You ask "which generates most business value?" Often, the answer isn't the accurate one.
Type I vs Type II Errors (The Business Version)
Statisticians call them Type I and Type II errors. Business calls them "saying yes when you meant no" and "saying no when you meant yes." Different costs, always.
- Type I (False Positive): You approve something bad. Bad loan. Bad hire. Block a good customer. Immediate, visible cost.
- Type II (False Negative): You reject something good. Skip a good loan, good hire, good customer. Cost is opportunity loss. Usually invisible. Sometimes larger.
Most teams default to minimizing Type II because the cost is harder to see. But data doesn't care. In most domains, one type is just plain more expensive. Cost-weighted metrics show you which, then you optimize accordingly.
How to Actually Make This Work
Shift from accuracy to cost. Four pieces:
1. Write Down Your Cost Matrix
For each outcome (TP, TN, FP, FN), what's the cost or value? Real numbers:
TP: $0 (fraud prevented)
TN: $0.02 (transaction fee)
FP: -$80 (customer service, friction)
FN: -$250 (fraud loss)
Loans:
TP: $12K (loan profit)
TN: $3K (loss prevented)
FP: -$5K (default)
FN: -$12K (opportunity loss)
2. Optimize on Threshold, Not Accuracy
Most models output probabilities. You pick a threshold: "if confidence > 0.6, predict positive." Different thresholds change your precision/recall tradeoff. Instead of picking the threshold that maximizes accuracy, pick the one that maximizes business value. Often means accepting lower accuracy for better error distribution.
3. Measure Business Impact, Not Benchmark Performance
When you evaluate:
- Compute raw accuracy (for context)
- Compute business value (the real metric)
- Calculate cost-per-error
- Do break-even analysis (at what volume does this ROI positive?)
Highest business value wins. Period. Even if accuracy is lower.
4. Track Cost Drift in Production
In production, watch whether cost profile stays stable. If false positive costs change (customers care less about friction now?), optimal threshold shifts. If false negative costs change (fraud losses are lower), tune the model again.
Accuracy is what you measure because it's easy, not because it matters. What matters is: does the business make money? Optimize directly for that. "Lower accuracy" models that maximize business value beat "higher accuracy" models that lose money. Always.
Actually Making the Shift
This is why high-accuracy models lose money: we borrowed our evaluation approach from academia. Academic world? Optimize accuracy. Business world? Optimize profit.
Making the shift is uncomfortable. You have to sit down with your business team and ask: "How much does it cost when we're wrong?" Then listen to the asymmetry. It's weird at first. "False positives cost 3x false negatives?" Feels unbalanced. It's real.
Once you've made the switch (once you're building toward business value instead of accuracy), you'll notice "lower accuracy" models often outperform "higher accuracy" ones. Not because they're better. Because they're measured against the right thing.
Start with your cost matrix. Get your business team to sign it. Rebuild your evaluation framework around cost, not accuracy. Your CFO wins. Your model wins. Money actually gets made.
Is Your Model Optimized for Accuracy or Value?
We help organizations shift from accuracy metrics to business-value metrics, often revealing $1M+ in hidden value.
Get a Cost Analysis