Case Studies | Automation Agents

BFSI Mumbai, India

Top-5 Indian Bank: Autonomous KYC Agents

▼

What They Struggled With

This Top-5 Indian bank was drowning in KYC applications. 150,000+ every month. Each one took 6-8 hours of a compliance officer's time. Manual document verification. They had 8% failure rate, mostly because documents were unreadable, papers were missing, or data didn't line up across Aadhaar, PAN, licenses, passports. All in different scripts (Devanagari, Tamil, Telugu).

Hiring more people wasn't an option. They needed automation that could handle India's document chaos and RBI's strict rules. Not just faster. Reliable. They wanted 15 minutes per application, not 6 hours. And they needed the system to catch mistakes before they became compliance problems.

What We Built

We designed a pipeline with specialized agents handling each stage. Here's how it works:

OCR That Actually Works: We integrated Bulbul 3.0, Indic language OCR that extracts text from Aadhaar, PAN, licenses in multiple scripts. It runs at 99.1% accuracy. It handles blurry images, rotated documents, low quality scans. It re-scans automatically if something looks off.

Four Agents in Sequence: Each one does one job well:

Data Extraction: Takes OCR output and structures it into clean JSON (name, DOB, address, ID numbers). Claude checks for inconsistencies. If something doesn't match, it flags it.
Verification: Hits the UIDAI Aadhaar API. Checks the PAN database. Compares against external sources. If something's off, a human gets it for review.
Risk Scoring: Runs RBI risk matrices. Document age. Address consistency. PEP screening. Sanctions lists. Returns a risk level: Green (approve), Amber (review), Red (human decision).
Escalation: Amber and Red cases route to compliance officers with full context. Green cases? Approved automatically. No bureaucracy for straightforward approvals.

Security & Compliance: Aadhaar numbers never leave the system unencrypted. PII is masked everywhere. Every decision gets logged for RBI audits. Nothing sketchy.

The Stack: Python. FastAPI. Redis for state. PostgreSQL for logs. React dashboard for officers to review escalations.

The Results

₹23Cr

Annual Savings

94.2%

Autonomous Approvals

8 min

Avg Processing Time

2.1%

False Rejection Rate

99.1%

OCR Accuracy

3.2x

Throughput Increase

How We Did It

Months 1-2: Tested Indic OCR systems, designed the agent architecture

Months 3-4: Built the API integrations, configured guardrails, prepped for compliance audits

Months 5-6: Ran a beta with 10K applications, refined the system

Month 7: Shipped to production, handling 150K applications monthly

Tech Stack Used

Bulbul 3.0 Claude API Guardrails AI FastAPI Redis PostgreSQL React Python

"We went from 6+ hours to under 10 minutes per application. But honestly, the bigger win was freeing our compliance team. They were drowning in paperwork. Now they focus on the edge cases and strategy. This is what automation should actually look like. It makes humans better, not redundant."

Head of KYC Operations, Top-5 Indian Bank

Download Full Case Study (PDF)

Legal Bengaluru, India

India's Largest LPO: Contract Intelligence

▼

The Problem

Leading LPO firm. 500+ contracts monthly. Each one took 4-6 hours. Legal analysts were drowning in paperwork. They were extracting terms, finding risks, flagging weird clauses. Revenue was capped by how fast they could work. Burnout was high.

Their customers (global F500 companies) needed fast turnarounds. And 99%+ accuracy wasn't optional. Miss a liability cap? That's millions of dollars of exposure. So they needed AI that augmented their analysts, not replaced them. Something that extracted terms, provided reasoning, caught the edge cases, and escalated the ambiguous stuff to humans.

What We Built

RAG system specifically for contracts. Here's how it works:

Smart Chunking for Legal Documents: We built a tokenizer that understands legal structure. Doesn't just split on tokens. Keeps clauses together. Maintains party references. Preserves numerical values. Semantic chunks, not fixed-size blocks.

Search That Gets Legal Language: Embedded clauses with Claude's embeddings. Indexed them in Pinecone. Added metadata (party names, clause types, dollar amounts). So when you ask for "liability cap language," it finds similar clauses from their whole contract library.

Four Agents Working Together:

Clause Extraction: Retrieves similar clauses via semantic search. Uses Claude to extract key terms and conditions from the current contract. Learns from examples.
Risk Scoring: Checks extracted terms against risk matrices. Liability caps. Indemnification. Force majeure. Flags non-standard language and explains why it's risky.
Cross-Referencing: Finds when obligations link back to party definitions. Spots conflicts (e.g., liability cap contradicts indemnity scope). Catches the hidden problems.
Market Comparison: Compares against the firm's standard templates and market norms. Shows deviations with context. "This liability cap is 30% lower than market for this industry."

Humans Still Decide: All flagged items route to senior analysts. They review, validate, correct. System learns from their feedback. Gets better over time.

The Stack: LlamaIndex. Pinecone. Claude. FastAPI. Vue.js for the UI.

The Results

96.2%

Clause Extraction Accuracy

4.2x

Throughput Increase

$2.1M

New Revenue (Year 1)

1.2 hrs

Avg Analysis Time

98%

Client Satisfaction

2

Enterprise Clients Added

How We Shipped It

Weeks 1-2: Studied their contracts, designed the tokenizer, evaluated their corpus

Weeks 3-6: Built the RAG system, tuned the agents, integrated with their workflow

Weeks 7-10: Beta tested with 50 contracts, benchmarked accuracy

Week 11+: Went to production, built the client-facing dashboard

Tech Stack

Claude API LlamaIndex Pinecone FastAPI Vue.js PostgreSQL Pydantic Python

"The system unblocked us. Senior analysts were stuck doing grunt work. Now the AI extracts, flags risks, compares to benchmarks. Our team reviews and validates. Turnaround went from 4 days to 6 hours. We picked up two new F500 clients because we can finally handle their volume. And the accuracy is solid."

Managing Partner, Leading Indian LPO

Download Full Case Study (PDF)

E-Commerce Delhi NCR, India

D2C Fashion Brand: 120K Tickets/Month

▼

The Situation

Growing D2C fashion brand. 120,000 support tickets a month. WhatsApp, email, website chat. Tickets took 18 minutes each. CSAT was 3.1 out of 5. Support was eating 12% of revenue. Hiring more people wasn't an option. The market expected 2-minute response times.

Most tickets were simple (order status, returns, size guides, delayed deliveries). Repetitive stuff. But context mattered. They needed an AI that handled the simple ones automatically and routed the messy ones to humans without feeling like a rigid chatbot. Customers get frustrated with robots.

What We Built

Three-tier agent system. Smart routing. Real context awareness. Here's the flow:

Tier 1: Auto-Resolve: Bulbul 3.0 detects intent (order status, refund, size guide). Queries the order database. For simple questions ("Where's my order?", "What's your return policy?") it answers immediately with Claude's natural language. Handles about 73% of tickets without human help.

Tier 2: Reasoning: Tougher issues (partial refunds, exchanges, delivery disputes). Claude looks at order history, customer lifetime value, reason for return. Applies business rules. Generates personalized responses that feel human. Routes to humans only if it requires an exception override.

Tier 3: Humans: Complicated stuff (disputes, custom orders, reputation risk). Gets escalated with full context. Humans make the judgment calls.

The Stack: N8N orchestrates the flow. Bulbul 3.0 for language understanding. Claude for reasoning and tone. Guardrails keeps the system honest (no false promises, no aggressive language). PostgreSQL for order data.

Smart Features: Unified input from WhatsApp, email, chat. Sentiment analysis flags escalations. Conversation history cached. A/B testing on response variants to improve quality.

The Results

73%

Autonomous Resolution

₹12

Cost/Ticket (from ₹85)

4.4/5.0

CSAT (from 3.1)

45s

Avg Response Time

68%

Cost Savings

120K

Tickets/Month Handled

The Rollout

Weeks 1-2: Analyzed real tickets, designed intent taxonomy, planned architecture

Weeks 3-5: Built N8N workflows, connected to order systems and chat platforms

Weeks 6-8: Trained agents, tuned guardrails, piloted with 20% of traffic

Weeks 9-10: Went full production, started monitoring and improving

Tech Stack

N8N Claude API Bulbul 3.0 Guardrails AI PostgreSQL Redis Node.js Twilio

"This changed how we work. We're handling 120K tickets with the same team size. CSAT went from 3.1 to 4.4. And our team isn't stressed. They don't waste time on order status questions anymore. They focus on the customers who actually need help. This is what we wanted. Automation that makes human work better."

VP Customer Success, D2C Fashion Brand

Download Full Case Study (PDF)

Manufacturing Pune, India

$800M Auto Components: Supply Chain Intelligence

▼

The Mess

Automotive supplier, $800M annual revenue. Supply chain was broken. Inventory 18% too high. Capital stuck in slow-moving stock. Demand forecasting was guesswork. Stockouts on critical stuff. Overstock on others. Steel prices up 15%. Aluminum down. Plastics volatile.

Decision-making was slow. Procurement managers staring at spreadsheets. Spot prices changing hourly. They needed intelligence that could watch everything at once (sales forecasts, commodity markets, shipping costs, geopolitical risk) and recommend purchases and inventory moves. With guardrails so humans still approved big decisions.

What We Engineered

Multi-agent system that watches supply chain and makes recommendations. Here's how:

Data Ingestion: Real-time connectors to their SAP ERP (inventory, orders, costs). Logistics APIs (shipping rates, transit times). Commodity feeds (spot prices for steel, aluminum, plastics). News feeds for geopolitical risks.

Analysis Agents: Time-series forecasting (sales trends, demand by component type). Cost modeling (procurement costs, shipping, storage). Risk assessment (can suppliers deliver? Price volatility? Geopolitical disruption risk?).

Forecast Agent: Combines historical data, ML (Prophet for demand, regression for costs), and Claude for context (e.g., "EV contract expected Q2 → 30% battery connector demand spike").

Action Agents: Generate concrete recommendations: "Buy 500 Steel Coil X units at ₹45/unit (3-month average ₹48, favorable window)", "Reroute Shipment Y from Port A to B (save $12K)". Standard recommendations auto-approve. Big ones (>₹500L) go to VP.

The Stack: OpenClaw on-premise (supply chain data never leaves the building). Python agents. SAP APIs. Apache Airflow. Kafka for real-time. PostgreSQL for logs.

The Results

$3.8M

Annual Savings

34%

Inventory Reduction

89%

Forecast Accuracy

2.1%

Stockout Reduction

1200+

Recommendations/Month

94%

Approval Rate

How We Built It

Months 1-2: Mapped data architecture, connected SAP, set up OpenClaw on-prem

Months 3-4: Designed agents, built forecasting models, configured approval workflows

Month 5: Tested with a subset of components, validated recommendations

Month 6+: Full deployment, monitoring, continuous refinement

Tech Stack

OpenClaw Claude API SAP APIs Apache Airflow Apache Kafka PostgreSQL Prophet Python

"It's recommending smart buys and inventory moves constantly. We took inventory from 18% over-target to optimal. Saved millions on commodity timing. Stockouts dropped. And our procurement team isn't buried in spreadsheets anymore. They're building relationships with suppliers and sourcing new components. That's actually strategic work."

VP Supply Chain, $800M Auto Components Supplier

Download Full Case Study (PDF)

Healthcare Hyderabad, India

Hospital Chain: Patient Intake & Triage

▼

The Problem

Hospital system across southern India. 12 hospitals. Patient intake was bottlenecking everything. 45-60 minutes per patient just for registration and triage. Paper forms or ancient tablets. Data entry mistakes. Triage decisions were sloppy. Staff missing red flags, wrong urgency levels.

They needed to handle Telugu, Hindi, English. Process handwritten medical history documents. Extract clinical data accurately. Triage correctly. Route to the right department. All automatically. And clinical safety wasn't optional. No missed critical symptoms, no wrong urgency calls. In healthcare, mistakes cost lives.

What We Built

End-to-end intake system with clinical safety baked in:

Multilingual Bot: Conversational in Telugu, Hindi, English. Bulbul 3.0 understands language. Claude talks like a human. Collects demographics, chief complaint, medical history through conversation, not rigid forms. Patients actually like interacting with it.

Document Scanning: OCR for scanned medical documents (lab reports, medication lists, previous records). Bulbul handles handwritten notes and different document languages. Extracts structured data: medications, allergies, conditions, surgeries.

Clinical Profile: Merges conversation data with scanned documents into one patient profile. Claude does clinical reasoning: diabetic patient with chest pain? That's cardiac risk. Anaphylaxis allergy? Flags it for everyone to see.

Triage Scoring: Applies ESI or custom triage protocol. Claude evaluates symptom severity, vital signs, comorbidities. Returns triage level (1-5) with reasoning. Clinicians can review and override if needed.

Smart Routing: Routes to Emergency, Cardiology, Orthopedics, General OPD based on chief complaint and triage level. Guardrails: chest pain always goes to Emergency. Altered consciousness always goes to Emergency. No exceptions.

Security: End-to-end encryption. No PII in logs. Full audit trail for every clinical decision. Integrates with their EHR (Epic, Cerner) so doctors see everything seamlessly.

The Stack: Bulbul 3.0. Claude. LangChain. FastAPI. React. HL7 FHIR for EHR integration.

The Results

65%

Intake Time Reduction

91%

Triage Accuracy

₹8Cr

Annual Savings (12 hospitals)

15 min

Avg Intake Time (from 50 min)

3

Languages Supported

99.8%

Clinical Safety (no missed flags)

The Timeline

Months 1-2: Met with clinical teams, validated triage protocols, tuned Bulbul for Telugu/Hindi

Months 3-5: Built the bot, integrated OCR, connected EHR APIs, designed safety checks

Months 6-7: Piloted at 2 hospitals, got clinician feedback, passed safety audits

Months 8-9: Deployed across all 12 hospitals, trained staff, optimized

Tech Stack

Bulbul 3.0 Claude API LangChain FastAPI React HL7 FHIR PostgreSQL Tesseract OCR

"Patients actually prefer the conversational bot to paper forms. And clinicians love it because triage is now consistent and accurate. Nothing important gets missed. We cut ED wait times significantly. Registration staff went from form-filling to actually helping patients. That's what good automation looks like."

Chief Medical Information Officer, 12-Hospital Health System

Download Full Case Study (PDF)

Insurance Mumbai, India

Top-3 Insurance Company: Claims Processing

▼

The Situation

Top-3 insurance company. 50,000+ claims monthly. Motor and health. Settlement was taking 14 days. Claims assessors were spending 4-6 hours per claim. Photos, PDFs, handwritten forms, prescriptions. Checking coverage. Assessing damage. Fraud detection. Manually.

IRDA says settle claims fast. Customers expect fast. But fraud was 2.1% of claim value. Manual fraud detection wasn't working. They needed automation that could look at photos, read handwritten notes, understand policies, spot fraud patterns, and approve the obvious ones automatically. With humans handling exceptions.

What We Engineered

End-to-end claims system. Multi-modal. Fraud-aware. Here's the architecture:

Multi-Modal Ingestion: Claims come in any format. PDFs. Images. Damage photos. Medical reports. Handwritten notes. Video links. Claude's vision can analyze all of it. Assess vehicle damage. Evaluate injury severity. The system sees what humans see.

Analysis Agent: Looks at damage photos. Estimates repair costs against internal benchmarks. Cross-checks against claim amount. For health claims, reviews medical documentation. Verifies covered procedures and policy limits.

Policy Matching: Pulls the policy from the database. Verifies: is this event covered? Any exclusions? Within limits? Extracts deductibles and co-insurance. Shows exactly what should be paid.

Fraud Detection: Pattern matching across historical claims. Same claimant with multiple claims? Similar photos? Suspicious frequency? Claude reasons about it: "Studio lighting in accident scene photos (unusual) + claim 2.3x typical repair costs for this damage = fraud signal." Risk score from 0-100.

Payout Calculation: Takes damage assessment, policy terms, and fraud risk. Calculates recommended payout. Applies deductibles. Applies co-insurance. Within policy limits. For routine cases (low fraud, clear coverage), recommends auto-approval. For edge cases, routes to a senior assessor with full context.

Workflow: Routine claims approve automatically. Complex ones get human review with all the analysis, risk flags, policy details, and recommended amount. Guardrails keep it honest: never pay over limits, never miss fraud.

The Stack: Claude API with vision. Python orchestration. Custom fraud models. Insurance core systems integration. React dashboard.

The Results

78%

Autonomous Processing

18 hours

Avg Settlement (from 14 days)

₹31Cr

Annual Savings

1.1%

Fraud Loss (from 2.1%)

97%

Customer Satisfaction

3.7M

Claims Processed (Year 1)

How We Did It

Months 1-2: Analyzed claim documents, researched fraud patterns, mapped policy database

Months 3-5: Developed multi-modal agents, trained fraud models, designed workflows

Months 6-7: Piloted with motor claims (biggest volume), validated accuracy and fraud detection

Months 8-9: Added health claims, went full production

Tech Stack

Claude API (Vision) Python FastAPI React PostgreSQL LightGBM (Fraud) Redis Celery

"We went from 14 days to 18 hours average settlement. Customers love it. Our claims team isn't drowning in paperwork anymore. They're handling exceptions and building relationships. Fraud losses dropped. Compliance is tight. And we're processing way more volume without hiring proportionally. This is what we needed."

Head of Claims Operations, Top-3 Insurance Company

Download Full Case Study (PDF)

Real deployments, real results.

Top-5 Indian Bank: Autonomous KYC Agents

What They Struggled With

What We Built

The Results

How We Did It

Tech Stack Used

India's Largest LPO: Contract Intelligence

The Problem

What We Built

The Results

How We Shipped It

Tech Stack

D2C Fashion Brand: 120K Tickets/Month

The Situation

What We Built

The Results

The Rollout

Tech Stack

$800M Auto Components: Supply Chain Intelligence

The Mess

What We Engineered

The Results

How We Built It

Tech Stack

Hospital Chain: Patient Intake & Triage

The Problem

What We Built

The Results

The Timeline

Tech Stack

Top-3 Insurance Company: Claims Processing

The Situation

What We Engineered

The Results

How We Did It

Tech Stack