Real deployments, real results.
Not slides. Actual systems in production. Banks using this stuff. Insurance companies running it daily. We built what worked and learned from what didn't. This is what we'd build for you.
What They Struggled With
This Top-5 Indian bank was drowning in KYC applications. 150,000+ every month. Each one took 6-8 hours of a compliance officer's time. Manual document verification. They had 8% failure rate, mostly because documents were unreadable, papers were missing, or data didn't line up across Aadhaar, PAN, licenses, passports. All in different scripts (Devanagari, Tamil, Telugu).
Hiring more people wasn't an option. They needed automation that could handle India's document chaos and RBI's strict rules. Not just faster. Reliable. They wanted 15 minutes per application, not 6 hours. And they needed the system to catch mistakes before they became compliance problems.
What We Built
We designed a pipeline with specialized agents handling each stage. Here's how it works:
OCR That Actually Works: We integrated Bulbul 3.0, Indic language OCR that extracts text from Aadhaar, PAN, licenses in multiple scripts. It runs at 99.1% accuracy. It handles blurry images, rotated documents, low quality scans. It re-scans automatically if something looks off.
Four Agents in Sequence: Each one does one job well:
- Data Extraction: Takes OCR output and structures it into clean JSON (name, DOB, address, ID numbers). Claude checks for inconsistencies. If something doesn't match, it flags it.
- Verification: Hits the UIDAI Aadhaar API. Checks the PAN database. Compares against external sources. If something's off, a human gets it for review.
- Risk Scoring: Runs RBI risk matrices. Document age. Address consistency. PEP screening. Sanctions lists. Returns a risk level: Green (approve), Amber (review), Red (human decision).
- Escalation: Amber and Red cases route to compliance officers with full context. Green cases? Approved automatically. No bureaucracy for straightforward approvals.
Security & Compliance: Aadhaar numbers never leave the system unencrypted. PII is masked everywhere. Every decision gets logged for RBI audits. Nothing sketchy.
The Stack: Python. FastAPI. Redis for state. PostgreSQL for logs. React dashboard for officers to review escalations.
The Results
How We Did It
Tech Stack Used
"We went from 6+ hours to under 10 minutes per application. But honestly, the bigger win was freeing our compliance team. They were drowning in paperwork. Now they focus on the edge cases and strategy. This is what automation should actually look like. It makes humans better, not redundant."
Head of KYC Operations, Top-5 Indian Bank
The Problem
Leading LPO firm. 500+ contracts monthly. Each one took 4-6 hours. Legal analysts were drowning in paperwork. They were extracting terms, finding risks, flagging weird clauses. Revenue was capped by how fast they could work. Burnout was high.
Their customers (global F500 companies) needed fast turnarounds. And 99%+ accuracy wasn't optional. Miss a liability cap? That's millions of dollars of exposure. So they needed AI that augmented their analysts, not replaced them. Something that extracted terms, provided reasoning, caught the edge cases, and escalated the ambiguous stuff to humans.
What We Built
RAG system specifically for contracts. Here's how it works:
Smart Chunking for Legal Documents: We built a tokenizer that understands legal structure. Doesn't just split on tokens. Keeps clauses together. Maintains party references. Preserves numerical values. Semantic chunks, not fixed-size blocks.
Search That Gets Legal Language: Embedded clauses with Claude's embeddings. Indexed them in Pinecone. Added metadata (party names, clause types, dollar amounts). So when you ask for "liability cap language," it finds similar clauses from their whole contract library.
Four Agents Working Together:
- Clause Extraction: Retrieves similar clauses via semantic search. Uses Claude to extract key terms and conditions from the current contract. Learns from examples.
- Risk Scoring: Checks extracted terms against risk matrices. Liability caps. Indemnification. Force majeure. Flags non-standard language and explains why it's risky.
- Cross-Referencing: Finds when obligations link back to party definitions. Spots conflicts (e.g., liability cap contradicts indemnity scope). Catches the hidden problems.
- Market Comparison: Compares against the firm's standard templates and market norms. Shows deviations with context. "This liability cap is 30% lower than market for this industry."
Humans Still Decide: All flagged items route to senior analysts. They review, validate, correct. System learns from their feedback. Gets better over time.
The Stack: LlamaIndex. Pinecone. Claude. FastAPI. Vue.js for the UI.
The Results
How We Shipped It
Tech Stack
"The system unblocked us. Senior analysts were stuck doing grunt work. Now the AI extracts, flags risks, compares to benchmarks. Our team reviews and validates. Turnaround went from 4 days to 6 hours. We picked up two new F500 clients because we can finally handle their volume. And the accuracy is solid."
Managing Partner, Leading Indian LPO
The Situation
Growing D2C fashion brand. 120,000 support tickets a month. WhatsApp, email, website chat. Tickets took 18 minutes each. CSAT was 3.1 out of 5. Support was eating 12% of revenue. Hiring more people wasn't an option. The market expected 2-minute response times.
Most tickets were simple (order status, returns, size guides, delayed deliveries). Repetitive stuff. But context mattered. They needed an AI that handled the simple ones automatically and routed the messy ones to humans without feeling like a rigid chatbot. Customers get frustrated with robots.
What We Built
Three-tier agent system. Smart routing. Real context awareness. Here's the flow:
Tier 1: Auto-Resolve: Bulbul 3.0 detects intent (order status, refund, size guide). Queries the order database. For simple questions ("Where's my order?", "What's your return policy?") it answers immediately with Claude's natural language. Handles about 73% of tickets without human help.
Tier 2: Reasoning: Tougher issues (partial refunds, exchanges, delivery disputes). Claude looks at order history, customer lifetime value, reason for return. Applies business rules. Generates personalized responses that feel human. Routes to humans only if it requires an exception override.
Tier 3: Humans: Complicated stuff (disputes, custom orders, reputation risk). Gets escalated with full context. Humans make the judgment calls.
The Stack: N8N orchestrates the flow. Bulbul 3.0 for language understanding. Claude for reasoning and tone. Guardrails keeps the system honest (no false promises, no aggressive language). PostgreSQL for order data.
Smart Features: Unified input from WhatsApp, email, chat. Sentiment analysis flags escalations. Conversation history cached. A/B testing on response variants to improve quality.
The Results
The Rollout
Tech Stack
"This changed how we work. We're handling 120K tickets with the same team size. CSAT went from 3.1 to 4.4. And our team isn't stressed. They don't waste time on order status questions anymore. They focus on the customers who actually need help. This is what we wanted. Automation that makes human work better."
VP Customer Success, D2C Fashion Brand
The Mess
Automotive supplier, $800M annual revenue. Supply chain was broken. Inventory 18% too high. Capital stuck in slow-moving stock. Demand forecasting was guesswork. Stockouts on critical stuff. Overstock on others. Steel prices up 15%. Aluminum down. Plastics volatile.
Decision-making was slow. Procurement managers staring at spreadsheets. Spot prices changing hourly. They needed intelligence that could watch everything at once (sales forecasts, commodity markets, shipping costs, geopolitical risk) and recommend purchases and inventory moves. With guardrails so humans still approved big decisions.
What We Engineered
Multi-agent system that watches supply chain and makes recommendations. Here's how:
Data Ingestion: Real-time connectors to their SAP ERP (inventory, orders, costs). Logistics APIs (shipping rates, transit times). Commodity feeds (spot prices for steel, aluminum, plastics). News feeds for geopolitical risks.
Analysis Agents: Time-series forecasting (sales trends, demand by component type). Cost modeling (procurement costs, shipping, storage). Risk assessment (can suppliers deliver? Price volatility? Geopolitical disruption risk?).
Forecast Agent: Combines historical data, ML (Prophet for demand, regression for costs), and Claude for context (e.g., "EV contract expected Q2 → 30% battery connector demand spike").
Action Agents: Generate concrete recommendations: "Buy 500 Steel Coil X units at ₹45/unit (3-month average ₹48, favorable window)", "Reroute Shipment Y from Port A to B (save $12K)". Standard recommendations auto-approve. Big ones (>₹500L) go to VP.
The Stack: OpenClaw on-premise (supply chain data never leaves the building). Python agents. SAP APIs. Apache Airflow. Kafka for real-time. PostgreSQL for logs.
The Results
How We Built It
Tech Stack
"It's recommending smart buys and inventory moves constantly. We took inventory from 18% over-target to optimal. Saved millions on commodity timing. Stockouts dropped. And our procurement team isn't buried in spreadsheets anymore. They're building relationships with suppliers and sourcing new components. That's actually strategic work."
VP Supply Chain, $800M Auto Components Supplier
The Problem
Hospital system across southern India. 12 hospitals. Patient intake was bottlenecking everything. 45-60 minutes per patient just for registration and triage. Paper forms or ancient tablets. Data entry mistakes. Triage decisions were sloppy. Staff missing red flags, wrong urgency levels.
They needed to handle Telugu, Hindi, English. Process handwritten medical history documents. Extract clinical data accurately. Triage correctly. Route to the right department. All automatically. And clinical safety wasn't optional. No missed critical symptoms, no wrong urgency calls. In healthcare, mistakes cost lives.
What We Built
End-to-end intake system with clinical safety baked in:
Multilingual Bot: Conversational in Telugu, Hindi, English. Bulbul 3.0 understands language. Claude talks like a human. Collects demographics, chief complaint, medical history through conversation, not rigid forms. Patients actually like interacting with it.
Document Scanning: OCR for scanned medical documents (lab reports, medication lists, previous records). Bulbul handles handwritten notes and different document languages. Extracts structured data: medications, allergies, conditions, surgeries.
Clinical Profile: Merges conversation data with scanned documents into one patient profile. Claude does clinical reasoning: diabetic patient with chest pain? That's cardiac risk. Anaphylaxis allergy? Flags it for everyone to see.
Triage Scoring: Applies ESI or custom triage protocol. Claude evaluates symptom severity, vital signs, comorbidities. Returns triage level (1-5) with reasoning. Clinicians can review and override if needed.
Smart Routing: Routes to Emergency, Cardiology, Orthopedics, General OPD based on chief complaint and triage level. Guardrails: chest pain always goes to Emergency. Altered consciousness always goes to Emergency. No exceptions.
Security: End-to-end encryption. No PII in logs. Full audit trail for every clinical decision. Integrates with their EHR (Epic, Cerner) so doctors see everything seamlessly.
The Stack: Bulbul 3.0. Claude. LangChain. FastAPI. React. HL7 FHIR for EHR integration.
The Results
The Timeline
Tech Stack
"Patients actually prefer the conversational bot to paper forms. And clinicians love it because triage is now consistent and accurate. Nothing important gets missed. We cut ED wait times significantly. Registration staff went from form-filling to actually helping patients. That's what good automation looks like."
Chief Medical Information Officer, 12-Hospital Health System
The Situation
Top-3 insurance company. 50,000+ claims monthly. Motor and health. Settlement was taking 14 days. Claims assessors were spending 4-6 hours per claim. Photos, PDFs, handwritten forms, prescriptions. Checking coverage. Assessing damage. Fraud detection. Manually.
IRDA says settle claims fast. Customers expect fast. But fraud was 2.1% of claim value. Manual fraud detection wasn't working. They needed automation that could look at photos, read handwritten notes, understand policies, spot fraud patterns, and approve the obvious ones automatically. With humans handling exceptions.
What We Engineered
End-to-end claims system. Multi-modal. Fraud-aware. Here's the architecture:
Multi-Modal Ingestion: Claims come in any format. PDFs. Images. Damage photos. Medical reports. Handwritten notes. Video links. Claude's vision can analyze all of it. Assess vehicle damage. Evaluate injury severity. The system sees what humans see.
Analysis Agent: Looks at damage photos. Estimates repair costs against internal benchmarks. Cross-checks against claim amount. For health claims, reviews medical documentation. Verifies covered procedures and policy limits.
Policy Matching: Pulls the policy from the database. Verifies: is this event covered? Any exclusions? Within limits? Extracts deductibles and co-insurance. Shows exactly what should be paid.
Fraud Detection: Pattern matching across historical claims. Same claimant with multiple claims? Similar photos? Suspicious frequency? Claude reasons about it: "Studio lighting in accident scene photos (unusual) + claim 2.3x typical repair costs for this damage = fraud signal." Risk score from 0-100.
Payout Calculation: Takes damage assessment, policy terms, and fraud risk. Calculates recommended payout. Applies deductibles. Applies co-insurance. Within policy limits. For routine cases (low fraud, clear coverage), recommends auto-approval. For edge cases, routes to a senior assessor with full context.
Workflow: Routine claims approve automatically. Complex ones get human review with all the analysis, risk flags, policy details, and recommended amount. Guardrails keep it honest: never pay over limits, never miss fraud.
The Stack: Claude API with vision. Python orchestration. Custom fraud models. Insurance core systems integration. React dashboard.
The Results
How We Did It
Tech Stack
"We went from 14 days to 18 hours average settlement. Customers love it. Our claims team isn't drowning in paperwork anymore. They're handling exceptions and building relationships. Fraud losses dropped. Compliance is tight. And we're processing way more volume without hiring proportionally. This is what we needed."
Head of Claims Operations, Top-3 Insurance Company