Case Study
AI Document Automation for Mid-Market Insurance Broker
An LLM-powered document processing pipeline that reduced policy document handling time by 78% and cut data entry errors to near-zero for a 120-staff insurance brokerage.
- Client
- Harrington & Cole Insurance Brokers
- Industry
- Financial Services
- Duration
- 11 weeks
- Delivered
- February 2026
Key Results
Document processing time -78%
Data entry errors reduced by 94%
£210,000 annual staff cost saving projected
Processing capacity: 3,000 documents/day vs 400 manual
6-week payback period on project cost
The Challenge
Harrington & Cole Insurance Brokers handle commercial and professional indemnity insurance for over 4,000 SME clients. At the centre of their renewal and new business process is a documentation-heavy workflow: insurers supply policy documents, endorsements, and schedules in inconsistent formats — PDF, Word, and occasionally scanned paper — that must be read, validated, and re-keyed into their broker management system (Acturis).
The manual process consumed approximately 1.8 full-time equivalent staff roles and created a bottleneck that slowed renewal turnaround times and introduced transcription errors that occasionally led to client disputes.
The specific problems:
- A team of three junior staff spent 60–70% of their time on manual document reading and data entry
- Error rate on transcription was estimated at 3.2% — causing downstream issues when policy terms were incorrectly recorded
- Peak renewal periods (January and June) created processing backlogs that delayed client-facing documentation by 2–5 days
- The brokerage could not scale without hiring more document handlers
The brief: automate the extraction, validation, and ingestion of insurance documents into Acturis — without replacing the human review step for high-value or complex policies.
Our Approach
We began with a two-week discovery sprint to understand the document taxonomy: 14 distinct document types across 8 insurance categories, each with varying layouts from different insurer templates.
Week 1–2: Discovery & scoping
We audited 1,200 historical documents to understand layout variation, field distribution, and extraction complexity. We identified three document tiers by complexity:
- Tier 1 (68% of volume): Standardised formats with consistent field positions — high automation confidence
- Tier 2 (24% of volume): Semi-structured with variable layout — AI extraction with human validation flag
- Tier 3 (8% of volume): Highly bespoke or handwritten — routed directly to human handler
Week 3–6: Pipeline development
We built a multi-stage extraction pipeline:
- Ingestion layer: Email attachment monitoring and SharePoint folder polling that captured incoming documents automatically — no manual upload required
- Document classification: A fine-tuned classification model (built on a base vision transformer) that identified document type with 97.3% accuracy across the 14 categories
- Field extraction: GPT-4 Vision with structured output schemas for each document type — extracting policyholder name, policy number, coverage limits, excess, premium, inception and expiry dates, endorsements, and special conditions
- Validation layer: Business rule validation against known insurer formats, cross-field consistency checks, and confidence scoring. Documents below a configurable threshold were automatically flagged for human review rather than auto-submitted
- Acturis integration: A REST API integration that pushed extracted data into the correct Acturis record via their documented import API — with idempotency checks to prevent duplicate entries
Week 7–9: Human-in-the-loop interface
For Tier 2 documents and low-confidence extractions, we built a lightweight review UI that showed the extracted fields alongside the source document — allowing staff to verify and correct values before submission with a single click. The correction data was logged for model fine-tuning.
Week 10–11: Testing, training & deployment
End-to-end testing with 500 live documents, staff training on the review interface, and a phased rollout starting with Tier 1 documents only.
The Results
The system went live processing Tier 1 documents in week 11, with Tier 2 enabled two weeks post-launch after staff were comfortable with the review workflow.
Operational outcomes at 60 days:
- Average document processing time reduced from 8.4 minutes to 1.9 minutes (78% reduction)
- Data entry error rate fell from 3.2% to 0.18% — a 94% reduction
- Daily throughput capacity increased from ~400 documents to 3,000+ without additional staff
- Renewal turnaround time reduced by an average of 1.8 days during the January peak
Business outcomes:
- The three staff previously dedicated to document handling were redeployed to client relationship management and complex case handling — higher-value work
- Projected annual staff cost saving of £210,000 based on prevented future hiring
- Project cost recovered in approximately 6 weeks at current processing volumes
The Tier 3 finding: The 8% of documents routed to human handlers were processed 40% faster because staff now had the AI's partial extraction as a starting point — reducing the time even on documents the system could not fully automate.
Technical Stack
- Document ingestion: Python, Microsoft Graph API (SharePoint/Outlook), AWS S3
- Classification model: Fine-tuned ViT on custom labelled dataset (1,200 examples)
- Extraction: GPT-4 Vision with Pydantic-validated structured outputs, prompt versioning via LangSmith
- Validation engine: Python rule engine with configurable per-document-type schemas
- Review UI: Next.js with react-pdf for document rendering
- Acturis integration: REST API with exponential backoff and dead-letter queue
- Monitoring: Datadog for pipeline health; custom extraction accuracy dashboard
- Infrastructure: AWS Lambda + SQS for async processing; RDS PostgreSQL for audit log
