Case Study

AI Document Automation for Mid-Market Insurance Broker

An LLM-powered document processing pipeline that reduced policy document handling time by 78% and cut data entry errors to near-zero for a 120-staff insurance brokerage.

Client
Harrington & Cole Insurance Brokers
Industry
Financial Services
Duration
11 weeks
Delivered
February 2026
AI AutomationMachine Learning SolutionsAI Integration

Key Results

  • Document processing time -78%

  • Data entry errors reduced by 94%

  • £210,000 annual staff cost saving projected

  • Processing capacity: 3,000 documents/day vs 400 manual

  • 6-week payback period on project cost

The Challenge

Harrington & Cole Insurance Brokers handle commercial and professional indemnity insurance for over 4,000 SME clients. At the centre of their renewal and new business process is a documentation-heavy workflow: insurers supply policy documents, endorsements, and schedules in inconsistent formats — PDF, Word, and occasionally scanned paper — that must be read, validated, and re-keyed into their broker management system (Acturis).

The manual process consumed approximately 1.8 full-time equivalent staff roles and created a bottleneck that slowed renewal turnaround times and introduced transcription errors that occasionally led to client disputes.

The specific problems:

  • A team of three junior staff spent 60–70% of their time on manual document reading and data entry
  • Error rate on transcription was estimated at 3.2% — causing downstream issues when policy terms were incorrectly recorded
  • Peak renewal periods (January and June) created processing backlogs that delayed client-facing documentation by 2–5 days
  • The brokerage could not scale without hiring more document handlers

The brief: automate the extraction, validation, and ingestion of insurance documents into Acturis — without replacing the human review step for high-value or complex policies.

Our Approach

We began with a two-week discovery sprint to understand the document taxonomy: 14 distinct document types across 8 insurance categories, each with varying layouts from different insurer templates.

Week 1–2: Discovery & scoping

We audited 1,200 historical documents to understand layout variation, field distribution, and extraction complexity. We identified three document tiers by complexity:

  • Tier 1 (68% of volume): Standardised formats with consistent field positions — high automation confidence
  • Tier 2 (24% of volume): Semi-structured with variable layout — AI extraction with human validation flag
  • Tier 3 (8% of volume): Highly bespoke or handwritten — routed directly to human handler

Week 3–6: Pipeline development

We built a multi-stage extraction pipeline:

  1. Ingestion layer: Email attachment monitoring and SharePoint folder polling that captured incoming documents automatically — no manual upload required
  2. Document classification: A fine-tuned classification model (built on a base vision transformer) that identified document type with 97.3% accuracy across the 14 categories
  3. Field extraction: GPT-4 Vision with structured output schemas for each document type — extracting policyholder name, policy number, coverage limits, excess, premium, inception and expiry dates, endorsements, and special conditions
  4. Validation layer: Business rule validation against known insurer formats, cross-field consistency checks, and confidence scoring. Documents below a configurable threshold were automatically flagged for human review rather than auto-submitted
  5. Acturis integration: A REST API integration that pushed extracted data into the correct Acturis record via their documented import API — with idempotency checks to prevent duplicate entries

Week 7–9: Human-in-the-loop interface

For Tier 2 documents and low-confidence extractions, we built a lightweight review UI that showed the extracted fields alongside the source document — allowing staff to verify and correct values before submission with a single click. The correction data was logged for model fine-tuning.

Week 10–11: Testing, training & deployment

End-to-end testing with 500 live documents, staff training on the review interface, and a phased rollout starting with Tier 1 documents only.

The Results

The system went live processing Tier 1 documents in week 11, with Tier 2 enabled two weeks post-launch after staff were comfortable with the review workflow.

Operational outcomes at 60 days:

  • Average document processing time reduced from 8.4 minutes to 1.9 minutes (78% reduction)
  • Data entry error rate fell from 3.2% to 0.18% — a 94% reduction
  • Daily throughput capacity increased from ~400 documents to 3,000+ without additional staff
  • Renewal turnaround time reduced by an average of 1.8 days during the January peak

Business outcomes:

  • The three staff previously dedicated to document handling were redeployed to client relationship management and complex case handling — higher-value work
  • Projected annual staff cost saving of £210,000 based on prevented future hiring
  • Project cost recovered in approximately 6 weeks at current processing volumes

The Tier 3 finding: The 8% of documents routed to human handlers were processed 40% faster because staff now had the AI's partial extraction as a starting point — reducing the time even on documents the system could not fully automate.

Technical Stack

  • Document ingestion: Python, Microsoft Graph API (SharePoint/Outlook), AWS S3
  • Classification model: Fine-tuned ViT on custom labelled dataset (1,200 examples)
  • Extraction: GPT-4 Vision with Pydantic-validated structured outputs, prompt versioning via LangSmith
  • Validation engine: Python rule engine with configurable per-document-type schemas
  • Review UI: Next.js with react-pdf for document rendering
  • Acturis integration: REST API with exponential backoff and dead-letter queue
  • Monitoring: Datadog for pipeline health; custom extraction accuracy dashboard
  • Infrastructure: AWS Lambda + SQS for async processing; RDS PostgreSQL for audit log