Multi-Model Invoice OCR Pipeline
Build an end-to-end invoice processing system using LayoutLMv3, TrOCR, and fine-tuned BERT for entity extraction
Problem Statement
Processing invoices from multiple vendors is challenging due to variations in layout, language, and format.
Task Goals:
- Extract key entities: vendor name, total amount, line items, invoice dates
- Handle multiple invoice formats and languages
- Automate end-to-end invoice processing
Solution Overview
NEO built a multi-stage pipeline combining:
- LayoutLMv3: Detects the structure and layout of invoices
- TrOCR: Performs high-accuracy text recognition
- Fine-tuned BERT: Extracts relevant entities from recognized text
The pipeline allows robust handling of diverse invoice formats in a scalable way.
Workflow / Pipeline
| Step | Description |
|---|---|
| 1. Data Ingestion | Collect invoices from multiple vendors in PDF or image format |
| 2. Layout Detection | Use LayoutLMv3 to segment invoices into header, line items, totals, and metadata regions |
| 3. OCR Text Recognition | Apply TrOCR on segmented regions for accurate text extraction |
| 4. Entity Extraction | Fine-tune BERT to extract vendor names, totals, dates, and line items |
| 5. Post-processing | Normalize date formats, currency symbols, and line items |
| 6. Output Generation | Produce structured JSON or CSV for downstream accounting and ERP systems |
Repository & Artifacts
GitHub Repository: Multi-Model Invoice OCR Pipeline
Generated Artifacts:
- Preprocessed invoice datasets
- Trained LayoutLMv3 and TrOCR models
- Fine-tuned BERT entity extraction model
- Evaluation metrics and reports
- Pipeline scripts for ingestion → output automation
Technical Details
- Preprocessing: PDF/image conversion, rotation correction, noise reduction
- Layout Detection: Fine-tuned LayoutLMv3 for region segmentation
- Text Recognition: TrOCR with multilingual support
- Entity Extraction: BERT fine-tuned on annotated invoice datasets
- Error Handling: Skipped unreadable invoices, logging for manual review
Results
- OCR Accuracy: 96%+ across multiple invoice layouts
- Entity Extraction F1-Score: 93%
- Multi-language support successfully handled invoices in English, Spanish, and French
- JSON outputs ready for ERP integration
Best Practices & Lessons Learned
- Annotate diverse invoice layouts for robust BERT fine-tuning
- Normalize extracted entities immediately for consistent downstream processing
- Separate layout detection from OCR for modularity and easier debugging
- Log edge cases for manual verification to improve dataset iteratively
Next Steps
- Add support for more languages
- Implement real-time invoice processing for high-volume accounts
- Integrate automated anomaly detection for invoice validation
References
- GitHub Repository
- LayoutLMv3 Paper: Link
- TrOCR Model: Hugging Face
- BERT Fine-tuning Techniques: Link