Cal MercerEveryone thinks parsing a bank statement should be simple. It's just a list of transactions,...
Everyone thinks parsing a bank statement should be simple. It's just a list of transactions, right?
Wrong.
After building parsers for dozens of document types, bank statements remain one of the most deceptively complex. Here's what we learned handling 500+ different formats.
There are roughly 4,500 FDIC-insured banks in the US alone. Add credit unions, international banks, and neobanks, and you're looking at tens of thousands of institutions. Each one formats their statements differently.
Chase uses a clean columnar layout.
Bank of America loves multi-page summaries before showing transactions.
Wells Fargo splits deposits and withdrawals into separate sections.
Capital One sometimes puts the date first, sometimes the description.
And that's just the big guys. Regional banks and credit unions often have PDF layouts that look like they were designed in 1998 using Microsoft Publisher.
Our first approach was template matching. For each bank, we'd define:
This worked for about 6 months. Then we hit three problems:
We were building 5-10 new templates per week. It wasn't sustainable.
Raw OCR gives you text, but bank statements are fundamentally about tables. The spatial relationship between columns matters.
Consider this line:
02/15 AMAZON MARKETPLACE -$47.99 $1,234.56
OCR sees: 02/15 AMAZON MARKETPLACE -$47.99 $1,234.56
But which number is the transaction amount and which is the running balance? In some formats, the balance comes first. In others, it's not shown at all.
Modern vision LLMs don't just read text. They understand layout. They can look at a bank statement and recognize:
The architecture that works:
PDF → Image → Vision LLM → Table Extraction → Schema Validation → JSON
The schema is critical. We define exactly what we expect:
{
"account": {
"holder_name": "string",
"account_number": "string",
"routing_number": "string",
"account_type": "checking|savings|business"
},
"period": {
"start_date": "date",
"end_date": "date"
},
"transactions": [{
"date": "date",
"description": "string",
"amount": "number",
"type": "credit|debit",
"category": "string",
"running_balance": "number|null"
}],
"summary": {
"opening_balance": "number",
"closing_balance": "number",
"total_credits": "number",
"total_debits": "number"
}
}
Even with vision models, bank statements have edge cases:
Multi-page transactions - A single transaction description can wrap across pages
Pending vs. posted - Some statements show both, with different formatting
Foreign currency - Amount in USD vs. original currency, exchange rates
Interest calculations - Daily balance tables that aren't transactions
Fees buried in descriptions - "Monthly Service Fee" as a line item vs. as a deduction footnote
We handle these with a combination of prompt engineering and post-processing validation. If the extracted transactions don't reconcile to the stated totals, we retry with more specific instructions.
After 8 months of iteration:
We wrapped this into an API. Upload a bank statement PDF, get structured JSON:
curl -X POST https://statementocr.com/api/parse \
-H "Authorization: Bearer YOUR_API_KEY" \
-F "file=@statement.pdf"
Response:
{
"account": {
"holder_name": "John Smith",
"account_number": "****4567"
},
"transactions": [
{
"date": "2024-02-01",
"description": "DIRECT DEPOSIT - ACME CORP",
"amount": 3500.00,
"type": "credit"
},
{
"date": "2024-02-03",
"description": "AMAZON MARKETPLACE",
"amount": -47.99,
"type": "debit"
}
],
"summary": {
"opening_balance": 1234.56,
"closing_balance": 4686.57
}
}
Three main use cases:
The lending use case is huge. Not everyone wants to connect their bank account via OAuth. Some customers prefer uploading a PDF. And for businesses, bank statements are often the only option.
If you're building anything that needs to understand bank statements, Statement OCR has a free tier. Upload a few statements and see the output.
Works with most US banks out of the box. International support is improving.
Part 2 of a series on document parsing. Previously: EOB parsing. Next: tax documents.