The problem Finn solves is a real one: most personal finance tools require you to change your habits (log into an app, categorize manually, remember later). Finn fits into a habit you already have β checking WhatsApp.
What it can do:
- π Log expenses and income via text or voice note (Whisper transcription)
- π Parse credit card PDF invoices β send the PDF, review the itemized list, confirm, saved
- π¦ Parse bank account statement PDFs β transactions extracted via a saldo-diff algorithm that works around garbled PDF text
- π Query spending β by period, category, card, or cost center (cross-source: credit cards + bank account + manual entries combined)
- βοΈ Edit or delete any saved record by just describing it
- π Phone whitelist β only your number gets a response
Demo
All four scenarios below are real conversations with a running Finn instance.
π’ Demo 1 β Log a Single Expense
The most common interaction: user mentions an amount and a description. Finn classifies the category and card automatically, shows a confirmation line, and only saves after approval.
You: 189 on lunch
Finn: 189γγ« Β· Food Β· Mastercard Β· Me Β· 2026εΉ΄04ζ22ζ₯ β confirm? β
You: yeah
Finn: β
Saved! Anything else?
If the description is ambiguous, Finn asks exactly one clarifying question:
You: spent 80 today
Finn: What was it for?
You: pharmacy
Finn: 80γγ« Β· Pharmacy Β· Mastercard Β· Me Β· 2026εΉ΄04ζ22ζ₯ β confirm? β
You: yes
Finn: β
Saved!
Voice notes work the same way β Finn echoes the transcription before acting:
You: ποΈ [audio: "spent ninety at the grocery store today"]
Finn: ποΈ _"spent ninety at the grocery store today"_
90γγ« Β· Groceries Β· Mastercard Β· Me Β· 2026εΉ΄04ζ22ζ₯ β confirm? β
You: yes
Finn: β
Saved!
βοΈ Demo 2 β Change Something
You can correct any saved record in plain language. Finn finds the most recent matching transaction, shows what will change, and waits for confirmation.
Change the amount:
You: the pharmacy one was actually 95, not 80
Finn: Update 80γγ« β 95γγ« Β· Pharmacy Β· Mastercard Β· Me Β· 2026εΉ΄04ζ22ζ₯ β confirm? β
You: yes
Finn: β
Updated!
Change the category:
You: that lunch from yesterday, mark it as Entertainment
Finn: Update Food β Entertainment Β· 189γγ« Β· Mastercard Β· Me Β· 2026εΉ΄04ζ22ζ₯ β confirm? β
You: yes
Finn: β
Updated!
Delete a record:
You: delete the 90γγ« grocery entry
Finn: Delete 90γγ« Β· Groceries Β· Mastercard Β· Me Β· 2026εΉ΄04ζ22ζ₯ β are you sure? ποΈ
You: yes
Finn: β
Deleted.
π Demo 3 β Import a Credit Card PDF Invoice
You forward the invoice PDF to the WhatsApp chat. Finn parses it, shows you every line, resolves unknown categories, and bulk-saves on one "yes".
You: [sends Mastercard invoice PDF]
Finn: π Mastercard Β· Due 2026εΉ΄04ζ10ζ₯ Β· 2,340γγ«.00 Β· 18 items (16 new Β· 2 skipped)
π Invoice Summary
Purchases: 2,680γγ«.00 Β· Financed balance (credit): -520γγ«.00 Β· Interest: +180γγ«.00 Β· Total due: 2,340γγ«.00
1. β
UBER*TRIP
Mar 12 Β· 18γγ«.50 Β· Transport Β· Me
2. β
NETFLIX.COM
Mar 13 Β· 15γγ«.99 Β· Entertainment Β· Me
3. β
AMAZON PRIME
Mar 14 Β· 42γγ«.30 Β· Shopping Β· Me (intl Β· USD 7.99)
4. β SQ*DOWNTOWN MKT
Mar 15 Β· 34γγ«.00 Β· β Β· Me
5. β
SHELL GAS STATION
Mar 17 Β· 95γγ«.00 Β· Transport Β· Me
6. β
DELTA AIRLINES 01/03
Mar 19 Β· 320γγ«.00 Β· Travel Β· Me
...
β What category for: #4 SQ*DOWNTOWN MKT (34γγ«.00)?
You: Groceries
Finn: Got it. Save all 16 items? (2 duplicates will be skipped)
You: yes
Finn: β
Saved 16 transactions Β· Total 2,340γγ«.00
Transport: 210γγ«.50 Β· Shopping: 380γγ«.00 Β· Food: 195γγ«.00 Β· Entertainment: 95γγ«.99 Β· Travel: 320γγ«.00 Β· Other: 1,138γγ«.51
The "Invoice Summary" line reconciles the invoice math: Purchases + Interest + Financed balance = Total due. Finn stores the interest and financed balance as signed rows so SELECT SUM(amount) always equals the invoice total.
For image-based (scanned) PDFs, Finn automatically falls back to GPT-4.1 Vision OCR β same output, no extra steps for the user.
π Demo 4 β Query Spending
Finn always queries the database β it never answers from memory or inference.
Total this month:
You: how much did I spend this month?
Finn: π All spending β last 60 days
Total: 6,234γγ«.18
Shopping: 1,402γγ«.10
Food: 534γγ«.90
Transport: 487γγ«.20
Entertainment: 211γγ«.70
Travel: 420γγ«.00
Other: 3,178γγ«.28
By category:
You: breakdown of Transport
Finn: π Transport β last 60 days
487γγ«.20 across 8 transactions
Mar 15 Β· Uber 34γγ«.90 (Mastercard)
Mar 20 Β· Posto Ipiranga 180γγ«.00 (Mastercard)
Apr 01 Β· Uber 28γγ«.50 (Mastercard)
Apr 05 Β· Shell Gas 155γγ«.00 (Nu)
Apr 10 Β· Parking 22γγ«.00 (Bradesco)
...
Cross-source (credit card + bank account):
You: did my card payment hit the bank account?
Finn: Yes β Mar 31 bank debit of 4,782γγ«.24 matches your Mastercard invoice exactly. β
A specific period:
You: how much did I spend on Food in March?
Finn: π½οΈ Food β March 2026
312γγ«.40 across 9 transactions
Highest: 89γγ«.90 at a restaurant on Mar 22
Daily average: 10γγ«.08
How I Used OpenClaw
The entire agent is an OpenClaw plugin registered as a before_dispatch hook. Every WhatsApp message β text, voice, or PDF β passes through Finn before OpenClaw does anything else.
WhatsApp β OpenClaw gateway β before_dispatch hook β Finn plugin β OpenAI gpt-4.1 β Supabase
Plugin Registration
// openclaw.plugin.json
{
"name": "finance-agent",
"version": "1.0.0",
"hooks": ["before_dispatch"]
}
// plugin/src/index.ts
api.on('before_dispatch', async (event, ctx) => {
const phone = ctx.senderId ?? event.senderId
// Phone whitelist β only the owner gets responses
const allowedPhones = process.env.ALLOWED_PHONES?.split(',').map(p => p.trim()) ?? []
if (allowedPhones.length > 0 && !allowedPhones.includes(phone)) {
return { handled: true, text: '' } // silent ignore for unknown numbers
}
const result = await runAgent({ phone, message, mediaType })
return { handled: true, text: result.reply }
})
The Agent Loop
The core is a tool-use loop over gpt-4.1 with six tools and a maximum of 5 iterations:
| Tool |
Purpose |
save_transaction |
Persist a confirmed expense or income entry |
query_spending |
Query totals, breakdowns, history from Supabase |
save_bulk_transactions |
Bulk-save confirmed invoice items from a PDF |
save_bank_statement |
Bulk-save confirmed bank statement rows |
update_transaction |
Edit a saved record after confirmation |
delete_transaction |
Delete a record after confirmation |
The system prompt enforces a strict state machine: the LLM never calls save_transaction without an explicit user confirmation. The confirmation always uses a canonical format:
45γγ« Β· Food Β· Mastercard Β· Me Β· 2026εΉ΄04ζ22ζ₯ β confirm? β
Preventing LLM Hallucination on Queries
Even with a well-crafted prompt saying "always call query_spending", the model would sometimes answer "you spent $X on Transport" by inferring from a recently-parsed PDF in context β instead of querying the database. The fix: force tool_choice on the first iteration for any spending question:
const SPENDING_Q_RE = /how much|breakdown|what did I spend/i
let toolChoice = SPENDING_Q_RE.test(userText)
? { type: 'function', function: { name: 'query_spending' } }
: 'auto'
PDF Invoice Pipeline
When a PDF arrives, the plugin routes it before the LLM ever sees it:
if (pdfText.trim().length < 100) {
// Scanned/image-based PDF β GPT-4.1 Vision OCR
invoice = await parseInvoiceOcr(pdfToImages(pdfBuffer))
} else if (/Extrato de:.*AgΓͺncia/i.test(pdfText)) {
// Bank statement
stmt = parseStatementBradesco(pdfText)
} else {
// Text-based credit card invoice
invoice = parseInvoice(pdfText)
}
What I Learned
1. PDF parsing is harder than it looks. The text extraction from pdf-parse is reliable for prose but unreliable for table columns β numbers get concatenated with adjacent reference codes. The saldo-diff approach was a counterintuitive fix: instead of parsing the value I want, compute it from context.
2. "This month" is not a calendar concept for credit cards. A purchase on March 8 appears on an April invoice β so a filter of date >= April 1 would miss it. Finn uses a 60-day rolling window for "this month" queries to cover the billing cycle lag.
3. Forcing tool_choice prevents silent hallucination. The model reliably answers from database queries when forced, and sometimes "just knows" from context when not forced. Both answers look correct β the second one just isn't queryable later.
4. WhatsApp as an interface has a real adoption advantage. The friction of opening a dedicated finance app is the #1 reason people stop using them. A chat interface that's already open all day has zero switching cost.
5. Security in layers. OpenClaw's allowFrom whitelist blocks at the gateway level. ALLOWED_PHONES adds an application-level check. Supabase rows are scoped by phone with RLS. Each layer is independent β if one fails, the others still hold.
Tech Stack
| Layer |
Technology |
| Channel |
WhatsApp via OpenClaw |
| Runtime |
TypeScript, Node.js 20 |
| Framework |
OpenClaw (before_dispatch hook) |
| LLM |
OpenAI gpt-4.1 (tool-use loop) |
| Database |
Supabase (PostgreSQL + Row Level Security) |
| PDF parsing |
pdf-parse + custom text parsers |
| Vision OCR |
GPT-4.1 Vision (scanned PDFs) |
| Audio |
OpenAI Whisper |
| Deployment |
pm2 on a VPS |
GitHub: github.com/vicente-r-junior/finn