Definition
Transaction categorisation turns a raw operation (technical label, amount, date) into a useful category: "Groceries › Supermarket", "Transport › Uber", "Salary".
It is an invisible but critical building block for any use case that consumes AIS data: PFM, BFM, alternative credit scoring, automated accounting, fraud detection, embedded finance.
The problem: unreadable raw labels
A typical statement contains lines such as:
CB SNCF MOBILE 24/04 0612345678VIRT SARL XYZ COMM/AVR2025PRLV EDF FACT N12345678CB CARREFOUR EXP 23/04 75011 PARISCB AMZN MKTPL DUBLIN IE
As-is, this is unusable. You have to infer the category, the actual merchant and the nature of the spend.
Rules, ML, hybrid
Three approaches, all in use:
- Deterministic rules — regex and lookup tables ("CARREFOUR" → Supermarket). Precise on the known, unable to handle new merchants.
- Supervised machine learning — models trained on tens of millions of tagged labels, able to generalise ("CARREF EXPRESS PARIS" with no explicit entry).
- Hybrid — rules for certain cases, ML for the unknown, and a feedback loop (manual re-categorisation → retraining).
The leaders (Bud, Yodlee, Tink, Heron Data) are all hybrid, with proprietary models trained on hundreds of millions of transactions.
The taxonomy: no standard
There is no universal taxonomy. Each player defines its own, usually across 2 or 3 levels:
- L1 — broad families (Groceries, Transport, Housing, Salary, etc.).
- L2 — subcategories (Supermarket, Restaurant, Bakery).
- L3 — refinements (merchant chain, product type).
MCCs (Merchant Category Codes, ISO 18245, 4 digits assigned by the networks) help but are not enough: they are sometimes wrong or too generic (code 5411 "Grocery Stores" covers both Carrefour and a kebab shop).
Quality criteria
- Coverage: the share of categorised transactions (vs "Other"). Target > 95%.
- Accuracy: the share correctly categorised. Target > 90% on L1, > 80% on L2.
- Latency: ideally < 100 ms per transaction for real time.
- Multi-language / multi-country: a pan-European player must be just as accurate in FR, DE, IT, ES, PL.
- Stability: not re-categorising the same transaction differently from one call to the next.
Business case: accounting categorisation
For BFM and automated accounting (Pennylane, Qonto, Indy), the exercise is more complex: the transaction must be linked to a chart of accounts (PCG, IFRS), VAT must be handled (deductible or not, varying rates), entry lines produced, and transactions and invoices reconciled automatically. A "FOURNITURES BUREAU" (office supplies) line must thus become "Account 6064, 20% deductible VAT".
What categorisation is not
- Not enrichment: categorisation infers the category; enrichment adds logo, geolocation, MCC and company profile. Two distinct layers, often combined.
- Not a regulated service: no ACPR licence is needed to categorise; an AISP that categorises its own data remains an AISP.
- Not universal: labels, MCCs and behaviours differ by country; a global model rarely matches a per-country model.
- Not fixed: every new merchant (TikTok Shop, a new operator) must be learned continuously.
In the PSD2 ecosystem
Categorisation is not part of PSD2 in the strict sense (which only defines the transport of data), but it is the main value added on top of AIS: without it, the data stays as raw labels. It is what justifies the business model of aggregators and PFM/BFM solutions.
Real-world examples
- Leaders: Bud (UK), Heron Data (UK, business focus), Yodlee (US, acquired by Envestnet), Tink (Sweden, Visa), Bridge and Powens (FR), MX (US).
- Bankin' / Lydia / Linxo: often rely on the Powens or Bridge engine; Bankin' also has its own legacy engine.
- Pennylane: accounting categorisation (PCG), invoice OCR and automated reconciliation, with a claimed accuracy above 95%.
- Heron Data: positions itself on B2B scoring by qualifying an SME's flows to assess its financial health.
- A known limitation: P2P transfers are the hardest to classify — a "VIRT JEAN DUPONT" with no context stays unclassifiable; many PFMs leave them "to be classified".
- Cost: typically €0.001 to €0.01 per transaction at the leaders — significant across millions of transactions per day, hence in-housing among the large players.
- Outlook: use of LLMs to categorise novel labels zero-shot (tested at Heron, Bud) — more accuracy on the long tail, but a higher inference cost.