A fraud analytics dataset feeds the models a team builds to spot scams. GDPR Recital 26 treats data as anonymous only when no person can be singled out again. anonym.plus removes names, accounts, and contacts across the dataset on your device, so features stay useful without the people.
When this applies
A data team prepares records to train a fraud model or hand to a vendor. You clean the set so the signal survives but no individual does.
How anonym.plus handles it
- Point anonym.plus at the dataset on your machine.
- It scans each row for names, accounts, and contacts.
- Local OCR reads any scanned source pages.
- Turn the name map OFF for true anonymity.
- Replace each identifier with a steady label.
- Save the clean dataset locally.
What you need to provide
- The dataset export (CSV-to-PDF, DOCX, or mixed).
- An operator (Replace, with the name map off).
- Batch mode for up to 20 files at once.
PII & financial identifiers detected
| Category | anonym.plus entity type | Example |
|---|---|---|
| Names | PERSON | row name → [PERSON_1] |
| Financial | US_BANK_NUMBER | accounts → [ACCOUNT] |
| Identifiers | US_SSN | SSNs → [SSN] |
| Contact | EMAIL_ADDRESS | emails → [EMAIL] |
| Amounts | MONEY | txn amounts → [AMOUNT] |
| Dates | DATE_TIME | timestamps → [DATE] |
Compliance achieved
- Aims for the anonymity bar in GDPR Recital 26.
- True anonymity needs the reversible name map turned off.
- Offline work keeps the training set off any server.
- Batch up to 20 files in one local pass.
Anonymize fraud analytics datasets offline — see plans & start free →
Limitations & cautions
Recital 26 says the data stays personal while anyone can re-identify it. Masking direct fields is not enough if rare combinations single someone out. Keep the name map off and test for re-identification before you share.
Frequently asked questions
When is the dataset truly anonymous under GDPR?
Recital 26 says only when no person can be singled out. Remove direct fields, turn off the map, and test rare combinations.
Can a model still learn from a cleaned set?
Yes. Steady labels preserve patterns and amounts, so the signal survives while identities do not.
Is the dataset uploaded?
No. The whole run is offline, so the data stays on your machine.