Trial dataset anonymization is the removal of participant identifiers before a file is shared. It supports EMA Policy 0070 on clinical data publication. anonym.plus runs offline and keeps the measured values readable.
When this applies
The agency asks a sponsor to publish participant-level results. Names, free-text fields, and rare dates must be hidden first.
How anonym.plus handles it
- Load the file (CSV, XLSX, PDF, or DOCX) into anonym.plus.
- The tool scans columns and free text for direct identifiers.
- Local OCR pulls text from any scanned supporting page.
- Confirm the flagged participant names, dates, and places.
- Replace each one with a steady token across the whole file.
- Save the cleaned copy on your device with no network call.
What you need to provide
- The file (CSV, XLSX, PDF, DOCX, or scan).
- An operator: Replace for tokens, Redact to drop a field.
- Optional: a token map for consistent labels across exports.
PHI entity types detected
| Category | anonym.plus entity type | Example |
|---|---|---|
| Names | PERSON | James O'Connor → [PARTICIPANT_1] |
| Event dates | DATE_TIME | Randomised 02/02/2026 → [DATE] |
| Location | LOCATION | Cork, Ireland → [REGION] |
| EMAIL_ADDRESS | j.oconnor@example.ie → [EMAIL] | |
| Free-text ID | ID | Screening SCR-0091 → [SCREEN_ID] |
| Age | AGE | Age 91 → [AGE_BAND] |
Compliance achieved
- Supports public release under EMA Policy 0070.
- Stays offline, so no BAA or data-transfer step is needed.
- On-device AES-256-GCM guards the working copy.
- Aligns with GDPR Recital 26 when the result no longer identifies anyone.
Anonymize clinical trial datasets offline — see plans & start free →
Limitations & cautions
Policy 0070 expects a risk assessment, not just field removal. The tool strips direct identifiers and flags rare values like an age of 91. You still judge whether quasi-identifiers in combination could re-identify a participant before release.
Frequently asked questions
What does EMA Policy 0070 require?
It governs the publication of clinical data the agency holds. Sponsors must anonymise participant-level files and justify the method in a report. Removing direct identifiers is the first step toward that submission.
Can it process spreadsheet columns?
Yes. Load a CSV or XLSX and the tool scans both column values and free text. It applies the same token to a repeated name across every row.
Is the published file still useful for analysis?
Yes. Numeric outcomes and timing offsets remain. Only direct identifiers are swapped, so the file keeps its scientific value.