TAR-set anonymization is the removal of personal data from documents fed to technology-assisted ranking, kept proportional under FRCP 26(b)(1). anonym.plus runs locally, so PII does not enter the training data.
When this applies
The model ranks documents by relevance to speed triage. Feeding raw PII into it spreads exposure across the index, so clearing it first keeps risk low.
How anonym.plus handles it
- Point anonym.plus at the document set on your device.
- Local OCR reads any scanned items in the set.
- The tool flags names, contacts, and IDs across files.
- Use steady labels so relevance signals survive.
- Replace or mask each confirmed value.
- Save the clean set for the ranking workflow.
What you need to provide
- The document set (mixed PDF, DOCX, email, or scan).
- Replace with steady labels to preserve text signals.
- Optional batch run of up to 20 files at a time.
PII entity types detected
| Category | anonym.plus entity type | Example |
|---|---|---|
| Names | PERSON | custodian name → [PERSON_n] |
| Contact | EMAIL_ADDRESS | sender email → [EMAIL] |
| Dates | DATE_TIME | doc date → [DATE] |
| Identifiers | US_SSN | SSN → [SSN] |
| Location | LOCATION | address → [ADDRESS] |
| Account | US_BANK_NUMBER | account no. → [ACCOUNT] |
Compliance achieved
- Keeps discovery proportional under FRCP 26(b)(1).
- Steady labels keep relevance signals for the model.
- Offline work keeps the training data inside your firm.
Anonymize TAR document sets offline — see plans & start free →
Limitations & cautions
Anonymizing before machine ranking can shift how a model reads context. Steady labels keep most signals, but test recall on a control set first. Free-text clues that survive redaction still need a human pass on responsive items.
Frequently asked questions
What is TAR?
Technology-assisted ranking uses machine learning to score documents by likely relevance, so lawyers focus on the most responsive items first.
Does anonymizing first hurt accuracy?
It can shift context, but steady labels keep most text signals. Test recall on a control set before relying on the model.
Why clear PII before the model trains?
Raw PII in training data spreads across the index. Clearing it first keeps the proportional scope under FRCP 26(b)(1) low-exposure.