TAR Review Set Anonymization

Legal Solution · eDiscovery & Productions · FRCP 26(b)(1)

TAR-set anonymization is the removal of personal data from documents fed to technology-assisted ranking, kept proportional under FRCP 26(b)(1). anonym.plus runs locally, so PII does not enter the training data.

When this applies

The model ranks documents by relevance to speed triage. Feeding raw PII into it spreads exposure across the index, so clearing it first keeps risk low.

How anonym.plus handles it

Point anonym.plus at the document set on your device.
Local OCR reads any scanned items in the set.
The tool flags names, contacts, and IDs across files.
Use steady labels so relevance signals survive.
Replace or mask each confirmed value.
Save the clean set for the ranking workflow.

What you need to provide

The document set (mixed PDF, DOCX, email, or scan).
Replace with steady labels to preserve text signals.
Optional batch run of up to 20 files at a time.

PII entity types detected

Category	anonym.plus entity type	Example
Names	PERSON	custodian name → [PERSON_n]
Contact	EMAIL_ADDRESS	sender email → [EMAIL]
Dates	DATE_TIME	doc date → [DATE]
Identifiers	US_SSN	SSN → [SSN]
Location	LOCATION	address → [ADDRESS]
Account	US_BANK_NUMBER	account no. → [ACCOUNT]

Compliance achieved

Keeps discovery proportional under FRCP 26(b)(1).
Steady labels keep relevance signals for the model.
Offline work keeps the training data inside your firm.

Anonymize TAR document sets offline — see plans & start free →

Limitations & cautions

Anonymizing before machine ranking can shift how a model reads context. Steady labels keep most signals, but test recall on a control set first. Free-text clues that survive redaction still need a human pass on responsive items.

Frequently asked questions

What is TAR?

Technology-assisted ranking uses machine learning to score documents by likely relevance, so lawyers focus on the most responsive items first.

Does anonymizing first hurt accuracy?

It can shift context, but steady labels keep most text signals. Test recall on a control set before relying on the model.

Why clear PII before the model trains?

Raw PII in training data spreads across the index. Clearing it first keeps the proportional scope under FRCP 26(b)(1) low-exposure.

TAR Review Set Anonymization with anonym.plus