TAR Review Set Anonymization with anonym.plus

Clear PII before a machine-review model trains, all on your own device.

TAR-set anonymization is the removal of personal data from documents fed to technology-assisted ranking, kept proportional under FRCP 26(b)(1). anonym.plus runs locally, so PII does not enter the training data.

When this applies

The model ranks documents by relevance to speed triage. Feeding raw PII into it spreads exposure across the index, so clearing it first keeps risk low.

How anonym.plus handles it

  1. Point anonym.plus at the document set on your device.
  2. Local OCR reads any scanned items in the set.
  3. The tool flags names, contacts, and IDs across files.
  4. Use steady labels so relevance signals survive.
  5. Replace or mask each confirmed value.
  6. Save the clean set for the ranking workflow.

What you need to provide

PII entity types detected

Categoryanonym.plus entity typeExample
NamesPERSONcustodian name → [PERSON_n]
ContactEMAIL_ADDRESSsender email → [EMAIL]
DatesDATE_TIMEdoc date → [DATE]
IdentifiersUS_SSNSSN → [SSN]
LocationLOCATIONaddress → [ADDRESS]
AccountUS_BANK_NUMBERaccount no. → [ACCOUNT]

Compliance achieved

Anonymize TAR document sets offline — see plans & start free →

Limitations & cautions

Anonymizing before machine ranking can shift how a model reads context. Steady labels keep most signals, but test recall on a control set first. Free-text clues that survive redaction still need a human pass on responsive items.

Frequently asked questions

What is TAR?

Technology-assisted ranking uses machine learning to score documents by likely relevance, so lawyers focus on the most responsive items first.

Does anonymizing first hurt accuracy?

It can shift context, but steady labels keep most text signals. Test recall on a control set before relying on the model.

Why clear PII before the model trains?

Raw PII in training data spreads across the index. Clearing it first keeps the proportional scope under FRCP 26(b)(1) low-exposure.