TAR Review Set Anonymisation with anonym.plus

Clear PII before a machine-review model trains, all on your own device.

TAR-set anonymisation is the removal of personal data from documents fed to technology-assisted ranking, kept proportionate under CPR Part 31. anonym.plus runs locally, so PII does not enter the training data.

When this applies

The model ranks documents by relevance to speed triage. Feeding raw PII into it spreads exposure across the index, so clearing it first keeps risk low.

How anonym.plus handles it

  1. Point anonym.plus at the document set on your device.
  2. Local OCR reads any scanned items in the set.
  3. The tool flags names, contacts, and IDs across files.
  4. Use steady labels so relevance signals survive.
  5. Replace or mask each confirmed value.
  6. Save the clean set for the ranking workflow.

What you need to provide

PII entity types detected

Categoryanonym.plus entity typeExample
NamesPERSONcustodian name → [PERSON_n]
ContactEMAIL_ADDRESSsender email → [EMAIL]
DatesDATE_TIMEdoc date → [DATE]
IdentifiersUK_NINONINO → [NINO]
LocationLOCATIONaddress → [ADDRESS]
AccountUK_SORT_CODEsort code → [SORT_CODE]

Compliance achieved

Anonymise TAR document sets offline — see plans & start free →

Limitations & cautions

Anonymising before machine ranking can shift how a model reads context. Steady labels keep most signals, but test recall on a control set first. Free-text clues that survive redaction still need a human pass on relevant items.

Frequently asked questions

What is TAR?

Technology-assisted ranking uses machine learning to score documents by likely relevance, so lawyers focus on the most relevant items first.

Does anonymising first hurt accuracy?

It can shift context, but steady labels keep most text signals. Test recall on a control set before relying on the model.

Why clear PII before the model trains?

Raw PII in training data spreads across the index. Clearing it first keeps the proportionate scope under CPR Part 31 low-exposure.