TAR Review Set Anonymisation

Legal Solution · e-Disclosure & Productions · CPR Part 31

TAR-set anonymisation is the removal of personal data from documents fed to technology-assisted ranking, kept proportionate under CPR Part 31. anonym.plus runs locally, so PII does not enter the training data.

When this applies

The model ranks documents by relevance to speed triage. Feeding raw PII into it spreads exposure across the index, so clearing it first keeps risk low.

How anonym.plus handles it

Point anonym.plus at the document set on your device.
Local OCR reads any scanned items in the set.
The tool flags names, contacts, and IDs across files.
Use steady labels so relevance signals survive.
Replace or mask each confirmed value.
Save the clean set for the ranking workflow.

What you need to provide

The document set (mixed PDF, DOCX, email, or scan).
Replace with steady labels to preserve text signals.
Optional batch run of up to 20 files at a time.

PII entity types detected

Category	anonym.plus entity type	Example
Names	PERSON	custodian name → [PERSON_n]
Contact	EMAIL_ADDRESS	sender email → [EMAIL]
Dates	DATE_TIME	doc date → [DATE]
Identifiers	UK_NINO	NINO → [NINO]
Location	LOCATION	address → [ADDRESS]
Account	UK_SORT_CODE	sort code → [SORT_CODE]

Compliance achieved

Keeps disclosure proportionate under CPR Part 31.
Steady labels keep relevance signals for the model.
Offline work keeps the training data inside your firm.

Anonymise TAR document sets offline — see plans & start free →

Limitations & cautions

Anonymising before machine ranking can shift how a model reads context. Steady labels keep most signals, but test recall on a control set first. Free-text clues that survive redaction still need a human pass on relevant items.

Frequently asked questions

What is TAR?

Technology-assisted ranking uses machine learning to score documents by likely relevance, so lawyers focus on the most relevant items first.

Does anonymising first hurt accuracy?

It can shift context, but steady labels keep most text signals. Test recall on a control set before relying on the model.

Why clear PII before the model trains?

Raw PII in training data spreads across the index. Clearing it first keeps the proportionate scope under CPR Part 31 low-exposure.

TAR Review Set Anonymisation with anonym.plus