Use Case: AI/ML Training Data

Anonymize training datasets for EU AI Act Art. 10 and GDPR compliance — entirely offline.

The Challenge

Challenge

An enterprise AI team is fine-tuning a customer service LLM using 18 months of support ticket data. The dataset contains 240,000 JSON records with customer names, email addresses, account numbers, product serial numbers, and free-text descriptions that include PII. The EU AI Act (Art. 10, effective August 2026) requires data governance practices ensuring training data is free of unnecessary personal data for high-risk AI applications. Uploading the dataset to a cloud anonymization service would itself create a GDPR violation — the data must stay within the company's EU data center.

The Solution

Solution

The ML engineering team installs anonym.plus on a workstation within the EU data center. They split the 240K record dataset into 120 JSONL files of 2,000 records each (avg 25 MB per file). Using Batch mode with 5 parallel workers, they process all 120 files over approximately 90 minutes. A custom preset uses: PERSON, EMAIL_ADDRESS, PHONE_NUMBER, IBAN_CODE, IP_ADDRESS, CREDIT_CARD, and a custom entity for product serial numbers (regex: SN-[A-Z0-9]{10}). Replace operator ensures irreversible anonymization. Processing history is exported as CSV for Art. 11 technical documentation.

The Results

Result
  • 240,000 records anonymized — 6 PII categories + 1 custom entity processed in 90 minutes
  • Anonymized dataset exits GDPR scope — no lawful basis required for training, no data subject rights apply
  • EU AI Act Art. 10 data governance requirement met — documented in technical file
  • Training data never left the EU data center — full data residency maintained
  • No DPA required with the training infrastructure provider — anonymized data only
  • Processing history CSV provides audit trail for Art. 11 technical documentation

Training Data Formats Supported

For datasets larger than per-file limits, split into chunks and process with Batch mode. Up to 20 files processed simultaneously with the Pro plan.

EU AI Act Art. 10 Documentation

After anonymizing training data, document the following in the AI system's technical file (Art. 11):

Read the full EU AI Act guide. EU AI Act Art. 10 compliance →

Frequently Asked Questions

How do I remove PII from AI training data for GDPR and EU AI Act compliance?

Load training files (JSON, CSV, TXT, XLSX) into anonym.plus. Select the GDPR Compliance preset or configure entity types. Choose Replace operator for permanent anonymization. Process in Batch mode for large datasets. Anonymized output exits GDPR scope and meets EU AI Act Art. 10 data governance requirements.

Does anonym.plus process JSONL format training datasets?

Yes. JSON and JSONL files (30 MB) are supported. anonym.plus parses text fields and replaces detected PII with labels. Structure is preserved — the JSONL file remains valid for training pipelines after anonymization.