When does the EU AI Act apply to training data?

The EU AI Act (Regulation 2024/1689) entered into force on August 1, 2024. High-risk AI system obligations, including Article 10 data governance requirements, apply from August 2, 2026. Organizations training, fine-tuning, or deploying high-risk AI systems in the EU must have compliant data governance practices in place before this date.

Do I need to anonymize training data for EU AI Act compliance?

Anonymization is the most practical path to EU AI Act Art. 10 compliance for training data containing personal information. Anonymized data (GDPR Recital 26) is not personal data and can be used freely for AI training without GDPR lawful basis requirements, data subject rights exposure, or cross-border transfer restrictions. anonym.plus anonymizes training datasets (CSV, JSON, TXT, XLSX) entirely offline — your training data never leaves your infrastructure.

EU AI Act Art. 10: Training Data Requirements

Q: What does EU AI Act Article 10 require for training data?

EU AI Act Article 10 requires providers of high-risk AI systems to implement data governance practices covering: the design choices for training, validation, and testing data; data collection processes; relevant preprocessing operations; formulation of assumptions about the data; assessment of availability, quantity, and suitability of data; examination of possible biases; identification of relevant gaps or shortcomings. For training data containing personal data, Article 10(5) permits processing of special categories of personal data for bias monitoring purposes under strict conditions.

By George Curta · Published March 17, 2026 · 8 min read · EU AI Act Compliance

Deadline: August 2, 2026. High-risk AI system obligations under EU AI Act (Regulation 2024/1689) apply from this date. Organizations using personal data in AI training datasets must have compliant data governance practices in place.

The EU AI Act imposes data governance obligations on providers of high-risk AI systems under Article 10. For any training dataset that contains personal data, the fastest path to compliance is anonymization — removing PII before it ever enters the training pipeline. anonym.plus processes training datasets entirely offline, keeping your data inside your infrastructure.

Who Is Affected by EU AI Act Art. 10

Article 10 applies to providers of high-risk AI systems — organizations that develop, train, or deploy AI systems listed in Annex III of the EU AI Act. These include:

AI systems for biometric identification and categorization
AI used in critical infrastructure (transport, energy, water)
Educational and vocational training AI
AI in employment decisions (hiring, HR management, worker monitoring)
Essential private and public services (credit scoring, insurance risk assessment)
Law enforcement AI
Migration, asylum, and border control AI
AI in administration of justice

Organizations that fine-tune foundation models (GPT-4, Claude, Llama) on their proprietary datasets for these purposes are also covered.

What Article 10 Requires for Training Data

Article 10 mandates that training, validation, and testing data must:

Be relevant, representative, and free from errors for the intended purpose
Have appropriate statistical properties for the AI's use case
Take into account biases that could lead to prohibited discrimination
Be subject to documented data governance practices — covering origin, collection methods, preprocessing, and known limitations
Not contain personal data — unless Art. 10(5) exceptional processing conditions apply (bias monitoring and correction of high-risk AI, under strict safeguards)

The default expectation is that training data for high-risk AI does not contain personal data. If it does, organizations must demonstrate a specific lawful basis and apply strict technical safeguards.

Anonymization as the Compliance Path

Removing personal data from training datasets before the AI training pipeline begins is the most straightforward route to Art. 10 compliance:

Anonymized training data is not personal data (GDPR Recital 26). No GDPR lawful basis required for training. No data subject rights apply to the dataset. No DPA needed for processors handling the dataset.
Art. 10's default requirement is met — the training data does not contain personal data.
Data governance documentation is simplified — you document that PII was removed, what entity types were detected, and what tool was used.

Training Data Formats Supported by anonym.plus

Format	Typical Use in AI Training	Max Size
CSV	Tabular datasets, labeled examples	30 MB
JSON / JSONL	Instruction tuning datasets, chat logs, annotations	30 MB
TXT	Pretraining corpora, raw text documents	50 MB
XLSX	Structured training labels, human-annotated data	20 MB / 100K rows
PDF	Document corpora, legal/medical training text	50 MB
DOCX	Annotated text documents, knowledge bases	30 MB

For large datasets above these limits, process files in batches using anonym.plus batch mode (Pro plan). All processing is 100% offline — training data never leaves your infrastructure.

Which PII to Remove from Training Data

For EU AI Act compliance, prioritize removing:

Direct identifiers: names, email addresses, phone numbers, national IDs, passport numbers
Quasi-identifiers: dates of birth, job titles, postal codes, rare combinations of demographic attributes
Special categories (Art. 9 GDPR): health data, racial/ethnic origin indicators, religious beliefs, political opinions, union membership, sexual orientation
Financial data: IBANs, credit card numbers, account numbers
Location data: precise GPS coordinates, home addresses, frequently visited places

anonym.plus detects all of these through 340+ built-in entity types. The GDPR Compliance preset (confidence 0.90) is the recommended starting point for training data preparation.

Documenting Compliance for Art. 10

After anonymizing your training datasets, document the following in your AI system's technical documentation (required under Art. 11):

Data sources and collection methods
PII removal method: anonym.plus v[x.x], Replace operator, GDPR Compliance preset, confidence threshold 0.90
Entity types detected and replaced
Date of processing and dataset version
Any residual risks identified and mitigations applied

anonym.plus creates a processing history entry for each file, including entity counts, operator used, and timestamp — supporting this documentation requirement.

Start preparing your training data now. Learn how batch processing works →

Frequently Asked Questions

What does EU AI Act Article 10 require for training data?

Art. 10 requires high-risk AI training data to be relevant, representative, properly governed, and — by default — free of personal data. Organizations must document data origin, preprocessing steps, and any biases. Anonymization is the primary compliance mechanism for training data containing personal information.

When does the EU AI Act training data requirement take effect?

August 2, 2026. The EU AI Act entered into force August 1, 2024; high-risk AI system obligations apply 24 months later. Organizations should begin data governance and anonymization preparation well before this deadline.

Does anonym.plus support large training datasets for EU AI Act compliance?

Yes. Use Batch mode (Pro plan) to process up to 20 files in parallel. Supported formats include CSV, JSON, TXT, XLSX, PDF, and DOCX. All processing is 100% offline — training data never leaves your servers. For very large datasets, process in batches by splitting files.

EU AI Act Art. 10 Explained: Training Data Requirements for High-Risk AI