Glossary
Key terms for PII detection, document anonymization, and encryption.
This glossary defines the technical terms used throughout the anonym.plus documentation, security architecture, and user interface. Terms are relevant to PII detection, NLP, cryptography, and data privacy regulations.
A
AES-256-GCM
Advanced Encryption Standard with 256-bit keys in Galois/Counter Mode. An authenticated encryption algorithm that provides both confidentiality and integrity. Used by anonym.plus for vault encryption and reversible document anonymization.
Anonymization
The process of removing, replacing, or obscuring personally identifiable information (PII) in documents so that individuals cannot be re-identified. anonym.plus offers five methods: replace, redact, mask, hash, and encrypt.
Argon2id
A memory-hard password hashing and key derivation function. Combines Argon2i (side-channel resistant) and Argon2d (GPU-resistant). anonym.plus uses Argon2id with 64 MB memory cost and 3 iterations to derive vault encryption keys from user passwords.
B
Batch Processing
Processing multiple files simultaneously through the anonymization pipeline. anonym.plus supports 1-5 parallel files with configurable error handling and auto-approve mode. Requires a Pro license.
BIP39 (Bitcoin Improvement Proposal 39)
A standard for generating mnemonic recovery phrases from random entropy. anonym.plus generates a 24-word BIP39 phrase (256 bits of entropy) during vault setup as the only recovery mechanism if the user forgets their PIN.
C
Confidence Threshold
A score (0.50 to 1.00) that controls how certain the detection engine must be before reporting a PII entity. Lower thresholds catch more entities but increase false positives. Financial presets use 0.95; development presets use 0.70.
Custom Entity
A user-defined PII type using regex patterns. anonym.plus supports up to 50 custom entities with up to 10 patterns each, context words, and ReDoS-safe validation. Detected alongside the 340+ built-in types.
D
Deanonymization
The reverse process of restoring original PII values in an anonymized document. Only possible when the encrypt operator was used (AES-256-GCM). Replace, redact, mask, and hash are irreversible by design. anonym.plus supports auto-matching against processing history.
Detection Preset
A saved configuration specifying which entity types to detect, the confidence threshold, and optional per-entity operators. anonym.plus includes 121 built-in presets across 7 categories: Auto, Country-specific, Regional, Technical/DevSecOps, Industry, Healthcare, and Financial.
E
E-E-A-T (Experience, Expertise, Authoritativeness, Trustworthiness)
Google's quality evaluation framework for web content. Demonstrates that content creators have direct experience with and expertise in their subject matter.
Encrypt Operator
An anonymization method that replaces PII with AES-256-GCM encrypted ciphertext. Unlike replace, redact, mask, or hash, encrypted entities can be decrypted later using the same key — enabling the "encrypt, share, edit, decrypt" workflow.
Entity Type
A category of PII that the detection engine can identify. Examples: PERSON, EMAIL_ADDRESS, US_SSN, CREDIT_CARD, DE_TAX_ID. anonym.plus detects 340+ entity types across 18 groups.
G
GDPR (General Data Protection Regulation)
EU regulation governing the processing of personal data. Requires data minimization, purpose limitation, and protection of personal data. anonym.plus helps organizations comply by detecting and removing PII before sharing documents.
H
Hash Operator
An anonymization method that replaces PII with a one-way cryptographic hash (SHA-256, SHA-512, or MD5). Irreversible — the original value cannot be recovered. Useful when you need consistent pseudonymization (the same input always produces the same hash).
HIPAA (Health Insurance Portability and Accountability Act)
US federal law that protects sensitive patient health information (PHI). Requires covered entities to implement safeguards for electronic health data. anonym.plus includes HIPAA-specific detection presets.
K
Key Derivation
The process of generating cryptographic keys from a password or passphrase. anonym.plus uses Argon2id to derive a 256-bit AES key from the user's vault password, making brute-force attacks computationally expensive.
Key Rotation
Replacing an encryption key with a new one. In anonym.plus, rotating a key permanently replaces the old key material. Documents encrypted with the old key require the old key value for deanonymization.
L
LLM (Large Language Model)
An AI model trained on large text corpora that can generate and understand human language. Examples: ChatGPT, Claude, Gemini. anonym.plus helps users redact sensitive data before sending text to LLMs.
M
Mask Operator
An anonymization method that partially hides PII by replacing characters with a mask character (default: *). Configurable mask count. Example: "4111-1111-1111" becomes "XXXX-XXXX-1111". Irreversible.
MCP (Model Context Protocol)
An open standard (by Anthropic) for connecting AI tools to external data sources and services. anonym.plus includes an MCP server that automatically anonymizes text before it reaches AI tools like Cursor or Claude Desktop, and restores original values in responses.
N
NER (Named Entity Recognition)
A natural language processing (NLP) technique that identifies and classifies named entities (people, places, organizations, dates) in text. anonym.plus uses spaCy NER models combined with Presidio's regex-based recognizers for hybrid detection.
NLP (Natural Language Processing)
A field of AI that deals with the interaction between computers and human language. anonym.plus uses NLP via spaCy to understand text context and detect PII entities that simple regex patterns would miss.
O
OCR (Optical Character Recognition)
Technology that extracts text from images. anonym.plus uses Tesseract OCR to extract text from PNG, JPG, BMP, and TIFF images with character-level bounding boxes, enabling redaction of PII directly on the image. Supports 38 OCR languages.
Operator
An anonymization method applied to a detected PII entity. anonym.plus supports five operators: replace, redact, mask, hash, and encrypt. Each can be configured per entity type within a detection preset.
P
PHI (Protected Health Information)
Health-related data that can identify an individual, protected under HIPAA. Includes medical records, lab results, insurance information, and any health data linked to a specific person.
PII (Personally Identifiable Information)
Any data that can be used to identify a specific individual. Includes names, email addresses, phone numbers, social security numbers, passport numbers, IP addresses, and financial account numbers. anonym.plus detects 200+ PII entity types.
Presidio
An open-source PII detection and anonymization framework by Microsoft. Combines NLP-based NER with configurable regex pattern recognizers. anonym.plus bundles Presidio as a local sidecar process — no cloud API calls are made.
R
Redact Operator
An anonymization method that completely removes PII text, replacing it with block characters (e.g., "john@mail.com" becomes "\u2588\u2588\u2588\u2588\u2588\u2588\u2588"). Irreversible. Leaves no trace of the original value.
Replace Operator
An anonymization method that substitutes PII with a typed placeholder. Example: "John Smith" becomes "<PERSON>". The default and most commonly used operator. Irreversible — the original value is not stored.
S
Sidecar
A companion process that runs alongside the main application. anonym.plus uses a Python sidecar process to run Presidio and spaCy for PII detection. Communication happens over a local HTTP interface with token-based authentication.
spaCy
An open-source NLP library for advanced natural language processing. Provides the NER (named entity recognition) models that anonym.plus uses to detect person names, locations, organizations, and dates in text. 23 language models available.
T
Tauri
A framework for building desktop applications with web technologies (HTML/CSS/JS) and a Rust backend. anonym.plus uses Tauri for its desktop app, with Rust handling encryption, file I/O, and anonymization operators.
Tesseract
An open-source OCR engine maintained by Google. anonym.plus bundles Tesseract for extracting text from images with character-level bounding box data, enabling precise PII redaction on scanned documents and photos.
V
Vault
anonym.plus's encrypted local storage for sensitive data including encryption keys, processing history, presets, and credentials. Protected with AES-256-GCM encryption, Argon2id key derivation, and an optional PIN or 24-word BIP39 recovery phrase.
Z
Zero-Knowledge Architecture
A system design where the server cannot access user data even if compromised. In anonym.plus, passwords are hashed client-side before transmission, encryption keys never leave the local vault, and the frontend references keys only by ID — actual key material stays in the Rust backend.
References
- Microsoft Presidio — PII detection framework
- spaCy — NLP library for named entity recognition
- GDPR full text — General Data Protection Regulation
- HIPAA — U.S. Department of Health & Human Services
- Model Context Protocol — MCP specification
35 terms defined. See also: Entity Types Reference and Documentation.