CLI Reference
GoldenMatch provides 21 CLI commands via goldenmatch <command>. All commands support --help.
pip install goldenmatch
goldenmatch --version
dedupe
Deduplicate one or more files.
# Zero-config (auto-detects columns, scorers, blocking)
goldenmatch dedupe customers.csv
# With config
goldenmatch dedupe customers.csv --config config.yaml --output-all --output-dir results/
# Multiple files
goldenmatch dedupe crm.csv marketing.csv --config config.yaml
# With LLM scorer
goldenmatch dedupe products.csv --config config.yaml --llm-scorer
# With anomaly detection
goldenmatch dedupe customers.csv --anomalies
# Preview changes before writing
goldenmatch dedupe customers.csv --preview
# Generate HTML report
goldenmatch dedupe customers.csv --html-report
# Before/after dashboard
goldenmatch dedupe customers.csv --dashboard
# Diff report
goldenmatch dedupe customers.csv --diff --diff-html
# Chunked processing for large files
goldenmatch dedupe huge.csv --chunked
# Ray distributed backend
goldenmatch dedupe huge.parquet --backend ray
# Cloud storage
goldenmatch dedupe s3://bucket/customers.csv
match
Match a target file against reference files.
goldenmatch match targets.csv --against reference.csv --config config.yaml --output-all
demo
Run a built-in demo with sample data. No files needed.
goldenmatch demo
tui / interactive
Launch the interactive terminal UI.
goldenmatch interactive customers.csv
goldenmatch interactive customers.csv --config config.yaml
evaluate
Measure matching quality against ground truth pairs.
goldenmatch evaluate data.csv --config config.yaml --gt ground_truth.csv
# CI/CD quality gates
goldenmatch evaluate data.csv --config config.yaml --gt gt.csv \
--min-f1 0.90 --min-precision 0.80 --min-recall 0.70
Exits with code 1 if thresholds are not met. Ground truth CSV must have id_a and id_b columns (configurable).
incremental
Match new CSV records against an existing base dataset.
goldenmatch incremental base.csv --new new_records.csv --config config.yaml
Handles exact matchkeys via Polars join and fuzzy matchkeys via match_one brute-force.
pprl link
Privacy-preserving record linkage between two files.
goldenmatch pprl link party_a.csv party_b.csv --security-level high
goldenmatch pprl link a.csv b.csv --fields first_name last_name dob zip --threshold 0.85
pprl auto-config
Analyze data and recommend PPRL parameters.
goldenmatch pprl auto-config data.csv
label
Build ground truth by labeling record pairs interactively. Type y (match), n (no match), or s (skip).
goldenmatch label customers.csv --config config.yaml --gt ground_truth.csv
serve
Start the REST API server for real-time matching.
goldenmatch serve --file customers.csv --config config.yaml --port 8080
See REST API for endpoint details.
mcp-serve
Start the MCP server for Claude Desktop integration.
goldenmatch mcp-serve --file customers.csv --config config.yaml
See MCP for tool details.
unmerge
Remove a record from its cluster (per-entity unmerge).
goldenmatch unmerge RECORD_ID --run-dir results/
explain
Explain why two records matched.
goldenmatch explain ID_A ID_B --run-dir results/
diff
Generate a before/after change report.
goldenmatch diff --run-dir results/ --html
rollback
Undo a previous merge run.
goldenmatch rollback RUN_ID --run-dir results/
runs
List previous runs available for rollback.
goldenmatch runs --run-dir results/
graph
Multi-table entity resolution with cross-relationship evidence propagation.
goldenmatch graph --entities people.csv companies.csv --relationships edges.csv --config config.yaml
anomaly
Detect fake emails, placeholder data, and suspicious records.
goldenmatch anomaly customers.csv
report
Generate a detailed HTML match report.
goldenmatch report --run-dir results/ --output report.html
dashboard
Generate a before/after data quality dashboard.
goldenmatch dashboard --run-dir results/ --output dashboard.html
schema-match
Auto-map columns between different schemas.
goldenmatch schema-match file_a.csv file_b.csv
watch
Watch a database table and match new records continuously.
goldenmatch watch --table customers --connection-string "$DATABASE_URL" --interval 30
# Daemon mode with health endpoint and PID file
goldenmatch watch --table customers --connection-string "$DATABASE_URL" --daemon
Other commands
| Command | Description |
|---|---|
goldenmatch setup | Interactive setup wizard (GPU, API keys, database) |
goldenmatch init | Interactive config wizard |
goldenmatch profile FILE | Profile data quality |
goldenmatch sync --table TABLE | Sync database table |
goldenmatch schedule --every 1h FILE | Run on a schedule |
goldenmatch config save/load/list/show | Manage config presets |
goldenmatch analyze-blocking FILE -c config.yaml | Suggest blocking strategies |
Common flags
| Flag | Available On | Description |
|---|---|---|
--config, -c | dedupe, match | Path to YAML config file |
--output-all | dedupe, match | Write golden, dupes, unique, lineage |
--output-dir | dedupe, match | Output directory |
--llm-scorer | dedupe | Enable LLM scoring for borderline pairs |
--llm-boost | dedupe | LLM-labeled training + fine-tuning |
--backend ray | dedupe, match | Use Ray distributed backend |
--preview | dedupe | Show merge preview before writing |
--anomalies | dedupe | Run anomaly detection |
--dashboard | dedupe | Generate HTML dashboard |
--html-report | dedupe | Generate HTML match report |
--diff | dedupe | Generate diff report |
--chunked | dedupe | Process in chunks for large files |