GoldenMatch provides 24 CLI commands via goldenmatch <command>. All commands support --help.
pip install goldenmatch
goldenmatch --version
Run AutoConfigController and print the committed config + telemetry. Does not run the pipeline — useful for piping into a YAML file or inspecting what auto-config would decide before committing to a full run.
# Print YAML config to stdout, telemetry panel to stderr
goldenmatch autoconfig customers.csv
# Save the config to disk; panel still goes to stderr
goldenmatch autoconfig customers.csv --out goldenmatch.yml
# Pin a domain rulebook
goldenmatch autoconfig products.csv --domain electronics
# Include indicator priors + decision trace in the panel
goldenmatch autoconfig customers.csv --verbose
# CI-friendly: swap the rich panel for a one-line status string
goldenmatch autoconfig customers.csv --hide-controller
The panel surfaces the controller’s stop_reason, health verdict, complexity profile cells, indicator column priors (with --verbose), refit decisions, and Path Y · N NE indicators on committed matchkeys. Same JSON shape the web UI’s /api/v1/controller/telemetry endpoint returns.
Deduplicate one or more files.
# Zero-config (auto-detects columns, scorers, blocking)
goldenmatch dedupe customers.csv
# With config
goldenmatch dedupe customers.csv --config config.yaml --output-all --output-dir results/
# Multiple files
goldenmatch dedupe crm.csv marketing.csv --config config.yaml
# With LLM scorer
goldenmatch dedupe products.csv --config config.yaml --llm-scorer
# With anomaly detection
goldenmatch dedupe customers.csv --anomalies
# Zero-config path: render the controller telemetry panel before the report
# (default ON when auto-config fires; suppressed automatically with --config)
goldenmatch dedupe customers.csv # panel surfaces stop_reason, health, decisions, Path Y NE
# Hide the controller panel (useful in CI logs)
goldenmatch dedupe customers.csv --hide-controller
# Preview changes before writing
goldenmatch dedupe customers.csv --preview
# Generate HTML report
goldenmatch dedupe customers.csv --html-report
# Before/after dashboard
goldenmatch dedupe customers.csv --dashboard
# Diff report
goldenmatch dedupe customers.csv --diff --diff-html
# Chunked processing for large files
goldenmatch dedupe huge.csv --chunked
# Ray distributed backend
goldenmatch dedupe huge.parquet --backend ray
# Cloud storage
goldenmatch dedupe s3://bucket/customers.csv
Match a target file against reference files.
goldenmatch match targets.csv --against reference.csv --config config.yaml --output-all
Run a built-in demo with sample data. No files needed.
goldenmatch demo
Launch the interactive terminal UI.
goldenmatch interactive customers.csv
goldenmatch interactive customers.csv --config config.yaml
Measure matching quality against ground truth pairs.
goldenmatch evaluate data.csv --config config.yaml --gt ground_truth.csv
# CI/CD quality gates
goldenmatch evaluate data.csv --config config.yaml --gt gt.csv \
--min-f1 0.90 --min-precision 0.80 --min-recall 0.70
Exits with code 1 if thresholds are not met. Ground truth CSV must have id_a and id_b columns (configurable).
Match new CSV records against an existing base dataset.
goldenmatch incremental base.csv --new new_records.csv --config config.yaml
Handles exact matchkeys via Polars join and fuzzy matchkeys via match_one brute-force.
Privacy-preserving record linkage between two files.
goldenmatch pprl link party_a.csv party_b.csv --security-level high
goldenmatch pprl link a.csv b.csv --fields first_name last_name dob zip --threshold 0.85
Analyze data and recommend PPRL parameters.
goldenmatch pprl auto-config data.csv
Build ground truth by labeling record pairs interactively. Type y (match), n (no match), or s (skip).
goldenmatch label customers.csv --config config.yaml --gt ground_truth.csv
Start the REST API server for real-time matching.
goldenmatch serve --file customers.csv --config config.yaml --port 8080
See REST API for endpoint details.
Start the MCP server for Claude Desktop integration.
goldenmatch mcp-serve --file customers.csv --config config.yaml
See MCP for tool details.
Remove a record from its cluster (per-entity unmerge).
goldenmatch unmerge RECORD_ID --run-dir results/
Explain why two records matched.
goldenmatch explain ID_A ID_B --run-dir results/
Generate a before/after change report.
goldenmatch diff --run-dir results/ --html
Undo a previous merge run.
goldenmatch rollback RUN_ID --run-dir results/
List previous runs available for rollback.
goldenmatch runs --run-dir results/
Multi-table entity resolution with cross-relationship evidence propagation.
goldenmatch graph --entities people.csv companies.csv --relationships edges.csv --config config.yaml
Detect fake emails, placeholder data, and suspicious records.
goldenmatch anomaly customers.csv
Generate a detailed HTML match report.
goldenmatch report --run-dir results/ --output report.html
Generate a before/after data quality dashboard.
goldenmatch dashboard --run-dir results/ --output dashboard.html
Auto-map columns between different schemas.
goldenmatch schema-match file_a.csv file_b.csv
Watch a database table and match new records continuously.
goldenmatch watch --table customers --connection-string "$DATABASE_URL" --interval 30
# Daemon mode with health endpoint and PID file
goldenmatch watch --table customers --connection-string "$DATABASE_URL" --daemon
Inspect, train, and move the Learning Memory store. Requires memory.enabled = true in your config (see Configuration).
# Inspect what's stored
goldenmatch memory stats --config goldenmatch.yml
goldenmatch memory show --config goldenmatch.yml --limit 50
# Force a learning pass (otherwise auto-runs at next pipeline call)
goldenmatch memory learn --config goldenmatch.yml
# Move memory between environments
goldenmatch memory export --config goldenmatch.yml --output corrections.jsonl
goldenmatch memory import --config goldenmatch.yml --input corrections.jsonl
| Subcommand | Purpose |
|---|---|
memory stats |
Counts by source / decision, learned threshold deltas, last-learned timestamp. |
memory show |
List recent corrections with reason and trust. |
memory learn |
Run the threshold learner over the current store. |
memory export |
JSONL dump of all corrections (one record per line). |
memory import |
Bulk-load corrections from JSONL with trust-based upsert. |
Full guide: Learning Memory.
| Command | Description |
|---|---|
goldenmatch setup |
Interactive setup wizard (GPU, API keys, database) |
goldenmatch init |
Interactive config wizard |
goldenmatch profile FILE |
Profile data quality |
goldenmatch sync --table TABLE |
Sync database table |
goldenmatch schedule --every 1h FILE |
Run on a schedule |
goldenmatch config save/load/list/show |
Manage config presets |
goldenmatch analyze-blocking FILE -c config.yaml |
Suggest blocking strategies |
goldenmatch compare-clusters A.json B.json |
Compare two clustering outcomes (CCMS) |
goldenmatch sensitivity FILE -c config.yaml --sweep threshold:0.7:0.95:0.05 |
Parameter sensitivity analysis |
| Flag | Available On | Description |
|---|---|---|
--config, -c |
dedupe, match | Path to YAML config file |
--output-all |
dedupe, match | Write golden, dupes, unique, lineage |
--output-dir |
dedupe, match | Output directory |
--llm-scorer |
dedupe | Enable LLM scoring for borderline pairs |
--llm-boost |
dedupe | LLM-labeled training + fine-tuning |
--backend ray |
dedupe, match | Use Ray distributed backend |
--preview |
dedupe | Show merge preview before writing |
--anomalies |
dedupe | Run anomaly detection |
--dashboard |
dedupe | Generate HTML dashboard |
--html-report |
dedupe | Generate HTML match report |
--diff |
dedupe | Generate diff report |
--chunked |
dedupe | Process in chunks for large files |