CLI Reference

GoldenMatch provides 24 CLI commands via goldenmatch <command>. All commands support --help.

pip install goldenmatch
goldenmatch --version

autoconfig

Run AutoConfigController and print the committed config + telemetry. Does not run the pipeline — useful for piping into a YAML file or inspecting what auto-config would decide before committing to a full run.

# Print YAML config to stdout, telemetry panel to stderr
goldenmatch autoconfig customers.csv

# Save the config to disk; panel still goes to stderr
goldenmatch autoconfig customers.csv --out goldenmatch.yml

# Pin a domain rulebook
goldenmatch autoconfig products.csv --domain electronics

# Include indicator priors + decision trace in the panel
goldenmatch autoconfig customers.csv --verbose

# CI-friendly: swap the rich panel for a one-line status string
goldenmatch autoconfig customers.csv --hide-controller

The panel surfaces the controller’s stop_reason, health verdict, complexity profile cells, indicator column priors (with --verbose), refit decisions, and Path Y · N NE indicators on committed matchkeys. Same JSON shape the web UI’s /api/v1/controller/telemetry endpoint returns.


dedupe

Deduplicate one or more files.

# Zero-config (auto-detects columns, scorers, blocking)
goldenmatch dedupe customers.csv

# With config
goldenmatch dedupe customers.csv --config config.yaml --output-all --output-dir results/

# Multiple files
goldenmatch dedupe crm.csv marketing.csv --config config.yaml

# With LLM scorer
goldenmatch dedupe products.csv --config config.yaml --llm-scorer

# With anomaly detection
goldenmatch dedupe customers.csv --anomalies

# Zero-config path: render the controller telemetry panel before the report
# (default ON when auto-config fires; suppressed automatically with --config)
goldenmatch dedupe customers.csv  # panel surfaces stop_reason, health, decisions, Path Y NE

# Hide the controller panel (useful in CI logs)
goldenmatch dedupe customers.csv --hide-controller

# Preview changes before writing
goldenmatch dedupe customers.csv --preview

# Generate HTML report
goldenmatch dedupe customers.csv --html-report

# Before/after dashboard
goldenmatch dedupe customers.csv --dashboard

# Diff report
goldenmatch dedupe customers.csv --diff --diff-html

# Chunked processing for large files
goldenmatch dedupe huge.csv --chunked

# Ray distributed backend
goldenmatch dedupe huge.parquet --backend ray

# Cloud storage
goldenmatch dedupe s3://bucket/customers.csv

match

Match a target file against reference files.

goldenmatch match targets.csv --against reference.csv --config config.yaml --output-all

demo

Run a built-in demo with sample data. No files needed.

goldenmatch demo

tui / interactive

Launch the interactive terminal UI.

goldenmatch interactive customers.csv
goldenmatch interactive customers.csv --config config.yaml

evaluate

Measure matching quality against ground truth pairs.

goldenmatch evaluate data.csv --config config.yaml --gt ground_truth.csv

# CI/CD quality gates
goldenmatch evaluate data.csv --config config.yaml --gt gt.csv \
    --min-f1 0.90 --min-precision 0.80 --min-recall 0.70

Exits with code 1 if thresholds are not met. Ground truth CSV must have id_a and id_b columns (configurable).


incremental

Match new CSV records against an existing base dataset.

goldenmatch incremental base.csv --new new_records.csv --config config.yaml

Handles exact matchkeys via Polars join and fuzzy matchkeys via match_one brute-force.


Privacy-preserving record linkage between two files.

goldenmatch pprl link party_a.csv party_b.csv --security-level high
goldenmatch pprl link a.csv b.csv --fields first_name last_name dob zip --threshold 0.85

pprl auto-config

Analyze data and recommend PPRL parameters.

goldenmatch pprl auto-config data.csv

label

Build ground truth by labeling record pairs interactively. Type y (match), n (no match), or s (skip).

goldenmatch label customers.csv --config config.yaml --gt ground_truth.csv

serve

Start the REST API server for real-time matching.

goldenmatch serve --file customers.csv --config config.yaml --port 8080

See REST API for endpoint details.


mcp-serve

Start the MCP server for Claude Desktop integration.

goldenmatch mcp-serve --file customers.csv --config config.yaml

See MCP for tool details.


unmerge

Remove a record from its cluster (per-entity unmerge).

goldenmatch unmerge RECORD_ID --run-dir results/

explain

Explain why two records matched.

goldenmatch explain ID_A ID_B --run-dir results/

diff

Generate a before/after change report.

goldenmatch diff --run-dir results/ --html

rollback

Undo a previous merge run.

goldenmatch rollback RUN_ID --run-dir results/

runs

List previous runs available for rollback.

goldenmatch runs --run-dir results/

graph

Multi-table entity resolution with cross-relationship evidence propagation.

goldenmatch graph --entities people.csv companies.csv --relationships edges.csv --config config.yaml

anomaly

Detect fake emails, placeholder data, and suspicious records.

goldenmatch anomaly customers.csv

report

Generate a detailed HTML match report.

goldenmatch report --run-dir results/ --output report.html

dashboard

Generate a before/after data quality dashboard.

goldenmatch dashboard --run-dir results/ --output dashboard.html

schema-match

Auto-map columns between different schemas.

goldenmatch schema-match file_a.csv file_b.csv

watch

Watch a database table and match new records continuously.

goldenmatch watch --table customers --connection-string "$DATABASE_URL" --interval 30

# Daemon mode with health endpoint and PID file
goldenmatch watch --table customers --connection-string "$DATABASE_URL" --daemon

memory (v1.6.0)

Inspect, train, and move the Learning Memory store. Requires memory.enabled = true in your config (see Configuration).

# Inspect what's stored
goldenmatch memory stats --config goldenmatch.yml
goldenmatch memory show  --config goldenmatch.yml --limit 50

# Force a learning pass (otherwise auto-runs at next pipeline call)
goldenmatch memory learn --config goldenmatch.yml

# Move memory between environments
goldenmatch memory export --config goldenmatch.yml --output corrections.jsonl
goldenmatch memory import --config goldenmatch.yml --input  corrections.jsonl
Subcommand Purpose
memory stats Counts by source / decision, learned threshold deltas, last-learned timestamp.
memory show List recent corrections with reason and trust.
memory learn Run the threshold learner over the current store.
memory export JSONL dump of all corrections (one record per line).
memory import Bulk-load corrections from JSONL with trust-based upsert.

Full guide: Learning Memory.


Other commands

Command Description
goldenmatch setup Interactive setup wizard (GPU, API keys, database)
goldenmatch init Interactive config wizard
goldenmatch profile FILE Profile data quality
goldenmatch sync --table TABLE Sync database table
goldenmatch schedule --every 1h FILE Run on a schedule
goldenmatch config save/load/list/show Manage config presets
goldenmatch analyze-blocking FILE -c config.yaml Suggest blocking strategies
goldenmatch compare-clusters A.json B.json Compare two clustering outcomes (CCMS)
goldenmatch sensitivity FILE -c config.yaml --sweep threshold:0.7:0.95:0.05 Parameter sensitivity analysis

Common flags

Flag Available On Description
--config, -c dedupe, match Path to YAML config file
--output-all dedupe, match Write golden, dupes, unique, lineage
--output-dir dedupe, match Output directory
--llm-scorer dedupe Enable LLM scoring for borderline pairs
--llm-boost dedupe LLM-labeled training + fine-tuning
--backend ray dedupe, match Use Ray distributed backend
--preview dedupe Show merge preview before writing
--anomalies dedupe Run anomaly detection
--dashboard dedupe Generate HTML dashboard
--html-report dedupe Generate HTML match report
--diff dedupe Generate diff report
--chunked dedupe Process in chunks for large files