Installation
Install GoldenMatch from PyPI with pip. Optional extras add embeddings, LLM scoring, database sync, and more.
pip (recommended)
pip install goldenmatch
Requires Python 3.11 or later. Core dependencies: Polars, RapidFuzz, Typer, Pydantic, Textual.
Optional extras
pip install goldenmatch[embeddings] # sentence-transformers + FAISS
pip install goldenmatch[llm] # Claude/OpenAI for LLM scoring
pip install goldenmatch[postgres] # PostgreSQL database sync
pip install goldenmatch[snowflake] # Snowflake connector
pip install goldenmatch[bigquery] # BigQuery connector
pip install goldenmatch[databricks] # Databricks connector
pip install goldenmatch[salesforce] # Salesforce connector
pip install goldenmatch[duckdb] # DuckDB out-of-core backend
pip install goldenmatch[quality] # GoldenCheck data quality scanning
pip install goldenmatch[ray] # Ray distributed backend
Install multiple extras at once:
pip install goldenmatch[embeddings,llm,postgres]
Docker
docker pull ghcr.io/benzsevern/goldenmatch:latest
# Run a dedupe
docker run --rm -v $(pwd):/data ghcr.io/benzsevern/goldenmatch:latest \
dedupe /data/customers.csv --output-dir /data/results
# Start the REST API
docker run --rm -p 8080:8080 -v $(pwd):/data ghcr.io/benzsevern/goldenmatch:latest \
serve --file /data/customers.csv --port 8080
PostgreSQL Extension
Pre-built packages for the SQL extension (separate from the Python package):
# Debian/Ubuntu
sudo dpkg -i goldenmatch-pg-0.1.0-pg16-amd64.deb
sudo systemctl restart postgresql
# RHEL/Fedora
sudo rpm -i goldenmatch-pg-0.1.0-pg16.x86_64.rpm
sudo systemctl restart postgresql
Download .deb and .rpm from the goldenmatch-extensions releases page.
DuckDB UDFs
pip install goldenmatch-duckdb
import duckdb, goldenmatch_duckdb
con = duckdb.connect()
goldenmatch_duckdb.register(con)
con.sql("SELECT goldenmatch_score('John', 'Jon', 'jaro_winkler')")
dbt Integration
pip install dbt-goldenmatch
The dbt-goldenmatch package provides macros for running entity resolution inside dbt pipelines using DuckDB.
Verify installation
goldenmatch --version
# goldenmatch 1.1.1
goldenmatch demo
# Runs a built-in demo with sample data
import goldenmatch as gm
print(gm.__version__) # "1.1.1"
Environment variables
| Variable | Purpose |
|---|---|
OPENAI_API_KEY | LLM scorer and LLM boost (OpenAI) |
ANTHROPIC_API_KEY | LLM scorer (Claude) |
DATABASE_URL | PostgreSQL connection string for sync / watch |
GOOGLE_APPLICATION_CREDENTIALS | Vertex AI embeddings (GCP service account) |
Setup wizard
Run the interactive wizard to configure GPU mode, API keys, and database connections:
goldenmatch setup
The wizard guides you through:
- GPU mode selection (CPU, CUDA, MPS, Vertex AI, Colab)
- LLM API key configuration
- PostgreSQL connection setup
- Saved preferences at
~/.goldenmatch/settings.yaml