Entity resolution that finds duplicates in your data so you don’t have to define the rules yourself.
What It Does
GoldenMatch takes messy records and figures out which ones refer to the same entity — without requiring you to hand-write matching rules.
INGEST → STANDARDIZE → BLOCK → SCORE → CLUSTER → GOLDEN RECORD
Step
What Happens
Ingest
Load CSV, Excel, Parquet, or a DataFrame
Standardize
Normalize casing, whitespace, phonetic encoding
Block
Group candidates to avoid N^2 comparisons
Score
Fuzzy match (jaro-winkler, levenshtein, token sort)
Cluster
Union-Find with confidence scoring
Golden
Merge clusters into canonical records
Quick Install
pip install goldenmatch
importgoldenmatchasgmresult=gm.dedupe("customers.csv",exact=["email"],fuzzy={"name":0.85})print(f"{result.total_clusters} clusters, {result.match_rate:.0%} match rate")result.golden.write_csv("golden_records.csv")
Benchmarks
Dataset
Records
Method
F1
Time
DBLP-ACM (academic)
4,910
Fuzzy matching
97.2%
2.1s
Abt-Buy (electronics)
2,162
Domain + LLM
72.2%
4.2s
FEBRL4 (PPRL)
10,000
Auto-config bloom filters
92.4%
14s
Synthetic
100K
Fuzzy (name+zip)
–
12.8s
Synthetic
1M
Exact dedupe
–
7.8s
Scale: 7,823 records/sec on a laptop (fuzzy + exact + golden).