ER Agent

GoldenMatch exposes itself as an autonomous entity resolution agent that other AI systems can discover and invoke.

An agent says “deduplicate this data” and GoldenMatch handles strategy selection, config generation, pipeline execution, and result explanation – all without human configuration.

Just want to use from Claude Desktop? See the MCP Server page instead – it’s simpler for human-in-the-loop workflows.


What is A2A?

A2A (Agent-to-Agent) is an open protocol for AI systems to discover and invoke each other. Think of it as DNS + HTTP for AI agents:

  1. An agent discovers GoldenMatch at /.well-known/agent.json (like a business card)
  2. The agent card lists skills (capabilities) with input/output schemas
  3. The calling agent sends a task, GoldenMatch processes it, returns structured results

A2A is supported by LangChain, CrewAI, AutoGen, and other agent frameworks. Use A2A when you’re building agent-to-agent workflows where no human is in the loop.


Two Protocols

Protocol Port Best For When to Use
A2A (Agent-to-Agent) 8200 AI agent frameworks (LangChain, CrewAI, AutoGen) Agent-to-agent automation, no human in the loop
MCP (Model Context Protocol) stdio Claude Desktop, Cursor, Windsurf Human-in-the-loop, interactive AI assistants

Quick Start

A2A Server

pip install goldenmatch[agent]
goldenmatch agent-serve --port 8200

Other agents discover GoldenMatch at:

GET http://localhost:8200/.well-known/agent.json

MCP (Claude Desktop)

Add to claude_desktop_config.json:

{
  "mcpServers": {
    "goldenmatch": {
      "command": "goldenmatch",
      "args": ["mcp-serve", "--file", "customers.csv"]
    }
  }
}

Agent Capabilities (12 Skills)

Skill What It Does
analyze_data Profile columns, detect domain, recommend matching strategy
configure Generate optimal YAML config from data analysis (legacy heuristic path)
autoconfig v1.7-v1.12: run AutoConfigController; return committed config + telemetry (stop_reason, decisions, NE / Path Y)
controller_telemetry v1.7-v1.12: surface controller telemetry from the most recent call (stateless A2A dispatch → returns inline note pointing callers at autoconfig / deduplicate inline telemetry)
deduplicate Full pipeline with confidence-gated output, reasoning, and telemetry (v1.7+)
match Cross-source matching with intelligent strategy selection and telemetry (v1.7+)
explain Natural language explanation for any pair or cluster
review Present borderline matches for approval
compare_strategies Run multiple approaches, report metrics
pprl Privacy-preserving mode for sensitive data
quality Scan and fix data quality issues (encoding, Unicode, format violations) using GoldenCheck
transform Normalize data formats (phone E.164, dates ISO, categorical spelling) using GoldenFlow

How It Works

When an agent calls deduplicate, GoldenMatch:

  1. Profiles the data (column types, cardinality, null rates)
  2. Detects the domain (healthcare, financial, retail, people, etc.)
  3. Selects the best strategy:
  4. Generates a config (matchkeys, blocking, scoring)
  5. Runs the pipeline with confidence gating
  6. Returns results + reasoning

Reasoning Output

Every response includes the agent’s reasoning:

{
  "results": {
    "clusters": 42,
    "match_rate": "8.4%"
  },
  "reasoning": {
    "domain_detected": "people",
    "strategy_chosen": "exact_then_fuzzy",
    "why": "Email has 92% uniqueness -- strong exact key. Name has spelling variation -- jaro_winkler at 0.85.",
    "alternatives_considered": [
      {"strategy": "pprl", "why_not": "No sensitive fields detected."},
      {"strategy": "fellegi_sunter", "why_not": "Fuzzy gives better recall for this data."}
    ],
    "confidence_distribution": {
      "auto_merged": 38,
      "review_queue": 4,
      "auto_rejected": 0
    }
  },
  "storage": "memory"
}

Confidence-Gated Review Queue

Not all matches are equal. The agent splits results by confidence:

Confidence Action Count
> 0.95 Auto-merged into golden records High-confidence pairs
0.75 - 0.95 Held in review queue for approval Borderline pairs
< 0.75 Auto-rejected Low-confidence pairs

Storage Tiers

Tier Config Persists?
Memory Default (nothing to configure) No
SQLite Create a .goldenmatch/ directory Yes (local file)
Postgres Set DATABASE_URL env var Yes (shared DB)

The agent auto-detects which tier is available and reports it in every response.

Review Queue API

from goldenmatch import AgentSession

session = AgentSession()
result = session.deduplicate("customers.csv")

# Check what needs review
pending = session.review_queue.list_pending("customers")
for item in pending:
    print(f"Pair ({item.id_a}, {item.id_b}): score={item.score}")
    print(f"  Explanation: {item.explanation}")

# Approve or reject
session.review_queue.approve("customers", 0, 1, decided_by="human")
session.review_queue.reject("customers", 2, 3, decided_by="human", reason="Different entities")

# Stats
print(session.review_queue.stats("customers"))
# {"pending": 2, "approved": 1, "rejected": 1}

Python API

from goldenmatch import AgentSession

session = AgentSession()

# Analyze data and get strategy recommendation
analysis = session.analyze("customers.csv")
print(analysis["strategy"])  # "exact_then_fuzzy"
print(analysis["why"])

# Deduplicate with full reasoning
result = session.deduplicate("customers.csv")
print(result["results"]["clusters"])
print(result["reasoning"]["why"])

# Compare strategies
comparison = session.compare_strategies("customers.csv")
for strategy, metrics in comparison.items():
    print(f"{strategy}: {metrics['clusters']} clusters, {metrics['match_rate']:.1%} match rate")

# Match two sources
matches = session.match_sources("new_customers.csv", "master.csv")

# v1.7-v1.12: explicit AutoConfigController invocation
autoconf = session.autoconfigure("customers.csv")
print(autoconf["telemetry"]["stop_reason"])     # e.g. "green"
print(autoconf["telemetry"]["health"])          # e.g. "green"
for decision in autoconf["telemetry"]["decisions"]:
    print(f"iter {decision['iteration']}: {decision['rule_name']}")

# Telemetry is also cached on `deduplicate` / `match_sources` calls
session.deduplicate("customers.csv")
print(session.last_telemetry)                    # same shape as autoconfigure's telemetry

MCP Tools (14 Agent-Level)

Tool Description
analyze_data Profile data, detect domain, recommend strategy
auto_configure Generate optimal config
agent_deduplicate Full pipeline with reasoning
agent_match_sources Cross-source matching
agent_explain_pair Explain a pair match
agent_explain_cluster Explain a cluster
agent_review_queue Get pending reviews
agent_approve_reject Process review decisions
agent_compare_strategies Compare ER approaches
suggest_pprl Check if PPRL is needed
scan_quality Run GoldenCheck data quality scan, return issues without fixing
fix_quality Run GoldenCheck scan and apply fixes (safe or moderate mode)
run_transforms Run GoldenFlow transforms (phone E.164, dates ISO, Unicode)

These are additive – existing MCP tools (suggest_config, list_domains, etc.) continue to work.


A2A Agent Card

{
  "name": "goldenmatch-agent",
  "description": "Autonomous entity resolution agent.",
  "provider": {
    "organization": "GoldenMatch",
    "url": "https://github.com/benzsevern/goldenmatch"
  },
  "capabilities": {
    "streaming": true,
    "pushNotifications": false
  },
  "skills": [...]
}

Full card at: http://localhost:8200/.well-known/agent.json


Authentication

Set GOLDENMATCH_AGENT_TOKEN env var for bearer token auth. If not set, no auth required (suitable for local use).


See also

Topic Link
MCP Server (Claude Desktop) MCP Server
Quick start with Python/CLI Quick Start
Full Python API (101 exports) Python API
Configuration reference Configuration