GoldenCheck configuration lives in goldencheck.yml in your working directory. The file is optional — GoldenCheck works without one. You create it by pinning findings in the TUI and pressing F2, or by writing it by hand.
Full Reference
version: 1
settings:
sample_size: 100000 # rows to sample for large files
severity_threshold: warning
fail_on: error # exit code 1 on: error | warning
columns:
email:
type: string
required: true
format: email
unique: true
age:
type: integer
range: [0, 120]
outlier_stddev: 3.0
status:
type: string
enum: [active, inactive, pending, closed]
notes:
type: string
nullable: true
relations:
- type: temporal_order
columns: [start_date, end_date]
- type: null_correlation
columns: [billing_address, billing_zip]
ignore:
- column: phone
check: pattern_consistency
- column: notes
check: nullability
settings
| Field | Type | Default | Description |
|---|---|---|---|
sample_size | int | 100000 | Maximum rows used for profiling. Files larger than this are randomly sampled. |
severity_threshold | string | "warning" | Minimum severity level to include in reports. Values: info, warning, error. |
fail_on | string | "error" | Exit code 1 is returned when any finding at or above this severity is present during validate. Values: error, warning. |
columns
Each key is a column name. All column fields are optional — you only need to specify the constraints you care about.
| Field | Type | Description |
|---|---|---|
type | string | Expected data type. Values: string, integer, float, boolean, date, datetime. |
required | bool | If true, no null values are allowed. |
nullable | bool | Explicit override — if true, nulls are expected and not flagged. |
format | string | Expected string format. Values: email, phone, url. |
unique | bool | If true, all values must be unique. |
range | [min, max] | Inclusive numeric range. Values outside this range are flagged as errors. |
enum | list[string] | Allowed values. Any value not in this list is flagged as an error. |
outlier_stddev | float | Standard deviations threshold for outlier detection. Default is 3.0. |
Column rule examples
columns:
# Email column — required, unique, must be valid email format
email:
type: string
required: true
format: email
unique: true
# Age — integer, bounded range, custom outlier threshold
age:
type: integer
range: [0, 120]
outlier_stddev: 4.0
# Status — fixed set of allowed values
status:
type: string
enum: [active, inactive, pending, closed]
# Notes — explicitly nullable, no format constraint
notes:
type: string
nullable: true
relations
Cross-column rules. Each relation has a type and a columns list.
temporal_order
Validates that the first column is always less than or equal to the second column. Both columns must be date or datetime types (or parseable as %Y-%m-%d strings).
relations:
- type: temporal_order
columns: [start_date, end_date]
- type: temporal_order
columns: [created_at, updated_at]
null_correlation
Documents that these columns are expected to be null/non-null together. Violations (one null, the other non-null) are flagged as warnings.
relations:
- type: null_correlation
columns: [billing_address, billing_city, billing_zip]
ignore
The ignore list prevents specific findings from appearing in future scans. Use this to dismiss false positives permanently. Each entry requires both a column name and a check name.
ignore:
- column: phone
check: pattern_consistency # mixed formats are expected
- column: notes
check: nullability # nulls are fine in free-text fields
- column: legacy_id
check: type_inference # intentionally stored as string
Check names correspond to the check field on a Finding:
| Check name | Raised by |
|---|---|
type_inference | TypeInferenceProfiler |
nullability | NullabilityProfiler |
uniqueness | UniquenessProfiler |
format_detection | FormatDetectionProfiler |
range_distribution | RangeDistributionProfiler |
cardinality | CardinalityProfiler |
pattern_consistency | PatternConsistencyProfiler |
temporal_order | TemporalOrderProfiler |
null_correlation | NullCorrelationProfiler |
Semantic Types
GoldenCheck ships with built-in semantic type definitions (goldencheck/semantic/types.py). You can override or extend these by placing a goldencheck_types.yaml file in your project directory.
types.yaml (built-in, read-only)
Located at goldencheck/semantic/types.yaml. Contains the canonical list of built-in semantic types and their detection heuristics:
version: 1
types:
email:
keywords: [email, e_mail, mail]
format: email
phone:
keywords: [phone, mobile, cell, tel]
format: phone
name:
keywords: [name, first_name, last_name, full_name, surname]
id:
keywords: [id, uuid, guid, identifier]
unique: true
currency:
keywords: [price, amount, cost, total, revenue, fee]
numeric: true
date:
keywords: [date, created_at, updated_at, timestamp]
temporal: true
category:
keywords: [status, type, category, tier, group]
low_cardinality: true
goldencheck_types.yaml (project-level, user-defined)
Place this file in your project root to add custom semantic types or override built-in ones. Custom types are merged on top of the built-in types:
version: 1
types:
sku:
keywords: [sku, product_code, item_code]
pattern: "^[A-Z]{2,4}-\\d{4,8}$"
customer_tier:
keywords: [tier, plan, subscription_level]
enum: [free, pro, enterprise]
low_cardinality: true
Custom type definitions support:
| Field | Description |
|---|---|
keywords | Column name substrings that trigger this type (case-insensitive) |
format | Expected format: email, phone, url |
pattern | Regex pattern that values should match |
enum | List of allowed values |
unique | Whether values are expected to be unique |
numeric | Whether values are expected to be numeric |
temporal | Whether values are expected to be dates/datetimes |
low_cardinality | Whether the column is expected to be a small enum |
Layering Strategy
GoldenCheck does not auto-generate a config on scan. The file only contains what you explicitly pin. This keeps configs small and meaningful.
Recommended workflow:
goldencheck scan data.csv— explore findings, no config needed- In the TUI, pin findings that represent real business rules (press Space)
- Press F2 to export to
goldencheck.yml - Edit the YAML by hand if needed (e.g., tighten a
range, add anenum) goldencheck validate data.csvin CI — only the rules you care about are enforced
Multiple environments: Use --config to point at different rule files:
# Stricter rules for production
goldencheck validate data.csv --config configs/production.yml --fail-on warning
# Relaxed rules for development data
goldencheck validate data.csv --config configs/dev.yml