Skip to main content

Overview

Prizm’s data quality management system continuously monitors, enforces, and improves the quality of your data assets using a combination of AI-driven rules, statistical profiling, and exception workflows.

Quality Framework

The Prizm quality framework consists of five interconnected layers:
1

Profiling

Statistical sampling of data assets to establish baselines — row counts, null rates, distributions, top values.
2

Metric Configuration

Define what to measure: completeness, freshness, validity, uniqueness, volume, consistency, or custom SQL.
3

Execution

Scheduled or event-triggered quality runs apply metrics to assets and produce Score records.
4

Scoring & Alerting

Quality scores are computed. Scores below thresholds trigger alerts and exception records.
5

Exception Management

Failing records are captured as exceptions, routed to workflows, and tracked through resolution.

Metric Types

CategoryMetricDescription
ObservabilityVolumeRow count monitoring and change detection
ObservabilityFreshnessData currency and last-updated tracking
ObservabilitySchema DriftDetection of schema changes
QualityCompletenessNull/missing value rate
QualityUniquenessDuplicate record detection
QualityValidityFormat, pattern, and constraint checks
QualityConsistencyCross-table and cross-system value checks
CustomCustom Query (CQ)User-defined SQL quality rules
BusinessBusiness MetricKPI-level business logic evaluations

AI-Driven Quality

Automatic Metric Recommendation

Prizm’s Quality Metric Recommendation Agent analyzes data asset characteristics and suggests appropriate metrics based on:
  • Column data types and distributions
  • Historical anomaly patterns
  • Criticality score and downstream usage
  • Industry and domain-specific quality patterns

Adaptive Rule Generation

Quality rules evolve with your data. Prizm’s AI monitors rule effectiveness and:
  • Suggests threshold adjustments when false positive rates increase
  • Recommends new rules when new data patterns emerge
  • Automatically updates baselines when data distributions shift (configurable)

Data Profiling

Profiling creates a statistical snapshot of your data at multiple granularities:

Profile Attributes

For each column, Prizm captures:
  • Row count and null count
  • Distinct value count and cardinality
  • Min, max, mean, median, standard deviation
  • Top-N frequent values
  • Data type and format distributions
  • Pattern analysis (for string columns)

Profile Scheduling

Profiles can be triggered by:
  • Scheduled runs — CRON-based or intelligent frequency (see Scheduling)
  • Change-triggered — Automatic re-profile when schema or volume changes are detected
  • On-demand — Manual trigger from the UI or via API
  • Pipeline events — Triggered by upstream job completion in Airflow or dbt

Incremental Profiling

For large tables, Prizm supports incremental profiling strategies:
Full Profile       → Complete statistical scan of all records
Incremental        → Profile only new/changed records since last run
Sampling           → Statistical sample (configurable %)
Filter             → Profile a specific partition or date range

Exception Management

When a quality check fails, Prizm creates Exception records capturing:
  • The failing asset and attribute
  • The metric that failed and the expected vs. actual value
  • Sample records that violated the rule
  • Exception severity and business impact classification
  • Workflow routing (ServiceNow, Jira, internal approval)

Exception Workflow States

Detected → Under Review → Assigned → In Progress → Resolved
                                                  ↘ Accepted as Known Issue
                                                  ↘ Suppressed

YAML-Based Quality (DQaC)

Data Quality as Code (DQaC) allows engineering teams to define quality metrics in YAML, enabling version-controlled, CI/CD-integrated data quality.
# Example DQaC metric definition
metrics:
  - name: orders_completeness
    asset: sales.orders
    type: completeness
    column: customer_id
    threshold:
      min_valid_percentage: 99.5
    schedule: "0 6 * * *"
    alert:
      channels: [slack, jira]
      severity: high
See YAML Metrics — DQaC for the full specification.

Quality Scoring

Every asset receives a composite Quality Score based on:
  • Weighted average of all active metric scores
  • Criticality weighting (high-criticality assets have tighter thresholds)
  • Historical trend (score trending down triggers earlier alerts)
Scores are surfaced in:
  • Asset detail pages
  • Domain and product dashboards
  • Executive-level quality scorecards
  • Lineage views (score overlaid on lineage graph)

Score Entity

Detailed schema for quality score storage and relationships.

AI Metric Recommendations

How Prizm suggests the right metrics for your assets.

Scheduling

Configure profiling and quality run schedules.

Exceptions

Manage and resolve data quality exceptions.