Data Quality Overview - DQLabs PRIZM

Overview

Prizm’s data quality management system continuously monitors, enforces, and improves the quality of your data assets using a combination of AI-driven rules, statistical profiling, and exception workflows.

Quality Framework

The Prizm quality framework consists of five interconnected layers:

Profiling

Statistical sampling of data assets to establish baselines — row counts, null rates, distributions, top values.

Metric Configuration

Define what to measure: completeness, freshness, validity, uniqueness, volume, consistency, or custom SQL.

Execution

Scheduled or event-triggered quality runs apply metrics to assets and produce Score records.

Scoring & Alerting

Quality scores are computed. Scores below thresholds trigger alerts and exception records.

Exception Management

Failing records are captured as exceptions, routed to workflows, and tracked through resolution.

Metric Types

Category	Metric	Description
Observability	Volume	Row count monitoring and change detection
Observability	Freshness	Data currency and last-updated tracking
Observability	Schema Drift	Detection of schema changes
Quality	Completeness	Null/missing value rate
Quality	Uniqueness	Duplicate record detection
Quality	Validity	Format, pattern, and constraint checks
Quality	Consistency	Cross-table and cross-system value checks
Custom	Custom Query (CQ)	User-defined SQL quality rules
Business	Business Metric	KPI-level business logic evaluations

AI-Driven Quality

Automatic Metric Recommendation

Prizm’s Quality Metric Recommendation Agent analyzes data asset characteristics and suggests appropriate metrics based on:

Column data types and distributions
Historical anomaly patterns
Criticality score and downstream usage
Industry and domain-specific quality patterns

Adaptive Rule Generation

Quality rules evolve with your data. Prizm’s AI monitors rule effectiveness and:

Suggests threshold adjustments when false positive rates increase
Recommends new rules when new data patterns emerge
Automatically updates baselines when data distributions shift (configurable)

Data Profiling

Profiling creates a statistical snapshot of your data at multiple granularities:

Profile Attributes

For each column, Prizm captures:

Row count and null count
Distinct value count and cardinality
Min, max, mean, median, standard deviation
Top-N frequent values
Data type and format distributions
Pattern analysis (for string columns)

Profile Scheduling

Profiles can be triggered by:

Scheduled runs — CRON-based or intelligent frequency (see Scheduling)
Change-triggered — Automatic re-profile when schema or volume changes are detected
On-demand — Manual trigger from the UI or via API
Pipeline events — Triggered by upstream job completion in Airflow or dbt

Incremental Profiling

For large tables, Prizm supports incremental profiling strategies:

Full Profile       → Complete statistical scan of all records
Incremental        → Profile only new/changed records since last run
Sampling           → Statistical sample (configurable %)
Filter             → Profile a specific partition or date range

Exception Management

When a quality check fails, Prizm creates Exception records capturing:

The failing asset and attribute
The metric that failed and the expected vs. actual value
Sample records that violated the rule
Exception severity and business impact classification
Workflow routing (ServiceNow, Jira, internal approval)

Exception Workflow States

Detected → Under Review → Assigned → In Progress → Resolved
                                                  ↘ Accepted as Known Issue
                                                  ↘ Suppressed

YAML-Based Quality (DQaC)

Data Quality as Code (DQaC) allows engineering teams to define quality metrics in YAML, enabling version-controlled, CI/CD-integrated data quality.

# Example DQaC metric definition
metrics:
  - name: orders_completeness
    asset: sales.orders
    type: completeness
    column: customer_id
    threshold:
      min_valid_percentage: 99.5
    schedule: "0 6 * * *"
    alert:
      channels: [slack, jira]
      severity: high

See YAML Metrics — DQaC for the full specification.

Quality Scoring

Every asset receives a composite Quality Score based on:

Weighted average of all active metric scores
Criticality weighting (high-criticality assets have tighter thresholds)
Historical trend (score trending down triggers earlier alerts)

Scores are surfaced in:

Asset detail pages
Domain and product dashboards
Executive-level quality scorecards
Lineage views (score overlaid on lineage graph)

Score Entity

Detailed schema for quality score storage and relationships.

AI Metric Recommendations

How Prizm suggests the right metrics for your assets.

Scheduling

Configure profiling and quality run schedules.

Exceptions

Manage and resolve data quality exceptions.

Quick Start

Quality Metrics

​Overview

​Quality Framework

​Metric Types

​AI-Driven Quality

​Automatic Metric Recommendation

​Adaptive Rule Generation

​Data Profiling

​Profile Attributes

​Profile Scheduling

​Incremental Profiling

​Exception Management

​Exception Workflow States

​YAML-Based Quality (DQaC)

​Quality Scoring

​Related Documentation

Score Entity

AI Metric Recommendations

Scheduling

Exceptions

Overview

Quality Framework

Metric Types

AI-Driven Quality

Automatic Metric Recommendation

Adaptive Rule Generation

Data Profiling

Profile Attributes

Profile Scheduling

Incremental Profiling

Exception Management

Exception Workflow States

YAML-Based Quality (DQaC)

Quality Scoring

Related Documentation