Skip to main content

Overview

Prizm’s autonomous intelligence is built on a 5-level architecture that progresses from basic metadata collection to sophisticated, self-directed action management. The system is designed to maximize automation while preserving meaningful human oversight at the right decision points.

The 5-Level Architecture

Level 1: Data Foundation Layer

The base layer stores all essential metadata components in the MetaStore:
ComponentDescription
T (Tables)Core table and view metadata
O (Objects)Database objects and schemas
L (Lineage)Data flow and dependency tracking
U (Usage)Query frequency and access patterns
P (Performance)Job execution and cost metrics
C (Cost)Compute and storage cost signals
B (Business terms)Semantic business vocabulary

Level 2: Data Intelligence Layer

  • Profile Snapshot — Attribute-level profiling using a default 7-day window or percentage-based sampling to establish baseline data characteristics
  • Semantic Classification — Automatically identifies and assigns business terms to data elements based on content and context analysis

Level 3: Autonomous Decision Layer

  • Criticality Scoring — Analyzes data assets to determine business importance and assigns priority levels for monitoring and governance
  • Schedule Intelligence — Optimizes profiling frequency and resource allocation based on data criticality, change patterns, and system load

Level 4: Recommendation Layer

  • Metric Recommendations — Suggests appropriate quality metrics (standard and custom) based on data characteristics and usage patterns
  • Quality Engine (Q & CQ) — Powers the recommendation system for both standard quality and custom quality metrics
  • AI Stewardship Queue — The task queue and intelligence engine that orchestrates autonomous operations across the platform

Level 5: Action Layer

Actions are categorized into three states based on confidence and risk:

AI Completed

Fully automated resolution — Prizm takes action without human intervention based on high-confidence signals.

Human Assisted

Partial automation with human guidance — Prizm surfaces a recommendation and waits for steward approval.

Action Needed

Requires manual intervention — Issue is flagged for human investigation and resolution.

Autonomous Intelligence Capabilities

Data Quality Intelligence

  • Automated profiling — Continuously scan data to identify patterns, anomalies, and statistical properties without manual intervention
  • Self-healing pipelines — Detect and correct data quality issues in real-time based on predefined rules and ML models
  • Smart validation — Apply contextual rules that adapt to changing data patterns and automatically flag inconsistencies
  • Drift detection — Monitor and alert on changes in data distributions that might indicate quality issues
  • Anomaly detection — Identify outliers and unusual patterns that may represent data quality problems

Data Catalog Intelligence

  • Automated metadata extraction — Extract technical metadata from data sources without human intervention
  • Business glossary suggestions — Use NLP to suggest business terms and definitions based on data context
  • Auto-classification — Categorize and tag datasets based on content analysis
  • Lineage inference — Automatically trace data flows and dependencies across systems
  • Usage analytics — Track how data assets are used and surface popular or related datasets

Data Observability Intelligence

  • Predictive monitoring — Forecast potential data pipeline failures before they occur
  • Root cause analysis — Automatically identify the source of data incidents
  • Impact assessment — Determine downstream effects of data issues without manual tracing
  • Intelligent alerting — Prioritize notifications based on business impact and urgency
  • Self-optimizing thresholds — Adjust monitoring parameters based on historical patterns and seasonality

Semantic Intelligence

  • Relationship discovery — Identify meaningful connections between data entities across sources
  • Context enrichment — Automatically add business context to technical data elements
  • Semantic layer generation — Create business-friendly views that abstract technical complexity
  • Knowledge graph maintenance — Update entity relationships as data evolves
  • Natural language interfaces — Enable data interaction through conversational queries

Profile Scheduling Intelligence

  • Dynamic scheduling — Automatically determine optimal profiling frequency based on data change rates and business criticality
  • Resource-aware execution — Schedule profiling jobs during system low-usage periods to minimize performance impact
  • Change-triggered profiling — Automatically initiate profiling when significant schema or data volume changes are detected
  • Intelligent batching — Group related tables for concurrent profiling to optimize system resources
  • Adaptive time windows — Adjust profiling schedules based on historical processing times and data volumes

Further Optimization Strategies

Prizm’s autonomous intelligence engine continuously improves its efficiency through:
Column prioritization — high-risk columns (keys, critical business fields) profile daily; low-risk columns profile weekly or monthly. Adaptive cadence increases frequency when recent drift is detected.
Mergeable sketches (HLL, KLL/TDigest, Top-k) are used throughout so historical data is never rescanned. Rolling baselines use a windowed merge strategy for O(log N) complexity.
Robust statistics (median + MAD instead of mean + stdev) reduce false positives. Multi-window alerting fires only when anomalies breach both short (1h) and long (24h) windows.
Intra-batch duplicates use exact matching. Cross-batch duplicates use HLL overlap estimation, with targeted sample queries triggered only when suspicion thresholds are crossed.
Schema snapshots are recorded daily. Alerts fire on new columns, type changes, and nullability flips, and are automatically routed to upstream pipeline owners via lineage.
Dynamic baselining: if a drift persists beyond a configurable threshold without measured business impact, the baseline is automatically updated to prevent permanent alert states.

Sub-Features

Quality Metric Recommendation Agent

AI-driven metric recommendations based on asset characteristics.

Business Quality Recommendations

Business-context-aware quality recommendations mapped to KPIs.

Glossary Creation

Automated business glossary generation using organizational context.

Autonomous Mode

Configure fully autonomous execution for trusted workflows.