Autonomous Intelligence

Overview

Prizm’s autonomous intelligence is built on a 5-level architecture that progresses from basic metadata collection to sophisticated, self-directed action management. The system is designed to maximize automation while preserving meaningful human oversight at the right decision points.

The 5-Level Architecture

Level 1: Data Foundation Layer

The base layer stores all essential metadata components in the MetaStore:

Component	Description
T (Tables)	Core table and view metadata
O (Objects)	Database objects and schemas
L (Lineage)	Data flow and dependency tracking
U (Usage)	Query frequency and access patterns
P (Performance)	Job execution and cost metrics
C (Cost)	Compute and storage cost signals
B (Business terms)	Semantic business vocabulary

Level 2: Data Intelligence Layer

Profile Snapshot — Attribute-level profiling using a default 7-day window or percentage-based sampling to establish baseline data characteristics
Semantic Classification — Automatically identifies and assigns business terms to data elements based on content and context analysis

Level 3: Autonomous Decision Layer

Criticality Scoring — Analyzes data assets to determine business importance and assigns priority levels for monitoring and governance
Schedule Intelligence — Optimizes profiling frequency and resource allocation based on data criticality, change patterns, and system load

Level 4: Recommendation Layer

Metric Recommendations — Suggests appropriate quality metrics (standard and custom) based on data characteristics and usage patterns
Quality Engine (Q & CQ) — Powers the recommendation system for both standard quality and custom quality metrics
AI Stewardship Queue — The task queue and intelligence engine that orchestrates autonomous operations across the platform

Level 5: Action Layer

Actions are categorized into three states based on confidence and risk:

AI Completed

Fully automated resolution — Prizm takes action without human intervention based on high-confidence signals.

Human Assisted

Partial automation with human guidance — Prizm surfaces a recommendation and waits for steward approval.

Action Needed

Requires manual intervention — Issue is flagged for human investigation and resolution.

Autonomous Intelligence Capabilities

Data Quality Intelligence

Automated profiling — Continuously scan data to identify patterns, anomalies, and statistical properties without manual intervention
Self-healing pipelines — Detect and correct data quality issues in real-time based on predefined rules and ML models
Smart validation — Apply contextual rules that adapt to changing data patterns and automatically flag inconsistencies
Drift detection — Monitor and alert on changes in data distributions that might indicate quality issues
Anomaly detection — Identify outliers and unusual patterns that may represent data quality problems

Data Catalog Intelligence

Automated metadata extraction — Extract technical metadata from data sources without human intervention
Business glossary suggestions — Use NLP to suggest business terms and definitions based on data context
Auto-classification — Categorize and tag datasets based on content analysis
Lineage inference — Automatically trace data flows and dependencies across systems
Usage analytics — Track how data assets are used and surface popular or related datasets

Data Observability Intelligence

Predictive monitoring — Forecast potential data pipeline failures before they occur
Root cause analysis — Automatically identify the source of data incidents
Impact assessment — Determine downstream effects of data issues without manual tracing
Intelligent alerting — Prioritize notifications based on business impact and urgency
Self-optimizing thresholds — Adjust monitoring parameters based on historical patterns and seasonality

Semantic Intelligence

Relationship discovery — Identify meaningful connections between data entities across sources
Context enrichment — Automatically add business context to technical data elements
Semantic layer generation — Create business-friendly views that abstract technical complexity
Knowledge graph maintenance — Update entity relationships as data evolves
Natural language interfaces — Enable data interaction through conversational queries

Profile Scheduling Intelligence

Dynamic scheduling — Automatically determine optimal profiling frequency based on data change rates and business criticality
Resource-aware execution — Schedule profiling jobs during system low-usage periods to minimize performance impact
Change-triggered profiling — Automatically initiate profiling when significant schema or data volume changes are detected
Intelligent batching — Group related tables for concurrent profiling to optimize system resources
Adaptive time windows — Adjust profiling schedules based on historical processing times and data volumes

Further Optimization Strategies

Prizm’s autonomous intelligence engine continuously improves its efficiency through:

Smarter Profiling

Column prioritization — high-risk columns (keys, critical business fields) profile daily; low-risk columns profile weekly or monthly. Adaptive cadence increases frequency when recent drift is detected.

Incremental Sketches

Mergeable sketches (HLL, KLL/TDigest, Top-k) are used throughout so historical data is never rescanned. Rolling baselines use a windowed merge strategy for O(log N) complexity.

Noise Reduction in Monitoring

Robust statistics (median + MAD instead of mean + stdev) reduce false positives. Multi-window alerting fires only when anomalies breach both short (1h) and long (24h) windows.

Smarter Duplicate Detection

Intra-batch duplicates use exact matching. Cross-batch duplicates use HLL overlap estimation, with targeted sample queries triggered only when suspicion thresholds are crossed.

Schema & Metadata Awareness

Schema snapshots are recorded daily. Alerts fire on new columns, type changes, and nullability flips, and are automatically routed to upstream pipeline owners via lineage.

Self-Healing Rules

Dynamic baselining: if a drift persists beyond a configurable threshold without measured business impact, the baseline is automatically updated to prevent permanent alert states.

Sub-Features

Quality Metric Recommendation Agent

AI-driven metric recommendations based on asset characteristics.

Business Quality Recommendations

Business-context-aware quality recommendations mapped to KPIs.

Glossary Creation

Automated business glossary generation using organizational context.

Autonomous Mode

Configure fully autonomous execution for trusted workflows.

​Overview

​The 5-Level Architecture

​Level 1: Data Foundation Layer

​Level 2: Data Intelligence Layer

​Level 3: Autonomous Decision Layer

​Level 4: Recommendation Layer

​Level 5: Action Layer