Overview
Prizm’s autonomous intelligence is built on a 5-level architecture that progresses from basic metadata collection to sophisticated, self-directed action management. The system is designed to maximize automation while preserving meaningful human oversight at the right decision points.The 5-Level Architecture
Level 1: Data Foundation Layer
The base layer stores all essential metadata components in the MetaStore:| Component | Description |
|---|---|
| T (Tables) | Core table and view metadata |
| O (Objects) | Database objects and schemas |
| L (Lineage) | Data flow and dependency tracking |
| U (Usage) | Query frequency and access patterns |
| P (Performance) | Job execution and cost metrics |
| C (Cost) | Compute and storage cost signals |
| B (Business terms) | Semantic business vocabulary |
Level 2: Data Intelligence Layer
- Profile Snapshot — Attribute-level profiling using a default 7-day window or percentage-based sampling to establish baseline data characteristics
- Semantic Classification — Automatically identifies and assigns business terms to data elements based on content and context analysis
Level 3: Autonomous Decision Layer
- Criticality Scoring — Analyzes data assets to determine business importance and assigns priority levels for monitoring and governance
- Schedule Intelligence — Optimizes profiling frequency and resource allocation based on data criticality, change patterns, and system load
Level 4: Recommendation Layer
- Metric Recommendations — Suggests appropriate quality metrics (standard and custom) based on data characteristics and usage patterns
- Quality Engine (Q & CQ) — Powers the recommendation system for both standard quality and custom quality metrics
- AI Stewardship Queue — The task queue and intelligence engine that orchestrates autonomous operations across the platform
Level 5: Action Layer
Actions are categorized into three states based on confidence and risk:AI Completed
Fully automated resolution — Prizm takes action without human intervention based on high-confidence signals.
Human Assisted
Partial automation with human guidance — Prizm surfaces a recommendation and waits for steward approval.
Action Needed
Requires manual intervention — Issue is flagged for human investigation and resolution.
Autonomous Intelligence Capabilities
Data Quality Intelligence
- Automated profiling — Continuously scan data to identify patterns, anomalies, and statistical properties without manual intervention
- Self-healing pipelines — Detect and correct data quality issues in real-time based on predefined rules and ML models
- Smart validation — Apply contextual rules that adapt to changing data patterns and automatically flag inconsistencies
- Drift detection — Monitor and alert on changes in data distributions that might indicate quality issues
- Anomaly detection — Identify outliers and unusual patterns that may represent data quality problems
Data Catalog Intelligence
- Automated metadata extraction — Extract technical metadata from data sources without human intervention
- Business glossary suggestions — Use NLP to suggest business terms and definitions based on data context
- Auto-classification — Categorize and tag datasets based on content analysis
- Lineage inference — Automatically trace data flows and dependencies across systems
- Usage analytics — Track how data assets are used and surface popular or related datasets
Data Observability Intelligence
- Predictive monitoring — Forecast potential data pipeline failures before they occur
- Root cause analysis — Automatically identify the source of data incidents
- Impact assessment — Determine downstream effects of data issues without manual tracing
- Intelligent alerting — Prioritize notifications based on business impact and urgency
- Self-optimizing thresholds — Adjust monitoring parameters based on historical patterns and seasonality
Semantic Intelligence
- Relationship discovery — Identify meaningful connections between data entities across sources
- Context enrichment — Automatically add business context to technical data elements
- Semantic layer generation — Create business-friendly views that abstract technical complexity
- Knowledge graph maintenance — Update entity relationships as data evolves
- Natural language interfaces — Enable data interaction through conversational queries
Profile Scheduling Intelligence
- Dynamic scheduling — Automatically determine optimal profiling frequency based on data change rates and business criticality
- Resource-aware execution — Schedule profiling jobs during system low-usage periods to minimize performance impact
- Change-triggered profiling — Automatically initiate profiling when significant schema or data volume changes are detected
- Intelligent batching — Group related tables for concurrent profiling to optimize system resources
- Adaptive time windows — Adjust profiling schedules based on historical processing times and data volumes
Further Optimization Strategies
Prizm’s autonomous intelligence engine continuously improves its efficiency through:Smarter Profiling
Smarter Profiling
Column prioritization — high-risk columns (keys, critical business fields) profile daily; low-risk columns profile weekly or monthly. Adaptive cadence increases frequency when recent drift is detected.
Incremental Sketches
Incremental Sketches
Mergeable sketches (HLL, KLL/TDigest, Top-k) are used throughout so historical data is never rescanned. Rolling baselines use a windowed merge strategy for O(log N) complexity.
Noise Reduction in Monitoring
Noise Reduction in Monitoring
Robust statistics (median + MAD instead of mean + stdev) reduce false positives. Multi-window alerting fires only when anomalies breach both short (1h) and long (24h) windows.
Smarter Duplicate Detection
Smarter Duplicate Detection
Intra-batch duplicates use exact matching. Cross-batch duplicates use HLL overlap estimation, with targeted sample queries triggered only when suspicion thresholds are crossed.
Schema & Metadata Awareness
Schema & Metadata Awareness
Schema snapshots are recorded daily. Alerts fire on new columns, type changes, and nullability flips, and are automatically routed to upstream pipeline owners via lineage.
Self-Healing Rules
Self-Healing Rules
Dynamic baselining: if a drift persists beyond a configurable threshold without measured business impact, the baseline is automatically updated to prevent permanent alert states.
Sub-Features
Quality Metric Recommendation Agent
AI-driven metric recommendations based on asset characteristics.
Business Quality Recommendations
Business-context-aware quality recommendations mapped to KPIs.
Glossary Creation
Automated business glossary generation using organizational context.
Autonomous Mode
Configure fully autonomous execution for trusted workflows.