Where Your Data
Finally Comes Home
339 tools. 47 modules. 21 Oracle phases. DataBridge AI transforms legacy financial chaos into production-ready data marts with automated trust.
Six engines. One destination.
Oracle Engine
Ingests legacy SQL, Python, and Excel to extract and operationalize business logic into production Snowflake DDL.
Argos Pipelines
The master ship-builder. Constructs high-performance Snowflake Dynamic Table pipelines from complex hierarchies.
Athena Intelligence
Guided wisdom. AI-powered planning and GraphRAG grounding to ensure zero-hallucination data discovery.
Aegis Trust
Divine protection. Deterministic PII masking, trust attestations, and audit-ready lineage for absolute security.
Olympus
The highest order. Manage financial hierarchies, formula groups, and templates as the spine of your mart.
Penelope
Precise weaving. Hash-compare sources and resolve discrepancies with meticulous, audit-ready precision.
From ERP chaos to clean data in 4 weeks.
1. Assess
Connect to ERP. Run the E2E Assessment Pipeline. Catalog tables and mask PII.
2. Design
Deploy financial templates. Run Oracle to parse logic. Generate Kimball star schemas.
3. Build
Generate Argos pipelines. Deploy Dynamic Tables to Snowflake. Go live.
4. Optimize
Activate GraphRAG. Build data catalog. Hardened Knowledge Base propagation.
Battle-tested benchmark results.
📊 Dashboard
Welcome to Ithaca
Complete these steps to get started:
Recent Activity
| Timestamp | Tool | Status |
|---|---|---|
| Loading... | ||
Sample Data Files
Available in data/ — click a file to load it in the Tool Workbench.
Loading...
Quick Start
Get started with Ithaca:
🔌 Connections
Configure and test your data source connections. Connections are used by all pipeline, profiling, and AI tools.
Connection Settings
Saved Connections
No connections configured yet. Add one to get started.
Connection Health
🎯 Live Demos
🔧 Tool Workbench
Available Tools
- Loading tools...
Select a Tool
Choose a tool from the list to configure and run it.
Output
⚡ Workflow Editor
Tool Palette
Workflow Steps
Click tools to add steps to your workflow.
✈️ Wright Pipeline
Build hierarchy-driven data marts with the 4-object pipeline pattern. Configure each step and preview generated SQL.
Pipeline Configuration
VW_1: Translation View
Translates ID_SOURCE column values to physical database columns using CASE statements.
-- Click "Generate" to create VW_1 Translation View SQL
DT_2: Granularity Table
UNPIVOT operation to normalize data and apply exclusion filters.
-- Click "Generate" to create DT_2 Granularity Table SQL
DT_3A: Pre-Aggregation Fact
UNION ALL branches for different join patterns. Each branch handles different dimension combinations.
-- Click "Generate" to create DT_3A Pre-Aggregation SQL
DT_3: Final Data Mart
Final data mart with formula precedence cascade and surrogate key generation.
-- Click "Generate" to create DT_3 Data Mart SQL
🔬 Data Lab
Run live demos against sample data using real MCP tools. Explore data quality, reconciliation, and schema analysis.
How the Data Lab Works
The Data Lab validates source data, compares datasets, and profiles quality — all from sample CSV files included with Ithaca.
load_csv --> profile_data (stats)
load_csv --> compare_hashes (diffs)
load_csv --> fuzzy_match (matches)
Two CSVs --> detect_schema_drift (changes)
Live Demos
Pro Data Lab Tools
Requires Pro Licenseanalyze_book_with_researcher
Analyze a Book's data sources against a database connection
compare_book_to_database
Compare Book hierarchy against live database schema
profile_book_sources
Profile all data sources referenced by a Book
⚙️ Administration
Configuration
License & Tier
Tenant Information
Cost / Credit Tracker
Track LLM token usage and Snowflake credit consumption per workflow run.
| Run ID | LLM Calls | Tokens (in/out) | LLM $ | SF Credits | SF $ | Total $ |
|---|---|---|---|---|---|---|
| No cost data yet — run a workflow with CostTracker enabled. | ||||||
Token Usage Calculator
📚 Documentation
Getting Started with Ithaca
New here? Follow these steps to be productive in under 10 minutes.
Step 1: Connect Your Data
Configure your Snowflake or database connection from the Connections page. Every pipeline and AI tool needs a working connection.
Step 2: Create Your First Hierarchy
Use a template (P&L, Balance Sheet, Oil & Gas LOS) or build from scratch. Hierarchies are the backbone of every DataBridge pipeline. Expand the sample demo to see how they work.
Step 3: Generate a Pipeline
Once your hierarchy is ready, use the Wright Pipeline page to generate a full 4-object Snowflake pipeline (Translation View, Granularity Table, Pre-Aggregation Fact, Data Mart).
Step 4: Validate Your Data
Use the Data Lab to profile data quality, reconcile sources, detect schema drift, and run fuzzy matching against your datasets.
Step 5: Explore with AI
Ask the AI Planner to analyze your data and generate multi-step workflows, or chat with the Agent Console for autonomous demos.
Take the Guided Tour
Want a hands-on walkthrough of every major page? Start the interactive guided tour.
DataBridge AI v0.49.4
A headless, MCP-native data and implementation engine with 339 tools across 47 modules. Tool availability is license-dependent (Community/Pro/Enterprise).
Core Capabilities
| 🔄 Data Reconciliation | Compare and validate data from CSV, SQL, PDF, JSON sources (38 tools) |
| 🏗️ Hierarchy Builder | Create and manage multi-level hierarchy projects with formulas (49 tools) |
| 🧬 BLCE Engine | Business logic extraction, Kimball modeling, DDL generation, deployment (84 tools, 21 phases) |
| 🧠 Cortex AI | Snowflake Cortex integration with natural language to SQL (26 tools) |
| 📊 Wright Module | Hierarchy-driven data mart generation with 4-object pipeline (31 tools) |
| 📚 Data Catalog | Centralized metadata registry with business glossary (19 tools) |
| 🔗 GraphRAG | Knowledge graph + vector search for explainable AI grounding (10 tools) |
| 📈 Observability | Metric recording, anomaly detection, asset health monitoring (15 tools) |
| 📦 Data Versioning | Dataset snapshots, diffs, and rollback (12 tools) |
| 🔍 Lineage Tracking | Column-level lineage and impact analysis (11 tools) |
| ✅ Data Quality | Expectation suites and data contracts (7 tools) |
| 🛡️ DataShield | Offline data masking before AI processing |
| 🔧 dbt Integration | Generate dbt projects from hierarchies (8 tools) |
Quick Start
Architecture
267 Tools] C --> D[Hierarchy Builder
49 tools] C --> E[Data Reconciliation
38 tools] C --> F[BLCE Engine
84 tools] C --> G[Wright Module
31 tools] C --> H[Cortex AI
26 tools] C --> I[Data Catalog
19 tools] C --> J[Observability
15 tools] C --> K[Other Modules] F --> L[(Snowflake)] G --> L H --> L D --> M[GraphRAG Store] F --> M I --> M
All 28 Tool Categories (267 Total)
Tool availability depends on your license tier: CE (Community), Pro, or Enterprise.
| Module | Tools | Tier | Key Tools |
|---|---|---|---|
| File Discovery | 3 | CE | find_files, stage_file |
| Data Reconciliation | 38 | CE | load_csv, profile_data, fuzzy_match_columns |
| Hierarchy Builder | 49 | CE | create_hierarchy, import_flexible_hierarchy, export_hierarchy_csv |
| Hierarchy-Graph Bridge | 5 | CE | hierarchy_graph_status, hierarchy_rag_search |
| Templates / Skills / KB | 16 | CE | list_financial_templates, get_skill_prompt |
| Git Automation | 4 | CE | commit_dbt_project, create_deployment_pr |
| SQL Discovery | 2 | CE | sql_to_hierarchy, smart_analyze_sql |
| Mapping Enrichment | 5 | CE | configure_mapping_enrichment, enrich_mapping_file |
| BLCE Engine | 84 | CE | blce_parse_sql, blce_generate_ddl, blce_execute_ddl, model_ask |
| AI Orchestrator | 16 | Pro | submit_orchestrated_task, register_agent |
| Planner Agent | 11 | Pro | plan_workflow, suggest_agents |
| Smart Recommendations | 5 | Pro | get_smart_recommendations, smart_import_csv |
| Diff Utilities | 6 | CE | diff_text, diff_dicts, explain_diff |
| Unified AI Agent | 10 | Pro | checkout_librarian_to_book, sync_book_and_librarian |
| Cortex Agent | 12 | Pro | cortex_complete, cortex_reason |
| Cortex Analyst | 14 | Pro | analyst_ask, create_semantic_model |
| Console Dashboard | 5 | CE | start_console_server, broadcast_console_message |
| dbt Integration | 8 | CE | create_dbt_project, generate_dbt_model |
| Data Quality | 7 | CE | generate_expectation_suite, run_validation |
| Wright Module | 31 | Pro | create_mart_config, generate_mart_pipeline, wright_from_hierarchy |
| Lineage & Impact | 11 | Pro | track_column_lineage, analyze_change_impact |
| Git / CI-CD | 12 | Pro | git_commit, github_create_pr |
| Data Catalog | 19 | Pro | catalog_scan_connection, catalog_search |
| Data Versioning | 12 | Pro | version_create, version_diff, version_rollback |
| GraphRAG Engine | 10 | Pro | rag_search, rag_validate_output, rag_entity_extract |
| Data Observability | 15 | Pro | obs_record_metric, obs_create_alert_rule |
| Cortex Table Understanding | 5 | Pro | generate_table_understanding, batch_table_understanding |
| AI Relationship Discovery | 8 | Pro | ai_analyze_schema, ai_detect_relationships |
| Mart Factory | 10 | Pro | create_mart_config, generate_mart_pipeline, discover_hierarchy_pattern |
| DataShield | — | CE | PII classification, trust attestations, data masking (integrated into pipeline phases) |
| Total | 267 |
Available Templates
Accounting Domain (10 templates)
| Template ID | Name | Industry |
|---|---|---|
| standard_pl | Standard P&L (Income Statement) | General |
| standard_bs | Standard Balance Sheet | General |
| oil_gas_los | Oil & Gas Lease Operating Statement | Oil & Gas |
| upstream_oil_gas_pl | Upstream Oil & Gas P&L | Oil & Gas - E&P |
| midstream_oil_gas_pl | Midstream Oil & Gas P&L | Oil & Gas - Midstream |
| oilfield_services_pl | Oilfield Services Company P&L | Oil & Gas - Services |
| manufacturing_pl | Industrial Manufacturing P&L | Manufacturing |
| industrial_services_pl | Industrial Services Company P&L | Industrial Services |
| saas_pl | SaaS Company P&L | SaaS |
| transportation_pl | Transportation & Logistics P&L | Transportation |
Finance Domain (2 templates)
| Template ID | Name | Industry |
|---|---|---|
| cost_center_hierarchy | Cost Center Hierarchy | General |
| profit_center_hierarchy | Profit Center Hierarchy | General |
Operations Domain (8 templates)
| Template ID | Name | Industry |
|---|---|---|
| geographic_hierarchy | Geographic Hierarchy | General |
| department_hierarchy | Organizational Department Hierarchy | General |
| asset_hierarchy | Asset Class Hierarchy | General |
| legal_entity_hierarchy | Legal Entity Hierarchy | General |
| upstream_field_hierarchy | Upstream Oil & Gas Field Hierarchy | Oil & Gas - E&P |
| midstream_asset_hierarchy | Midstream Oil & Gas Asset Hierarchy | Oil & Gas - Midstream |
| manufacturing_plant_hierarchy | Manufacturing Plant Hierarchy | Manufacturing |
| fleet_hierarchy | Fleet & Route Hierarchy | Transportation |
ERP Data Model Templates (BLCE)
Pre-built Kimball data model specs for common ERP systems. Used by the BLCE engine to generate dimension and fact tables automatically.
| ERP System | Config File | Pre-Built Dims | Pre-Built Facts |
|---|---|---|---|
| Enertia | dm_specs/enertia.json | 12 | 5 |
| WolfePak | dm_specs/wolfepak.json | 10 | 5 |
| SAP | dm_specs/sap.json | 10 | 5 |
| NetSuite | dm_specs/netsuite.json | 9 | 5 |
| QuickBooks | dm_specs/quickbooks.json | 7 | 4 |
| ProCount | dm_specs/procount.json | 12 | 7 |
Built-in Skills
| Skill ID | Name | Industries | Capabilities |
|---|---|---|---|
| financial-analyst | Financial Analyst | General | GL reconciliation, trial balance, bank rec, COA design |
| fpa-oil-gas-analyst | FP&A Oil & Gas Analyst | Oil & Gas | LOS analysis, JIB, reserves, hedge accounting |
| manufacturing-analyst | Manufacturing Analyst | Manufacturing | Standard costing, COGS, variances, inventory |
| saas-metrics-analyst | SaaS Metrics Analyst | SaaS | ARR/MRR, cohorts, CAC/LTV, unit economics |
| transportation-analyst | Transportation & Logistics Analyst | Transportation | Operating ratio, fleet, lanes, driver metrics |
| operations-analyst | Operations Analyst | General, Manufacturing, Logistics | Operational KPIs, throughput, utilization, capacity planning |
| fpa-cost-analyst | FP&A Cost Analyst | General, Manufacturing, Technology | Cost allocation, variance analysis, budget vs actual, cost centers |
| platform-workflow | Platform Workflow Orchestrator | General | E2E assessment pipeline, 15-phase orchestration, data modeling workflows |
BLCE Auto-Generated Skills
The BLCE engine automatically generates domain-specific skill prompts from each analysis run. Skills are reusable and shareable across projects.
| Skill Type | Generated From | Example |
|---|---|---|
| Domain Expert | Normalized measures + governance metadata | "Revenue analysis for Enertia upstream O&G" |
| Query Assistant | Bus matrix + model metadata | "Query the well production fact table" |
| Report Builder | Report suggestions + templates | "Build a lease operating statement" |
API Reference
MCP Configuration (Claude Desktop)
MCP Configuration (SSE Transport)
For remote/deployed servers, use the SSE transport configuration:
Deployed Endpoints
| Service | URL | Description |
|---|---|---|
| Dashboard | https://databridge.dataamplifier.io | Web UI (this dashboard) |
| MCP SSE | https://mcp.databridge.dataamplifier.io/sse | MCP server endpoint for Claude Desktop / AI clients |
Programmatic Usage
License Key System
DataBridge uses a tiered license system. Community Edition is free; Pro and Enterprise require a license key.
Environment Variables
| Variable | Description | Default |
|---|---|---|
| DATABRIDGE_LICENSE_KEY | License key for Pro/Enterprise features | - (CE mode) |
| DATABRIDGE_LICENSE_SECRET | License signing secret (admin only) | - |
| DATA_DIR | Data directory for projects | ./data |
| NESTJS_BACKEND_URL | NestJS backend URL | http://localhost:8001 |
| NESTJS_API_KEY | API key for backend | - |
| SNOWFLAKE_ACCOUNT | Snowflake account identifier | - |
| SNOWFLAKE_USER | Snowflake authentication user | - |
| DATABRIDGE_FUZZY_THRESHOLD | Fuzzy match score threshold (0-100) | 80 |
Platform Architecture Diagrams
BLCE 21-Phase Pipeline
The Business Logic Comprehension Engine processes ERP data through 21 sequential phases, from intake to deployment.
Wright Pipeline Flow
The Wright module generates a 4-object Snowflake Dynamic Table pipeline from hierarchy projects.
Translation View] VW1 --> DT2[DT_2
Granularity Table] DT2 --> DT3A[DT_3A
Pre-Aggregation] DT3A --> DT3[DT_3
Final Data Mart] DT3 --> SF[(Snowflake)]
Cortex AI Pipeline
Snowflake Cortex integration for AI-powered analytics with natural language queries.
Data Catalog & Observability
Centralized metadata, lineage tracking, and real-time health monitoring.
19 tools] CAT --> LIN[Lineage Graph
11 tools] CAT --> GL[Business Glossary] OBS[Observability
15 tools] --> MET[Metrics Store] OBS --> ALR[Alert Rules] OBS --> AH[Asset Health] LIN --> GR[GraphRAG
10 tools] CAT --> GR
E2E Assessment Pipeline
The 15-phase orchestrated workflow for end-to-end ERP data assessment, from connection to final report.
DataShield] M --> R[Discover Relationships] R --> D[Detect Dimensions] D --> BM[Generate Bus Matrix] BM --> Q[Quality Validation] Q --> ML[Model Load
Dims + Facts] ML --> PR[Persist to Snowflake] PR --> RP[Generate Report] RP --> BD[Bundle Artifacts] BD --> SP[Create ShieldProject] SP --> DONE[Assessment Complete] style C fill:#2d5a3d,stroke:#4ade80 style DONE fill:#2d5a3d,stroke:#4ade80
Hierarchy-Graph Bridge
Auto-populates the GraphRAG vector store and lineage graph whenever hierarchies change. Event-driven with rich semantic embeddings.
Create / Update / Delete] --> ASM[AutoSyncManager
Event Callbacks] ASM --> HGB[HierarchyGraphBridge] HGB --> VS[VectorStore
Rich Embeddings:
levels, mappings,
properties, formulas] HGB --> LG[LineageGraph
Source Mapping Edges] VS --> RAG[GraphRAG Search] LG --> RAG RAG --> PA[PlannerAgent] RAG --> RE[RecommendationEngine]
Gateway Mode — Dynamic Tool Exposure
Cross-LLM compatibility layer. Only ~18 gateway tools are visible; the remaining 239 are discoverable and executable via discover_tools() and run_tool(). Enable with DATABRIDGE_TOOL_MODE=dynamic (default: full).
Turbo Engine — Local Acceleration
Optional Polars + DuckDB acceleration layer. Data loads 10-100x faster locally, then persists to Snowflake via the existing bulk loader. Falls back to Pandas if not installed.
CSV / Parquet / JSON"] --> PL["Polars
Fast Read + Profile"] SRC --> DDB["DuckDB
Local SQL Engine"] PL --> PDF["pd.DataFrame
Tool Compatibility"] DDB --> PDF PDF --> SF["Snowflake
sf_pool + bulk loader"] PL -.->|fallback| PD["Pandas
pd.read_csv"] PD --> PDF
Vanna RAG Text-to-SQL Pipeline
RAG-powered SQL generation from natural language. Trains on DDL, documentation, and query history. Falls back to deterministic QueryBuilder when confidence is low.
DDL + Docs + Q&A] VR --> LLM[Claude LLM
Generate SQL] LLM --> CONF{Confidence
>= 0.7?} CONF -->|Yes| GQ[GeneratedQuery] CONF -->|No| DET[Deterministic
QueryBuilder] DET --> GQ GQ --> EXEC{Execute} EXEC --> DUCK[DuckDB
Local] EXEC --> SF[Snowflake
Remote]
PydanticAI Planning Loop
Multi-step reasoning agent for workflow planning. Iteratively validates plans using tool calls before returning a type-safe result.
Multi-turn Reasoning] PA -->|Tool Call| LA[list_available_agents] PA -->|Tool Call| CC[check_agent_capability] PA -->|Tool Call| VD[validate_step_dependency] PA --> REF[Iterative Refinement] REF --> PO[Validated PlanOutput
Pydantic Model] PO --> WP[WorkflowPlan] WP --> ORCH[PlatformOrchestrator]
Deployment Architecture
Production deployment on GCE with Nginx SSL termination and systemd service management.
SSL Termination
Let's Encrypt] NG -->|databridge.dataamplifier.io| DASH[Dashboard Service
systemd: databridge-dashboard
Port 5050] NG -->|mcp.databridge.dataamplifier.io| MCP[MCP SSE Service
systemd: databridge-mcp
Port 786] DASH --> FL[Flask UI
run_ui.py] MCP --> SRV[MCP Server
run_server.py --sse] SRV --> SF[(Snowflake)] SRV --> FS[Local Filesystem
data/]
Commercialization Tiers
Three-tier licensing model with increasing tool counts and capabilities.
~106 tools
Free - PyPI] --> PRO[Pro Edition
~247 tools
Licensed - GitHub Packages] PRO --> ENT[Enterprise
267+ tools
Custom Deploy] CE --> EX[Pro Examples
47 tests + 29 use cases]
Changelog
v0.49.4 - March 1, 2026
- Enterprise Intelligence Layer: Builds 1–6 complete
- Decision-making loop: VOI, Thompson Sampling, Monte Carlo rollout
- Cost optimizer, governance dashboard, rule auto-tuner
- Active learning, calibration, self-learning feedback loop
- Distributed architecture: CE / Pro / Enterprise tiers
- 5-layer IP protection: license server, source stripping, Cython, API auth, data moat
- GraphRAG: 4,571 nodes, 408K edges across all domains
- 4,363 tests passing
- Total tool count: 339 CE (393 Enterprise)
v0.45.0 - February 24, 2026
- Financial Validation Framework: ERP detection, TB validation, GL-TB reconciliation
- Evaluation & Metrics Framework: 15 CE tools, Nelder-Mead tuner
- Pattern Abstraction with federated privacy (k-anonymity, differential privacy)
- Distributed architecture groundwork: Redis, Celery, PostgreSQL, S3
- Total tool count: 339
v0.43.0 - February 20, 2026
- Wright Integration: hierarchy-driven 4-object pipeline generation
- Hierarchy-Graph Bridge: auto-sync GraphRAG on hierarchy changes
- Lineage graph with full provenance tracking
- Detection grounding: knowledge-backed anomaly rules
- Total tool count: 290
v0.42 - February 18, 2026
- BLCE P5: DDL executor + deployment phase (phase 21)
- 22 new tools added (tools 51-72), 5 new phases (17-21)
- Auto-build pipeline: schema creation, DDL execution, validation
- Swarm orchestration for parallel AI enrichment
- Artifact bundle generation with rich HTML reports
- Dashboard UI refresh with Architecture/Changelog tabs, BLCE Engine page
- Total tool count: 267
v0.41.0 - February 16, 2026
- BLCE Engine launch: Business Logic Comprehension Engine
- 50 initial tools across 16 phases
- SQL parsing, measure normalization, cross-referencing
- Evidence collection, governance metadata, model generation
- Bus matrix generation, quality validation
- 601 tests passing
v0.40.0 - January 15, 2026
- E2E Assessment Pipeline: 15-phase orchestrated workflow
- DataShield UI: offline data masking before AI processing
- Snowflake Connection Pool: singleton SSO auth for pipelines
- Bulk VARIANT loader for Snowflake persistence
- ERP config registry with auto-detect + Enertia preset
- Report generator with KPI tiles, bus matrix, timeline
v0.39.0 - December 2025
- Data Observability: metric recording, anomaly detection, asset health
- GraphRAG Engine: knowledge graph + vector search
- Data Versioning: snapshots, diffs, and rollback
- AI Relationship Discovery: schema analysis, naming patterns, FK detection
- Cortex Table Understanding: AI-generated table summaries
🧬 BLCE Engine
The Business Logic Comprehension Engine (BLCE) is Ithaca's core analytical engine. It ingests raw ERP SQL views and tables, extracts business logic, normalizes measures, discovers hierarchies, and generates a complete Kimball-style data warehouse — all through a 21-phase automated pipeline.
21-Phase Pipeline
How It Works
| Phase Group | Phases | Purpose |
|---|---|---|
| Intake & Discovery | 1-6 | Connect to ERP, catalog tables, parse SQL, identify reports |
| Analysis & Normalization | 7-9 | Normalize measures, detect hierarchies, cross-reference |
| Governance & Modeling | 10-14 | Collect evidence, apply governance, generate Kimball model, bus matrix |
| Quality & Skills | 15-16 | Validate data quality, generate domain-specific AI skills |
| Enrichment & Build | 17-21 | AI enrichment, swarm orchestration, auto-build DDL, deploy |
84 BLCE Tools by Function
Parsing & Extraction (8 tools)
Normalization & Grain (2 tools)
Cross-Reference & Comparison (4 tools)
Evidence & Governance (5 tools)
AI & Semantic (4 tools)
Skills & Generation (2 tools)
Pipeline & Orchestration (5 tools)
Parallel Engine & Agents (7 tools)
Client Interaction (6 tools)
E2E Handoff (2 tools)
Model Operations (9 tools)
Proposal & Code Generation (9 tools)
Analysis & Mapping (3 tools)
Review & Deployment (6 tools)
Graph Copilot & Excel Reconciliation (3 tools)
DataShield Classification (9 tools)
17 Pydantic Contracts
BLCE uses strongly-typed Pydantic models at every phase boundary. Each contract validates data flowing between phases.
| Contract | Prefix | Purpose |
|---|---|---|
| ParsedSQL | PSQL_ | Validated SQL parse tree with CTEs, joins, measures |
| NormalizedMeasure | NM_ | Canonical measure with aggregation type, grain, units |
| DetectedHierarchy | DH_ | Discovered hierarchy levels with parent-child links |
| CrossReference | XR_ | Cross-table relationships with confidence scores |
| EvidenceRecord | ER_ | Source evidence for each analytical decision |
| GovernanceTag | GT_ | PII/sensitivity classification, retention policy |
| DimensionSpec | DS_ | Kimball dimension definition with SCD type |
| FactSpec | FS_ | Kimball fact table with grain, measures, FK links |
| BusMatrixEntry | BM_ | Fact-dimension intersection for bus matrix |
| QualityRule | QR_ | Data quality expectation with threshold |
| SkillPrompt | SP_ | Generated AI skill with domain context |
| EnrichmentResult | ENR_ | AI-enriched metadata and descriptions |
| SwarmTask | ST_ | Parallel task definition for swarm orchestration |
| DDLStatement | DDL_ | Generated CREATE TABLE/VIEW statement |
| DeploymentPlan | DP_ | Ordered DDL execution plan with rollback |
| ArtifactBundle | AB_ | HTML report, JSON metadata, diagram outputs |
| PipelineState | PS_ | Checkpoint state for pipeline resume/rollback |
Mart Factory (Phase 26)
The Mart Factory generates complete 4-object Snowflake Dynamic Table pipelines from hierarchy configurations. It uses heuristic discovery to auto-detect hierarchy patterns and suggest optimal mart configurations.
10 MCP Tools
| Category | Tools |
|---|---|
| Config (3) | create_mart_config, add_mart_join_pattern, export_mart_config |
| Pipeline (3) | generate_mart_pipeline, generate_mart_object, generate_mart_dbt_project |
| Discovery (2) | discover_hierarchy_pattern, suggest_mart_config |
| Validation (2) | validate_mart_config, validate_mart_pipeline |
4-Object Pipeline
Formula Engine — 5-Level Precedence Cascade
| Level | Operations | Example |
|---|---|---|
| P1 | SUM | Revenue = Sum of all revenue line items |
| P2 | SUBTRACT, ADD | Net Revenue = Revenue - Discounts |
| P3 | DIVIDE, RATIO | Gross Margin % = Gross Profit / Revenue |
| P4 | VARIANCE | Variance = Actual - Budget |
| P5 | Complex | Custom multi-step calculations |
DataShield — Trust & Data Classification
DataShield provides offline data masking, PII/sensitivity classification, and trust attestation enforcement for AI-safe data processing.
Key Capabilities
| Feature | Description |
|---|---|
| PII Classification | Column-level sensitivity detection (PII, PHI, financial, confidential) using heuristic + AI models |
| Trust Attestations | Every AI phase records pass/fail attestations. Configurable enforcement: hard_fail, warn, or off |
| Data Masking | Offline masking of sensitive columns before AI/LLM processing — no PII leaves your environment |
| Audit Trail | All attestations persisted as JSON files with timestamps, event IDs, and phase metadata |
Trust Enforcement Modes
| Mode | Behavior | Use Case |
|---|---|---|
hard_fail | Blocks phases when attestation is missing | Production deployments |
warn | Logs warning but continues execution | Development & E2E testing |
off | No enforcement | Local testing |
Classification Output
DataShield classifies every column in your schema and outputs structured reports:
⬡ Hierarchy Builder
Sample: Investment Property Financial Analysis DEMO
Commercial real estate investment property model with income statement, balance sheet, and financial analysis hierarchies.
Click a node in the hierarchy tree to view details.
Financial Analysis] --> IS[Income Statement] ROOT --> BS[Balance Sheet] ROOT --> FA[Financial Analysis
Report] IS --> REV[Revenue] IS --> OPEX[Operating Expenses] IS --> NOI_C[Net Operating Income] REV --> RENT[Rental Income] REV --> CAM[CAM Reimbursements] REV --> OTH_R[Other Income] BS --> ASSETS[Assets] BS --> LIAB[Liabilities] BS --> EQ[Owner Equity] FA --> NOI[NOI Analysis] FA --> CAP[Cap Rate Analysis] FA --> DCF[DCF Valuation]
Hierarchy Tree
Select a project to view its hierarchy tree.
Select a node from the tree to edit its details.
Demo Showcase
The 60-Second Wow
Upload your Chart of Accounts. Get a production-ready financial hierarchy and dbt models. Zero config.
5-phase pipeline from raw CSV to queryable Snowflake mart
CSV
Classify
Hierarchy
Mart
Snowflake
Financial Intelligence Demos
CFO-grade analytics, forensic auditing, executive dashboards, and portfolio risk assessment.
CFO Gross Margin Mismatch
Revenue = $12.4M but COGS shows $8.1M creating a 34.7% margin vs expected 42%. FixGenerator runs the full closure loop.
Month-End GL Reconciliation
GL trial balance vs sub-ledger totals with 47 mismatched accounts. GraphRAG recommends a workflow then reviews findings.
Audit Evidence Trail
External auditors need SOX compliance evidence. CFO-strict search for audit trail data plus closure metrics KPI dashboard.
Ghost Vendor Fraud
Forensic ledger-to-invoice matching detects ghost vendors with zero POs. Flags suspicious invoices and quantifies total exposure.
View DataM&A Integration Conflict
Compare account ID formats across merging entities. Detects 847 overlapping IDs and generates unified mapping recommendations.
View DataPortfolio Synergy Capture
Cluster redundant cost centers across PE portfolio companies. Quantifies $3.36M/yr savings pipeline across 3 opportunity clusters.
View DataExecutive Dashboard
C-suite portfolio overview: 1,945 companies tracked, trust store health, $18.2M synergy pipeline, and risk breakdown by category.
Portfolio Risk Heatmap
PE firm risk density grid: 5 firms x 4 issue types with color-coded severity. Red zones = immediate manual audit required.
Data Engineering Demos
ERP integration, grain analysis, fact harmonization, self-healing pipelines, and legacy modernization.
ERP Integration Quality Check
Migrating from legacy ERP with 200+ tables. Search for relevant tools then get a workflow recommendation.
Enterprise Grain Analysis
Two ERP systems at different granularity — daily vs monthly. Detect, compare, and recommend alignment.
Fact Harmonization
Match columns across two fact tables using exact, prefix-stripped, and fuzzy matching. Generate UNION ALL SQL.
Progressive Parity Validation
5-gate state machine: LINE_ITEM/MONTH → FULL_REPORT/YEAR. Watch gates pass, fail, fix, and certify.
Parity Certificate
Full validation cycle: compile spec, run progressive validation, classify discrepancies, generate signed certificate.
Self-Healing Pipeline
4-stage autonomous loop: Detect issues, patch in sandbox, memorize the fix, replay on new companies with zero human input.
Architect Modernizer
Legacy COBOL → star schema: AI proposal, architect correction (SK + SCD2), then self-improved replay on new files.
Advanced: Implementation Showcase (6 demos) & Platform Internals
ERP Template Auto-Select
Shows how the system resolves ERP template strategy (explicit/alias/detected).
Proposal Coverage Impact
Compares no-template baseline vs template-enriched proposal coverage.
Trust Metrics Live
Pulls real trust-policy metrics from GraphRAG runtime memory.
Policy Explainability
Drills into one discrepancy event and explains trust factors and reasoning codes.
Retention Tier Status
Shows hot/warm/cold memory tier distribution.
WolfePak Quick Start
Fast bootstrapping to at least 5 dimensions and 3 facts.
DataBridge AI v0.49.4 — Platform Recommendation Guide
Comprehensive guide covering E2E Assessment Pipeline, BLCE Business Logic Engine, and Hierarchy Financial Reporting.
| # | Phase | What It Does | Business Value | Duration |
|---|---|---|---|---|
| 1 | load_metadata | Connects to source, samples every table, detects column types, infers basic FK patterns | Baseline understanding of source schema | 5–15 min |
| 2 | ai_relationship_discovery | 6-sub-phase AI pipeline: schema inventory, naming patterns, deterministic FK scan, value overlap, Cortex semantic matching, confidence scoring | Uncovers hidden FK relationships invisible to naming heuristics | 5–10 min |
| 3 | group_tables | Clusters tables by ERP domain prefix (GL, RV, AR, etc.) | Organizes 100+ raw tables into business domains | <1 min |
| 4 | hierarchy | Builds dimension hierarchies (optional) | Pre-built rollup structures | Skipped |
| 5 | ai_classify | DataShield column classification — identifies PII (SSN, email, phone), classifies identifiers, measures, dates, codes | Security: all PII identified and masked before data leaves client environment | 10–20 min |
| 6 | detect_dimensions | Kimball heuristic classifier: dimension (referenced by 3+ tables), fact (references 2+ dims), bridge (M:M resolver) | Foundational DW design: which tables are facts vs. dimensions | 2–5 min |
| 7 | wright | Generate dbt pipelines from hierarchy projects | Automated data mart generation | Skipped |
| 8 | dbt | Deploy dbt transformations | Continuous transformation pipeline | Skipped |
| 9 | quality | Generates Great Expectations suites from dim/fact classification (NOT NULL, UNIQUE, NUMERIC) | Automated data quality monitoring — detect drift before production | 2–5 min |
| 10 | observability | Record SLA metrics and asset health | Ongoing monitoring | Skipped |
| 11 | artifact_bundle | Rich HTML report with KPI tiles, phase timeline, bus matrix, classification breakdown | Client-facing deliverable — one page that tells the whole story | 2–5 min |
| 12 | bus_matrix | Kimball fact×dimension conformance grid | Architecture scorecard: 75%+ = enterprise-ready, <40% = design gaps | <1 min |
| 13 | dm_spec_generate | Auto-generates dimensional model table specs (DIM_*, FCT_*) with SCD2 boilerplate | Automated DW design — production-ready table specs in seconds | 1–2 min |
| 14 | dm_load | Loads dimensional tables from source to warehouse (chunked INSERT, 500-row batches) | Analytics warehouse live, ready for Tableau/Power BI | 50–80 min |
| 15 | quality_from_classification | Maps DataShield per-column classifications to advanced quality expectations | Tier-2 quality rules: SSN format regex, date range validation, code set membership | 2–5 min |
E2E Assessment: 111 Source Tables to Kimball Star Schema
The client runs Enertia ERP with 111 tables across Revenue, Production, Land, Joint Interest Billing, and GL. Table names like RVMASHDR, JIBDTLRC are cryptic — no FK metadata in the schema.
| Phase | Result | Business Impact |
|---|---|---|
| load_metadata | 111 tables sampled, 247 relationships inferred | First-ever complete schema map |
| ai_relationship_discovery | 21 naming patterns (GL, RV, AB, JIB), Cortex semantic matching | Discovered FK patterns invisible to heuristics |
| ai_classify | 2,143 columns classified, masked samples generated | Security team verified: zero PII leaked to analytics layer |
| detect_dimensions | 36 dimensions, 9 facts identified | Kimball design: GLCHART, PROPERTY, CUSTOMER as conformed dims |
| dm_load | 14 tables loaded, 5.7M rows | Production DW ready for Power BI — Revenue, Production, GL analytics |
E2E Assessment: Stripe + Salesforce + Snowflake Native
Consolidate subscription data from Stripe (payments), Salesforce (CRM), and product usage database into a unified analytics warehouse.
| Phase | Expected Result | Business Impact |
|---|---|---|
| load_metadata | ~45 tables: 15 Stripe, 20 Salesforce, 10 product | First unified view of all subscription data |
| ai_relationship_discovery | Cross-system FK: customer_id, subscription_id, invoice_id | Links Stripe charges to Salesforce opportunities to product usage |
| detect_dimensions | DIM_CUSTOMER, DIM_PLAN, DIM_DATE; FCT_SUBSCRIPTION, FCT_INVOICE | Kimball design for MRR/ARR, churn, LTV analytics |
| bus_matrix | Conformance grid showing shared CUSTOMER and DATE dims | Validates star schema supports cross-domain cohort analysis |
1. Dimensional Data Warehouse
- 14–17 production-ready tables (DIM_* + FCT_*)
- 5–10M rows of clean, conformed data
- SCD2 change tracking for dimension history
- Ready for Tableau / Power BI / Looker
2. Metadata Audit Trail (6 tables)
- RUN_SUMMARY — pipeline execution proof
- TABLE_PROFILES — per-table profiling stats
- RELATIONSHIPS — FK discovery + confidence
- CLASSIFICATIONS — per-column data types
- TABLE_SUMMARY — business purpose per table
- MASKED_SAMPLES — de-identified sample data
3. Reports & Quality
- Rich HTML report (KPI tiles, phase timeline)
- Bus matrix conformance grid
- Great Expectations quality suites
- GraphRAG knowledge base entries
| Format | What Gets Extracted | Example |
|---|---|---|
| SQL | SELECT measures, WHERE filters, GROUP BY grain, FROM/JOIN dependencies | SELECT SUM(amount) AS total_revenue FROM sales WHERE is_active = true |
| Python | pandas aggregation patterns, DataFrame transformations | df.groupby('region')['amount'].sum() |
| Excel | Named ranges, SUMIF/VLOOKUP formulas, pivot table definitions | =SUMIF(A:A,"Revenue",B:B) |
| DAX | Power BI measure definitions, CALCULATE contexts | CALCULATE(SUM(Sales[Amount]), YEAR(Sales[Date])=2025) |
| MDX | SSAS cube queries, dimension hierarchies | SELECT [Measures].[Revenue] ON 0 FROM [Cube] |
| OCR-extracted tables, report structure, KPI definitions | Board report with "Net Revenue: $12.5M" | |
| CSV | Column headers as grain, numeric columns as measures | Monthly budget CSV with department, amount, period |
Phase 1: Parse Sources
Extracts LogicArtifacts from any combination of source files — each artifact captures measures, filters, joins, grain columns, and source dependencies.
Phase 2: Normalize
Deduplicates and canonicalizes measures across all sources. Consolidation with a confidence boost of +0.1 per source.
Phase 3: Cross-Reference
Discovers relationships between artifacts using 3 strategies: Column Name Similarity (0.75), Grain Matching (0.85), and Measure Expression Matching.
Phase 4: Evidence Sampling
Builds validation queries to test extracted logic against actual data — up to 5,000 rows with a 12-month lookback window. SHA-256 hashed for integrity.
Phase 5: Governance Classification
Scores each artifact on a [-1.0, 1.0] scale using 5 evidence-based rules to classify as CORE, CANDIDATE, or CUSTOM.
Phase 6: Skill & DDL Generation
Only CORE-classified artifacts generate reusable AI skill prompts and production Snowflake DDL.
| Rule | Boost | Penalty | What It Measures |
|---|---|---|---|
| Standard Aggregations (SUM, COUNT, AVG) | +0.20 | — | Uses widely-recognized patterns |
| Core Naming (total_, count_, sum_, net_) | +0.15 | — | Domain-standard naming conventions |
| Custom Naming (custom_, client_, _temp) | — | -0.30 | Client-specific, non-reusable |
| Used by 3+ clients | +0.50 | — | Proven reusability |
| Used by 1 client only | — | -0.10 | Low reusability |
CORE (score ≥ 0.4)
Standardizable, reusable across clients. Gets skill prompt + DDL.
CANDIDATE (-0.1 < score < 0.4)
Near-CORE. Needs more client evidence.
CUSTOM (score ≤ -0.1)
Client-specific. Documented but not operationalized.
| Template | Industry | Type | Levels | Key Features |
|---|---|---|---|---|
| Standard P&L | General | Income Statement | 3 | Revenue, COGS, Gross Profit, OpEx, Net Income |
| Standard Balance Sheet | General | Balance Sheet | 3 | Assets (Current/Non-Current), Liabilities, Equity |
| Upstream O&G P&L | E&P | Income Statement | 4 | Oil/Gas/NGL revenue, LOE breakdown, DD&A, Netback per BOE |
| Midstream O&G P&L | Midstream | Income Statement | 4 | Gathering, Processing, Transportation, Storage revenue |
| Oilfield Services P&L | OFS | Income Statement | 3 | Well services, Completion, Workover, Rig revenue |
| Oil & Gas LOS | E&P | Lease Operating | 5 | Per-property LOE: labor, chemicals, utilities, workover |
| Manufacturing P&L | Manufacturing | Income Statement | 4 | Product lines, COGS by material/labor/overhead |
| SaaS P&L | SaaS | Income Statement | 3 | Subscription/Professional/Usage revenue, CAC, LTV |
| Transportation P&L | Logistics | Income Statement | 3 | Freight revenue, fuel costs, maintenance |
| Cost Center Hierarchy | General | Cost Center | 4 | Revenue-generating, Production, Support, R&D |
| Profit Center Hierarchy | General | Profit Center | 4 | Business units, product lines, geographic regions |
VW_1 Translation View
- Dynamic column mapping via CASE
- Joins to dimension tables
- Multi-currency conversion
DT_2 Granularity Table
- UNPIVOT filter groups
- Multi-round filtering
- Exclusion logic via NOT IN
DT_3A Pre-Aggregation
- UNION ALL per join pattern
- Account segment filtering
- Sign change flag handling
DT_3 Data Mart
- 5-level formula cascade (P1–P5)
- Calculated rows via formula engine
- DENSE_RANK surrogate keys
From 10 Days to 3 Days — Automated P&L Rollup
Problem: Month-end close takes 10 days because accountants manually build P&L rollups in Excel across 5 entities.
Solution: Deploy Standard P&L hierarchy with GL account source mappings. Wright auto-generates VW_1 → DT_2 → DT_3A → DT_3.
Per-Property Cost Analysis with Netback Calculation
Problem: Operations managers need per-property LOE per BOE analysis.
Solution: Deploy O&G LOS hierarchy (5 levels). GL accounts 6100–6900 mapped to LOE categories. Wright DT_3 calculates Netback.
3-Entity GL Consolidation with GAAP-Compliant Eliminations
Problem: Holding company needs consolidated statements across 3 subsidiaries with IC elimination.
Solution: Consolidated P&L with entity-level rollup. 3 UNION ALL branches in DT_3A; elimination in DT_3 formula cascade.
| Stage | Capabilities | Metrics | Timeline |
|---|---|---|---|
| Stage 1: Assess | E2E Pipeline (15 phases) + BLCE Parse & Classify | Schema cataloged, dims identified, PII masked | Week 1 |
| Stage 2: Design | + Hierarchy templates + Bus matrix + BLCE governance | P&L hierarchy live, star schema designed | Week 2–3 |
| Stage 3: Build | + Wright pipelines + DM load + Quality expectations | Data marts live, BI connected | Week 3–4 |
| Stage 4: Optimize | + Hierarchy Intelligence + GraphRAG + BLCE skills | AI governance, automated skill library | Month 2+ |
🤖 AI Workflow Planner
No plan yet
📊 Workbook Analysis
What does Workbook Analysis do?
A 6-stage AI pipeline that scans Excel workbooks, classifies their purpose, extracts formula logic, links entities across sheets, and proposes fixes.
Supported Formats
Best Suited For
Multi-sheet financial workbooks with formulas — P&L, balance sheets, consolidation packs, budgets, forecasts, and financial models.
Archetype Classifications
| Archetype | Signals |
|---|---|
| Financial Report | Sheet names with P&L, balance, income, cashflow; functions like SUMIFS, VLOOKUP, IRR, NPV; currency formats |
| Data Extract | Keywords like export, dump, raw; high row-to-formula ratio; single-sheet flat tables |
| Model / Template | Template, form, forecast, scenario in filename; named ranges; data validation; complex formula chains |
| Consolidation | Cross-sheet references; multiple structurally similar sheets; intercompany, elimination keywords |
| Unknown | No strong archetype signal detected |
Pipeline Stages
Each stage is fail-forward — if one fails, independent stages still run.
Select Workbook
Sample Workbooks
Pick a sample and click Analyze to see the full pipeline in action:
Options
Advanced — skip stages
Instant Workbook Intelligence
Upload any Excel workbook and get a full analysis in seconds.
Try a sample workbook from the left panel, or upload your own.