Every company sitting on a data warehouse wants the same thing: let anyone ask questions in plain English and get reliable answers. OpenAI published how they built their internal data agent, and the open-source community responded fast. Here’s a quick summary of three projects pushing this forward.
OpenAI’s In-House Data Agent: 6 Layers of Context
Source: Inside OpenAI’s In-House Data Agent by OpenAI, January 2026
OpenAI built a bespoke data agent serving 3,500+ internal users across 600 petabytes and 70k datasets. The core insight: context is everything. Without it, even strong models hallucinate column names and misinterpret business terminology.
The agent grounds itself in 6 layers of context:
| Layer | What It Provides | How |
|---|---|---|
| Metadata Grounding | Schema, columns, data types, table lineage | Warehouse metadata |
| Query Inference | Historical query patterns, common joins | Ingested past queries |
| Curated Descriptions | Business meaning, caveats, intent | Domain expert annotations |
| Code-Level Definitions | How tables are built, freshness, scope | Codex-powered code crawling |
| Institutional Knowledge | Launches, incidents, metric definitions | Slack, Docs, Notion (RAG) |
| Memory | Corrections, discovered filters, nuances | Self-learning from conversations |
Plus a runtime context layer for live schema inspection when existing info is stale.
Key Lessons
- Tool consolidation matters - overlapping tools confuse agents. Restrict and consolidate.
- Less prescriptive prompting works better - rigid instructions pushed the agent down wrong paths. Higher-level guidance + model reasoning = more robust results.
- Code > metadata - pipeline logic captures assumptions and business intent that never surface in SQL or table schemas. Crawling the codebase with Codex was a game-changer.
- Memory is non-negotiable - stateless agents repeat the same mistakes. The self-learning memory stores corrections, non-obvious filters, and constraints critical for correctness.
Evaluation
OpenAI uses curated question-answer pairs with “golden” SQL. Generated SQL is compared both syntactically and by result set, using an LLM grader that accounts for acceptable variation. These run continuously as canaries in production.
Dash: Open-Source Self-Learning Data Agent
Source: Dash: Self-learning data agent by Ashpreet Bedi
GitHub: agno-agi/dash
Dash is an open-source implementation directly inspired by OpenAI’s architecture. It implements the same 6-layer context approach:
| Layer | Source |
|---|---|
| Table Usage | knowledge/tables/*.json |
| Human Annotations | knowledge/business/*.json |
| Query Patterns | knowledge/queries/*.sql |
| Institutional Knowledge | MCP (optional) |
| Memory | LearningMachine |
| Runtime Context | introspect_schema tool |
Self-Learning Loop
Dash learns through two systems:
- Static Knowledge - validated queries, business context, table schemas, metric definitions. Curated by your team, maintained alongside the agent.
- Continuous Learning - patterns discovered through trial and error. Column mappings, team focus areas, business term disambiguation. Implemented with just ~5 lines of code via
LearningMachine.
Quick Start
1 | git clone https://github.com/agno-agi/dash && cd dash |
Ships with F1 race data (1950-2020), a built-in UI via Agno, and an evaluation suite (string matching, LLM grading, golden SQL comparison). Built with the Agno framework.
Nao: Open-Source Analytics Agent
GitHub: getnao/nao (Y Combinator backed)
Nao takes a different approach - it’s a framework-first analytics agent focused on context building and deployment.
Two-Step Architecture
- nao-core CLI - build and manage agent context (data, metadata, modeling, rules, docs, tools, MCPs)
- nao chat UI - deploy a conversational interface for anyone to query
Key Differentiators
| Feature | Detail |
|---|---|
| Data stack agnostic | Works with any warehouse, any LLM |
| File-system context | Context organized as files - no limit on what you include |
| Agent reliability testing | Unit test agent performance before deploying |
| Version tracking | Version context and track performance over time |
| User feedback loop | Built-in thumbs up/down for continuous improvement |
| Self-hosted | Use your own LLM keys, full data privacy |
Quick Start
1 | pip install nao-core |
Also available via Docker:
1 | docker pull getnao/nao:latest |
Stack: Fastify + Drizzle + tRPC (backend), React + TanStack Query + shadcn (frontend).
Comparison
| Aspect | OpenAI Agent | Dash | Nao |
|---|---|---|---|
| Open Source | No (internal) | Yes (Apache 2.0) | Yes (Apache 2.0) |
| Context Layers | 6 layers | 6 layers (same model) | File-system based |
| Self-Learning | Memory system | LearningMachine | User feedback loop |
| Evaluation | Golden SQL + LLM grader | String match + LLM grader + golden SQL | Unit testing framework |
| LLM Support | GPT-5 / Codex | OpenAI | Any LLM |
| Data Stack | OpenAI internal | Configurable | Agnostic |
| UI | Internal tool | Agno platform | Built-in chat |
| Best For | - | Teams wanting OpenAI’s architecture OSS | Teams wanting stack-agnostic framework |
Takeaway
The pattern is clear: context-rich, self-learning data agents are becoming the standard. OpenAI proved the architecture at scale, Dash made it accessible, and Nao provides a framework-first alternative. The shared insight across all three: without deep, layered context, even the best models produce unreliable results.
References
- Inside OpenAI’s In-House Data Agent - OpenAI (Jan 2026)
- Dash: Self-learning data agent - Ashpreet Bedi
- agno-agi/dash - Dash GitHub repo
- getnao/nao - Nao GitHub repo
Comments