Why raw access isn't enough
Pointing an AI agent at raw warehouse tables is the same as handing someone a database connection with no data platform in front of it. You get results, but you have no control over what gets queried, no guarantee the definitions are consistent, and no isolation between the agent's workload and everything else running on that compute. It's not a starting point — it's a known failure mode.
The design constraint was clear from the start: the agent would query through the same governed interface built for human analysts — curated models, defined metrics, explicit access controls. Anything less wasn't a simpler version — it was no data platform at all.
via MCP
allowlist only
per metric
directly
Isolate everything
The first decision is physical separation. The agentic layer lives in its own database, runs on its own compute, and is entirely separate from the warehouse your ETL and BI tools use. This isn't just about security — it means an expensive agent query can't saturate the compute your production pipelines depend on, and a misconfigured grant in the agentic layer can't accidentally expose something in the main analytics database.
In practice this means: dedicated database, dedicated warehouse (auto-suspending when idle), and two roles — one for building, one for querying — with nothing shared between them.
Two roles, one job each
The access model has exactly two roles. The first is for the data team — it can create objects, manage grants, and configure the agentic infrastructure. The second is what the AI agent authenticates as at runtime. It can only read, and only from an explicit allowlist.
The agent cannot query a table it was never granted access to. That boundary lives at the database level — not in a system prompt that could be overridden or ignored in a future conversation.
What the agent can and can't see
The agent role has an explicit allowlist. Every table outside that list returns an access error — not a result. The list is deliberately narrow: user behaviour data, product usage metrics, and sales call records. Revenue tables, billing data, and raw source tables are never granted.
A curated data layer between agent and source
Even within the allowlisted tables, the agent doesn't query raw data directly. Every approved table goes through a curated layer first — a set of models built specifically for AI consumers that pre-join context, strip sensitive columns, and expose only the fields the agent genuinely needs.
Semantic views: one definition per metric
On top of the curated data layer sits a semantic layer — views that add business-friendly names, metric definitions, and natural language descriptions so the agent understands the data without guessing from column names or inventing its own logic.
The critical property: every metric is defined exactly once. Total consults, activation rate, average generation time — every consumer of the semantic layer gets the same definition. There's no version of "activation rate" that calculates differently depending on which tool is querying. The agent reads the semantic view; the semantic view enforces the definition.
What this prevents
Three failure modes that become structurally impossible with this design — not just unlikely:
The whole pattern is version-controlled in Terraform — roles, grants, semantic view definitions, warehouse config. That means access controls are reviewable in a pull request like any other infrastructure change, not managed through a UI where history is hard to audit.
The principle that generalises
The components ended up being straightforward: a dedicated role scoped to an explicit allowlist, a curated data layer that pre-filters before the agent sees anything, a semantic layer that enforces metric definitions, and isolated compute. What took time was the design — deciding what the agent should and shouldn't see, and making that decision live in the infrastructure rather than in a conversation.
What makes it hold up over time is that none of the boundaries depend on the agent behaving correctly. They're enforced by the database, the role system, and the data model. The agent can't accidentally or deliberately step outside them — and when a new team member asks "what can the agent see?", the answer lives in version-controlled Terraform, not someone's memory.