Deploying governed infrastructure for agentic AI

Why raw access isn't enough

Pointing an AI agent at raw warehouse tables is the same as handing someone a database connection with no data platform in front of it. You get results, but you have no control over what gets queried, no guarantee the definitions are consistent, and no isolation between the agent's workload and everything else running on that compute. It's not a starting point — it's a known failure mode.

The design constraint was clear from the start: the agent would query through the same governed interface built for human analysts — curated models, defined metrics, explicit access controls. Anything less wasn't a simpler version — it was no data platform at all.

The governing principle

AI agent

asks questions
via MCP

governed

Read-only role

explicit grants
allowlist only

curated

Semantic layer

one definition
per metric

filtered

Source data

never touched
directly

    Isolation at every layer — compute · database · role · table · column
  

Fig 1 — The agent never touches raw data directly. Every layer between them enforces a boundary.

Isolate everything

The first decision is physical separation. The agentic layer lives in its own database, runs on its own compute, and is entirely separate from the warehouse your ETL and BI tools use. This isn't just about security — it means an expensive agent query can't saturate the compute your production pipelines depend on, and a misconfigured grant in the agentic layer can't accidentally expose something in the main analytics database.

In practice this means: dedicated database, dedicated warehouse (auto-suspending when idle), and two roles — one for building, one for querying — with nothing shared between them.

Two roles, one job each

The access model has exactly two roles. The first is for the data team — it can create objects, manage grants, and configure the agentic infrastructure. The second is what the AI agent authenticates as at runtime. It can only read, and only from an explicit allowlist.

Builder role

Used by the data team to create semantic views, configure MCP servers, and manage grants. Never used at query time.

Create objects Manage grants Data team only

Agent role

What the AI agent authenticates as. Read-only. Scoped to an explicit allowlist of approved tables and views.

SELECT only Allowlisted tables Semantic views

The agent cannot query a table it was never granted access to. That boundary lives at the database level — not in a system prompt that could be overridden or ignored in a future conversation.

What the agent can and can't see

The agent role has an explicit allowlist. Every table outside that list returns an access error — not a result. The list is deliberately narrow: user behaviour data, product usage metrics, and sales call records. Revenue tables, billing data, and raw source tables are never granted.

Behavioural data

Curated activity events — pre-joined with context, sensitive columns stripped before the agent sees them

Granted

Business metrics

Aggregated product usage and engagement — modelled at a safe grain, not raw transactions

Granted

Operational intelligence

Sales, support, and engagement signals — for theme and pattern analysis, not individual records

Granted

Financial data

Aggregated revenue metrics only — individual billing records, subscription details, and raw financial data are blocked

Aggregated

Personal information

Contact details, identifiers, anything that could surface an individual — stripped at the data layer

Blocked

Raw source tables

Unmodelled production data — the agent queries pre-joined, aggregated fact tables, never raw sources directly

Blocked

A curated data layer between agent and source

Even within the allowlisted tables, the agent doesn't query raw data directly. Every approved table goes through a curated layer first — a set of models built specifically for AI consumers that pre-join context, strip sensitive columns, and expose only the fields the agent genuinely needs.

Pre-join

The agent doesn't navigate table relationships. The curated layer does that upstream — joining user context, practice details, and event metadata into a single clean table before the agent ever sees it.

Column restriction

Only the columns the agent needs are selected. Billing fields, contact details, and internal IDs are stripped at this layer — not filtered by the agent's instructions, which can be overridden.

Automatic grants on deploy

Each curated model automatically grants read access to the agent role when deployed to production — and only production. In development and CI environments, no grant is issued. The access control ships with the model, not as a separate manual step that could be forgotten.

Semantic views: one definition per metric

On top of the curated data layer sits a semantic layer — views that add business-friendly names, metric definitions, and natural language descriptions so the agent understands the data without guessing from column names or inventing its own logic.

The critical property: every metric is defined exactly once. Total consults, activation rate, average generation time — every consumer of the semantic layer gets the same definition. There's no version of "activation rate" that calculates differently depending on which tool is querying. The agent reads the semantic view; the semantic view enforces the definition.

The full deployment stack

AI agent

queries semantic views via MCP server · authenticates as agent role

Semantic layer

governed metric definitions · business-readable names · one source of truth

Curated data layer (dbt)

pre-joined · columns restricted · auto-grants on prod deploy only

Source tables (allowlisted)

7 approved tables · revenue and PII blocked · agent role never touches raw sources

Isolated compute & database

dedicated warehouse · auto-suspends when idle · no shared resources with ETL or BI

Fig 2 — The full governance stack: isolation at compute, database, role, table, and column level

What this prevents

Three failure modes that become structurally impossible with this design — not just unlikely:

Data leakage

The agent tries to query a revenue or PII table. The database returns an access error — not a result. No prompt engineering required to hold that boundary. It exists at the role level and cannot be overridden by a conversation.

Metric hallucination

The agent invents its own definition of "activation rate." The semantic layer corrects it — because the view defines what that metric means, and the agent queries the view, not the raw table. Wrong definitions can't persist.

Resource contention

An agentic query generates unexpected load. It hits an isolated warehouse that auto-suspends when idle — it can't consume compute budgeted for production ETL or BI queries, and cost stays predictable.

The whole pattern is version-controlled in Terraform — roles, grants, semantic view definitions, warehouse config. That means access controls are reviewable in a pull request like any other infrastructure change, not managed through a UI where history is hard to audit.

The principle that generalises

The components ended up being straightforward: a dedicated role scoped to an explicit allowlist, a curated data layer that pre-filters before the agent sees anything, a semantic layer that enforces metric definitions, and isolated compute. What took time was the design — deciding what the agent should and shouldn't see, and making that decision live in the infrastructure rather than in a conversation.

What makes it hold up over time is that none of the boundaries depend on the agent behaving correctly. They're enforced by the database, the role system, and the data model. The agent can't accidentally or deliberately step outside them — and when a new team member asks "what can the agent see?", the answer lives in version-controlled Terraform, not someone's memory.

Governed infrastructurefor agentic AI.

Why raw access isn't enough

Isolate everything

Two roles, one job each

What the agent can and can't see

A curated data layer between agent and source

Semantic views: one definition per metric

What this prevents

The principle that generalises

Governed infrastructure
for agentic AI.