The dbt project structure that scales

Layer boundaries

The first decision on a blank project is how to organise the layers. Each layer needs a clear job — and models should only know about the layer below them. Without that constraint, models start doing too much and the project becomes hard to reason about as it grows.

Model layer contracts

Staging

One model per source table. Rename, cast, light cleaning only. No joins, no business logic. Nothing downstream should bypass this.

Intermediate

Ephemeral — compiles inline, never materialises. Where joins live. Can't be queried directly, which keeps the layer honest.

Marts

Business-entity grain. Stable, trusted. Knows nothing about BI tools or MCP — just models the business correctly.

BI

Materialised tables built for the BI tool. Optimised for query performance, not reuse. Consumers don't reach past this layer.

MCP

Curated layer for AI agents — pre-joined, columns restricted, access-controlled. Covered in Part 3.

    The rule: a model only knows about the layer below it. Nothing reaches across layers.
  

Fig 1 — Six model layers, each with a clear contract about what it's allowed to do

Schema isolation: one macro, three environments

The most useful thing in the repo is a single Jinja macro that controls schema naming across all three environments. In development, each engineer gets their own isolated namespace — nobody steps on each other's work. In CI, schemas are scoped to the PR number. In production, schemas are clean with no prefix.

Schema naming by environment

Dev

        dev_tanay_bi

        dev_tanay_staging

        dev_tanay_marts

Per-engineer namespace. Nobody shares schemas.

CI

        pr_123_bi

        pr_123_staging

        pr_123_marts

PR-scoped. Torn down after every run.

Production

        bi

        staging

        marts

Clean. No prefix. One macro controls all three.

Fig 2 — One macro derives the schema name from the target environment. No manual configuration per engineer.

The useful thing about this: A new engineer doesn't configure anything — they run dbt against their dev target and their schemas appear namespaced correctly. CI gets its own isolated space per PR, which means concurrent CI runs never conflict. And the cleanup macro drops everything scoped to a PR number after the run finishes, keeping the CI database clean.

A CI pipeline that only runs what changed

Running everything on every PR is what makes CI slow enough that engineers stop waiting for it. The first thing the pipeline does at Lyrebird is check whether any dbt files actually changed. If not, every downstream job is skipped — a Terraform-only PR doesn't wait for a dbt build, and the branch protection check still passes.

CI pipeline on every pull request

1

Change detection

Did any dbt files change? If no — skip everything. Branch protection still passes.

2

Lint changed files only

SQLFluff runs against the diff — not the whole project. Failures are always attributable to this PR.

3

Build in PR-scoped schema

dbt build --fail-fast in an isolated schema. If a model fails, CI stops immediately.

4

Drop PR schemas

A cleanup macro tears down all schemas prefixed with this PR number. Pass or fail — always runs.

    A sentinel job acts as the branch protection check — passes whether build succeeded or was skipped
  

Fig 3 — Four steps, each with a clear purpose. Non-dbt PRs skip steps 2–4 entirely.

Branch protection requires a passing check — but if dbt didn't change, that check should still pass. A sentinel job handles this: it passes whether the build ran or was skipped. Non-dbt PRs are never blocked waiting for a check that won't run.

Deployment: merge to ship

Deployment triggers automatically on merge to main when dbt files changed. There's one non-obvious design decision: full-refresh runs are controlled by the PR title. If the title contains a specific flag, the deploy runs with full-refresh. Otherwise it's incremental. No separate workflow, no manual trigger — just a naming convention the pipeline reads before deciding how to build.

This matters because full refreshes on large incremental models are expensive and occasionally necessary. Having it controlled by the PR title means the decision is visible in the git history, reviewable in the PR, and doesn't require anyone to remember to flip a configuration somewhere.

Code style: lowercase everything

The linting config enforces a small set of rules that apply to every SQL file in the repo. The most important is the simplest: everything lowercase. Keywords, identifiers, functions, literals — all lowercase. Not because it's objectively correct, but because consistency removes a whole category of review comment. A PR that touches SQL should be reviewed for its logic, not its formatting.

Rule 1

Lowercase everything

Keywords, identifiers, functions, literals. No exceptions. The linter enforces it so reviewers don't have to.

Rule 2

120 character line limit

Long enough to be practical, short enough that side-by-side diffs are readable. Enforced by SQLFluff, not convention.

Rule 3

!= not <>

One not-equal style across the project. Picked c_style. Could have been either — what matters is it's consistent.

What a PR has to prove

Every PR uses a shared template with the same sections in the same order: what changed, why, how it was tested, and a security check confirming no credentials were hardcoded. The template is short enough that it doesn't slow anyone down, but specific enough to catch what usually gets missed — particularly the testing evidence.

The combination of isolated schemas, diff-based CI, and a consistent PR template means review conversations focus on modelling decisions. Not formatting, not missing tests, not "did you check this against prod?" — those are handled structurally before the review starts.

The dbt projectstructure that scales.

Layer boundaries

Schema isolation: one macro, three environments

A CI pipeline that only runs what changed

Deployment: merge to ship

Code style: lowercase everything

What a PR has to prove

The dbt project
structure that scales.