The gap worth closing
The pipelines were running. Product data existed. But metric definitions lived in fifteen different BI workbooks, nobody agreed on what "activation" meant, and every business question went through the data team. A ticket, a wait, an answer that arrived after the decision had already been made.
The first job was to fix the foundation — a single semantic layer where every metric was defined once and owned. The second was to make the data reachable without an engineer in the loop.
I built the data platform at Lyrebird from scratch. Getting to stage two was the easy part — pipelines running, dashboards live, tickets getting closed. But it was clear that wasn't enough. For data to matter strategically it needs three things: to be accurate enough that people trust it, accessible enough that they don't need an engineer to get it, and oriented around the questions the business actually cares about. That's what was worth building.
Building the semantic layer first
Before any AI agent can query your data reliably, the data has to mean something. That sounds obvious, but most warehouses are a graveyard of inconsistently named tables, duplicated metric definitions, and logic that lives in fifteen different BI workbooks. An LLM pointing at that will hallucinate confidently and be wrong.
The fix is a semantic layer — a governed interface where every metric is defined once, in one place, with one owner. I built this using Snowflake semantic views: SQL views with explicit metric definitions, business-friendly column names, and row-level access controls that enforce what each role is allowed to see.
The critical design decision was treating AI agents as first-class consumers of the semantic layer, not an afterthought. When I connected Claude to Snowflake via MCP, it queried the semantic views — not raw tables. That means it sees business-friendly metric names, enforced access controls, and pre-validated logic. The agent can't accidentally query a table it shouldn't see, and it can't invent a metric definition that contradicts the one Finance is using.
The insight: A semantic layer built only for BI tools will fail when you add AI. Build it for the most demanding consumer first — an LLM that will query it at 3am without a human in the loop — and the BI use case becomes easy.
feeding the layer
on one platform
per metric
Agentic analytics: the WBR that runs itself
The first thing I automated once the semantic layer was in place was the Weekly Business Review. Every Monday, a senior leadership team at Lyrebird would spend hours compiling metrics across six business sections — product, marketing, SMB sales, enterprise, conversion funnel, and north star. The output was a PowerPoint. The insight arrived 48 hours after the week ended.
I replaced it with an AI agent that queries the semantic layer, calculates 13-week trend tables, scores every metric against its vPlan target with RAG scoring, and generates a shareable Claude artifact and a formatted Word document — automatically, triggered on Monday morning.
Trigger fires
13-week window
Trend analysis
vPlan delta
The human still writes the narrative — decisions, highlights, lowlights. That's appropriate. But the 3 hours of data compilation that preceded that thinking is now zero. The agent is also more consistent than a human: it never misses a metric, never miscalculates a vPlan score, and never formats a table differently from the previous week.
The intervention engine
The semantic layer and the automated WBR freed up enough time and trust in the data to tackle the problems that actually mattered commercially. At Lyrebird there were two: activation rates that were stalling out, and paying GPs going quiet with nobody watching.
Problem 1 — GP activation wasn't converting
GP activation rates had been declining — from 61% in January 2026 to 44% by May. The core issue: 43–55% of new GPs never got a note saved to Best Practice on day one, which meant they never hit the "dopamine hit" moment that drives retention. No note saved to BP, no value felt, no reason to come back.
The solution was a structured day-one onboarding flow with two diverging paths based on whether the GP reached that value moment or not.
→ Completes first listening session session 1
→ Trigger: dopamine hit notification sent
→ Show to-do list for next steps
→ Social proofing content surfaced
→ Falls into EOD Day 1 comms flow
→ Research prompt triggered:
"Why wasn't your note saved?"
• Investigate integration gaps and recording issues for non-integrated users
• Can social proofing / to-do list nudges help close the activation gap for Path 1 users?
never saved
The onboarding flow above defines the logic — who gets what, and when. What follows is the architecture that makes it run. Snowflake does the qualification, HubSpot executes the comms, and the outcome data flows back into Snowflake to close the loop.
on business criteria
Splits into experiment
& control groups
Experiment / control
flags included
Path 1: dopamine hit flow
Path 2: EOD rescue flow
Control: no comms
re-ingested
Experiment lift
measured
The 43–55% of GPs who never saved a note to BP on day one were the problem population. For them, the flow triggered a research prompt ("Why wasn't your note saved?") and an end-of-day comms sequence segmented by integration status — integrated GPs got a follow-up survey, non-integrated got a nudge toward BP setup, and the highest-touch segment got a direct call routed into the day-two re-engagement flow.
Problem 2 — Paying GPs going quiet with no one watching
The second problem was harder to solve with comms alone. These were already-paying GPs — some with months of tenure — who had simply stopped using the product. No single trigger, no obvious day-one failure. Just a slow drift toward zero consults over several weeks, and a 5.4-week average window before they cancelled.
With 674 GPs showing warning signals across three risk tiers, and each percentage point of activation worth roughly 0.71 points of paid conversion, the cost of a manual spot-check process was too high. So I built a weekly churn risk engine and wired it directly to the CS team's workflow.
600+ paying GPs
● High — 0 WAU 2–3wk
● Medium — declining
● Watch — early signals
MRR exposure
Delivered to CS lead
Records outcome
Signal → HubSpot
Every Monday after the Snowflake refresh, the engine classifies every non-enterprise paying GP into one of four risk tiers based on consecutive weeks of zero usage and MRR exposure. The output is a structured weekly skill delivered directly to the CS team lead — a prioritised list with user context, tenure, and risk tier. The lead then works through it: emails, calls, records the outcome signals in HubSpot.
The part that closes the loop: HubSpot data flows back into Snowflake, where it's merged with product usage data. That means we can see whether a CS outreach actually moved the needle — did the GP who got a call in week one start using the product again in weeks two and three? The signal from the CS team's work becomes data in the same system that generated the risk score.
The business case
Activation rate had fallen 17 percentage points over five months. When I ran the correlation against paid conversion, the relationship was clear — the two moved together, with each point of activation tracking closely with paid conversion. At the same time, the churn risk engine was flagging roughly a quarter of GP MRR as at risk. The cost of doing nothing had a number attached to it.
The conversation became straightforward. Instead of "we think this will reduce churn," it was a correlation chart and a percentage of ARR on the table. That's what gets it on the roadmap.
The design principle: Every cohort is split — 70% receive the intervention, 30% are held out as a control. Without that holdout you can't tell whether the outreach worked or whether those users would have recovered anyway. The experiment structure is what turns an intervention programme into something you can actually learn from.
What I'd build next
The four-phase roadmap I set for Lyrebird has one phase remaining. Phases one through three — command centre, agentic WBR, and the intervention engine — are complete or in active use. Phase four is the forecasting layer: user-level churn probability scores updated weekly, 30-day MRR forecasts from leading indicators, and automated anomaly detection that fires before a human notices the chart moving.
The pattern I'd apply to any health or SaaS company at this stage is the same: build the semantic layer first so AI has something trustworthy to consume, automate the recurring reporting workflows to free up analyst time, then turn that freed time toward building the signal detection and intervention machinery. In that order. Companies that try to skip to the intervention engine without the semantic layer build something that confidently acts on bad data.
The infrastructure I'd use wouldn't change much either — Snowflake for the warehouse and semantic layer, dbt for the transformation layer with a proper CI/CD pipeline and testing strategy, GitHub Actions for orchestration, and whatever communication tools the customer-facing team is already using for the intervention delivery layer. The stack isn't exotic. What's hard is the design: knowing which signals matter, what the intervention playbook should contain, and how to close the loop so the system gets smarter over time.