Tools

Analytics Engineering for AI Projects: Trust as Infrastructure

AI has changed what "more, faster" means. An agent can write a quarter's worth of SQL in an afternoon. The bottleneck isn't production anymore. It's verification.

David Effiong

Programs Lead, YDP

9 June 202610 min read

Analytics Engineering for AI Projects: Trust as Infrastructure

The Reframe

Trust used to be a feeling. Now it has to be a system property.

One of the most quietly important findings in the dbt 2026 State of Analytics Engineering report was that trust has moved from a challenge to a priority. That's a small word change with a big consequence.

The same report puts the new job description plainly: analytics engineers are now responsible for keeping AI-generated insights from amplifying the trust gap that already existed in the data landscape.

That responsibility is heavier than it sounds. Although AI brought speed, it is an open question as to whether trust scales with it.

Every analytics team has experienced this. An agent gets fast and can write SQL in seconds, returns answers in a few more, and a stakeholder watches a demo and becomes excited. An important question that needs a confident answer is: "How do I know this is right?”

That gap, between what AI can produce and what stakeholders can verify, is where AI analytics either matures into infrastructure or loses credibility. We've seen this story before.

The Self-Serve Cautionary Tale

A decade ago, data experts made a promise: self-serve BI was going to democratize analytics. Anyone could build a dashboard, answer their own questions, and skip the data team’s queue. The tools got better, the licenses sold, the dashboards multiplied.

However, the promise didn't land. According to Strategy Software's AI+BI Analytics 2025 Global Report , fewer than 10% of employees in most organizations currently use tools beyond spreadsheets" and "only 8% of employees in most firms currently use advanced analytics tools…

There were seven years of better tooling and adoption didn’t necessarily pick up.

The dashboards weren't bad. BI tools are genuinely good products. What broke was trust. Two teams would build dashboards on overlapping data, get different revenue numbers, present them in the same executive meeting, and the room would split on which one to believe. Once that happens twice, people stop opening the dashboards and start emailing the data team. The capability scaled. The trust didn't. Without trust, capability is just a faster way to be wrong.

AI analytics is on the same trajectory, only faster. The thing that killed self-serve wasn't tooling. It was access scaling ahead of trust. If we let that happen again with agents, it won't take seven years to get that outcome. It might take as little as a quarter.

What Does "Trust as Infrastructure" Actually Mean?

Trust used to be earned in small interactions. A stakeholder would ask the analyst a question. The analyst would provide the answer and also walk over with the answer with context. Here's the number, but note that we changed the definition of qualified lead in Q3. The trust came from the number and conversation about the number.

In an AI analytics workflow, there is no analyst in the middle. The system produces the number directly. If trust isn't engineered into the system, there's nowhere for it to come from.

That's what we mean by trust as infrastructure:

Trust is produced by the system, not requested from stakeholders.
Every layer of the data and AI stack emits evidence of its own reliability; including tests, lineage, evaluations, audit trails, monitoring, version history.
Trust artifacts are versioned, maintained, and owned with the same engineering rigor as the data models themselves.

The shift in the question is the easiest way to feel the difference. The old question was "is this number right?" and we answered it one number at a time. The new question is "does the system prove its answers are right, continuously, at scale?"

That's not a vibes question. And it gets answered the same way every infrastructure question is answered: by building.

The Trust Stack

Trust isn't one thing you build. It's a layered stack with each layer answering a different question. These are the 5 we’ve found at Data Culture:

1. Data Quality: Can I trust the inputs?

The most mature layer, and the winner of the last analytics engineering decade. Tests, freshness checks, contracts, lineage. Tools like dbt made it normal to ship a data model with assertions about what's true. This is settled work, but in an AI context, the consequences of a silent quality failure are worse. A broken dashboard gets noticed. A broken column quietly feeds an agent that confidently produces a wrong answer without visibility and without traceability. The discipline doesn't change. The stakes do.

2. Context: can I trust that the agent understands what the data means?

Clean data misread by the agent is just as bad as dirty data. A well-named column with no description is a reasoning gap. A semantic layer that captures what a metric is, but not what it shouldn't be confused with, is a confidently wrong answer waiting to happen. I previously covered this layer in depth in my last post Context as Infrastructure. The short version is that the documentation and semantic layer you wrote for human onboarding isn't enough for an AI consumer. Context has to be engineered.

3. Evaluations & Monitoring: can I trust the outputs, and can I prove it?

If an agent's accuracy is in your head, it isn't real. If it isn't measured against verified answers on every iteration, you can't tell improvement from regression, and you can't show a stakeholder the trend that earns their trust. My colleague Opeyemi, has written two beautiful pieces on this; The Boring Work That Makes AI Analytics Actually Work and If You Can't Measure It, Don't Ship It covering what to measure, how to build a golden question set, and why the evaluation methodology itself has to be calibrated before you trust its results. Evals are how trust becomes evidence.

4. Governance: can I trust what the agent is allowed to do and can anyone generate dashboards or reports from the agent?

This is the under-explored layer, and the one in which analytics engineers are least practiced. Governance for an AI agent is not just a checklist. It's the answer to a set of operational questions:

Which datasets are this agent allowed to query, and which is it explicitly forbidden from touching?
What do agents do when asked a question outside the scope? Decline with explanation is a governed answer. Making something up is the failure mode that destroys a deployment.
Can anyone just make a Claude dashboard and share with the team?
Who owns this agent? Who reviews its prompt, its tools and its scope when something changes upstream?
What's the deployment gate? In our practice, an agent doesn't move to a new domain until it has a passing eval run, a documented stakeholder owner, and a defined decline-behavior for out-of-scope questions.
When the agent is wrong, and it will be, what's the rollback path, and how do we monitor learn from it?

Gartner now expects this layer to become a board-level concern. They predict that by 2028, 50% of organizations will implement a zero-trust posture for data governance due to the proliferation of unverified AI-generated data. Zero-trust posture is significant because it means treating every AI output as unverified until the system proves otherwise. That's not paranoia. That's infrastructure.

5. Stakeholder Literacy: can the consumer tell a confident-wrong answer from a right one?

This is the layer that lives outside the codebase. You can build the cleanest data, richest context, tightest evals, and most governed agent and still lose trust if the stakeholder doesn't know what they're looking at.

This is partly an interface problem. If the agent returns a number with no source, no definition, and no uncertainty signal, the stakeholder has nothing to verify against. Citations and references aren't a nice-to-have on an AI analytics output. They're the trust receipt.

It's also partly an education problem. Stakeholders need to know what the agent can and can't do, what a decline means, what a hedged answer means, and what their role in the loop is. The teams getting this right are the ones treating stakeholder enablement as part of the product, not as something that happens after launch in a Loom video nobody watches.

The stack is the point. Five layers, one question. Each layer is necessary. None is sufficient on its own. A team with perfect evals and no governance ships a fast agent into a domain it shouldn't be in. A team with great governance and no context ships an agent that's allowed to be wrong politely. Trust as infrastructure means treating the five together as one system, with the same coordination discipline you'd give a production data pipeline.

What This Looks Like on Monday

Treating trust as infrastructure isn't a six-month project. It's a set of practices a team can start adopting now. One concrete move per layer:

Data Quality: Pick the ten most-queried models the agent will touch. Make sure each has tests for the assumptions it relies on, and that the agent is reading the tested model, not an upstream raw table.
Context: Audit those same ten models for description coverage. Any column the agent will see, any metric it might pick, does it have a description that goes beyond restating the column name? If not, that's a gap in reasoning. Also, build a trust context layer with business logic and context that is available to the agent.
Evaluations: Build a golden question set. Start small with 10 to 15 questions with verified answers. Run it on every agent change. If you can't measure improvement, you can't claim it.
Governance: Write down, in a single document, what the agent is allowed to do, what it should decline, who owns it, and what gates a release.
Stakeholder Literacy: Update your stakeholders to know what your agent can do and also what it cannot do. And where necessary, agent responses should cite sources and relevant references.

None of these are revolutionary. They're the boring work. That's the point. Trust scales when the boring work scales.

The Closing Flag

Analytics engineering used to be measured by output. How many models, how fast, how reliably. The discipline grew around that. dbt, contracts, tests, lineage, all of it pointed at the same goal of producing more, faster, with fewer breaks.

AI has changed what "more, faster" means. An agent can write a quarter's worth of SQL in an afternoon. The bottleneck isn't production anymore. It's verification. The new measure of the discipline is whether trust scales alongside output. A team that doubles its analytical capacity and halves its credibility hasn't gained anything.

The teams that figure this out will treat trust the way they currently treat data quality, as something engineered into the system, owned, versioned, tested, and continuously improved. They'll have AI analytics that survives the first time an executive catches a wrong number, because the system will have already caught it.

The teams that don't will rerun the self-serve story at a higher speed. Fast tools, growing access, eroding trust, and an inevitable retreat to the data team for "real" numbers.

The question is no longer "is the answer right?"

It's "does the system prove it?"

‍

If your team is exploring AI analytics and wants to understand whether your data foundation and evaluation approach are ready, we offer an AI Readiness Assessment that evaluates your current setup across data quality, semantic context, and organizational readiness.

dbtAnalytics EngineeringTools

Found this useful?

Share it with your network.