dbt – semantic layer introduction


Introduction

What is the Semantic Layer?

The semantic layer is the “translation layer” between your warehouse and your business users. Instead of every analyst or dashboard tool writing custom SQL, the semantic layer defines entities, measures, and dimensions in one central place. This ensures reporting is consistent, accurate, and reusable.

What is MetricFlow?

[MetricFlow] is the query engine that powers dbt’s semantic layer. It understands your semantic model and translates high-level metric queries into optimized SQL.

Why dbt + MetricFlow?

dbt models are the foundation: they clean and transform your raw data.
The semantic layer is the next step: it allows business users (or BI tools) to query metrics like “total revenue” or “average order value” without writing SQL.

dbt Core integrates seamlessly with MetricFlow, letting you define metrics once and query them anywhere.

---

Project Setup

I’ve published a demo project here:
👉 [dbt-semantic-models-demo]

Goal

We start with a `daily_revenue` table in our warehouse.
Our objective is to:

1. Build a dbt model `fct_revenue`
2. Define a semantic model (`fct_revenue.yml`) with measures and dimensions
3. Define metrics in `metrics.yml`
4. Run MetricFlow queries

---

Step 1: Create the dbt model

SQL model: `models/fct_revenue.sql`**

with base as (
select
order_id,
order_date,
product_id,
customer_id,
region,
revenue
from {{ source('public', 'daily_revenue') }}
)
select * from base


This creates a fact table (`fct_revenue`) that we’ll use in the semantic layer.

---

Step 2: Add a Date Spine

The semantic layer requires a **time spine** for time-based aggregations.

SQL model: `models/date_spine.sql`


{{
config(
materialized = 'table',
)
}}
with base_dates as (
{{
dbt.date_spine(
'day',
"DATE('2000-01-01')",
"DATE('2030-01-01')"
)
}}
),
final as (
select
cast(date_day as date) as date_day
from base_dates
)
select * from final


---

Step 3: Define the Semantic Model

YAML: `models/semantic_models/fct_revenue.yml`


version: 2

semantic_models:
- name: fct_revenue
model: ref('fct_revenue')
description: "Revenue fact table at order level"

entities:
- name: order
type: primary
expr: order_id
- name: product
type: foreign
expr: product_id
- name: customer
type: foreign
expr: customer_id

measures:
- name: total_revenue
agg: sum
expr: revenue
description: "Sum of revenue"
agg_time_dimension: order_date

- name: order_count
agg: count
expr: order_id
description: "Number of orders"
agg_time_dimension: order_date

dimensions:
- name: order_date
type: time
expr: order_date
type_params:
time_granularity: day

- name: region
type: categorical
expr: region

- name: order_month
type: categorical
expr: date_trunc('month', order_date)

- name: order_year
type: categorical
expr: date_trunc('year', order_date)



---

Step 4: Define Metrics

YAML: `models/semantic_models/metrics.yml`

version: 2

metrics:
- name: total_revenue_metric
label: "Total Revenue"
description: "Total revenue across all orders"
type: simple
type_params:
measure: total_revenue

- name: order_count_metric
label: "Order Count"
description: "Total number of orders"
type: simple
type_params:
measure: order_count

- name: avg_revenue_metric
label: "Average Revenue"
description: "Average revenue per order"
type: derived
type_params:
expr: "{{ total_revenue_metric }} / NULLIF({{ order_count_metric }}, 0)"
metrics:
- total_revenue_metric
- order_count_metric


---

Step 5: Run dbt + MetricFlow

Compile and run dbt to build the models:

dbt compile
dbt run


Then query your metrics with MetricFlow:


mf query --metrics total_revenue_metric --group-by order__region


Example output:


order__region total_revenue_metric
--------------- ----------------------
US-East 930150
US-West 918685




mf query --metrics avg_revenue_metric --group-by order__region


Example output:

order__region avg_revenue_metric
--------------- --------------------
US-East 253.931
US-West 251.213



---

Step 6: Inspect the Generated SQL

Use the `--explain` flag to see the SQL MetricFlow generates:


mf query --metrics avg_revenue_metric --group-by order__region --explain


Example SQL:

SELECT
order__region
, total_revenue_metric / NULLIF(order_count_metric, 0) AS avg_revenue_metric
FROM (
SELECT
order__region
, SUM(total_revenue) AS total_revenue_metric
, SUM(order_count) AS order_count_metric
FROM (
SELECT
region AS order__region
, revenue AS total_revenue
, CASE WHEN order_id IS NOT NULL THEN 1 ELSE 0 END AS order_count
FROM "home"."public"."fct_revenue" revenue_model_src_10000
) subq_2
GROUP BY
order__region
) subq_4



---

Conclusion

With dbt’s semantic layer and MetricFlow:

* dbt models remain your data transformations.
* The semantic model defines entities, measures, and dimensions.
* Metrics become business-facing KPIs you can query without writing SQL.

This approach ensures consistent reporting across all teams and tools.

Leave a comment