← Perspectives/Article
Data EngineeringArchitectureMedallion

Medallion Architecture: What It Is, Why It Works, and When Not to Use It

28 September 20248 min read

What Is Medallion Architecture?

Medallion architecture is a data organisation pattern that structures data across three layers — Bronze, Silver, and Gold — each representing a progressively refined state of the same underlying data.

Bronze is raw ingestion. Data arrives here exactly as it came from source systems — no transformation, no cleaning, no schema enforcement. The Bronze layer is your audit trail and your recovery point. If anything goes wrong downstream, Bronze is how you replay.

Silver is standardised and validated. Here, Bronze data is cleaned, typed, deduplicated, and conformed to a consistent schema. Business rules are applied — but not business logic. Silver is where data quality problems are resolved, not where business definitions are imposed.

Gold is business-ready. This is where KPIs live, where aggregations happen, where the data that powers dashboards and reports is shaped for consumption. Gold tables are designed around how they will be used — not around the shape of their source data.

Why It Works

The architecture solves a specific and persistent problem in data engineering: the tension between auditability and usability.

Before medallion patterns became common, teams typically had one of two failure modes:

Everything is raw. Analysts get access to source data and build their own transformation logic. Every analyst's view of "revenue" is slightly different. Reports conflict. Trust erodes.

Everything is transformed. A single gold-layer model tries to serve every use case. As requirements evolve, the model becomes increasingly complex and fragile. Small changes cascade into production incidents.

Medallion architecture separates these concerns cleanly. Raw data is preserved intact (Bronze), shared transformations are applied once (Silver), and business-specific models are built from a clean foundation (Gold).

The result: a platform that's both auditable and scalable.

A Real-World Example

In our KPI Engine engagement, we inherited 300+ dashboards built directly against source data — each with its own transformation logic, each subtly different.

The Bronze layer consolidated raw feeds from 20+ operational systems — telephony platforms, CRM systems, WFM tools — via Airflow DAGs into an S3 landing zone.

The Silver layer applied standardisation: consistent timestamp handling, unified agent ID logic, deduplication of call records, and data quality checks that flagged anomalies before they reached reporting.

The Gold layer contained KPI-specific tables — SLA performance, CSAT, AHT, attrition — each with a single canonical definition, client-level partitioning, and DirectQuery optimisation for Power BI.

The result was 30,000+ man-hours of annual reporting effort eliminated — not because we built something cleverer, but because we built something consistent.

When Not to Use It

Medallion architecture is not always the right answer. Here's when it creates more problems than it solves:

When you have a single, simple use case. If you're building a reporting pipeline for one data source and one consumer, three layers adds overhead with no benefit. Start with a direct model and refactor later if complexity grows.

When your transformation logic is highly use-case-specific. Medallion works because Silver is shared across consumers. If every consumer needs fundamentally different transformations of the same source data, you may end up building per-consumer Silver layers — at which point you've just added complexity without adding value.

When your team doesn't have the data engineering maturity to maintain it. A medallion architecture requires clear ownership, a data quality enforcement mindset, and the discipline to not bypass layers. In organisations without that culture, Bronze becomes a dumping ground and Silver becomes a staging area with no SLAs.

When latency requirements are very tight. Three layers means three hops. For near-real-time analytical use cases (sub-minute refresh), the overhead of full medallion processing may be prohibitive. Streaming architectures with delta lake patterns may be more appropriate.

Practical Implementation Guidance

A few things we've learned from implementing this pattern across multiple engagements:

Define Silver quality contracts early. The hardest part of medallion architecture is deciding what "clean" means. Before building your Silver layer, write down the specific quality rules for each attribute. Who owns each rule? What's the SLA for data quality remediation? These decisions are more important than the tooling.

Keep Gold tables purpose-built. The temptation is to build a "universal" Gold table that serves every use case. Resist it. Gold tables should be designed around specific consumption patterns — a Power BI DirectQuery Gold table looks different from a Gold table built for Python ML pipelines.

Log quality metrics at Silver. Build observability in from day one. How many records failed quality checks at Silver today? What's the trend? This data is more valuable than you think — both for operational stability and for building trust with downstream consumers.


DataGravity has implemented medallion architectures on AWS Redshift, Azure Data Lake, and Databricks. If you're designing a data platform and want a practitioner's perspective, get in touch.

[MORE PERSPECTIVES]

Read more practitioner writing on data engineering and analytics.

← All Articles

Facing this challenge
in your organisation?

Let's talk about your specific situation — not a generic deck.

Start a Conversation →