Dagster - Data Orchestration

Published: 3 May, 2025

data science python dagster

No AI Content

Unless explicitly stated, no content on this page is AI generated. Read more...

The Project

This project is a simple mock-up project for a data pipeline built with Dagster. I am by no means an expert in data pipelines — or in Dagster, for that matter — but it’s a good excuse to tell you about this neat and handy piece of software, and for me to gain more hands-on experience with it.

I stumbled upon Dagster while looking for ways to manage data for a personal project I wanted to set up. The project involved crawling data, inserting it into a database, transforming it, and then analyzing it further to generate reports. A perfect use-case for Dagster, as it turns out.

The project shared here is a mock-up of that idea. Instead of crawling data from a real source, it retrieves data from a mock API. Dagster manages the crawling, inserts the data into a PostgreSQL database, and then generates a HTML plot using Plotly. If you prefer to try it out yourself or just want to look at the code, you can check it out on GitHub here:

dagster-data-analysis

UB18Aux 147 KB Python Dockerfile

The dagster project on GitHub.

Repository on GitHub

The data pipeline in the Dagster Web UI.

The Scenario

The project runs using multiple Docker containers. There is one container for each service: The database, Dagster, the mock API, and Nginx, which is used to serve the final result. The Dagster WebUI, Nginx and Mock API are exposed to the user.

The Mock API serves prices and trading volumes for various predefined items. The theme is a pizza restaurant: users get prices for pizzas, ingredients, and other products. Each item has its own random processes determining the prices and volumes. The current version only displays the crawled prices at the end of the pipeline. A future version (or one you, the reader, could build, if you’re up for a challenge) might include fitting the data and generating forecasts.

The data crawled from the pizza store.

Dagster

If you are curious, here are a few core concepts used in Dagster:

Assets

One of the central concepts is the asset¹. An asset represents an observation of data — like a table or file (e.g., a pandas dataframe or an HTML file in this project). It is defined by code that generates and manages the observed data. Assets can depend on other assets. For example, a table output (the first asset) can be further transformed into a plot file (a downstream asset). Turning an asset into it a concrete observation (in other words, running the code) is called materializing the asset.

In the screenshots above, all nodes represent assets, and the lines between them represent the dependencies.

There neat thing about using multiple assets is that the pipeline can be executed partially. If the crawling step succeeded, but the plot generation failed, we don’t need to re-run the crawling step. We can fix the problem and re-materialize the plot asset.

Another benefit is speed: if assets don’t depend on each other, Dagster runs them in paralell. Each asset materialization has its own logs, error messages, and metadata — which is extremely helpful.

Resources, Jobs, Automation, Sensors

Resources are used when Dagster needs to interact with external systems like databases or APIs. They usually require configuration (e.g., usernames, passwords) that is shared between different assets.

A job² is collection of connected assets. It’s the main unit of execution and allows you to trigger the entire pipeline with one click. Jobs can also be automated.

Automation is possible via:

Schedules (e.g., cron expressions)
Dependency updates (e.g., re-run if inputs have changed)
Sensors, which monitor for events or external signals

Sensors run on a schedule, checking for certain conditions. Based on their findings, they can trigger or skip jobs. These events can come from within Dagster (e.g., other runs meeting criteria) or from external systems (e.g., new data in an API).

Footnotes

Dagster Docs on Asssets: https://docs.dagster.io/guides/build/assets/. ↑
Dagster Docs on Jobs: https://docs.dagster.io/guides/build/jobs/. ↑

Back to the Top