Skip to main content

Represent Airflow DAGs in Dagster

info

Airlift v2 is under active development. You may encounter feature gaps, and the APIs may change. To report issues or give feedback, please reach out to your CSM.

TK - conceptual info here

once you have represented your airflow instance in dagster using the airflow instance component, you may want to represent the graph of asset dependencies produced by that dag as well. this is easy to do in your component configuration.

Manually mapping assets to Airflow tasks

You can manually define which assets are produced by a given airflow dag by editing your component's yaml configuration:

type: dagster_airlift.core.components.AirflowInstanceComponent

attributes:
name: my_airflow
auth:
type: basic_auth
webserver_url: '{{ env("AIRFLOW_WEBSERVER_URL") }}'
username: '{{ env("AIRFLOW_USERNAME") }}'
password: '{{ env("AIRFLOW_PASSWORD") }}'
mappings:
- dag_id: upload_source_data
assets:
- spec:
key: order_data
- spec:
key: activity_data
- spec:
key: aggregated_user_data
deps: [order_data, activity_data]

If you have a more specific mapping from a task within the dag to a set of assets, you can also set these mappings at the task level:

type: dagster_airlift.core.components.AirflowInstanceComponent

attributes:
name: my_airflow
auth:
type: basic_auth
webserver_url: '{{ env("AIRFLOW_WEBSERVER_URL") }}'
username: '{{ env("AIRFLOW_USERNAME") }}'
password: '{{ env("AIRFLOW_PASSWORD") }}'
mappings:
- dag_id: upload_source_data
task_mappings:
- task_id: upload_orders
assets:
- spec:
key: order_data
- task_id: upload_activity
assets:
- spec:
key: activity_data
- task_id: aggregate_user_data
assets:
- spec:
key: aggregated_user_data
deps: [order_data, activity_data]

Prerequisites

Before following this guide, you will need to follow the setup guide and peer your Airflow instance to Dagster.