Represent Airflow DAGs in Dagster
Airlift v2 is under active development. You may encounter feature gaps, and the APIs may change. To report issues or give feedback, please reach out to your CSM.
TK - conceptual info here
once you have represented your airflow instance in dagster using the airflow instance component, you may want to represent the graph of asset dependencies produced by that dag as well. this is easy to do in your component configuration.
Manually mapping assets to Airflow tasks
You can manually define which assets are produced by a given airflow dag by editing your component's yaml configuration:
type: dagster_airlift.core.components.AirflowInstanceComponent
attributes:
name: my_airflow
auth:
type: basic_auth
webserver_url: '{{ env("AIRFLOW_WEBSERVER_URL") }}'
username: '{{ env("AIRFLOW_USERNAME") }}'
password: '{{ env("AIRFLOW_PASSWORD") }}'
mappings:
- dag_id: upload_source_data
assets:
- spec:
key: order_data
- spec:
key: activity_data
- spec:
key: aggregated_user_data
deps: [order_data, activity_data]
If you have a more specific mapping from a task within the dag to a set of assets, you can also set these mappings at the task level:
type: dagster_airlift.core.components.AirflowInstanceComponent
attributes:
name: my_airflow
auth:
type: basic_auth
webserver_url: '{{ env("AIRFLOW_WEBSERVER_URL") }}'
username: '{{ env("AIRFLOW_USERNAME") }}'
password: '{{ env("AIRFLOW_PASSWORD") }}'
mappings:
- dag_id: upload_source_data
task_mappings:
- task_id: upload_orders
assets:
- spec:
key: order_data
- task_id: upload_activity
assets:
- spec:
key: activity_data
- task_id: aggregate_user_data
assets:
- spec:
key: aggregated_user_data
deps: [order_data, activity_data]
Prerequisites
Before following this guide, you will need to follow the setup guide and peer your Airflow instance to Dagster.