Skip to main content

Migrating an Airflow BashOperator to Dagster

info

Airlift v2 is under active development. You may encounter feature gaps, and the APIs may change. To report issues or give feedback, please reach out to your CSM.

The Airflow BashOperator is used to execute bash commands as part of a data pipeline.

from airflow.operators.bash import BashOperator

execute_script = BashOperator(
task_id="execute_script",
bash_command="python /path/to/script.py",
)

The BashOperator's functionality is very general, since it can be used to run any bash command, and there exist richer integrations in Dagster for many common BashOperator use cases. In this guide, we'll explain how to migrate the BashOperator to execute a bash command in Dagster, and how to use the Dagster Airlift component to proxy the execution of the original task to Dagster. We'll also provide a reference for richer integrations in Dagster for common BashOperator use cases.

note

If you're using the BashOperator to execute dbt commands, see "Migrating an Airflow BashOperator (dbt) to Dagster".

Dagster equivalent

The direct Dagster equivalent to the BashOperator is the PipesSubprocessClient, which you can use to execute a bash command in a subprocess.

Migrating the operator

To migrate the operator, you will need to:

  1. Ensure that the resources necessary for your bash command are available to both your Airflow and Dagster deployments.
  2. Write an asset that executes the bash command using the PipesSubprocessClient.
  3. Use the Dagster Airlift component to proxy execution of the original task to Dagster.
  4. (Optional) Implement a richer integration for common BashOperator use cases.

Step 1: Ensure shared bash command access in Airflow and Dagster

First, you'll need to ensure that the bash command you're running is available for use in both your Airflow and Dagster deployments. What this entails will vary depending on the command you're running. For example, if you're running a Python script, you will need to ensure the Python script exists in a shared location accessible to both Airflow and Dagster, and all necessary environment variables are set in both environments.

Step 2: Write an @asset that executes the bash command

You can write a Dagster asset-decorated function that runs your bash command. This is straightforward with the PipesSubprocessClient:

import dagster as dg


@dg.asset
def script_result(context: dg.AssetExecutionContext):
return (
dg.PipesSubprocessClient()
.run(context=context, command="python /path/to/script.py")
.get_results()
)

Step 3: Use the Dagster Airlift component to proxy execution

Finally, you can use the Dagster Airlift component to proxy the execution of the original task to Dagster.

(Optional) Step 4: Implement richer integrations

For many of the use cases that you might be using the BashOperator for, Dagster might have better options. We'll detail some of those here.

Running a Python script

As mentioned above, you can use the PipesSubprocessClient to run a Python script in a subprocess. You can also modify this script to send additional information and logging back to Dagster. For more information, see the Dagster Pipes documentation.

Running a dbt command

If you're using the BashOperator to execute dbt commands, you can follow the steps in "Migrating an Airflow BashOperator (dbt) to Dagster" to switch from the BashOperator to the dagster-dbt integration.

Running S3 Sync or other AWS CLI commands

Dagster has a rich set of integrations for AWS services. For example, you can use the s3.S3Resource to interact with S3 directly. For more information, see the AWS integration documentation.