Migrating an Airflow BashOperator to Dagster
Airlift v2 is under active development. You may encounter feature gaps, and the APIs may change. To report issues or give feedback, please reach out to your CSM.
The Airflow BashOperator is used to execute bash commands as part of a data pipeline.
from airflow.operators.bash import BashOperator
execute_script = BashOperator(
    task_id="execute_script",
    bash_command="python /path/to/script.py",
)
The BashOperator's functionality is very general, since it can be used to run any bash command, and there exist richer integrations in Dagster for many common BashOperator use cases. In this guide, we'll explain how to migrate the BashOperator to execute a bash command in Dagster, and how to use the Dagster Airlift component to proxy the execution of the original task to Dagster. We'll also provide a reference for richer integrations in Dagster for common BashOperator use cases.
If you're using the BashOperator to execute dbt commands, see "Migrating an Airflow BashOperator (dbt) to Dagster".
Dagster equivalent
The direct Dagster equivalent to the BashOperator is the PipesSubprocessClient, which you can use to execute a bash command in a subprocess.
Migrating the operator
To migrate the operator, you will need to:
- Ensure that the resources necessary for your bash command are available to both your Airflow and Dagster deployments.
- Write an assetthat executes the bash command using thePipesSubprocessClient.
- Use the Dagster Airlift component to proxy execution of the original task to Dagster.
- (Optional) Implement a richer integration for common BashOperator use cases.
Step 1: Ensure shared bash command access in Airflow and Dagster
First, you'll need to ensure that the bash command you're running is available for use in both your Airflow and Dagster deployments. What this entails will vary depending on the command you're running. For example, if you're running a Python script, you will need to ensure the Python script exists in a shared location accessible to both Airflow and Dagster, and all necessary environment variables are set in both environments.
Step 2: Write an @asset that executes the bash command
You can write a Dagster asset-decorated function that runs your bash command. This is straightforward with the PipesSubprocessClient:
import dagster as dg
@dg.asset
def script_result(context: dg.AssetExecutionContext):
    return (
        dg.PipesSubprocessClient()
        .run(context=context, command="python /path/to/script.py")
        .get_results()
    )
Step 3: Use the Dagster Airlift component to proxy execution
Finally, you can use the Dagster Airlift component to proxy the execution of the original task to Dagster.
(Optional) Step 4: Implement richer integrations
For many of the use cases that you might be using the BashOperator for, Dagster might have better options. We'll detail some of those here.
Running a Python script
As mentioned above, you can use the PipesSubprocessClient to run a Python script in a subprocess. You can also modify this script to send additional information and logging back to Dagster. For more information, see the Dagster Pipes documentation.
Running a dbt command
If you're using the BashOperator to execute dbt commands, you can follow the steps in "Migrating an Airflow BashOperator (dbt) to Dagster" to switch from the BashOperator to the dagster-dbt integration.
Running S3 Sync or other AWS CLI commands
Dagster has a rich set of integrations for AWS services. For example, you can use the s3.S3Resource to interact with S3 directly. For more information, see the AWS integration documentation.