Dagster provides several methods to execute jobs. This guide explains different ways to do one-off execution of jobs: the Dagster UI, the command line, or Python APIs.
You can also launch jobs in other ways:
Schedules can be used to launch runs on a fixed interval.
Sensors allow you to launch runs based on external state changes.
Click on the Launchpad tab, then press the Launch Run button to execute the job:
By default, Dagster will run the job using the multiprocess_executor - that means each step in the job runs in its own process, and steps that don't depend on each other can run in parallel.
The Launchpad also offers a configuration editor to let you interactively build up the configuration. Refer to the Dagster UI documentation for more info.
This executor_def property can be set to allow for different types of isolation and parallelism, ranging from executing all the ops in the same process to executing each op in its own Kubernetes pod. See Executors for more details.
The default job executor definition defaults to multiprocess execution. It also allows you to toggle between in-process and multiprocess execution via config.
Below is an example of run config as YAML you could provide in the Dagster UI playground to launch an in-process execution.
execution:
config:
in_process:
Additional config options are available for multiprocess execution that can help with performance. This includes limiting the max concurrent subprocesses and controlling how those subprocesses are spawned.
The example below sets the run config directly on the job to explicitly set the max concurrent subprocesses to 4, and change the subprocess start method to use a forkserver.
Using a forkserver is a great way to reduce per-process overhead during multiprocess execution, but can cause issues with certain libraries. Refer to the Python documentation for more info.
In addition to the max_concurrent limit, you can use tag_concurrency_limits to specify limits on the number of ops with certain tags that can execute at once within a single run.
Limits can be specified for all ops with a certain tag key or key-value pair. If any limit would be exceeded by launching an op, then the op will stay queued. Asset jobs will look at the op_tags field on each asset in the job when checking them for tag concurrency limits.
For example, the following job will execute at most two ops at once with the database tag equal to redshift, while also ensuring that at most four ops execute at once:
Note: These limits are only applied on a per-run basis. You can apply op concurrency limits across multiple runs using the celery_executor or celery_k8s_job_executor.