Kubernetes is a container orchestration system for automating deployment, scaling, and management of containerized applications. Dagster uses Kubernetes in combination with Helm, a package manager for Kubernetes applications. Using Helm, users specify the configuration of required Kubernetes resources to deploy Dagster through a values file or command-line overrides. References to values.yaml in the following sections refer to Dagster's values.yaml.
Dagster publishes a fully-featured Helm chart to manage installing and running a production-grade Kubernetes deployment of Dagster. For each Dagster component in the chart, Dagster publishes a corresponding Docker image on DockerHub.
The Dagster Helm chart is versioned with the same version numbers as the Dagster Python library. Ideally, the Helm chart and Dagster Python library should only be used together when their version numbers match.
In the following tutorial, we install the most recent version of the Dagster Helm chart. To use an older version of the chart, pass a --version flag to helm upgrade.
A code location server runs a gRPC server and responds to the Dagster webserver's requests for information, such as:
List all of the jobs in each code location, or
What is the dependency structure of job X?
The user-provided image for the server must contain a code location definition and all of the packages needed to execute within the code location.
Users can have multiple code location servers. A common pattern is for each code location server to correspond to a different code location.
Code location servers can be updated independently from other Dagster components, including the webserver. As a result, updates to code locations can occur without causing downtime to any other code location or the webserver. After updating, if there is an error with any code location, an error is surfaced for that code location within the Dagster UI. All other code locations and the UI will still operate normally.
The Dagster webserver communicates with user code deployments via gRPC to fetch information needed to populate the Dagster UI. The webserver doesn't load or execute user-written code, which allows the UI to remain available even when user code contains errors. The webserver frequently checks whether the user code deployment has been updated and if so, fetches the new information.
The webserver can be horizontally scaled by setting the dagsterWebserver.replicaCount field in values.yaml.
By default, the webserver launches runs via the K8sRunLauncher, which creates a new Kubernetes job per run.
The daemon periodically checks the runs table in PostgreSQL for runs that are ready to be launched. The daemon also submits runs from schedules and sensors.
The daemon launches runs via the K8sRunLauncher, creating a run worker job with the image specified in the user code deployment.
The run worker is responsible for executing launched Dagster runs. The run worker uses the same image as the user code deployment at the time the run was submitted. The run worker uses ephemeral compute and completes once the run is finished. Events that occur during the run are written to the database and then displayed in the UI.
Run worker jobs and pods are not automatically deleted so that users are able to inspect results. It's up to the user to periodically delete old jobs and pods.
Each Dagster job specifies an executor that determines how the run worker will execute each step of the job. Different executors offer different levels of isolation and concurrency. Common choices are:
in_process_executor - All steps run serially in a single process in a single pod
Generally, increasing isolation incurs some additional overhead per step (e.g. starting up a new Kubernetes job vs starting a new process within a pod). The executor can be configured per-run in the execution block.
An external database (i.e. using a cloud provider's managed database service, like RDS) can be connected, or you can run PostgreSQL on Kubernetes. This database stores run event logs, other metadata, and powers much of the real-time and historical data visible in the UI. To maintain a referenceable history of events, we recommend connecting an external database for most use cases.
First, configure the kubectl CLI to point at a kubernetes cluster. You can use docker-desktop to set up a local k8s cluster to develop against or substitute with another k8s cluster as desired.
If you're using docker-desktop and you have a local cluster set up, configure the kubectl CLI to point to the local k8s cluster:
In this step, you'll build a Docker image containing your Dagster definitions and any dependencies needed to execute the business logic in your code. For reference, here is an example Dockerfile and the corresponding user code directory.
Here, we install all the Dagster-related dependencies in the Dockerfile and then copy the directory with the implementation of the Dagster repository into the root folder. We'll need to remember the path of this repository in a later step to setup the gRPC server as a deployment.
The example user code repository includes:
An example_job job that runs all ops in a single pod
Several of the jobs in dagster/user-code-example use an S3 I/O Manager. To run these jobs, you'll need an available AWS S3 bucket and access to a pair of AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY values. This is because the I/O Manager uses boto.
This tutorial also has the option of using minio to mock an S3 endpoint locally in k8s. Note: This option uses host.docker.internal to access a host from within Docker. This behavior has only been tested for MacOS and may need a different configuration for other platforms.
To use AWS S3, create a bucket in your AWS account. For this tutorial, we'll create a bucket called test-bucket.
Retrieve your AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY credentials.
brew install minio/stable/minio # server
brew install minio/stable/mc # clientmkdir$HOME/miniodata # Prepare a directory for data
minio server $HOME/miniodata # start a server with default user/pass and no TLSmc --insecure aliasset minio http://localhost:9000 minioadmin minioadmin
# See it workmcls minio
date> date1.txt # create a sample filemccp date1.txt minio://testbucket/date1.txt
exportAWS_ACCESS_KEY_ID="minioadmin"exportAWS_SECRET_ACCESS_KEY="minioadmin"# See the aws cli work
aws --endpoint-url http://localhost:9000 s3 mb s3://test-bucket
aws --endpoint-url http://localhost:9000 s3 cp date1.txt s3://test-bucket/
The Dagster chart repository contains the versioned charts for all Dagster releases. Add the remote URL under the namespace dagster to install the Dagster charts.
In this step, you'll update the dagster-user-deployments.deployments section of the Dagster chart's values.yaml to include your deployment.
Here, we can specify the configuration of the Kubernetes deployment that creates the gRPC server for the webserver and daemon to access the user code. The gRPC server is created through the arguments passed to dagsterApiGrpcArgs, which expects a list of arguments for dagster api grpc.
To get access to the Dagster values.yaml, run:
helm show values dagster/dagster > values.yaml
The following snippet works for Dagster's example user code image. Since our Dockerfile contains the code location definition in a path, we specify arguments for the gRPC server to find this path under dagsterApiGrpcArgs.
Note: If you haven't set up an S3 endpoint, you can only run the job called example_job.
dagsterApiGrpcArgs also supports loading code location definitions from a module name. Refer to the code location documentation for a list of arguments.
You can also specify configuration like configmaps, secrets, volumes, resource limits, and labels for individual user code deployments:
By default, this configuration will also be included in the pods for any runs that are launched for the code location server. You can disable this behavior for a code location server by setting includeConfigInLaunchedRuns.enabled to false for that server.
You'll need a slightly different configuration to run the pod_per_op_job. This is because pod_per_op_job uses an s3_pickle_io_manager, so you'll need to provide the user code k8s pods with AWS S3 credentials.
Refer to Step 4 for setup instructions. The below snippet works for both AWS S3 and a local S3 endpoint via minio:
Next, you'll install the Helm chart and create a release. Below, we've named our release dagster. We use helm upgrade --install to create the release if it doesn't exist; otherwise, the existing dagster release will be modified:
Helm will launch several pods including PostgreSQL. You can check the status of the installation with kubectl. Note that it might take a few minutes for the pods to move to a Running state.
If everything worked correctly, you should see output like the following:
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
dagster-webserver-645b7d59f8-6lwxh 1/1 Running 0 11m
dagster-k8s-example-user-code-1-88764b4f4-ds7tn 1/1 Running 0 9m24s
dagster-postgresql-0 1/1 Running 0 17m
You can introspect the jobs that were launched with kubectl:
$ kubectl get jobs
NAME COMPLETIONS DURATION AGE
dagster-run-5ee8a0b3-7ca5-44e6-97a6-8f4bd86ee630 1/1 4s 11s
Now, you can try a run with step isolation. Switch to the pod_per_op_job job, changing the default config to point to your S3 bucket if needed, and launch the run.
If you're using Minio, change your config to look like this:
resources:io_manager:config:s3_bucket:"test-bucket"s3:config:# This use of host.docker.internal is unique to Macendpoint_url: http://host.docker.internal:9000region_name: us-east-1ops:multiply_the_word:config:factor:0inputs:word:""
Again, you can view the launched jobs:
kubectl get jobs
NAME COMPLETIONS DURATION AGE
dagster-run-5ee8a0b3-7ca5-44e6-97a6-8f4bd86ee630 1/1 4s 11s
dagster-run-733baf75-fab2-4366-9542-0172fa4ebc1f 1/1 4s 100s
That's it! You deployed Dagster, configured with the default K8sRunLauncher, onto a Kubernetes cluster using Helm.