This guide is applicable only to Dagster Open Source (OSS).
Workspace files contain a collection of user-defined code locations and information about where to find them. Code locations loaded via workspace files can contain either a Definitions object or multiple repositories.
Workspace files are used by Dagster to load code locations in complex or customized environments, such as a production OSS deployment. For local development within a single Python environment, Definitions users can use the -m or -f flags with our CLI tools, or set the pyproject.toml file to avoid using command line flags entirely.
The decorator used to define repositories. The decorator returns a RepositoryDefinition. Note: This has been replaced by Definitions, which is the recommended way to define code locations.
Each entry in a workspace file is considered a code location. A code location can contain either a single Definitions object, a repository, or multiple repositories.
To accommodate incrementally migrating from @repository to Definitions, code locations in a single workspace file can mix and match between definition approaches. For example, code-location-1 could load a single Definitions object from a file or module, and code-location-2 could load multiple repositories.
Each code location is loaded in its own process that Dagster tools use an RPC protocol to communicate with. This process separation allows multiple code locations in different environments to be loaded independently, and ensures that errors in user code can't impact Dagster system code.
To load a code location from a Python file, use the python_file key in workspace.yaml. The value of python_file should specify a path relative to workspace.yaml leading to a file that contains a code location definition.
For example, if a code location is in my_file.py and the file is in the same folder as workspace.yaml, the code location could be loaded using the following:
If using @repository to define code locations, you can identify a single repository within the module using the attribute key. The value of this key must be the name of a repository or the name of a function that returns a RepositoryDefinition. For example:
By default, Dagster command-line tools (like dagster dev, dagster-webserver, or dagster-daemon run) look for workspace files (by default, workspace.yaml) in the current directory when invoked. This allows you to launch from that directory without the need for command line arguments:
dagster dev
To load the workspace.yaml file from a different folder, use the -w argument:
dagster dev -w path/to/workspace.yaml
When dagster dev is run, Dagster will load all the code locations defined by the workspace file. Refer to the CLI reference for more info and examples.
If a code location can't be loaded - for example, due to a syntax or some other unrecoverable error - a warning message will display in the Dagster UI. You'll be directed to a status page with a descriptive error and stack trace for any locations Dagster was unable to load.
Note: If a code location is re-named or its configuration in a workspace file is modified, you'll need to stop and re-start any running schedules or sensors in that code location. You can do this in the UI by navigating to the Deployment overview page and using the Schedules and Sensors tabs.
This is required because when you start a schedule or a sensor, a serialized representation of the entry in your workspace file is stored in a database. The Dagster daemon process uses this serialized representation to identify and load your schedule or sensor. If the code location is modified and its schedules and sensors aren't restarted, the Dagster daemon process will use an outdated serialized representation, resulting in issues.
By default, Dagster tools automatically create a process on your local machine for each of your code locations. However, it's also possible to run your own gRPC server that's responsible for serving information about your code locations. This can be useful in more complex system architectures that deploy user code separately from the Dagster webserver.
To initialize the Dagster gRPC server, run the dagster api grpc command and include:
A target file or module. Similar to a workspace file, the target can either be a Python file or module.
Host address
Port or socket
The following tabs demonstrate some common ways to initialize a gRPC server:
Running on a port, using a Python file:
dagster api grpc --python-file /path/to/file.py --host 0.0.0.0 --port 4266
Running on a socket, using a Python file:
dagster api grpc --python-file /path/to/file.py --host 0.0.0.0 --socket /path/to/socket
Using a Python module:
dagster api grpc --module-name my_module_name --host 0.0.0.0 --port 4266
This is applicable only for code locations defined using @repository.
Specifying an attribute within the target to load a specific repository. When run, the server will automatically find and load the specified repositories:
When running your own gRPC server within a container, you can tell the webserver that any runs launched from a code location should be launched in a container with that same image.
To do this, set the DAGSTER_CURRENT_IMAGE environment variable to the name of the image before starting the server. After setting this environment variable for your server, the image should be listed alongside the code location on the Status page in the UI.
By default, code is loaded with dagster-webserver's working directory as the base path to resolve any local imports in your code. Using the working_directory key, you can specify a custom working directory for relative imports. For example:
By default, the webserver and other Dagster tools assume that code locations should be loaded using the same Python environment used to load Dagster. However, it's often useful for code locations to use independent environments. For example, a data engineering team running Spark can have dramatically different dependencies than an ML team running Tensorflow.
To enable this use case, Dagster supports customizing the Python environment for each code location by adding the executable_path key to the YAML for a location. These environments can involve distinct sets of installed dependencies, or even completely different Python versions. For example:
The example above also illustrates the location_name key. Each code location in a workspace file has a unique name that is displayed in the UI, and is also used to disambiguate definitions with the same name across multiple code locations. Dagster will supply a default name for each location based on its workspace entry if a custom one is not supplied.