Resources are objects that are shared across the implementations of multiple software-defined assets, ops, schedules, and sensors. These resources can be plugged in after the definitions of your assets or ops, and can be easily swapped out.
Resources typically model external components that assets and ops interact with. For example, a resource might be a connection to a data warehouse like Snowflake or a service like Slack.
So, why use resources?
Plug in different implementations in different environments - If you have a heavy external dependency that you want to use in production, but avoid using in testing, you can accomplish this by providing different resources in each environment. Check out Separating Business Logic from Environments for more info about this capability.
Surface configuration in the UI - Resources and their configuration are surfaced in the Dagster UI, making it easy to see where your resources are used and how they are configured.
Share configuration across multiple ops or assets - Resources are configurable and shared, so you can supply configuration in one place instead of configuring the ops and assets individually.
Share implementations across multiple ops or assets - When multiple ops access the same external services, resources provide a standard way to structure your code to share the implementations.
Class for resource definitions. You almost never want to use initialize this class directly. Instead, you should extend the ConfigurableResource class which implements ResourceDefinition.
Typically, resources are defined by subclassing ConfigurableResource. Attributes on the class are used to define the resource's configuration schema. The configuration system has a few advantages over plain Python parameter passing; configured values are displayed in the Dagster UI and can be set dynamically using environment variables. Binding resource config values can also be delayed so that they can be specified at run launch time.
Assets and ops specify resource dependencies by annotating the resource as a parameter to the asset or op function.
To provide resource values to your assets and ops, attach them to your Definitions call. These resources are automatically passed to the function at runtime.
Here, we define a subclass of ConfigurableResource representing a connection to an external service. We can configure the resource by constructing it in the Definitions call.
We can define methods on the resource class which depend on config values. These methods can be used by assets and ops.
Here, we define a subclass of ConfigurableResource representing a connection to an external service. We can configure the resource by constructing it in the Definitions call.
We can define methods on the resource class which depend on config values. These methods can be used by assets and ops.
There are many supported config types that can be used when defining resources. See the advanced config types documentation for a more comprehensive overview on the available config types.
Sensors can use resources in the same way as ops and assets, which can be useful for querying external services for data.
To specify resource dependencies on a sensor, annotate the resource type as a parameter to the sensor's function. For more information and examples, refer to the Sensors documentation.
Schedules can also use resources in case your schedule logic needs to interface with an external tool, or to make your schedule logic more testable.
To specify resource dependencies on a schedule, annotate the resource type as a parameter to the schedule's function. For more information and examples, refer to the Schedules documentation.
Resources can be configured using environment variables, which is useful for secrets or other environment-specific configuration. If you're using Dagster Cloud, environment variables can be configured directly in the UI.
To use environment variables, pass an EnvVar when constructing your resource. EnvVar inherits from str and can be used to populate any string config field on a resource. The value of the environment variable will be evaluated at launch time.
In some cases, you may want to specify configuration for a resource at launch time, in the launchpad or in a RunRequest for a schedule or sensor. For example, you may want a sensor-triggered run to specify a different target table in a database resource for each run.
You can use the configure_at_launch() method to defer the construction of a configurable resource until launch time.
In some situations, you may want to define a resource which depends on other resources. This is useful for common configuration. For example, separate resources for a database and for a filestore may both depend on credentials for a particular cloud provider. Defining these credentials as a separate, nested resource allows you to specify configuration in a single place. It also makes it easier to test your resources, since you can mock the nested resource.
In this case, you can list that nested resource as an attribute of your resource class.
If we instead would like the configuration for our credentials to be provided at launch time, we can use the configure_at_launch() method to defer the construction of the CredentialsResource until launch time.
Because credentials requires launch time configuration through the launchpad, it must also be passed to the Definitions object, so that configuration can be provided at launch time. Nested resources only need to be passed to the Definitions object if they require launch time configuration.
Once a resource reaches a certain complexity, it may be desirable to manage the state of the resource over its lifetime. This is useful for resources which require special initilization or cleanup. ConfigurableResource is a dataclass meant to encapsulate config, but can also be used to set up basic state.
You can mark any private state attributes using Pydantic's PrivateAttr. These attributes, which must start with an underscore, will not be included in the resource's config.
The setup_for_execution and teardown_after_execution methods can be overridden to initialize or teardown a resource before each run execution, and are free to modify any private state attributes.
In this instance, we can setup an API token for a client resource based on the username and password provided in the config. We then use that API token to query an API in our asset body.
from dagster import ConfigurableResource, asset
import requests
from pydantic import PrivateAttr
classMyClientResource(ConfigurableResource):
username:str
password:str
_api_token:str= PrivateAttr()defsetup_for_execution(self, context)->None:# Fetch and set up an API token based on the username and password
self._api_token = requests.get("https://my-api.com/token", auth=(self.username, self.password)).text
defget_all_users(self):return requests.get("https://my-api.com/users",
headers={"Authorization": self._api_token},)@assetdefmy_asset(client: MyClientResource):return client.get_all_users()
setup_for_execution and teardown_after_execution are each called once per run, per process. When using the in-process executor, this means that they will be called once per run. When using the multiprocess executor, each process' instance of the resource will be initialized and torn down.
For more complex use-cases, you may instead override the yield_for_execution. By default, this context manager will call setup_for_execution, yield the resource, and then call teardown_after_execution, but you can override it to provide any custom behavior. This is useful for resources which require a context to be open for the duration of a run, such as database connections or file handles.
from dagster import ConfigurableResource, asset
from contextlib import contextmanager
from pydantic import PrivateAttr
classDBConnection:...defquery(self, body:str):...@contextmanagerdefget_database_connection(username:str, password:str):...classMyClientResource(ConfigurableResource):
username:str
password:str
_db_connection: DBConnection = PrivateAttr()@contextmanagerdefyield_for_execution(self, context):# keep connection open for the duration of the executionwith get_database_connection(self.username, self.password)as conn:# set up the connection attribute so it can be used in the execution
self._db_connection = conn
# yield, allowing execution to occuryield self
defquery(self, body:str):return self._db_connection.query(body)@assetdefmy_asset(client: MyClientResource):
client.query("SELECT * FROM my_table")
Pythonic I/O managers are defined as subclasses of ConfigurableIOManager, and similarly to Pythonic resources specify any configuration fields as attributes. Each subclass must implement a handle_output and load_input method, which are called by Dagster at runtime to handle the storing and loading of data.
When starting to build a set of assets or jobs, you may want to use a bare Python object without configuration as a resource, such as a third-party API client.
Dagster supports passing plain Python objects as resources. This follows a similar pattern to using a ConfigurableResource subclass, however assets and ops which use these resources must annotate them with ResourceParam. This annotation lets Dagster know that the parameter is a resource and not an upstream input.
from dagster import Definitions, asset, ResourceParam
# `ResourceParam[GitHub]` is treated exactly like `GitHub` for type checking purposes,# and the runtime type of the github parameter is `GitHub`. The purpose of the# `ResourceParam` wrapper is to let Dagster know that `github` is a resource and not an# upstream asset.@assetdefpublic_github_repos(github: ResourceParam[GitHub]):return github.organization("dagster-io").repositories()
defs = Definitions(
assets=[public_github_repos],
resources={"github": GitHub(...)},)
In the case that your resource makes use of the resource initialization context, you can use the build_init_resource_context utility alongside the with_init_resource_context helper on your resource class:
Resources are a powerful way to encapsulate reusable logic in your assets and ops. For more information on the supported config types for resources, see the advanced config types documentation. For information on the Dagster config system, which you can use to parameterize ops and assets, see the run configuration documentation.