Unlike jobs created using the job decorator where you explicitly define the dependencies when you create the job, the topology of an asset-based job is based on the assets and their dependencies.
The simplest way to create an op-based job is to use the job decorator.
Within the decorated function body, you can use function calls to indicate the dependency structure between the ops/graphs. This allows you to explicitly define dependencies between ops when you define the job.
In this example, the add_one op depends on the return_five op's output. Because this data dependency exists, the add_one op executes after return_five runs successfully and emits the required output.
from dagster import job, op
@opdefreturn_five():return5@opdefadd_one(arg):return arg +1@jobdefdo_stuff():
add_one(return_five())
Creating jobs from a graph can be useful when you want to define inter-op dependencies before binding them to resources, configuration, executors, and other environment-specific features. This approach to job creation allows you to customize graphs for each environment by plugging in configuration and services specific to that environment.
You can model this by building multiple jobs that use the same underlying graph of ops. The graph represents the logical core of data transformation, and the configuration and resources on each job customize the behavior of that job for its environment.
To do this, you first define a graph with the @graph decorator.
from dagster import graph, op, ConfigurableResource
classServer(ConfigurableResource):defping_server(self):...@opdefinteract_with_server(server: Server):
server.ping_server()@graphdefdo_stuff():
interact_with_server()
Ops, software-defined assets, and resources often accept configuration that determines how they behave. By default, you supply configuration for these ops and resources at the time you launch the job.
When constructing a job, you can customize how that configuration will be satisfied, by passing a value to the config parameter of the GraphDefinition.to_job method or the @job decorator. The options are discussed below:
You can supply a RunConfig object or raw config dictionary. The supplied config will be used to configure the job whenever the job is launched. It will show up in the UI Launchpad and can be overridden.
from dagster import Config, RunConfig, job, op
classDoSomethingConfig(Config):
config_param:str@opdefdo_something(context, config: DoSomethingConfig):
context.log.info("config_param: "+ config.config_param)
default_config = RunConfig(
ops={"do_something": DoSomethingConfig(config_param="stuff")})@job(config=default_config)defdo_it_all_with_default_config():
do_something()if __name__ =="__main__":# Will log "config_param: stuff"
do_it_all_with_default_config.execute_in_process()
For op-based jobs, you can supply a PartitionedConfig to create a partitioned job. This defines a discrete set of partitions along with a function for generating config for a partition. Job runs can be configured by selecting a partition.
You can supply a ConfigMapping. This allows you to expose a narrower config interface to your job. Instead of needing to configure every op and resource individually when launching the job, you can supply a smaller number of values to the outer config, and the ConfigMapping can translate it into config for all the job's ops and resources.
from dagster import Config, RunConfig, config_mapping, job, op
classDoSomethingConfig(Config):
config_param:str@opdefdo_something(context, config: DoSomethingConfig)->None:
context.log.info("config_param: "+ config.config_param)classSimplifiedConfig(Config):
simplified_param:str@config_mappingdefsimplified_config(val: SimplifiedConfig)-> RunConfig:return RunConfig(
ops={"do_something": DoSomethingConfig(config_param=val.simplified_param)})@job(config=simplified_config)defdo_it_all_with_simplified_config():
do_something()if __name__ =="__main__":# Will log "config_param: stuff"
do_it_all_with_simplified_config.execute_in_process(
run_config={"simplified_param":"stuff"})
You make jobs available to the UI, GraphQL, and the command line by including them in Definitions object at the top-level of Python module or file. The tool loads that module as a code location. If you include schedules or sensors, the code location will automatically include jobs that those schedules or sensors target.
from dagster import Definitions, job
@jobdefdo_it_all():...
defs = Definitions(jobs=[do_it_all])
Dagster has built-in support for testing, including separating business logic from environments and setting explicit expectations on uncontrollable inputs. Refer to the Testing guide for more info and examples.