This documentation is for an older version (1.4.7) of Dagster. You can view the version of this page from our latest release below.
Executes a single-threaded, in-process run which materializes provided assets.
By default, will materialize assets to the local filesystem.
assets (Sequence[Union[AssetsDefinition, SourceAsset]]) –
The assets to materialize.
Unless you’re using deps or non_argument_deps, you must also include all assets that are upstream of the assets that you want to materialize. This is because those upstream asset definitions have information that is needed to load their contents while materializing the downstream assets.
You can use the selection argument to distinguish between assets that you want to materialize and assets that are just present for loading.
resources (Optional[Mapping[str, object]]) – The resources needed for execution. Can provide resource instances directly, or resource definitions. Note that if provided resources conflict with resources directly on assets, an error will be thrown.
run_config (Optional[Any]) – The run config to use for the run that materializes the assets.
partition_key – (Optional[str]) The string partition key that specifies the run config to execute. Can only be used to select run config for assets with partitioned config.
tags (Optional[Mapping[str, str]]) – Tags for the run.
selection (Optional[Union[str, Sequence[str], Sequence[AssetKey], Sequence[Union[AssetsDefinition, SourceAsset]], AssetSelection]]) –
A sub-selection of assets to materialize.
If not provided, then all assets will be materialized.
If providing a string or sequence of strings, https://docs.dagster.io/concepts/assets/asset-selection-syntax describes the accepted syntax.
The result of the execution.
Examples
@asset
def asset1():
...
@asset
def asset2(asset1):
...
# executes a run that materializes asset1 and then asset2
materialize([asset1, asset2])
# executes a run that materializes just asset2, loading its input from asset1
materialize([asset1, asset2], selection=[asset2])
Executes a single-threaded, in-process run which materializes provided assets in memory.
Will explicitly use mem_io_manager()
for all required io manager
keys. If any io managers are directly provided using the resources
argument, a DagsterInvariantViolationError
will be thrown.
assets (Sequence[Union[AssetsDefinition, SourceAsset]]) – The assets to materialize. Can also provide SourceAsset
objects to fill dependencies for asset defs.
run_config (Optional[Any]) – The run config to use for the run that materializes the assets.
resources (Optional[Mapping[str, object]]) – The resources needed for execution. Can provide resource instances directly, or resource definitions. If provided resources conflict with resources directly on assets, an error will be thrown.
partition_key – (Optional[str]) The string partition key that specifies the run config to execute. Can only be used to select run config for assets with partitioned config.
tags (Optional[Mapping[str, str]]) – Tags for the run.
selection (Optional[Union[str, Sequence[str], Sequence[AssetKey], Sequence[Union[AssetsDefinition, SourceAsset]], AssetSelection]]) –
A sub-selection of assets to materialize.
If not provided, then all assets will be materialized.
If providing a string or sequence of strings, https://docs.dagster.io/concepts/assets/asset-selection-syntax describes the accepted syntax.
The result of the execution.
Examples
@asset
def asset1():
...
@asset
def asset2(asset1):
...
# executes a run that materializes asset1 and then asset2
materialize([asset1, asset2])
# executes a run that materializes just asset1
materialize([asset1, asset2], selection=[asset1])
Defines a Dagster job.
The config mapping for the job, if it has one.
A config mapping defines a way to map a top-level config schema to run config for the job.
Execute the Job in-process, gathering results in-memory.
The executor_def on the Job will be ignored, and replaced with the in-process executor. If using the default io_manager, it will switch from filesystem to in-memory.
(Optional[Mapping[str (run_config) – The configuration for the run
Any]] – The configuration for the run
instance (Optional[DagsterInstance]) – The instance to execute against, an ephemeral one will be used if none provided.
partition_key – (Optional[str]) The string partition key that specifies the run config to execute. Can only be used to select run config for jobs with partitioned config.
raise_on_error (Optional[bool]) – Whether or not to raise exceptions when they occur.
Defaults to True
.
op_selection (Optional[Sequence[str]]) – A list of op selection queries (including single op
names) to execute. For example:
* ['some_op']
: selects some_op
itself.
* ['*some_op']
: select some_op
and all its ancestors (upstream dependencies).
* ['*some_op+++']
: select some_op
, all its ancestors, and its descendants
(downstream dependencies) within 3 levels down.
* ['*some_op', 'other_op_a', 'other_op_b+']
: select some_op
and all its
ancestors, other_op_a
itself, and other_op_b
and its direct child ops.
input_values (Optional[Mapping[str, Any]]) – A dictionary that maps python objects to the top-level inputs of the job. Input values provided here will override input values that have been provided to the job directly.
resources (Optional[Mapping[str, Any]]) – The resources needed if any are required. Can provide resource instances directly, or resource definitions.
Returns the default ExecutorDefinition
for the job.
If the user has not specified an executor definition, then this will default to the multi_or_in_process_executor()
. If a default is specified on the Definitions
object the job was provided to, then that will be used instead.
Returns True if this job has explicitly specified an executor, and False if the executor was inherited through defaults or the Definitions
object the job was provided to.
Returns true if the job explicitly set loggers, and False if loggers were inherited through defaults or the Definitions
object the job was provided to.
Returns the set of LoggerDefinition objects specified on the job.
If the user has not specified a mapping of LoggerDefinition
objects, then this will default to the colored_console_logger()
under the key console. If a default is specified on the Definitions
object the job was provided to, then that will be used instead.
The partitioned config for the job, if it has one.
A partitioned config defines a way to map partition keys to run config for the job.
Returns the PartitionsDefinition
for the job, if it has one.
A partitions definition defines the set of partition keys the job operates on.
Returns the set of ResourceDefinition objects specified on the job.
This may not be the complete set of resources required by the job, since those can also be provided on the Definitions
object the job may be provided to.
Deprecated
Creates a RunRequest object for a run that processes the given partition.
partition_key – The key of the partition to request a run for.
run_key (Optional[str]) – A string key to identify this launched run. For sensors, ensures that only one run is created per run key across all sensor evaluations. For schedules, ensures that one run is created per tick, across failure recoveries. Passing in a None value means that a run will always be launched per evaluation.
tags (Optional[Dict[str, str]]) – A dictionary of tags (string key-value pairs) to attach to the launched run.
(Optional[Mapping[str (run_config) – Configuration for the run. If the job has
a PartitionedConfig
, this value will override replace the config
provided by it.
Any]] – Configuration for the run. If the job has
a PartitionedConfig
, this value will override replace the config
provided by it.
current_time (Optional[datetime]) – Used to determine which time-partitions exist. Defaults to now.
dynamic_partitions_store (Optional[DynamicPartitionsStore]) – The DynamicPartitionsStore object that is responsible for fetching dynamic partitions. Required when the partitions definition is a DynamicPartitionsDefinition with a name defined. Users can pass the DagsterInstance fetched via context.instance to this argument.
an object that requests a run to process the given partition.
Apply a set of hooks to all op instances within the job.
Apply a set of resources to all op instances within the job.
Execute a job synchronously.
This API represents dagster’s python entrypoint for out-of-process
execution. For most testing purposes,
execute_in_process()
will be more suitable, but when wanting to run
execution using an out-of-process executor (such as dagster.
multiprocess_executor
), then execute_job is suitable.
execute_job expects a persistent DagsterInstance
for
execution, meaning the $DAGSTER_HOME environment variable must be set.
It also expects a reconstructable pointer to a JobDefinition
so
that it can be reconstructed in separate processes. This can be done by
wrapping the JobDefinition
in a call to dagster.
reconstructable()
.
from dagster import DagsterInstance, execute_job, job, reconstructable
@job
def the_job():
...
instance = DagsterInstance.get()
result = execute_job(reconstructable(the_job), instance=instance)
assert result.success
If using the to_job()
method to
construct the JobDefinition
, then the invocation must be wrapped in a
module-scope function, which can be passed to reconstructable
.
from dagster import graph, reconstructable
@graph
def the_graph():
...
def define_job():
return the_graph.to_job(...)
result = execute_job(reconstructable(define_job), ...)
Since execute_job is potentially executing outside of the current process, output objects need to be retrieved by use of the provided job’s io managers. Output objects can be retrieved by opening the result of execute_job as a context manager.
from dagster import execute_job
with execute_job(...) as result:
output_obj = result.output_for_node("some_op")
execute_job
can also be used to reexecute a run, by providing a ReexecutionOptions
object.
from dagster import ReexecutionOptions, execute_job
instance = DagsterInstance.get()
options = ReexecutionOptions.from_failure(run_id=failed_run_id, instance)
execute_job(reconstructable(job), instance, reexecution_options=options)
job (ReconstructableJob) – A reconstructable pointer to a JobDefinition
.
instance (DagsterInstance) – The instance to execute against.
run_config (Optional[dict]) – The configuration that parametrizes this run, as a dict.
tags (Optional[Dict[str, Any]]) – Arbitrary key-value pairs that will be added to run logs.
raise_on_error (Optional[bool]) – Whether or not to raise exceptions when they occur.
Defaults to False
.
op_selection (Optional[List[str]]) –
A list of op selection queries (including single op names) to execute. For example:
['some_op']
: selects some_op
itself.
['*some_op']
: select some_op
and all its ancestors (upstream dependencies).
['*some_op+++']
: select some_op
, all its ancestors, and its descendants
(downstream dependencies) within 3 levels down.
['*some_op', 'other_op_a', 'other_op_b+']
: select some_op
and all its
ancestors, other_op_a
itself, and other_op_b
and its direct child ops.
reexecution_options (Optional[ReexecutionOptions]) – Reexecution options to provide to the run, if this run is
intended to be a reexecution of a previous run. Cannot be used in
tandem with the op_selection
argument.
The result of job execution.
Reexecution options for python-based execution in Dagster.
parent_run_id (str) – The run_id of the run to reexecute.
step_selection (Sequence[str]) –
The list of step selections to reexecute. Must be a subset or match of the set of steps executed in the original run. For example:
['some_op']
: selects some_op
itself.
['*some_op']
: select some_op
and all its ancestors (upstream dependencies).
['*some_op+++']
: select some_op
, all its ancestors, and its descendants
(downstream dependencies) within 3 levels down.
['*some_op', 'other_op_a', 'other_op_b+']
: select some_op
and all its
ancestors, other_op_a
itself, and other_op_b
and its direct child ops.
Creates a persistent DagsterInstance
available within a context manager.
When a context manager is opened, if no temp_dir parameter is set, a new temporary directory will be created for the duration of the context manager’s opening. If the set_dagster_home parameter is set to True (True by default), the $DAGSTER_HOME environment variable will be overridden to be this directory (or the directory passed in by temp_dir) for the duration of the context manager being open.
overrides (Optional[Mapping[str, Any]]) – Config to provide to instance (config format follows that typically found in an instance.yaml file).
set_dagster_home (Optional[bool]) – If set to True, the $DAGSTER_HOME environment variable will be overridden to be the directory used by this instance for the duration that the context manager is open. Upon the context manager closing, the $DAGSTER_HOME variable will be re-set to the original value. (Defaults to True).
temp_dir (Optional[str]) – The directory to use for storing local artifacts produced by the instance. If not set, a temporary directory will be created for the duration of the context manager being open, and all artifacts will be torn down afterward.
Defines a Dagster op graph.
An op graph is made up of
Nodes, which can either be an op (the functional unit of computation), or another graph.
Dependencies, which determine how the values produced by nodes as outputs flow from one node to another. This tells Dagster how to arrange nodes into a directed, acyclic graph (DAG) of compute.
End users should prefer the @graph
decorator. GraphDefinition is generally
intended to be used by framework authors or for programatically generated graphs.
name (str) – The name of the graph. Must be unique within any GraphDefinition
or JobDefinition
containing the graph.
description (Optional[str]) – A human-readable description of the job.
node_defs (Optional[Sequence[NodeDefinition]]) – The set of ops / graphs used in this graph.
dependencies (Optional[Dict[Union[str, NodeInvocation], Dict[str, DependencyDefinition]]]) – A structure that declares the dependencies of each op’s inputs on the outputs of other
ops in the graph. Keys of the top level dict are either the string names of ops in the
graph or, in the case of aliased ops, NodeInvocations
.
Values of the top level dict are themselves dicts, which map input names belonging to
the op or aliased op to DependencyDefinitions
.
input_mappings (Optional[Sequence[InputMapping]]) – Defines the inputs to the nested graph, and how they map to the inputs of its constituent ops.
output_mappings (Optional[Sequence[OutputMapping]]) – Defines the outputs of the nested graph, and how they map from the outputs of its constituent ops.
config (Optional[ConfigMapping]) – Defines the config of the graph, and how its schema maps to the config of its constituent ops.
tags (Optional[Dict[str, Any]]) – Arbitrary metadata for any execution of the graph. Values that are not strings will be json encoded and must meet the criteria that json.loads(json.dumps(value)) == value. These tag values may be overwritten by tag values provided at invocation time.
Examples
@op
def return_one():
return 1
@op
def add_one(num):
return num + 1
graph_def = GraphDefinition(
name='basic',
node_defs=[return_one, add_one],
dependencies={'add_one': {'num': DependencyDefinition('return_one')}},
)
Aliases the graph with a new name.
Can only be used in the context of a @graph
, @job
, or @asset_graph
decorated function.
@job
def do_it_all():
my_graph.alias("my_graph_alias")
The config mapping for the graph, if present.
By specifying a config mapping function, you can override the configuration for the child nodes contained within a graph.
Execute this graph in-process, collecting results in-memory.
run_config (Optional[Mapping[str, Any]]) – Run config to provide to execution. The configuration for the underlying graph should exist under the “ops” key.
instance (Optional[DagsterInstance]) – The instance to execute against, an ephemeral one will be used if none provided.
resources (Optional[Mapping[str, Any]]) – The resources needed if any are required. Can provide resource instances directly, or resource definitions.
raise_on_error (Optional[bool]) – Whether or not to raise exceptions when they occur.
Defaults to True
.
op_selection (Optional[List[str]]) – A list of op selection queries (including single op
names) to execute. For example:
* ['some_op']
: selects some_op
itself.
* ['*some_op']
: select some_op
and all its ancestors (upstream dependencies).
* ['*some_op+++']
: select some_op
, all its ancestors, and its descendants
(downstream dependencies) within 3 levels down.
* ['*some_op', 'other_op_a', 'other_op_b+']
: select some_op
and all its
ancestors, other_op_a
itself, and other_op_b
and its direct child ops.
input_values (Optional[Mapping[str, Any]]) – A dictionary that maps python objects to the top-level inputs of the graph.
Input mappings for the graph.
An input mapping is a mapping from an input of the graph to an input of a child node.
The name of the graph.
Output mappings for the graph.
An output mapping is a mapping from an output of the graph to an output of a child node.
Attaches the provided tags to the graph immutably.
Can only be used in the context of a @graph
, @job
, or @asset_graph
decorated function.
@job
def do_it_all():
my_graph.tag({"my_tag": "my_value"})
The tags associated with the graph.
Make this graph in to an executable Job by providing remaining components required for execution.
name (Optional[str]) – The name for the Job. Defaults to the name of the this graph.
resource_defs (Optional[Mapping [str, object]]) – Resources that are required by this graph for execution. If not defined, io_manager will default to filesystem.
config –
Describes how the job is parameterized at runtime.
If no value is provided, then the schema for the job’s run config is a standard format based on its ops and resources.
If a dictionary is provided, then it must conform to the standard config schema, and it will be used as the job’s run config for the job whenever the job is executed. The values provided will be viewable and editable in the Dagster UI, so be careful with secrets.
If a ConfigMapping
object is provided, then the schema for the job’s run config is
determined by the config mapping, and the ConfigMapping, which should return
configuration in the standard format to configure the job.
If a PartitionedConfig
object is provided, then it defines a discrete set of config
values that can parameterize the job, as well as a function for mapping those
values to the base config. The values provided will be viewable and editable in the
Dagster UI, so be careful with secrets.
tags (Optional[Mapping[str, Any]]) – Arbitrary information that will be attached to the execution of the Job. Values that are not strings will be json encoded and must meet the criteria that json.loads(json.dumps(value)) == value. These tag values may be overwritten by tag values provided at invocation time.
metadata (Optional[Mapping[str, RawMetadataValue]]) – Arbitrary information that will be attached to the JobDefinition and be viewable in the Dagster UI. Keys must be strings, and values must be python primitive types or one of the provided MetadataValue types
logger_defs (Optional[Mapping[str, LoggerDefinition]]) – A dictionary of string logger identifiers to their implementations.
executor_def (Optional[ExecutorDefinition]) – How this Job will be executed. Defaults to multi_or_in_process_executor
,
which can be switched between multi-process and in-process modes of execution. The
default mode of execution is multi-process.
op_retry_policy (Optional[RetryPolicy]) – The default retry policy for all ops in this job. Only used if retry policy is not defined on the op definition or op invocation.
version_strategy (Optional[VersionStrategy]) – Defines how each op (and optionally, resource) in the job can be versioned. If provided, memoizaton will be enabled for this job.
partitions_def (Optional[PartitionsDefinition]) – Defines a discrete set of partition keys that can parameterize the job. If this argument is supplied, the config argument can’t also be supplied.
asset_layer (Optional[AssetLayer]) – Top level information about the assets this job will produce. Generally should not be set manually.
input_values (Optional[Mapping[str, Any]]) – A dictionary that maps python objects to the top-level inputs of a job.
JobDefinition
Result object returned by in-process testing APIs.
Users should not instantiate this object directly. Used for retrieving run success, events, and outputs from execution methods that return this object.
This object is returned by:
- dagster.GraphDefinition.execute_in_process()
- dagster.JobDefinition.execute_in_process()
- dagster.materialize_to_memory()
- dagster.materialize()
All dagster events emitted during execution.
List[DagsterEvent]
Retrieves the value of an asset that was materialized during the execution of the job.
asset_key (CoercibleToAssetKey) – The key of the asset to retrieve.
The value of the retrieved asset.
Any
The Dagster run that was executed.
The job definition that was executed.
Retrieves output value with a particular name from the in-process run of the job.
node_str (str) – Name of the op/graph whose output should be retrieved. If the intended graph/op is nested within another graph, the syntax is outer_graph.inner_node.
output_name (Optional[str]) – Name of the output on the op/graph to retrieve. Defaults to result, the default output name in dagster.
The value of the retrieved output.
Any
Retrieves output of top-level job, if an output is returned.
output_name (Optional[str]) – The name of the output to retrieve. Defaults to result, the default output name in dagster.
The value of the retrieved output.
Any
The run ID of the executed DagsterRun
.
str
Result object returned by dagster.execute_job()
.
Used for retrieving run success, events, and outputs from execute_job. Users should not directly instantiate this class.
Events and run information can be retrieved off of the object directly. In order to access outputs, the ExecuteJobResult object needs to be opened as a context manager, which will re-initialize the resources from execution.
List of all events yielded by the job execution.
Sequence[DagsterEvent]
The Dagster run that was executed.
The job definition that was executed.
Retrieves output value with a particular name from the run of the job.
In order to use this method, the ExecuteJobResult object must be opened as a context manager. If this method is used without opening the context manager, it will result in a DagsterInvariantViolationError
.
node_str (str) – Name of the op/graph whose output should be retrieved. If the intended graph/op is nested within another graph, the syntax is outer_graph.inner_node.
output_name (Optional[str]) – Name of the output on the op/graph to retrieve. Defaults to result, the default output name in dagster.
The value of the retrieved output.
Any
Retrieves output of top-level job, if an output is returned.
In order to use this method, the ExecuteJobResult object must be opened as a context manager. If this method is used without opening the context manager, it will result in a DagsterInvariantViolationError
. If the top-level job has no output, calling this method will also result in a DagsterInvariantViolationError
.
output_name (Optional[str]) – The name of the output to retrieve. Defaults to result, the default output name in dagster.
The value of the retrieved output.
Any
The id of the Dagster run that was executed.
str
Events yielded by op and job execution.
Users should not instantiate this class.
Value for a DagsterEventType.
str
str
NodeHandle
Value for a StepKind.
str
Dict[str, str]
Type must correspond to event_type_value.
Any
str
int
DEPRECATED
Optional[str]
For events that correspond to a specific asset_key / partition (ASSET_MATERIALIZTION, ASSET_OBSERVATION, ASSET_MATERIALIZATION_PLANNED), returns that asset key. Otherwise, returns None.
Optional[AssetKey]
The type of this event.
If this event is of type ASSET_MATERIALIZATION_PLANNED.
bool
If this event is of type ASSET_OBSERVATION.
bool
If this event is of type ENGINE_EVENT.
bool
If this event is of type STEP_EXPECTATION_RESULT.
bool
If this event represents the failure of a run or step.
bool
If this event is of type HANDLED_OUTPUT.
bool
If this event relates to the execution of a hook.
bool
If this event is of type LOADED_INPUT.
bool
If this event is of type RESOURCE_INIT_FAILURE.
bool
If this event relates to a specific step.
bool
If this event is of type STEP_FAILURE.
bool
If this event is of type ASSET_MATERIALIZATION.
bool
If this event is of type STEP_RESTARTED.
bool
If this event is of type STEP_SKIPPED.
bool
If this event is of type STEP_START.
bool
If this event is of type STEP_SUCCESS.
bool
If this event is of type STEP_UP_FOR_RETRY.
bool
If this event is of type STEP_OUTPUT.
bool
Create a ReconstructableJob
from a
function that returns a JobDefinition
/JobDefinition
,
or a function decorated with @job
.
When your job must cross process boundaries, e.g., for execution on multiple nodes or
in different systems (like dagstermill
), Dagster must know how to reconstruct the job
on the other side of the process boundary.
Passing a job created with ~dagster.GraphDefinition.to_job
to reconstructable()
,
requires you to wrap that job’s definition in a module-scoped function, and pass that function
instead:
from dagster import graph, reconstructable
@graph
def my_graph():
...
def define_my_job():
return my_graph.to_job()
reconstructable(define_my_job)
This function implements a very conservative strategy for reconstruction, so that its behavior is easy to predict, but as a consequence it is not able to reconstruct certain kinds of jobs or jobs, such as those defined by lambdas, in nested scopes (e.g., dynamically within a method call), or in interactive environments such as the Python REPL or Jupyter notebooks.
If you need to reconstruct objects constructed in these ways, you should use
build_reconstructable_job()
instead, which allows you to
specify your own reconstruction strategy.
Examples
from dagster import job, reconstructable
@job
def foo_job():
...
reconstructable_foo_job = reconstructable(foo_job)
@graph
def foo():
...
def make_bar_job():
return foo.to_job()
reconstructable_bar_job = reconstructable(make_bar_job)
The default executor for a job.
This is the executor available by default on a JobDefinition
that does not provide custom executors. This executor has a multiprocessing-enabled mode, and a
single-process mode. By default, multiprocessing mode is enabled. Switching between multiprocess
mode and in-process mode can be achieved via config.
execution:
config:
multiprocess:
execution:
config:
in_process:
When using the multiprocess mode, max_concurrent
and retries
can also be configured.
execution:
config:
multiprocess:
max_concurrent: 4
retries:
enabled:
The max_concurrent
arg is optional and tells the execution engine how many processes may run
concurrently. By default, or if you set max_concurrent
to be 0, this is the return value of
python:multiprocessing.cpu_count()
.
When using the in_process mode, then only retries can be configured.
Execution priority can be configured using the dagster/priority
tag via op metadata,
where the higher the number the higher the priority. 0 is the default and both positive
and negative numbers can be used.
The in-process executor executes all steps in a single process.
To select it, include the following top-level fragment in config:
execution:
in_process:
Execution priority can be configured using the dagster/priority
tag via op metadata,
where the higher the number the higher the priority. 0 is the default and both positive
and negative numbers can be used.
The multiprocess executor executes each step in an individual process.
Any job that does not specify custom executors will use the multiprocess_executor by default. To configure the multiprocess executor, include a fragment such as the following in your run config:
execution:
config:
multiprocess:
max_concurrent: 4
The max_concurrent
arg is optional and tells the execution engine how many processes may run
concurrently. By default, or if you set max_concurrent
to be None or 0, this is the return value of
python:multiprocessing.cpu_count()
.
Execution priority can be configured using the dagster/priority
tag via op metadata,
where the higher the number the higher the priority. 0 is the default and both positive
and negative numbers can be used.
The context
object that can be made available as the first argument to the function
used for computing an op or asset.
This context object provides system information such as resources, config, and logging.
To construct an execution context for testing purposes, use dagster.build_op_context()
.
Example
from dagster import op, OpExecutionContext
@op
def hello_world(context: OpExecutionContext):
context.log.info("Hello, world!")
Return the AssetKey for the corresponding output.
Returns the partition key of the upstream asset corresponding to the given input.
Returns the asset partition key for the given output. Defaults to “result”, which is the name of the default output.
Deprecated
The range of partition keys for the current run.
If run is for a single partition key, return a PartitionKeyRange with the same start and end. Raises an error if the current run is not a partitioned run.
Return the PartitionKeyRange for the corresponding input. Errors if there is more or less than one.
Return the PartitionKeyRange for the corresponding output. Errors if not present.
Returns a list of the partition keys of the upstream asset corresponding to the given input.
Returns a list of the partition keys for the given output.
The PartitionsDefinition on the upstream asset corresponding to this input.
The PartitionsDefinition on the upstream asset corresponding to this input.
The time window for the partitions of the input asset.
Raises an error if either of the following are true: - The input asset has no partitioning. - The input asset is not partitioned with a TimeWindowPartitionsDefinition or a MultiPartitionsDefinition with one time-partitioned dimension.
The time window for the partitions of the output asset.
Raises an error if either of the following are true: - The output asset has no partitioning. - The output asset is not partitioned with a TimeWindowPartitionsDefinition or a MultiPartitionsDefinition with one time-partitioned dimension.
The backing AssetsDefinition for what is currently executing, errors if not available.
Return the provenance information for the most recent materialization of an asset.
asset_key (AssetKey) – Key of the asset for which to retrieve provenance.
materialization of the asset. Returns None if the asset was never materialized or the materialization record is too old to contain provenance information.
Optional[DataProvenance]
Which mapping_key this execution is for if downstream of a DynamicOutput, otherwise None.
Get a logging tag.
key (tag) – The tag to get.
The value of the tag, if present.
Optional[str]
If there is a backing AssetsDefinition for what is currently executing.
Whether the current run is a partitioned run.
Check if a logging tag is set.
key (str) – The tag to check.
Whether the tag is set.
bool
The current Dagster instance.
The currently executing pipeline.
The name of the currently executing pipeline.
str
The log manager available in the execution context.
Log an AssetMaterialization, AssetObservation, or ExpectationResult from within the body of an op.
Events logged with this method will appear in the list of DagsterEvents, as well as the event log.
event (Union[AssetMaterialization, AssetObservation, ExpectationResult]) – The event to log.
Examples:
from dagster import op, AssetMaterialization
@op
def log_materialization(context):
context.log_event(AssetMaterialization("foo"))
The parsed config specific to this op.
Any
The current op definition.
The partition key for the current run.
Raises an error if the current run is not a partitioned run.
The range of partition keys for the current run.
If run is for a single partition key, return a PartitionKeyRange with the same start and end. Raises an error if the current run is not a partitioned run.
The partition time window for the current run.
Raises an error if the current run is not a partitioned run, or if the job’s partition definition is not a TimeWindowPartitionsDefinition.
Gives access to pdb debugging from within the op.
Example
@op
def debug(context):
context.pdb.set_trace()
dagster.utils.forked_pdb.ForkedPdb
The currently available resources.
Resources
Which retry attempt is currently executing i.e. 0 for initial attempt, 1 for first retry, etc.
The run config for the current execution.
dict
The id of the current execution’s run.
str
Get the set of AssetKeys this execution is expected to materialize.
Get the output names that correspond to the current selection of assets this execution is expected to materialize.
alias of OpExecutionContext
Builds op execution context from provided parameters.
build_op_context
can be used as either a function or context manager. If there is a
provided resource that is a context manager, then build_op_context
must be used as a
context manager. This function can be used to provide the context argument when directly
invoking a op.
resources (Optional[Dict[str, Any]]) – The resources to provide to the context. These can be either values or resource definitions.
op_config (Optional[Mapping[str, Any]]) – The config to provide to the op.
resources_config (Optional[Mapping[str, Any]]) – The config to provide to the resources.
instance (Optional[DagsterInstance]) – The dagster instance configured for the context. Defaults to DagsterInstance.ephemeral().
mapping_key (Optional[str]) – A key representing the mapping key from an upstream dynamic
output. Can be accessed using context.get_mapping_key()
.
partition_key (Optional[str]) – String value representing partition key to execute with.
partition_key_range (Optional[PartitionKeyRange]) – Partition key range to execute with.
_assets_def (Optional[AssetsDefinition]) – Internal argument that populates the op’s assets definition, not meant to be populated by users.
Examples
context = build_op_context()
op_to_invoke(context)
with build_op_context(resources={"foo": context_manager_resource}) as context:
op_to_invoke(context)
Builds asset execution context from provided parameters.
build_asset_context
can be used as either a function or context manager. If there is a
provided resource that is a context manager, then build_asset_context
must be used as a
context manager. This function can be used to provide the context argument when directly
invoking an asset.
resources (Optional[Dict[str, Any]]) – The resources to provide to the context. These can be either values or resource definitions.
resources_config (Optional[Mapping[str, Any]]) – The config to provide to the resources.
asset_config (Optional[Mapping[str, Any]]) – The config to provide to the asset.
instance (Optional[DagsterInstance]) – The dagster instance configured for the context. Defaults to DagsterInstance.ephemeral().
partition_key (Optional[str]) – String value representing partition key to execute with.
partition_key_range (Optional[PartitionKeyRange]) – Partition key range to execute with.
Examples
context = build_asset_context()
asset_to_invoke(context)
with build_asset_context(resources={"foo": context_manager_resource}) as context:
asset_to_invoke(context)
The context
object available to a type check function on a DagsterType.
Centralized log dispatch from user code.
An object whose attributes contain the resources available to this op.
The id of this job run.
Function to validate a provided run config blob against a given job.
If validation is successful, this function will return a dictionary representation of the validated config actually used during execution.
job_def (JobDefinition) – The job definition to validate run config against
run_config (Optional[Dict[str, Any]]) – The run config to validate
A dictionary representation of the validated config.
Dict[str, Any]
The
run_config
used for jobs has the following schema:{ # configuration for execution, required if executors require config execution: { # the name of one, and only one available executor, typically 'in_process' or 'multiprocess' __executor_name__: { # executor-specific config, if required or permitted config: { ... } } }, # configuration for loggers, required if loggers require config loggers: { # the name of an available logger __logger_name__: { # logger-specific config, if required or permitted config: { ... } }, ... }, # configuration for resources, required if resources require config resources: { # the name of a resource __resource_name__: { # resource-specific config, if required or permitted config: { ... } }, ... }, # configuration for underlying ops, required if ops require config ops: { # these keys align with the names of the ops, or their alias in this job __op_name__: { # pass any data that was defined via config_field config: ..., # configurably specify input values, keyed by input name inputs: { __input_name__: { # if an dagster_type_loader is specified, that schema must be satisfied here; # scalar, built-in types will generally allow their values to be specified directly: value: ... } }, } }, }