Sharing Context Between Duties in Databricks Workflows

0
9


Databricks Workflows is a fully-managed service on Databricks that makes it straightforward to construct and handle advanced information and ML pipelines in your lakehouse with out the necessity to function advanced infrastructure.

Generally, a process in an ETL or ML pipeline will depend on the output of an upstream process. An instance could be to guage the efficiency of a machine studying mannequin after which have a process decide whether or not to retrain the mannequin based mostly on mannequin metrics. Since these are two separate steps, it might be greatest to have separate duties carry out the work. Beforehand, accessing info from a earlier process required storing this info outdoors of the job’s context, similar to in a Delta desk.

Databricks Workflows is introducing a brand new characteristic referred to as “Activity Values”, a easy API for setting and retrieving small values from duties. Duties can now output values that may be referenced in subsequent duties, making it simpler to create extra expressive workflows. Wanting on the historical past of a job run additionally offers extra context, by showcasing the values handed by duties on the DAG and process ranges. Activity values may be set and retrieved via the Databricks Utilities API.

The historical past of the run exhibits that the “evaluate_model” process has emitted a price
When clicking on the duty, you may see the values emitted by the duty

Activity values are actually usually obtainable. We’d love so that you can check out this new performance and inform us how we will enhance orchestration even additional!



LEAVE A REPLY

Please enter your comment!
Please enter your name here