pydatatask.pipeline module#

A pipeline is just an unordered collection of tasks. Relationships between the tasks are implicit, defined by which repositories they share.

class pydatatask.pipeline.Pipeline(tasks: Iterable[Task], session: Session, resources: Iterable[ResourceManager], priority: Callable[[str, str], int] | None = None)[source]#

Bases: object

The pipeline class.

Parameters:
  • tasks – The tasks which make up this pipeline.

  • session – The session to open while this pipeline is active.

  • resources – Any resource managers in use. You need to provide these so the pipeline can reset the rate-limiting at each update.

  • priority – Optional: A function which takes a task and job name and returns an integer priority. No jobs will be scheduled unless all higher-priority jobs (larger numbers) have already been scheduled.

settings(synchronous=False, metadata=True)[source]#

This method can be called to set properties of the current run.

Parameters:
  • synchronous – Whether jobs will be started and completed in-process, waiting for their completion before a launch phase succeeds.

  • metadata – Whether jobs will store their completion metadata.

async open()[source]#

Opens the pipeline and its associated resource session. This will be automatically when entering an async with pipeline: block.

async close()[source]#

Closes the pipeline and its associated resource session. This will be automatically called when exiting an async with pipeline: block.

async update() bool[source]#

Perform one round of pipeline maintenance, running the update phase and then the launch phase. The pipeline must be opened for this function to run.

Returns:

Whether there is any activity in the pipeline.

async update_only_update() bool[source]#

Perform one round of the update phase of pipeline maintenance. The pipeline must be opened for this function to run.

Returns:

Whether there were any live jobs.

async update_only_launch() bool[source]#

Perform one round of the launch phase of pipeline maintenance. The pipeline must be opened for this function to run.

Returns:

Whether there were any jobs launched or ready to launch.

async gather_ready_jobs(task: Task) Set[str][source]#

Collect all jobs that are ready to be launched for a given task.

graph() DiGraph[source]#

Generate the dependency graph for a pipeline. This is a directed graph containing both repositories and tasks as nodes.

dependants(node: Repository | Task, recursive: bool) Iterable[Repository | Task][source]#

Iterate the list of repositories that are dependent on the given node of the dependency graph.

Parameters:
  • node – The starting point of the graph traversal.

  • recursive – Whether to return only direct dependencies (False) or transitive dependencies (True) of node.