This doc was originally posted on HackMD at this url.
Enrico, 2022-10-10
Kudos to everyone who worked on this epic before me - I took the liberty of copy/pasting some of your thoughts here.
Bacalhau (currently v0.3.x) can take workloads composed by a single job but to execute a multi-step workload they have to manually submit and track each job, this (1) poses a significant burden on the user, (2) makes reproducing a pipeline difficult, (3) data origin can only be determined by manually logging each step and backtracking.
In the effort to offering broader support towards modern multi-step workloads, this document aims at designing a compelling pipelining feature that is user-friendly and allows for complex workloads.
In this context, a Pipeline is completely user-defined meaning they write a pipeline spec detailing what/how each step (i.e. a Bacalhau job) is related to one another.
This document is based on prior work:
bacalhau docker run
cli users.bacalhau serve
users. We consider this persona too because any new piece of infrastructure must be easily deployable.