Reviewers:

Aronchick 2023-01-16 - my biggest comment is basically just thinking about what’s close to what we already have and/or reusing our platform as much as possible to reduce the PoC nature of it. E.g. Kai’s thought about just reusing FilecoinUnsealed driver.

Background

Develop a compelling demo for Bacalhau’s use on-prem / IoT to create distributed masterless data pipes using IPFS (masterless data movement), GossipSub (masterless pubsub), WASM (for insane platform portability) & Docker (where we need heavier workloads like AI with GPUs).

Demo

Untitled

Use case: When someone connects to a wifi network (detected by a WASM program parsing the logs on the access point), all the cameras in the building take a photo and rescale it with WASM. Image detection on those photo runs on an AI model in the building on a GPU in Docker, and the labels for it are published to an EC2 instance, which alerts us on Slack with labelled photos if the labels match a subset (i.e person).

The alternative: streaming all the webcam and log data into the cloud, would be significantly more costly in terms of ingress, compute & storage costs.

We plan to prototype this demo entirely in Luke’s house, where he has a GPU machine, some old Linux laptops with webcams, and several Ubiqiti Wifi APs which are ARM boxes running Linux that we can SSH into.

The key problem to solve with respect to the current Bacalhau architecture is being able to compose a pipeline of streams, using as much as possible of the existing IPFS and GossipSub infrastructure as possible. THIS IS A PROOF OF CONCEPT, IT IS NOT INTENDED TO BE THE FINAL PRODUCTION-READY AND PERFORMANT WAY WE DO STREAMING.

Easier installation & private clusters

Would be nice if bacalhau serve just worked everywhere with no args. Then we could have a simple k3s-like installer (maybe with nicer UX 😉).

We need two modes for bacalhau serve:

  1. Joining the public Bacalhau network, this should probably be the default for bacalhau serve
  2. Creating a new private network, how would we do this? In this case, both the Bacalhau gossipsub and the IPFS networks we create need to be isolated. Something like:
    1. bacalhau serve --private --initial outputs Run bacalhau serve --private --join <my-ip>
    2. bacalhau serve --private --join <ip-of-other-node> to continue creating the mesh

✅ Node selectors

We will need a way to schedule jobs to specific nodes or based on node labels. Or even just to run jobs on specific nodes (by id) for now.

Mounting Local Directories