Reviewers:
Aronchick 2023-01-16 - my biggest comment is basically just thinking about what’s close to what we already have and/or reusing our platform as much as possible to reduce the PoC nature of it. E.g. Kai’s thought about just reusing FilecoinUnsealed driver.
Develop a compelling demo for Bacalhau’s use on-prem / IoT to create distributed masterless data pipes using IPFS (masterless data movement), GossipSub (masterless pubsub), WASM (for insane platform portability) & Docker (where we need heavier workloads like AI with GPUs).
Use case: When someone connects to a wifi network (detected by a WASM program parsing the logs on the access point), all the cameras in the building take a photo and rescale it with WASM. Image detection on those photo runs on an AI model in the building on a GPU in Docker, and the labels for it are published to an EC2 instance, which alerts us on Slack with labelled photos if the labels match a subset (i.e person).
The alternative: streaming all the webcam and log data into the cloud, would be significantly more costly in terms of ingress, compute & storage costs.
We plan to prototype this demo entirely in Luke’s house, where he has a GPU machine, some old Linux laptops with webcams, and several Ubiqiti Wifi APs which are ARM boxes running Linux that we can SSH into.
The key problem to solve with respect to the current Bacalhau architecture is being able to compose a pipeline of streams, using as much as possible of the existing IPFS and GossipSub infrastructure as possible. THIS IS A PROOF OF CONCEPT, IT IS NOT INTENDED TO BE THE FINAL PRODUCTION-READY AND PERFORMANT WAY WE DO STREAMING.
Would be nice if bacalhau serve
just worked everywhere with no args. Then we could have a simple k3s-like installer (maybe with nicer UX 😉).
We need two modes for bacalhau serve
:
bacalhau serve
bacalhau serve --private --initial
outputs Run bacalhau serve --private --join <my-ip>
bacalhau serve --private --join <ip-of-other-node>
to continue creating the meshWe will need a way to schedule jobs to specific nodes or based on node labels. Or even just to run jobs on specific nodes (by id) for now.