Meeting minutes
2022-11-15
Bacalhau Pipelines (2022-11-15 15:41 GMT+1)
2022-10-25
- Enrico
- Philippe
- Unify pipelines, template level
- 1 CID in, 1 CID OUT
- it's actually 1+ CIDs in, 1 CID out
- Fan in/out
- remove string keys, use indices
- We output an array containing indices
- Flyte
- built in scheduler
- data caching
- not best rep for post-Airflow era
- Enrico
- Philippe
- Simon's img processing examples
- Demo:
- Img processing - cool, must run on laptop
- Philippe
- Docker image could be CID so
- deterministic task could be made out of a tuple: docker-CID, input-CID
- use IPFS-backed Docker registry
2022-10-19
- Part 1) Alternatives to Airflow
- Philippe
- Kedro and others may not come with a scheduler attachted
- Differentiatrs
- Airflow still domenante - network effect!
- post-airflow mind set -> extra abstraction layer (e.g Dagster)
- Task scheduler is key
- Prefect changed it recentrly
- Popularity
- Dagster (!) - focus on data integration
- Prefect - the next airflow
- ( Metaflow (by Netflix) )
- Dagster less mature
- Github stars
- Prefect 10k stars (!) - possibly will grown
- Will share examples offline
- Luke
- put reseach in writing
- any front runner?
- Philippe:
- Prefect first, Argo 2nd
- PYTHON, Jaml, (visual editor?)
- Part 2) AIRFLOW research
- Kai
- CID ås output is small - good news
- Bacalhau could be bacalahu\
- Airflow
- XCom is cool for intermediate steps
- How do you get your end result out of your pipeline
- Philippe:
- Airflow should output a CID, input CID as Bacalhau normally does
- Philippe: pipeline state could sit on IPFS
- Kai: pipeline export format? Philippe: look for a common interface
- Philippe: dbt operator
- Figure out how Prefect manages task comms
2022-10-13
meeting goals
- Spark or not?
- Be responsible or not?
- Enrico: Elaborate on Pros and Cons
- opts
- Operator
- POC bac can talk to operator
- embedded