Author: @Walid Baruni

Status: Discussion

Last Updated Date: 2022-12-30

Discussion of Concerns

How Bacalhau Could Be Useful for Log Vending

First, Bacalhau jobs are currently and should always be stateless, transient and short running with a defined timeout period, inputs and outputs, similar to Lambda functions. Maybe better to name these tasks or functions instead of jobs

To support long runnings jobs, other type of services will own triggering the execution of those tasks, keeping track of state/progress, delegating required permissions, and retries if necessary.

These services will be market makers and connectors for different inputs sources. For example:

  1. Cron Service: A user can register a cron job with this service, which will own finding and executing a periodic task in one of the compute nodes in the network, along with retries if necessary. In the log processing use case, a periodic job can be executed to query S3 for new logs files for further processing
  2. Streaming Service: A user can register a job to consume from Kafka/Kinesis. The service will trigger multiple tasks in the network to consume from each of Kafka partitions or Kinesis shards, keep track of the checkpoint offsets, and make sure that there is always an active and healthy consumer running, and retry if the consumer times-out or becomes unhealthy. Another implementation would be for the Streaming Service to actually read and buffer from Kafka/Kinesis, store the data in IPFS/S3, and then trigger a Bacalhau task to process the uploaded file. Each approach has their own tradeoffs, and both can co-exist.
  3. Queueing Service: Similar to the previous one, but consumes from queueing services like SQS and SNS. Implementation wise, the Queuing Service will read few messages and trigger a Bacalhau task with the message values as input.

These are **difficult service to implement and operate, but the anyone can be an orchestration service provider and participate in the network with a new connector and source type. Network access will be required, but I think we are moving towards that already