Related to:
Content-addressed, hash-linked jobs/functions
Currently (Nov. 2022), the /submit
API endpoint takes a data payload whose Job
model is significantly more complex than what is required. Specifically, there are a number of fields that don’t need to be in the spec. This document sits in the Pipeline folder because we need to simplify the way /submit
works. A more comprehensive text about how job spec will evolve is available at Content-addressed, hash-linked jobs/functions.
The submitRequest object is bloated:
Here’s the Job model:
// Job contains data about a job request in the bacalhau network.
type Job struct {
// (1) WHAT
// The specification of this job.
Spec Spec
// The unique global ID of this job in the bacalhau network.
ID string
APIVersion string
///// (2) HOW
// The ID of the requester node that owns this job.
RequesterNodeID string
// The public key of the Requester node that created this job
RequesterPublicKey PublicKey
// how will this job be executed by nodes on the network
ExecutionPlan JobExecutionPlan
// The deal the client has made, such as which job bids they have accepted.
Deal Deal
//// (3) STATUS
// The current state of the job
State JobState
// All events associated with the job
Events []JobEvent
// All local events associated with the job
LocalEvents []JobLocalEvent
//// METADATA
// Time the job was submitted to the bacalhau network.
CreatedAt time.Time
// The ID of the client that created this job.
ClientID string
}
Here’s the current Spec (nested field in Job):
// Spec is a complete specification of a job that can be run on some execution provider
type Spec struct {
// e.g. docker or language
Engine Engine
Verifier Verifier
// there can be multiple publishers for the job
Publisher Publisher
// executor specific data
Docker JobSpecDocker
Language JobSpecLanguage
Wasm JobSpecWasm
// the compute (cpu, ram) resources this job requires
Resources ResourceUsageConfig
// How long a job can run in seconds before it is killed.
// This includes the time required to run, verify and publish results
Timeout float64
// the data volumes we will read in the job for example "read this ipfs cid"
Inputs []StorageSpec
// Input volumes that will not be sharded
// for example to upload code into a base image
// every shard will get the full range of context volumes
Contexts []StorageSpec
// the data volumes we will write in the job
// for example "write the results to ipfs"
Outputs []StorageSpec
// Annotations on the job - could be user or machine assigned
Annotations []string
// the sharding config for this job
// describes how the job might be split up into parallel shards
Sharding JobShardingConfig
// Do not track specified by the client
DoNotTrack bool
}
Here’s an example request with a (roughly) minimal set of working fields.
{
"data": {
"ClientID": "...",
"Job": {
"APIVersion": "V1beta1",
"Spec": {
"Engine": "Docker",
"Verifier": "Noop",
"Publisher": "Estuary",
"Docker": {
"Image": "ubuntu",
"Entrypoint": [
"date"
]
},
"Timeout": 1800,
"outputs": [
{
"StorageSource": "IPFS",
"Name": "outputs",
"path": "/outputs"
}
],
"Sharding": {
"BatchSize": 1,
"GlobPatternBasePath": "/inputs"
}
},
"Deal": {
"Concurrency": 1
}
}
},
"signature": "...",
"client_public_key": "..."
}
We’re in a state where:
Deal.Concurrency
, Timeout, etc.)@Simon Worthington’s thoughts - Fields in the job can be split into 3 sections: