Idea: every job has an identifier generated by hashing all code and input data that are required to run the job. Two jobs doing the same work over the same data have the same hash – even if the two jobs are running on different machines who have never communicated.

If execution of jobs is also deterministic, then each job hash also uniquely identifies a set of outputs as well – because running the same code over the same data deterministically should result in the same output. This means that it’s possible to see where data output by one job is used as input to another, and how all of the jobs link together in a call tree.

Tl;dr:

Job Spec by expede · Pull Request #8 · ipvm-wg/spec

Why

Having standard hashable jobs is a precursor to some valuable qualities (discussed during CoD Summit²):

  1. Link job executions to their outputs.
    1. Outputs from previous executions of the same job can be re-used. If there’s a mechanism for previous job outputs to be discovered, those previous results can be used instead of running the same work again.
    2. Reputation. Makes transparent all the results that are being generated which can be used as the basis for a reputation system.
    3. Economic incentivisation of useful data. Using the call tree of jobs and their outputs, it would be possible to see how many jobs ultimately refer to a peice of data (either directly or via the output of another job).
  2. Job is reproducible even across networks. If someone wants to run the same job later to reproduce the result, they can be confident that the result they get will be the same as the first time it was run, even if the original network has shut down.
    1. To achieve this we need a standard way of executing jobs as well – i.e. given a job, the way in which the job is run needs to not affect the output. Broadly, determinism. And this may not really be necessary if we aren’t bothered about unsubstantive differences in the output.
    2. ❓Is it for users? Do we expect users to use this format to submit jobs?

Doing this in a standard cross-network way means that all these qualities work across networks as well. A network has good incentive to buy into this standard because it means they get access to all of the memoization and reputation-building (maybe) happening from other networks.

Principles

Features

Not-features