This document aims to describe the entire workflow of SPARK retrieval checks, based on the parts discussed in SPARK Content retrieval attestation, CID Sampling for SPARK and other related documents. See also Station Module: SP Retrieval Checker (Spark)
Table of Contents
Context
SPARK will operate within the MERidian framework. We want to be decentralised, with no single party in charge of any component.
Overview
At a high level, the protocol is split into the following steps:
- Tasking: Cluster the online SPARK checkers into several committees.
- CID Sampling: Choose a random (CID, SP) pair for each committee.
- Retrieval with Attestation: Retrieve CID content from SP and obtain the attestation token.
- Proof of Data Possession: Create proof that the entire CID content was retrieved.
- Measurement: Report job outcome to MERidian
- Evaluation & Verification: Evaluate the impact of checkers and detect fraud.
The last phase - Reward - is handled by MERidian smart contracts.
1) Tasking
In this step, we want to match running SPARK checker instances with retrieval jobs in such a way that makes fraud more difficult. For each retrieval job (a pair (CID, SP)
), we want to form a small committee of peers to perform the same retrieval redundantly to arrive at an honest majority result.
In the rest of the section, a task represents an abstract but fixed retrieval check that checker nodes will perform. The tasking algorithm does not need to concern with what specific (CID, SP)
is derived from each task, as long as we have a deterministic algorithm for that. The conversion from “task” to (CID, SP)
job is explained in the section 2) CID Sampling (per each task).
Requirements
-
Rate Limiting
We want to rate-limit how many jobs are performed by each checker.
- This is especially important for the LabWeek23 release, when we won’t have the retrieval fraud detection implemented yet. Without fraud detection, a fraudulent node can short-circuit job execution by skipping the retrieval and reporting fake results instead. This allows nodes to cheaply report many completed jobs, which is later translated to disproportionally large impact & reward.
- Even with fraud detection in place, if we don’t rate-limit the checks, then nodes on fast, unmetered internet connections can tweak the SPARK checker code to perform more checks than we designed for. This can put too much pressure on SPs. Plus, the operator will get a larger portion of rewards, which is unfair to honest nodes.
<aside>
🤔 Miroslav
After describing this requirement in detail, I am no longer sure if it’s really required in the longer term. What can go wrong if we allow nodes to make many honest retrieval checks?
- Node operator creates more impact and thus receives more rewards. That’s the core of the MERidian scheme, right?
- Node operators download much more data from SPs. If we sample CIDs uniformly, the load should be spread across SPs based on how much data they store. Any single SPs can be overloaded only if they provide a large fraction of FIL capacity. If that’s the case, they must be prepared to handle more retrievals than other SPs providing less capacity.
</aside>
-
Make Sybil attacks difficult/expensive
We want to limit how many checker instances a single party operates by disincentivizing operating too many instances.
- The primary goal is to make Sybil attacks more difficult/expensive.
- As a side effect, this also makes it more challenging to avoid rate limiting by running many SPARK instances on the same machine.