SPARK Content retrieval attestation

In SPARK, we want to reward Station instances for periodically making retrieval requests to check the availability of content stored by SP. The reward function is based on the number of checks performed. There are various attack vectors we need to prevent to avoid abuse. One of them is a cheating client that does not make any retrieval requests but simply reports fake retrieval metrics. In this document, we design a solution based on signature chains that allows 3rd parties (e.g. MERidian measurement service) to verify that the SPARK client attempted a retrieval from the given SP.

Workflow of a single retrieval check performed by SPARK

The current version of SPARK (Storage Provider Retrieval Checker) follows the following process for each retrieval check it performs:

The SPARK orchestrator defines a new retrieval checking job. The job record has several fields, among others:
1. unique job_id
2. cid of the content to retrieve (bafy...)
3. address to retrieve the content from (/ip4/211.254.148.138/tcp/8180/p2p/12D3KooWHeLUGxJsnsCsHnNW7CpvzumuDVq6vt9NWinUAXtFyD6H)
In the future, we want to replace the orchestrator with a smart-contract-driven solution. The important part is that the network assigns (cid, address) pair to the checker in a random & uniformly distributed way, the checker does not have any control over that selection and SPs cannot predict what CIDs will be checked (e.g. by reading data of the scheduling smart contract).

Different design options are discussed here: CID Sampling for SPARK
The SPARK module running inside Filecoin Station (SPARK checker) retrieves the given CID from the given address using the HTTP protocol, using the Lassie HTTP interface under the hood.
The SPARK checker reports retrieval results to the SPARK orchestrator (MERidian measurement service).

sequenceDiagram
  participant SparkNode as SPARK Checker
  participant SP as Storage Provider
  box Cyan Private & centralised services operated by SPARK
    participant Orchestrator as SPARK Orchestrator
    participant SparkDB as SPARK DB
  end

  loop every 10 seconds
    SparkNode ->> Orchestrator: give me a new job
    Orchestrator ->> SparkDB: create a new job from a random template
    SparkDB -->> SparkDB: choose a random (cid, address) template
    SparkDB -->> SparkDB: create a new job record with a unique job_id
    SparkDB ->> Orchestrator: (job_id, cid, address)
    Orchestrator ->> SparkNode: (job_id, cid, address)
    SparkNode ->> SP: retrieve CID
    SP ->> SparkNode: (CAR stream, retrieval attestation)
    SparkNode ->> Orchestrator: (job_id, retrieval metrics, attestation)
    Orchestrator ->> SparkDB: update the job record
  end

Fraud detection

MERidian measurement & evaluation service periodically processes retrieval reports to calculate the impact of each Station and assign rewards. As part of the evaluation step, we want to detect fraudulent behaviour.

See Threat Model for SPARK Frauds

sequenceDiagram
  participant FraudDetection as SPARK/MERidian Fraud Detection
  participant SparkDB as SPARK DB

  loop every MERidian Evaluation epoch
    FraudDetection ->> SparkDB: get job details
    SparkDB ->> FraudDetection: (job_id, cid, address, metrics, attestation)
    FraudDetection -->> FraudDetection: validate retrieval attestation
    FraudDetection ->> SparkDB: flag fraudulent jobs
  end

Attestation verification

The SPARK Fraud Detection service has the following data fields available for each job (retrieval check):

job id
CID
address
protocol (hard-coded to HTTP)
the public key of the SPARK Checker instance initiating the retrieval