The first step in the SPARK retrieval check workflow is the selection of (cid, address) pair that the given checker should test.
We have the following requirements:
(cid, address) is chosenchecker will be assigned to test the given (cid, address) job.
cid from the same address in a single measurement epoch. The sampling algorithm must be able to account for this.(cid, address, checker) triple is sampled.Additional thoughts:
❌ A hard-coded list of (CID, address) to pick from. This list must be private to SPARK Orchestrator (SPs must not be able to access it.)
A random walk of IPNI advertisements, using DRAND as a source of randomness.
What would our IPNI query look like?
Can the IPNI team build & ship this API in time for us?
There is no verification that SPs are submitting all CIDs to IPNI and won’t be done by LabWeek.
→ Propose the new API - open a new GH issue in https://github.com/ipni/storetheindex/issues
A random walk of Filecoin storage deals
Algo:
StateMarketDeals method is over 3GB compressed, over 23GB decompressed. SPARK nodes cannot work with a dataset this large.❌ For each SPARK deal, the party paying for the retrieval provides a public list of CIDs & addresses to check. (This will be presumably based on Filecoin storage deals made by the paying party.)
We don’t know that the advertisement is honest