Rationale: This began with my review of the discussion of Fil+ governance about starting to require retrievalability for Fil+ data:

Retrievability of Open data stored through Fil+ · filecoin-project/notary-governance · Discussion #883

A common theme is that retrieval carries a large maintenance and cost burden, aren’t reliable, and have little reward.

We’ve worked on this a lot, but I think we can make things easier, potentially dramatically easier for SPs in various ways. Ultimately, the fixed costs of retrieval should reduce to bandwidth usage and disk access, which can be offset through incentives. Right now, the largest cost is neither of these, but rather the setup and maintenance effort required to serve high performance retrievals.

The ideas I think are most promising are highlighted in bold.

Pain points and potential solutions

HTTP

Traditionally we’ve had problems with reliability of data transfer over libp2p. We’ve settled on HTTP as a protocol we can count on to be reliable and fast. However, while we’ve implemented booster-http (and it will soon be serving retrievals that are compatible with Lassie), I think there is more work needed to get to get mass adoption.

Indexing Pieces

Even with the LID deployment, maintaining indexes of pieces is a large maintenance burden for SPs. Scanning CARs is to find blocks is a sizable processing operation, that is highly error prone. Holding indexes and maintaining them is also a cost and a source of errors.

Some approaches we could take to removing or limiting the indexing burden:

Storage And Caching

Currently, all data lives behind the lotus-miner barrier, in whole pieces, with additional bytes interleaved on disk due to the way Filecoin pieces are constructed. This means there is a long code and data path to retrieve any bytes need to serve a retrieval