Rationale: This began with my review of the discussion of Fil+ governance about starting to require retrievalability for Fil+ data:
Retrievability of Open data stored through Fil+ · filecoin-project/notary-governance · Discussion #883
A common theme is that retrieval carries a large maintenance and cost burden, aren’t reliable, and have little reward.
We’ve worked on this a lot, but I think we can make things easier, potentially dramatically easier for SPs in various ways. Ultimately, the fixed costs of retrieval should reduce to bandwidth usage and disk access, which can be offset through incentives. Right now, the largest cost is neither of these, but rather the setup and maintenance effort required to serve high performance retrievals.
The ideas I think are most promising are highlighted in bold.
Pain points and potential solutions
HTTP
Traditionally we’ve had problems with reliability of data transfer over libp2p. We’ve settled on HTTP as a protocol we can count on to be reliable and fast. However, while we’ve implemented booster-http (and it will soon be serving retrievals that are compatible with Lassie), I think there is more work needed to get to get mass adoption.
- We need HTTP retrievals monitoring in the Boost UI that is on par or better than GraphSync monitoring
- We need a simpler, ideally “on by default” setup process for booster-http. This could include:
- A built in Nginx reverse proxy setup file to put in front included in the boost repo
- Easy and clear path to get an SSL cert (see: LetsEncrypt, but perhaps automated)
- Possible alternative: provide a mechanism for running booster-http via libp2p
- Booster-http on by default when you make a new install of boost (not sure we can force it in as an upgrade, but IMHO we should get there) OR
- Docker/deployment setup to auto start a production deployment of Booster-http (also see Saturn idea under Incentives)
- Could also use the Saturn L1 Node nginx.conf as a base to build out a boost standard nginx.conf
- Difficult: Easy
Indexing Pieces
Even with the LID deployment, maintaining indexes of pieces is a large maintenance burden for SPs. Scanning CARs is to find blocks is a sizable processing operation, that is highly error prone. Holding indexes and maintaining them is also a cost and a source of errors.
Some approaches we could take to removing or limiting the indexing burden:
- Index CAR files before publish?
- A simple trick, would enable SPs to reject badly packed CAR files
- Probably requires a deal protocol change so maybe not so great
- Difficulty: Medium
- Make indexes sharable or transferrable
- often a data aggregator already has indexing information. Other SPs have the information for highly replicated piece
- A client could request an index from a different party then use that to plan multiple requests to a multitude of SPs
- This is even more powerful if indexes are verifiable in some fashion
- DAG House has expressed interest in an indexing + CARs protocol
- Difficulty: Medium
- Reduce the number of CIDs we have to index
- Our current algorithms for chunking files into IPLD DAGs impose a fixed ~1MB block size
- This is a long standing problem across the network, and the Iroh teams use of Blake3 hashes presents fairly well developed solutions to this issue.
- It would allow a single CID to hash a multi-gigabyte video, resulting in 1000x smaller indexes.
- It also gets us much closer to flat file storage, but still allow us to make arbitrary byte range requests that are verifiable (would still need to hold parts of the merkle tree)
- We may need to store some intermediate hashing data, but the process for collecting this data is NOT error prone in the way CAR scanning is.
- Also a big win in scalability for IPNI
- Difficult: Large
- Explore sub piece retrieval more in depth
- The most reliable form of retrieval we can build is piece retrieval over HTTP, ideally with verifiable range requests through sub piece inclusion proofs
- This is the only form of retrieval that is truly resilient regardless of how the client packs the data, because we have already verified PieceCID
- Rather than holding CAR indexes, maybe we should simply hold enough of the piece tree to serve arbitrary range requests (we can store only enough layers of the piece tree to make calculating the lowest layers efficient at runtime — for example, the tree down to 1MB could be stored with remaining hash trees indexed on the fly)
- Might be more powerful with newer faster proofs (particularly vector commitments, if that’s a thing we’re going do — moves us much farther out)
- Difficulty: Medium/Large
Storage And Caching
Currently, all data lives behind the lotus-miner barrier, in whole pieces, with additional bytes interleaved on disk due to the way Filecoin pieces are constructed. This means there is a long code and data path to retrieve any bytes need to serve a retrieval
- Use an NGinx cache
- This folds into the default nginx setup work mentioned above
- For /ipfs/ payload retrievals a small to medium size cache can probably serve a high percentage of requests, based on existing experience on Saturn and Lassie
- Need to validate with concrete data
- This would remove the entire code and data fetching path through boost and lotus-miner
- Difficult: Easy
- Make data more accessible during the course of a lotus-miner refactor
- Not clear where the refactor is at
- Seems like a secondary concern with existing Lotus miner work
- Difficulty: Hard