Why Not Both? Packing Content for IPLD vs Piece (IntactPack)

TL;DR

Instead of packing and storing file / blob data like this:

Untitled

Pack and store it like this:

Untitled

Background

There currently exists two views, and two separate uses of Filecoin storage of content: IPLD block focused and Filecoin as an opaque block device. These exist as lenses that are used to view the nature of data stored on Filecoin but also impact the technology choices as we move up the stack.

Vision 1: IPLD block focused

Data preparation: packed into CARs, blocks are the typical ~1MB maximum for compatibility and incremental verifiability, file data is encoded as UnixFS (or similar) @
Retrievals: Bitswap (per-block) or HTTP Trustless (per-block or a broad selector description for an IPLD DAG)

Vision 2: Filecoin is an opaque block-device

Data preparation: however you like, but currently typical to still pack into standard CARs, but it is very important for clients to store metadata about the layouts of their pieces so they can fetch blocks or ranges that they care about straight out of pieces.
Retrievals: HTTP Piece retrieval with range requests to slice and dice

Competing visions of Filecoin as an IPLD-block storage layer vs a opaque byte storage is older than the network. Opaque bytes make everything on the Storage Provider side significantly simpler and cheaper. Yet deep IPLD integration provides significant value to the network (making this case is beyond the scope of this document, but this is a strongly held conviction by some).

Planning for rearchitecting miner and markets software takes the cost and complexity view that the IPLD layer is a value-add and should be considered an optional extra, to be added if/when it’s needed/demanded. Storing and proving pieces is the critical activity, retrieving pieces is significantly simpler and more efficient (in many definitions of “efficient”) and indexing at the IPLD-block level is complexity that been persistently difficult to solve and is one of the largest costs and risks for scalability.

Finding “Why Not Both?” Approaches

The ideal for storage providers is a choose-your-complexity model, where the simplest form of the software stack has the lowest complexity and cost, but complexity can be added to increase the value-add to a storage provider’s customers where it is needed.

Untitled

CommPact is one attempt at describing a world where piece data can also be viewed through an IPLD lens, by re-purposing the piece commitment proving tree over the raw piece data. This path is one form of in-between (”why not both?”) approach. In this form, the data is piece-first, in that it is prepared with the perspective of it being stored in a piece, and the IPLD lens can be applied after-the-fact as required (i.e. the IPLD lens is optional).