Maybe the transport protocol of our dreams was right there all along: the IPFS gateway.

StarGate is a specification to extend the IPFS gateway to support trustless, multipeer data transfer of fairly complex queries.

  1. It defines a new response format, identified by MIME type application/vnd.ipld.car+stargate , which is passed in the Accept header to indicate a Stargate request
  2. It adds additional query parameters that configure trustless verification and multi-peer retrievals.

Why HTTP?

@Marten Seemann already wrote an awesome answer —https://www.notion.so/Transferring-Content-Addressed-Data-over-HTTP-e0cb05500e8446519f58fdcc35b88b1b

Why Based On The Gateway?

I started designing an HTTP protocol based on certain assumptions:

  1. Having worked with selectors for three years, I think querying by path and “the whole dag” is relatively straightforward, but general case selectors are very complicated. Moreover, we use path and “whole dag” for 99% of cases.
  2. I want reasonable support for HTTP caching
  3. Verification and multipeer optimization are easier if you limit the format for your data. General case “IPLD dag” verification and multi-peer retrieval is complicated. Working only with UnixFS is more manageable.

As I started working through this, I realized “what fits in a URL” is an excellent constraint to achieve all of these, and then I realized people have been working on URL-ifying IPFS for years, in the form of gateways.

More importantly, gateways are the most widely deployed mechanism for transferring IPFS data (bigger than Bitswap?). Having a gateway implementation is often table-stakes for a new IPFS language implementation. If the Gateway API can be extended to work over libp2p connections (already possible cause of libp2p + HTTP spec) AND be an effective mechanism for trustless data transfer, that feels like an ideal path to wide deployment and usage.

StarGate CAR Format

The core of Stargate is a new specification for specialized CAR file, which is a valid general CARv1 file.

GET /ipfs/{cid}[/{path}][?{params}]

A StarGate car should specify {cid} as the root CID in CAR Header

Following the CAR header, the first block is a star gate message.

The stargate message format is as follows:


type StarGateMessage struct {
  Kind Kind (rename "knd")
  Path nullable Path (rename "pth")
  DAG nullable DAG (rename "dag")
} representation map

type Kind enum {
  # Path indicates a pathing sequence
  | Path ("p")
  # DAG indicates a DAG block
  | DAG ("d")
} representation string

type Path struct {
  # name of this path segment
	Segments [String] (rename "seg")
  # CIDs required, in order, to verify this segment of the path
  Blocks BlockMetadata (rename "blks")
} representation map

type DAG struct {
  Ordering Ordering (rename "ord")
  Blocks BlockMetadata (rename "blks")
} representation map

type Ordering enum {
  # Depthfirst indicates blocks will be transmitted depth first
  | DepthFirst ("d")
  # BreadthFirst indicates blocks will be breadth depth first
  | BreadthFirst ("b")
} representation string

# Metadata for each "link" in the DAG being communicated, each block gets one of
# these and missing blocks also get one
type BlockMetadatum struct {
  Link Link
  Status BlockStatus
} representation tuple

type BlockMetadata [BlockMetadatum]

type BlockStatus enum {
   # Present means the linked block was present on this machine, and is included
   # in this message
   | Present             ("p")
   # NotSent means the linked block was present on this machine, but not sent
   # - it needs to be fetched elsewhere
   | NotSent             ("n")
   # Missing means I did not have the linked block, so I skipped over this part
   # of the traversal
   | Missing             ("m")
   # Duplicate means the linked block was encountered, but we already have traversed it
   # so we're not traversing it again -- the block has likely already been transmitted
   | Duplicate           ("d")
} representation string

If there are URL path segments after the CID, the first message will be a Path StarGate message. Following the path message there will be a data block for every block in the BlockMetadata that has a status of BlockPresent

If the Segments part of Path message contained the last elements of the URL path, the following message will be a StarGate DAG message.

Otherwise, there will be additional Path segment sequences till the end of the URL Path

After the path, there will be 1 or more StarGate DAG messages

A StarGate DAG message simply gives a traversal order and BlockMetadata, followed by all present blocks, similar to the path.

The splitting of Paths and DAGs in to one or more messages is defined by the application protocol — i.e. /ipfs in the parlance of UnixFS

Important note: CIDs in the CID chain are those required to VERIFY the path, but do not contain cid or block specified at that path itself. So for /ipfs/someCid/someFile where someCid points to a UnixFS HAMT Directory, the path segment header for someFile would contain someCid and as well as the cids for deeper levels of the HAMT up to the leaf node that points to the cid for someFile. However, it would not contain the actual CID for somefile itself.