Repo: https://github.com/guillaumemichel/kubo-routing-measurements → measurements folder

DISCLAIMER

This experiment isn’t representative and shouldn’t be taken as reference. Only 30% of the requested CIDs are root CIDs, the rest (70%) are the child CIDs and thus it introduces a large bias in the measurements. Plots that filter out non root cid requests are available in the python notebook on the Github repository https://github.com/guillaumemichel/kubo-routing-measurements.

The problem with this measurement is that when requesting a CID, if it is found, kubo will follow up and request child CIDs. The child CIDs are (almost) always stored on the same host that provided the root CID, hence Bitswap will have an easy success. This is were the bias comes from, and the main reason why we perform a new measurement. We expect the actual success rate to be lower that the one showed in this document.

Methodology

Codebase

We merged [github.com/ipfs/go-bitswap](<http://github.com/ipfs/go-bitswap>) repo into [github.com/ipfs/kubo](<http://github.com/ipfs/kubo>) to have all the code at the same place. This fork of kubo lives at https://github.com/guillaumemichel/kubo-routing-measurements. kubo had to be slighly modified to set a custom DefaultProviderSearchDelay in core/node/bitswap.go. We modified go-bitswap (now kubo/bitswap) to provide more logs at level DEBUG.

Experiment

We set a DefaultProviderSearchDelay of 5 seconds. This means that Bitswap has 5 seconds to discover the CIDs it is looking for before the node asks the DHT. We picked a high delay to show the time distribution of Bitswap requests between 0 and 5 seconds.

We have a very large list of CIDs gathered on 2022-08-10, by listening at WANT-HAVE messages from Bitswap. The list of CIDs has been randomized to avoid requesting the CIDs that are stored on the same host sequentially. We run a slightly modified kubo node and sequentially request these CIDs, with a total timeout of 10 seconds. (5 for Bitswap, 5 for the DHT)

If Bitswap is able to discover the content, we will differentiate a remote peer sending a HAVE message (replying to a WANT-HAVE) and peers sending the BLOCK (replying to a WANT-BLOCK), without sending HAVE before. We time how much time is needed for a peer to provide either a HAVE or a BLOCK message, as well as the peerID of the peer providing it.

If Bitswap cannot find where the content is stored within 5 seconds, we request the content to the DHT. If we find a provider record in the DHT, we consider that Bitswap failed to discover the content. If there is no provider record matching the CID.

Results

Run 1

The experiment was run on my local machine, connected over WiFi to my home network, all traffic going through a commercial VPN (ProtonVPN), on 2022-09-01.

Bitswap success rate: 97.499% (25023 blocks directly provided by bitswap, 642 by the DHT, 25647 CIDs fetched in total, requested CIDs: 31502)

464 peers provided 25023 blocks directly through bitswap

p50:  7 peers provide 50% of the blocks
p90:  79 peers provide 90% of the blocks
p99:  284 peers provide 99% of the blocks
p100: 464 peers provide 100% of the blocks

81 peers found by the DHT provided 642 blocks

1 peer was looked up 137 times by the DHT