Hi everyone,

I’m Dennis from the network measurement and protocol benchmarking team ProbeLab that spun out of Protocol Labs. So far, the team has focused on developing metrics for IPFS (see https://probelab.io), but recently started looking into other libp2p-based networks. We extended our DHT crawler that powers IPFS metrics for over a year to also support Ethereum’s DiscV5 DHT. In this post I want to share some findings and gather feedback. You can find the source code here:

https://github.com/dennis-tra/nebula

Related Work

There are already other great crawlers out there. A non-comprehensive list:

What’s new?

The methodology to discover peers is different and (we believe) more accurate in Nebula compared to all of the above crawlers, where it is the same. They periodically generate a random key, do a DHT lookup, and consequently come across random peers in the network.

Nebula on the other hand employs a structured approach to peer discovery by enumerating the DHT. It starts by asking the DiscV5 bootstrap peers for all of the peers in their routing tables. To do that Nebula issues multiple FindNode queries with decreasing distance to the bootstrappers’ peer ID. This effectively drains all peers a bootstrapper has in its routing table. Nebula then recursively continues with a random peer that was returned from the bootstrapper and employs the same technique to get the routing table entries. Some peers may return peers that Nebula has already contacted, so these won’t be contacted again. The crawl stops when Nebula has tried to contact all peers it has seen.

Besides FindNode queries, Nebula also tries to establish a libp2p connection to a remote peer it has found. This allows Nebula to gather information like agent version and supported protocols. Further, Nebula parses the ENR and extracts fork and attnets information and more.

To see Nebula in action, I gave demo at libp2p day in Istanbul last week. You can find the recording here: https://www.youtube.com/watch?v=QDgvCBDqNMc

Findings

Disclaimer

Let me preface this with a disclaimer. I’m not as familiar with the Ethereum network as with e.g., the IPFS or Filecoin networks and I only implemented Ethereum support a few weeks ago. So if the numbers don’t make sense it could very well be due to a misunderstanding of the networking stack on my end or a simple programming error. Most importantly the numbers do not match the ones that MigaLabs publishes on http://monitoreth.io. However, after talking to them this could very well stem from the different measurement methodologies. Nevertheless I think these are promising results and could very well complement additional data from a different perspective.

General