As a reminder: Thunderdome is a way to compare the performance and behaviour of different versions of IPFS gateways using real traffic in a controlled environment. It’s being developed by the Production Engineering team, currently Ian Davis (@iand) and Tommy Hall (@thattommyhall).

In our last update Tommy demonstrated how we could define an experiment, stand it up and have traffic streamed to it within minutes. Since then we have been refining that process and improving the metrics we report and the dashboards we produce.

We’re now in the position where we can run several experiments concurrently. We have setup and run four different experiments for Kubo, each of which has been streamed live Gateway traffic for over 24 hours. This update lists some key findings and dives a bit deeper into the experiments we are running.

Key Findings

We have only been running these experiments for 1-2 days so it is still early to be drawing conclusions. But each of the three significant experiments we have run has uncovered some interesting data or behaviour so we’re really excited that Thunderdome is proving to be useful tool to make available for everyone to use.

Each experiment is discussed in detail below but if you just want to dive into the metrics, these are the main charts we use on Grafana:

Experiment One: tweedles

tweedles is our null experiment. We run two identical instances of Kubo (dee and dum) and send the same traffic to each of them. We do this to give us more confidence that dealgood, the component that sends the stream of requests and measures response times, is behaving well. Every instance in an experiment gets sent the same requests by dealgood so they should respond similarly.

In the tweedles experiment the TTFB metric shows almost no difference at any of the quantiles which indicates that we're running a fair test.

target metrics (tweedles experiment)

target metrics (tweedles experiment)

Resource utilization is similar too, with both averaging about 27k goroutines and using 5-6GiB heap.

In terms of request handling, both instances are receiving about 15 per second and there are almost zero drops. Drops indicate that the instance can't keep up with the number of requests that dealgood is sending: the request is dropped if there are too many in-flight at any one time. The number of concurrent requests and the maximum rate at which they are sent is configured on a per experiment basis. In the future we plan to adapt these dynamically as part of warming up the instances so we can make sure the instances are fully utilized.