Thunderdome Experiment Summary: provdelayrouting

This is a summary of an experiment that was run in Thunderdome. Thunderdome is our tool for comparing performance of different versions, configurations or deployments of IPFS gateways. More information here.

This experiment was designed to compare performance of various values for the bitswap provider search delay setting to help identify an appropriate setting. The original discussion that led to this experiment is on GitHub.

Key Findings

See Results section below for more detail

P99 time to first byte is significantly worse in v0.18 using dht routing when the provider delay is reduced. Comparing v0.18 and v0.17 at 500ms delay v0.18 has a P99 of 22.9s, compared to 16.9s for v0.17 (+35%) and at 0ms delay v0.18 has a P99 of 21.5s compared to 9.2s for v0.17 (+133%). The median values show a similar decrease in performance.
P99 time to first byte is generally better in v0.18 using auto routing when compared to dht at all provider delay settings. Comparing dht vs auto routing in v0.18 the P99 time to first byte is 57% better with 1000ms delay, 9.9% better at 500ms and 5% worse at 0 delay. However the median time is generally worse for auto routing (24% worse at 1000ms, 7% better at 500ms and 38% worse at 0 delay). This suggests that there is a constant startup overhead with auto routing but responses are generally faster on average.
At all provider delay settings v0.18 maintained smaller peersets with auto routing generally being 27-45% higher than dht routing. The highwater setting seemed not to influence this in v0.18 but the peerset was noticably greater in v0.17 with a high water of 900.
At all provider delay settings v0.18 with dht had smaller wantlists than v0.17, between 11 and 37% smaller.
At all provider delay settings v0.18 with auto routing had larger wantlists than dht routing, ranging from 26 to 58% larger.
At all provider delay settings v0.18 received around 20-30% fewer bytes over bitswap when compared to v0.17 but auto routing was generally comparable to dht routing.
CPU Utilization is generally around 8% better on v0.18 when compared to v0.17 when using a delay of 1000ms and roughly the same when using lower delays. Utilization is slightly better (5%) when using auto routing at a delay of 1000ms but somewhat worse (9%) at a delay of 500ms. All configurations had a similar but higher CPU utilization when the provider delay was 0.
v0.18 uses significantly more heap than v0.17 at all provider delay settings, ranging from 350% to 865% higher than v0.17. The highest, kubo18rc2-dht-900, averaged a 14.1GB heap.
All instances of v0.18 had fewer timeouts than v0.17, up to 21% fewer. However v0.18 also produced 34% more 404 responses. This suggests that v0.18 responds sooner when it cannot find blocks and gives the client an actionable response. Clients are less likely to retry the request if negative response is received.
All versions under test had comparable response success rates, 96-97% of all requests resulted in actionable responses (500 errors and timeouts are not actionable by the client)

Setup

Experiments used the Thunderdome Standard Setup using the io_medium server type and kubo-default configuration profile.

Three variants of the experiment were run:

provdelayrouting1000
- Internal.Bitswap.ProviderSearchDelay=1000ms
provdelayrouting500
- Internal.Bitswap.ProviderSearchDelay=500ms