We want to build a Public Spark Dashboard showing aggregated data about retrieval measurements performed by Spark checker nodes.

We agreed to aggregate the data with one-day granularity.

What data we want to show?

  1. Step 1: retrieval success rate (one number - percentage)
    1. One number per day
  2. Step 2: RSR per Storage Provider (minerId)
    1. N numbers per day, where N is ~600. We expect this number to grow with no upper bound.
  3. Later: sky is the limit. See Retrieval Bot Dashboard.

What database to use

Prior art:

Known issues:

Proposal

I am proposing the following architecture. It’s reusing the existing building blocks we are already familiar with.

  1. Store the public data in a Postgres database.
    1. This allows us to store rich objects with many properties and define indexes to get performant queries.
  2. Modify the existing spark-evaluate service to update the public data every round, similarly to how it publishes aggregated per-round metrics to InfluxDB.