Links
Goal
- Create a relational data store for retrieval metrics gathered from various retrieval clients that is accessible to any user to view and query
Requirements
Permissions
- Writing access: new entities can be added to the list of entities permitted to write into the database. Original set of entities includes:
- Autoretrieve - Bedrock
- Autoretrieve - Estuary
- Validation bot instances
- (future) Entities running Lassie
- Reading access:
- Any user can have a read-only access to the data
- Any user can query the data
- Query computation is performed on the database / Staboard side
Data user requirements
- As a data user, I can query the data via a SQL interface (e.g., pgAdmin)
Option 1: store detailed data, allow users perform arbitrary queries
- As a data user, I am able to run arbitrary computations on the data, for example:
- Calculate aggregate metrics per storage provider for a specified time period:
- Average retrieval success rate (from query to retrieval success)
- Average TTFB (for successful retrievals)
- Average download speed (from TTFB to completion, for successful retrievals)
- Total data transferred per week
- Total number of queries received per week
- Total number of non-error query responses received per week
- Total number of retrievals requested per week
- Total number of retrievals completed successfully per week
- Calculate network-wide retrieval success rate for a specified period of time
Option 2: Users can only request precomputed aggregate metrics
- As a data user, I am able to request weekly aggregated metrics for each storage provider
- Weekly average retrieval success rate
- Number of successful retrievals
- Total data transferred
- Average TTFB
- Average download speed (MB/s)
Nice to haves
- As a data user, I want to be able to:
- See a list of data providers (retrieval clients that generate retrieval performance metrics such as Autoretrieve, Estuary, etc.)
- Select providers I want to include in my queries
Data contributor requirements
- As a data contributor, I want to be able to write event-based data (schema here) to the reputation data warehouse