(keeping the changes section at the top; just to how we are measuring for current setup)
We measure at the nginx load balancers. This is monitoring data that goes into Prometheus and can be visualized in Grafana. This gives us (mostly) apples-to-apples look at the full request execution for both rhea and old gateway implementations.
This shows us the metrics at the closest point in our service to the user. We are currently using this data for our primary project Rhea metrics.
However, because the measurement is happening as part of the infra itself, we run into issues using them, particularly in our test configuration. For instance, the way traffic mirroring is configured means we are unable to rely on response size data from here.
This is a postgres database that Saturn team maintains and is primary place where all logs of requests to L1’s go. There’s also a table with log lines written by Caboose. In addition, Saturn is stripping headers from Lassie requests and stuffing them in a table called “lassie_logs”. There’s a consistent ID now used across requests, and we’re building code to make it easy to pull together the full view of all logs for a request. We’re documenting the details of this data store here.
The arc network can be configured to send different sets of requests to different services. It’s currently doing a single “experiment” where a set of a requests (top requests to ipfs.io) are being sent to ipfs.io, rhea, and saturn L1’s directly (DNS binding). Currently these requests are all for car format; that needs to be fixed.
Database: The results from arc are stored in a table in the Saturn Prod DB called “race_requests”. Arc now has the same consistent tracing ID (”traceparent”) as the other tables in saturn_prod_db.
Prometheus: Arc data is also going into Prometheus/Grafana