What we would need, in order of priority:

  1. Metric collection with dashboard of graphs
    1. Prometheus + Grafana?
    2. Example metrics:
      1. Number of requests received
      2. Number of responses received
      3. Number of libp2p messages sent/received
  2. Fix Helm deployment
    1. We need to add a dependency so the nodes deploy after the Rendezvous Server
    2. Automated deployments after merges to production/arbitrum-goerli and production/hyperspace branches
  3. Error Reporting
    1. Reports to a Discord channel
    2. Which service do we use?
  4. Durable, Searchable logs
    1. Cloudwatch Logs? ElasticSearch? Papertrail?
  5. End-to-end liveness test for Medusa (running every 5 minutes)
    1. We can run a script in Github Actions that checks the most recent request and triggers an error alert if a response has not been received for 5 minutes.