Goals
- Storage providers serve retrievals to the best extent they can for the unrestricted data
- Clients know about SP's track record of serving retrievals
Hypotheses
Some of the systems that we believe can encourage Storage providers to serve retrievals:
Soft types:
- Earning client business: clients can see who “good” SPs are and choose to do business with them (either actively by hand picking SPs or passively through an aggregator that is using reputation). SPs care about client business and behave well
- Deal engine for Slingshot can exclude bad behavior SPs
- Punishment: SPs have something at stake and if they fail to provide reasonable retrieval quality, they get punished (insurance model)
- Earning FIL: serving retrievals can earn FIL and SPs are economically incentivized to dedicate capacity to retrievals as it is profitable (incl. opportunity cost)
Generating retrieval metrics
Validation bot for Slingshot
- Team: Xin An (on leave starting December)
- Design doc
- Validation bot design
- Requirements for the original retrieval bot (by Marina)
- Notes
- How different from the old deal bot?
- Dealbot is too large, hard to modify or improve
- Purpose is different: validation bot is for all purposes of validation (e.g., if the services is on, index provided, behind VPN)
- What is the purpose of the Validation bot?
- Slingshot now, potentially Fil+ in the future, in the longer term can extend to any party
- Where is the data stored?
- Raw data stored in web3.storage
- All metrics are observable. Participants can publish their own data
- Consumers can decide which publishers to trust
- Observer - download web3.storage raw data, make available to query locally
- Why web3.storage vs. Pando?
- Pando is centralized
- Interface is not convenient
- Web3 gives more guarantees than Pando
- Open to Pando in the future
- What retrieval metrics are gathered?
- Only for graphsync data
- Bitswap once implemented
- When HTTP retrievals are available, will be able to do that as well
- Will also check if publish data as index provider
- Retrieval metrics
- Download speed, average, and raw download bytes for each second
- Latency - TTFB (time to first byte)
- Unsealing required (graphsync only)
- Index ingestion
- Validation for Slingshot
- Task: known CIDs, validation bot will receive request and check providers
- Can set restrictions to not request more than X attempt per hour
- Distribute between workers
- Each worker attempts retrieval
- Workers deployed in AWS
- Status
- beta, dummy test
- Plan to fully launch: aim to launch in Nov
- Announce with Slingshot participants, will not enforce any rules
- Slingshot v3
- PL sends a deal. Data preparers are ecosystem, but PL proposes a deal
- Validation bot will also check data preparers: download individual CID matches the original data
- Deals are made via deal engine (Evergreen)
- Would it retry retrievals if the first attempt failed? It will not, if it managed to download part of the file, it will be recorded as a partial success. Otherwise, a retrieval test on a different file will be attempted in another hour
- Do you track per SP retrieval success rates including retrieval acceptance rate and overall success rate (content delivered): All the raw test results are published on web3.storage. The observer role of the software will observe those test results and dump into a SQL database. Then it's up to the user for how to interpret those result (they can write their own SQL aggregation query) - anyone can set up their own SQL database and play with the data
- Which of the retrieval metrics are you able to track? All of them
- TTFB (current metric in dealbot is not accurate)
- Total transfer speed
- Total time taken (tracking but not publishing)
- How many retrieval attempts per day is the bot able to support now and in the future? We are thinking of 1 per hour for each storage provider. We're designing the solution to be scalable to the number of retrieval test demands
- Can the bot request partial retrievals? Yes, it will support retrieving sub file/folder inside each deal
- Can the bot request both free and paid retrievals? Free for now only
- Does the bot also attempt to store data and track storage metrics?
Dealbot
- Currently still used for Slingshot
- Workflow
Pando