In SPARK protocol and SPARK tasking v2, we are discussing a fully decentralised design for SPARK. However, such design cannot be implemented by LabWeek’23 (mid-Nov), where we want to present concrete SPARK improvements and future plans (see Pre-LabWeek S/M/S Outcomes + Roadmap).
This page outlines a set of incremental improvements we can deliver in the next ~10 weeks.
At the moment, we have a centralised SPARK orchestrator service hosted at Fly.io with a hard-coded list of 200 (CID, SP, proto)
job templates using bitswap or graphsync protocols. SPARK’s Filecoin Station module (checker node) is running on 40+ Station instances. It periodically asks the orchestrator for a random job, performs the retrieval and reports the stats to the orchestrator.
There is no verification or fraud detection. Checker nodes can easily cheat, and we won’t know.
The sections below outline the next few features to implement by LabWeek. These features will give use minimal fraud detection we can incrementally improve later.
I am versioning the milestones as 2.X
since this is the second iteration of SPARK roadmap. We already released SPARK 1.0 earlier this year.
While Station Desktop provides a super easy way to update to a newer version, it still requires manual intervention from the user - they have to restart the app.
As a result, less than 50% of users are running the latest Station version, and about ~25% of users are running an older SPARK module version. This makes it difficult for us to quickly iterate on the SPARK protocol because we have to support older server versions for a long time.
I am proposing two small changes:
Introduce a new Orchestrator API error response with HTTP Status Code 400
and the response body equal OUTDATED CLIENT
In the SPARK module, when it receives this new error for the first time, it will log an error activity to the Station UI and stop doing any work.
Additionally, Station Desktop will automatically restart after it downloads the installer for the new version? To not interfere with the user’s actions, we can trigger the restart only if the Station’s window is not shown (we are running in the tray).
EDIT: This is already happening → https://github.com/filecoin-station/desktop/pull/931
Auto-restart does not solve the problem for Station Core (headless), so we should probably still implement some way for the server to tell the SPARK module that it should not do any more work because it’s outdated.
→ https://github.com/filecoin-station/spark/issues/13 and https://github.com/filecoin-station/roadmap/issues/41
In order to verify that checker nodes retrieved all data, we need several checkers to redundantly perform the same retrieval so that we can use the “honest majority” approach to determine what are the expected values, like the Blake3 hash of the CAR file retrieved.
To enable the formation of majorities, we need a concept of a time-limited period - a round - during which we collect redundant measurements and then compare the results after the round is over. So, the first logical step is to introduce the concept of a SPARK round.