Hey Orchestrators, it’s time for another guide! Don’t worry, this one’s way simpler than the Portainer/Docker guide
The Problem:
We’ve all dealt with the headache that comes with using RPC endpoints, whether we’re being rate limited by Alchemy, Infura, or the public Arbitrum RPC, or even running into issues with private Arb nodes - outages and rate limits happen.
The Solution:
This guide aims to mitigate those issues as much as possible by using a Cloudflare Worker to easily manage and deploy multiple RPC endpoints with ZERO downtime. It uses totally open source code to distribute requests to a pool of endpoints of your choice and even removes the bad ones automatically when major errors are detected.
The community node created by Ftkuhnsman has alleviated a lot of the RPC pain points for Orchestrators, but outages can still occur. The beautiful thing about using the Cloudflare Worker approach is that it can be used on top of the community node, or any node for that matter.
Thanks, @MikeZupper for making this happen!
Let’s begin.
Important Pricing Note: The Cloudflare Workers’ cost depends on usage and the bulk of it comes from KV storage which is how Cloudflare automatically removes bad endpoints. If Cloudflare is constantly removing and re-adding (reading/writing) endpoints, the cost for KV storage can be significant (over $100). This is not ideal, so please choose the RPC’s you’ll be using in the pool carefully, or use the CF Worker code without KV, which will be made available on the GitHub repo.
When using the code without KV, automatic failover is disabled, which means you’ll have to manually remove endpoints if/when they go down. This can still be done with no downtime, but the benefit of KV is that it happens automatically.
We’re currently working on new code that should bring these costs down while still incorporating failover. The code will also be made available on the guides GitHub repo when it passes the necessary tests.
Step 1: Cloudflare Basics
-
Create an account on cloudflare.com
-
Add a payment method
-
Go to plans page
-
Subscribe to Pay-as-you-go plan
Step 2: Set up Cloudflare Test Worker
Creating a test Worker is not required, however, it’s highly recommended as it allows us to deploy experimental code changes without disrupting our production environment.
The steps we’re going to go through here will be almost identical to setting up a production Worker.
-
On the left-hand-side menu click
Workers
-
Create a domain (we can call this whatever we’d like)
-
Click
Set up
-
Click the dropdown arrow next to
Workers
on the left-hand-side menu -
Click
Overview
-
Click
Create Service
-
In the
Service name
box, entertest-rpc
-
Click
Create Service
-
On the next page click the
Settings
tab -
Click
Variables
-
Click
Add variable
-
For the
Variable name
putARB_RPC_LIST
. TheValue
field is where our RPC URLs will go.
It’s important that these are entered correctly. The URLs should be separated by commas with NO spaces. For testing purposes, we’ll use 2 public RPC URLs, https://arb1.arbitrum.io/rpc,https://arb1.arbitrum.io/rpc
-
Add another variable
-
For the
Variable name
putFAILURE_ERROR_CODE_LIST
-
For the
Value
field put429,400,403,500
-
Add another variable
-
For the
Variable name
putFALLBACK_ARB_RPC_URL
-
For the
Value
field put another RPC URL. This can be any endpoint you have access to, but since we’re just testing this, we can simply addhttps://arb1.arbitrum.io/rpc
. Note that only one RPC URL should be entered here. -
Add another variable
-
For the
Variable name
putLOGGING_ENABLED
-
For the
Value
field puttrue
-
Click the dropdown arrow next to
Workers
on the left-hand-side menu -
Click on
KV
-
Click
Create namespace
-
In the
Namespace Name
field enterPROD_RPC_ERROR_LIST
-
Click
Create namespace
again and in theNamespace Name
field enterTEST_RPC_ERROR_LIST
When an RPC URL fails, it will be added to the corresponding KV namespace where you’ll be able to manually delete it by clicking the 3 little dots on the right-hand-side. Once deleted, they’ll automatically be added back into the rotation.
-
Click on
Workers
on the left-hand-side menu -
Click on
test-rpc
-
Click
Settings
tab -
Click
Variables
-
Scroll down until you see
KV Namespace Bindings
-
For the
Variable name
putRPC_ERROR_LIST
-
in the
KV Namespace
dropdown tab selectTEST_RPC_ERROR_LIST
-
Scroll up and click on the
Recourses
tab -
On the right-hand-side click
Quick edit
This is where the magic happens!
- Delete the default code.
- Copy this code and paste it in place of the code we just deleted. Alertenativly, the code will be posted on the guides GitHub repo. Keep this handy as any upgrades will be posted there as well.
/**
* This worker will proxy incoming requests and submit a POST
* requests to a given list of backend servers. Backend servers are configured
* via Cloudflare's ENVIRONMENT variables. The Variable name is ARB_RPC_LIST
*
*/
const pruneRPCFailedNodes = (logger, rpc_array, bad_rpc_array) => {
logger(`[pruneRPCFailedNodes] - input = ${rpc_array}, ${bad_rpc_array}`);
let map = new Map();
rpc_array.forEach((val, idx) => map.set(val, idx));
if (bad_rpc_array) {
bad_rpc_array.forEach((br) => {
rpc_array.forEach((rpc) => {
if (rpc == br.name) {
map.delete(rpc);
}
});
});
}
const output = rpc_array.filter((rpc) => {
return map.has(rpc);
});
return output;
};
const fetchRPCUrlWinner = async (env, logger) => {
logger(`[fetchRPCUrlWinner] ARB_RPC_LIST = ${env.ARB_RPC_LIST}`);
//parse the list of configured RPC enpoints (https://rpc1,https://rpc2,etc..)
const bad_rpc_array = await env.RPC_ERROR_LIST.list();
logger(
`[fetchRPCUrlWinner] RPC_ERROR_LIST = ${JSON.stringify(bad_rpc_array)}`
);
const rpc_urls = pruneRPCFailedNodes(
logger,
env.ARB_RPC_LIST.split(","),
bad_rpc_array.keys
);
// choose a random winner to "balance" the load across many RPC providers
const winner = rpc_urls[Math.floor(Math.random() * rpc_urls.length)];
logger(`[fetchRPCUrlWinner] winner = ${winner}`);
return winner;
};
const buildResponse = async (env, logger, winner, rawResponse) => {
let respBody = await rawResponse.json();
//construct final response back to consumer
logger(`[buildResponse] The response body is ${JSON.stringify(respBody)}`);
let hasError = false;
if (respBody.error) {
const { code, message } = respBody.error;
// these codes cause the node to be removed from the list of available nodes
if (code) {
const msg = `Internal Server Error error code [${code}] and message [${message}]`;
logger(`[buildResponse] error code found in response`, msg);
const fatalErrorCodes = env.FAILURE_ERROR_CODE_LIST;
fatalErrorCodes.split(",").forEach((fatalErrorCode) => {
const fc = parseInt(fatalErrorCode);
logger(`${code == fc} ${code} == ${fc} ???`);
if (code == fc) {
hasError = true;
}
});
logger(`[buildResponse] has error = ${hasError}`);
if (hasError == true) {
await env.RPC_ERROR_LIST.put(winner, Date.now());
return new Response(msg, { status: 500 });
}
}
}
return new Response(JSON.stringify(respBody), rawResponse);
};
export default {
/**
*
*
*/
async fetch(request, env, context) {
console.log("[config] LOGGING_ENABLED = ", env.LOGGING_ENABLED);
console.log("[config] ARB_RPC_LIST = ", env.ARB_RPC_LIST);
console.log("[config] FAILURE_ERROR_CODE_LIST = ", env.FAILURE_ERROR_CODE_LIST);
console.log("[config] FALLBACK_ARB_RPC_URL = ", env.FALLBACK_ARB_RPC_URL);
const logger = (logMsg) => {
if (env.LOGGING_ENABLED && env.LOGGING_ENABLED == "true") {
console.log(logMsg);
}
};
//pull the existing JSON text from the incoming request.
const body = await request.text();
logger(`[fetch] incoming request: ${body}`);
if (!body) {
return new Response("Bad Request: No Input", { status: 400 });
}
let winner = await fetchRPCUrlWinner(env, logger);
if (!winner) {
winner = env.FALLBACK_ARB_RPC_URL;
}
//submit new POST request to the backend RPC winner
let rawResponse;
try {
rawResponse = await fetch(winner, {
method: "POST",
headers: {
Accept: "application/json",
"Content-Type": "application/json",
},
body,
});
return buildResponse(env, logger, winner, rawResponse);
} catch (e) {
const msg = "Internal Server Error - [fetch] fetching response";
console.error(msg, JSON.stringify(e));
if(winner != env.FALLBACK_ARB_RPC_URL){
await env.RPC_ERROR_LIST.put(winner, Date.now());
return new Response(msg, { status: 500 });
}
else{
return new Response("Internal Server Error - [fetch] fallback failure", { status: 500 });
}
}
},
};
- Click Save and Deploy
Let’s test out the code.
- In the middle section change
GET
toPOST
and in theBody
put{"jsonrpc":"2.0","method":"net_version","params":[],"id":67}
Note that the URL next to GET/POST is what we use as our ethUrl.
- Click send
We’ll see some interesting stats pop up toward the bottom of the page.
For example, [fetchRPCurlwinner] winner = <RPC URL>
tells us which endpoint was used for that request.
If you click send and nothing happens, it’s possible an anti-virus, firewall, or browser setting is preventing the request from executing. If that occurs, you can simply run the test on your local machine or a server using:
export ARB_URL='https://<your URL>.workers.dev/'
echo ""
echo "net_version"
echo ""
curl $ARB_URL \
-X POST \
-H "Content-Type: application/json" \
--data '{"jsonrpc":"2.0","method":"net_version","params":[],"id":67}'
Just replace the URL in ARB_URL='https://<your URL>.workers.dev/'
with your own (found on the Quick Edit page). As long as the request returns a valid post response like this
, and not a long error, we’re good to go.
This is good curl command to keep handy as it’s very useful for troubleshooting Arbitrum RPC’s.
Last thing before we move on to the final steps…
This only applies to those who use Cloudflare as their primary DNS.
Back on the main Workers page click Triggers
and enter a custom domain like test-rpc.com
. This is totally optional and doesn’t change the performance of the Worker; it just allows us to use a custom domain as the ethUrl instead of the one Cloudflare provides us.
Step 3: Creating our Production Environment
At this point, we could add the URL from our Worker to the ethUrl
flag for Livepeer and be up and running, but remember, we want to separate our test environment from production.
Doing this is extremely simple. We’re going to duplicate the steps from Step 2
with a few minor changes:
-
Click the dropdown arrow next to
Workers
on the left-hand-side menu -
Click
Overview
-
Click
Create Service
-
In the
Service name
box, enterprod-rpc
-
Click
Create Service
-
On the next page click the
Settings
tab -
Click
Variables
-
Click
Add variable
-
For the
Variable name
putARB_RPC_LIST
. TheValue
field is where our RPC URLs will go.
Since this is our production environment, we’ll want to add the RPC URLS we plan to actively use.
Note: Be careful mixing/matching custom Arb nodes with public endpoints like Infura, Alchemy, and Offchain. Custom Arb nodes like the community node tend to drift out of sync and can potentially cause issues. Livepeer version 5.34 may help with sync problems but further testing is needed. Feel free to experiment however you’d like, but keep an eye on your logs. If they’re being spammed with Block not Found
errors, adjust your RPC list. A few of those errors every now and then are normal, log spam is not.
-
Add another variable
-
For the
Variable name
putFAILURE_ERROR_CODE_LIST
-
For the
Value
field put429,400,403,500
-
Add another variable
-
For the
Variable name
putFALLBACK_ARB_RPC_URL
-
For the
Value
field put another RPC URL.
For our production environment, we’ll want to put a node we trust as a fallback. For example, I use the community node here, which I trust will be able to handle all my O nodes if for some reason the main RPC pool fails. -
Add another variable
-
For the
Variable name
putLOGGING_ENABLED
-
For the
Value
field puttrue
-
Scroll down until you see
KV Namespace Bindings
-
For the
Variable name
putRPC_ERROR_LIST
-
in the
KV Namespace
dropdown tab selectPROD_RPC_ERROR_LIST
-
Scroll up and click on the
Recourses
tab -
On the right-hand-side click
Quick edit
-
Delete the default code.
-
Copy the same code listed in step 2 and paste it in place of the code you just deleted.
GitHub link -
Click `Save and Deploy
-
Run the same tests listed in step 2
That’s it, we can now use the production URL as our ethUrl and let Cloudflare do all the work. You can also add or remove RPC URLs on the fly through the Cloudflare UI.
Combined with this guide, we can take some significant steps to operate a secure, highly available orchestrator!
Another great thing about using Cloudflare is that they provide some pretty neat metrics for us to look at. Check them out on the Worker page under the Recourses
tab.
Feel free to ask any question in this thread or reach out on Discord, and feel even freer to tweak the code and customize it until your heart’s content! As briefly mentioned, @MikeZupper is working on further code optimizations so keep an eye on the GitHub repo for those.
Any significant updates to the CF Worker will be announced in Livepeer’s official Discord, using this thread, which can be found under the #optimize-your-node channel.