Guide: Simple, Scalable, Serverless Arbitrum RPC

Meme

Hey Orchestrators, it’s time for another guide! Don’t worry, this one’s way simpler than the Portainer/Docker guide :slight_smile:

The Problem:

We’ve all dealt with the headache that comes with using RPC endpoints, whether we’re being rate limited by Alchemy, Infura, or the public Arbitrum RPC, or even running into issues with private Arb nodes - outages and rate limits happen.

The Solution:

This guide aims to mitigate those issues as much as possible by using a Cloudflare Worker to easily manage and deploy multiple RPC endpoints with ZERO downtime. It uses totally open source code to distribute requests to a pool of endpoints of your choice and even removes the bad ones automatically when major errors are detected.

The community node created by Ftkuhnsman has alleviated a lot of the RPC pain points for Orchestrators, but outages can still occur. The beautiful thing about using the Cloudflare Worker approach is that it can be used on top of the community node, or any node for that matter.

Thanks, @MikeZupper for making this happen!

Let’s begin.

Important Pricing Note: The Cloudflare Workers’ cost depends on usage and the bulk of it comes from KV storage which is how Cloudflare automatically removes bad endpoints. If Cloudflare is constantly removing and re-adding (reading/writing) endpoints, the cost for KV storage can be significant (over $100). This is not ideal, so please choose the RPC’s you’ll be using in the pool carefully, or use the CF Worker code without KV, which will be made available on the GitHub repo.

When using the code without KV, automatic failover is disabled, which means you’ll have to manually remove endpoints if/when they go down. This can still be done with no downtime, but the benefit of KV is that it happens automatically.

We’re currently working on new code that should bring these costs down while still incorporating failover. The code will also be made available on the guides GitHub repo when it passes the necessary tests.

Step 1: Cloudflare Basics

  • Create an account on cloudflare.com

  • Add a payment method

  • Go to plans page

  • Subscribe to Pay-as-you-go plan

Step 2: Set up Cloudflare Test Worker

Creating a test Worker is not required, however, it’s highly recommended as it allows us to deploy experimental code changes without disrupting our production environment.
The steps we’re going to go through here will be almost identical to setting up a production Worker.

  • On the left-hand-side menu click Workers

  • Create a domain (we can call this whatever we’d like)

  • Click Set up

  • Click the dropdown arrow next to Workers on the left-hand-side menu

  • Click Overview

  • Click Create Service

  • In the Service name box, enter test-rpc

  • Click Create Service

  • On the next page click the Settings tab

  • Click Variables

  • Click Add variable

  • For the Variable name put ARB_RPC_LIST. The Value field is where our RPC URLs will go.

It’s important that these are entered correctly. The URLs should be separated by commas with NO spaces. For testing purposes, we’ll use 2 public RPC URLs, https://arb1.arbitrum.io/rpc,https://arb1.arbitrum.io/rpc

  • Add another variable

  • For the Variable name put FAILURE_ERROR_CODE_LIST

  • For the Value field put 429,400,403,500

  • Add another variable

  • For the Variable name put FALLBACK_ARB_RPC_URL

  • For the Value field put another RPC URL. This can be any endpoint you have access to, but since we’re just testing this, we can simply add https://arb1.arbitrum.io/rpc. Note that only one RPC URL should be entered here.

  • Add another variable

  • For the Variable name put LOGGING_ENABLED

  • For the Value field put true

  • Click the dropdown arrow next to Workers on the left-hand-side menu

  • Click on KV

  • Click Create namespace

  • In the Namespace Name field enter PROD_RPC_ERROR_LIST

  • Click Create namespace again and in the Namespace Name field enter TEST_RPC_ERROR_LIST

When an RPC URL fails, it will be added to the corresponding KV namespace where you’ll be able to manually delete it by clicking the 3 little dots on the right-hand-side. Once deleted, they’ll automatically be added back into the rotation.

  • Click on Workers on the left-hand-side menu

  • Click on test-rpc

  • Click Settings tab

  • Click Variables

  • Scroll down until you see KV Namespace Bindings

  • For the Variable name put RPC_ERROR_LIST

  • in the KV Namespace dropdown tab select TEST_RPC_ERROR_LIST

  • Scroll up and click on the Recourses tab

  • On the right-hand-side click Quick edit

This is where the magic happens!

  • Delete the default code.
  • Copy this code and paste it in place of the code we just deleted. Alertenativly, the code will be posted on the guides GitHub repo. Keep this handy as any upgrades will be posted there as well.
/**
 * This worker will proxy incoming requests and submit a POST
 * requests to a given list of backend servers. Backend servers are configured
 * via Cloudflare's ENVIRONMENT variables. The Variable name is ARB_RPC_LIST
 *
 */

 const pruneRPCFailedNodes = (logger, rpc_array, bad_rpc_array) => {
  logger(`[pruneRPCFailedNodes] - input =  ${rpc_array}, ${bad_rpc_array}`);
  let map = new Map();

  rpc_array.forEach((val, idx) => map.set(val, idx));

  if (bad_rpc_array) {
    bad_rpc_array.forEach((br) => {
      rpc_array.forEach((rpc) => {
        if (rpc == br.name) {
          map.delete(rpc);
        }
      });
    });
  }

  const output = rpc_array.filter((rpc) => {
    return map.has(rpc);
  });
  return output;
};

const fetchRPCUrlWinner = async (env, logger) => {
  logger(`[fetchRPCUrlWinner] ARB_RPC_LIST = ${env.ARB_RPC_LIST}`);

  //parse the list of configured RPC enpoints (https://rpc1,https://rpc2,etc..)
  const bad_rpc_array = await env.RPC_ERROR_LIST.list();
  logger(
    `[fetchRPCUrlWinner] RPC_ERROR_LIST = ${JSON.stringify(bad_rpc_array)}`
  );
  const rpc_urls = pruneRPCFailedNodes(
    logger,
    env.ARB_RPC_LIST.split(","),
    bad_rpc_array.keys
  );

  // choose a random winner to "balance" the load across many RPC providers
  const winner = rpc_urls[Math.floor(Math.random() * rpc_urls.length)];
  logger(`[fetchRPCUrlWinner] winner = ${winner}`);
  return winner;
};

const buildResponse = async (env, logger, winner, rawResponse) => {
  let respBody = await rawResponse.json();

  //construct final response back to consumer
  logger(`[buildResponse] The response body is ${JSON.stringify(respBody)}`);
  let hasError = false;

  if (respBody.error) {
    const { code, message } = respBody.error;

    // these codes cause the node to be removed from the list of available nodes
    if (code) {
      const msg = `Internal Server Error error code [${code}] and message [${message}]`;
      logger(`[buildResponse] error code found in response`, msg);
      const fatalErrorCodes = env.FAILURE_ERROR_CODE_LIST;

      fatalErrorCodes.split(",").forEach((fatalErrorCode) => {
        const fc = parseInt(fatalErrorCode);
        logger(`${code == fc} ${code} == ${fc} ???`);
        if (code == fc) {
          hasError = true;
        }
      });
      logger(`[buildResponse] has error = ${hasError}`);

      if (hasError == true) {
        await env.RPC_ERROR_LIST.put(winner, Date.now());
        return new Response(msg, { status: 500 });
      }
    }
  }

  return new Response(JSON.stringify(respBody), rawResponse);
};

export default {
  /**
   *
   *
   */
  async fetch(request, env, context) {
    console.log("[config] LOGGING_ENABLED = ", env.LOGGING_ENABLED);
    console.log("[config] ARB_RPC_LIST = ", env.ARB_RPC_LIST);
    console.log("[config] FAILURE_ERROR_CODE_LIST = ",  env.FAILURE_ERROR_CODE_LIST);
    console.log("[config] FALLBACK_ARB_RPC_URL = ",  env.FALLBACK_ARB_RPC_URL);

    const logger = (logMsg) => {
      if (env.LOGGING_ENABLED && env.LOGGING_ENABLED == "true") {
        console.log(logMsg);
      }
    };

    //pull the existing JSON text from the incoming request.
    const body = await request.text();
    logger(`[fetch] incoming request: ${body}`);
    if (!body) {
      return new Response("Bad Request: No Input", { status: 400 });
    }

    let winner = await fetchRPCUrlWinner(env, logger);

    if (!winner) {
        winner = env.FALLBACK_ARB_RPC_URL;
    }

    //submit new POST request to the backend RPC winner
    let rawResponse;
    try {
      rawResponse = await fetch(winner, {
        method: "POST",
        headers: {
          Accept: "application/json",
          "Content-Type": "application/json",
        },
        body,
      });
      return buildResponse(env, logger, winner, rawResponse);
    } catch (e) {
      const msg = "Internal Server Error - [fetch] fetching response";
      console.error(msg, JSON.stringify(e));
      if(winner != env.FALLBACK_ARB_RPC_URL){
        await env.RPC_ERROR_LIST.put(winner, Date.now());
        return new Response(msg, { status: 500 });
      }
      else{
          return new Response("Internal Server Error - [fetch] fallback failure", { status: 500 });
      }
    }
  },
};
  • Click Save and Deploy

Let’s test out the code.

  • In the middle section change GET to POST and in the Body put {"jsonrpc":"2.0","method":"net_version","params":[],"id":67}

Note that the URL next to GET/POST is what we use as our ethUrl.

  • Click send

We’ll see some interesting stats pop up toward the bottom of the page.
For example, [fetchRPCurlwinner] winner = <RPC URL> tells us which endpoint was used for that request.

If you click send and nothing happens, it’s possible an anti-virus, firewall, or browser setting is preventing the request from executing. If that occurs, you can simply run the test on your local machine or a server using:

export ARB_URL='https://<your URL>.workers.dev/'
echo ""
echo "net_version"
echo ""
curl $ARB_URL \
  -X POST \
  -H "Content-Type: application/json" \
  --data '{"jsonrpc":"2.0","method":"net_version","params":[],"id":67}'

Just replace the URL in ARB_URL='https://<your URL>.workers.dev/' with your own (found on the Quick Edit page). As long as the request returns a valid post response like this
8_RPC_TEST, and not a long error, we’re good to go.
This is good curl command to keep handy as it’s very useful for troubleshooting Arbitrum RPC’s.

Last thing before we move on to the final steps…
This only applies to those who use Cloudflare as their primary DNS.
Back on the main Workers page click Triggers and enter a custom domain like test-rpc.com. This is totally optional and doesn’t change the performance of the Worker; it just allows us to use a custom domain as the ethUrl instead of the one Cloudflare provides us.

Step 3: Creating our Production Environment

At this point, we could add the URL from our Worker to the ethUrl flag for Livepeer and be up and running, but remember, we want to separate our test environment from production.

Doing this is extremely simple. We’re going to duplicate the steps from Step 2 with a few minor changes:

  • Click the dropdown arrow next to Workers on the left-hand-side menu

  • Click Overview

  • Click Create Service

  • In the Service name box, enter prod-rpc

  • Click Create Service

  • On the next page click the Settings tab

  • Click Variables

  • Click Add variable

  • For the Variable name put ARB_RPC_LIST. The Value field is where our RPC URLs will go.
    Since this is our production environment, we’ll want to add the RPC URLS we plan to actively use.

Note: Be careful mixing/matching custom Arb nodes with public endpoints like Infura, Alchemy, and Offchain. Custom Arb nodes like the community node tend to drift out of sync and can potentially cause issues. Livepeer version 5.34 may help with sync problems but further testing is needed. Feel free to experiment however you’d like, but keep an eye on your logs. If they’re being spammed with Block not Found errors, adjust your RPC list. A few of those errors every now and then are normal, log spam is not.

  • Add another variable

  • For the Variable name put FAILURE_ERROR_CODE_LIST

  • For the Value field put 429,400,403,500

  • Add another variable

  • For the Variable name put FALLBACK_ARB_RPC_URL

  • For the Value field put another RPC URL.
    For our production environment, we’ll want to put a node we trust as a fallback. For example, I use the community node here, which I trust will be able to handle all my O nodes if for some reason the main RPC pool fails.

  • Add another variable

  • For the Variable name put LOGGING_ENABLED

  • For the Value field put true

  • Scroll down until you see KV Namespace Bindings

  • For the Variable name put RPC_ERROR_LIST

  • in the KV Namespace dropdown tab select PROD_RPC_ERROR_LIST

  • Scroll up and click on the Recourses tab

  • On the right-hand-side click Quick edit

  • Delete the default code.

  • Copy the same code listed in step 2 and paste it in place of the code you just deleted.
    GitHub link

  • Click `Save and Deploy

  • Run the same tests listed in step 2

That’s it, we can now use the production URL as our ethUrl and let Cloudflare do all the work. You can also add or remove RPC URLs on the fly through the Cloudflare UI.

Combined with this guide, we can take some significant steps to operate a secure, highly available orchestrator!

Another great thing about using Cloudflare is that they provide some pretty neat metrics for us to look at. Check them out on the Worker page under the Recourses tab.

Feel free to ask any question in this thread or reach out on Discord, and feel even freer to tweak the code and customize it until your heart’s content! As briefly mentioned, @MikeZupper is working on further code optimizations so keep an eye on the GitHub repo for those.

Any significant updates to the CF Worker will be announced in Livepeer’s official Discord, using this thread, which can be found under the #optimize-your-node channel.

8 Likes

Just set up my 9 nodes using this method. Hopefully I’ll see some improved stability :slight_smile:

Great tutorial! Very easy to follow and detailed.

Thank you @Authority_Null and @MikeZupper
:partying_face:

1 Like

The blocks seems to be not lining up well with multiple endpoints in the pool.

I’m going to try the community arb node as the main and the Offchain Labs as the fallback.

I think just that double redundancy is enough for me right now :slight_smile:

I have not had any significant issues mixing Alchemy and Arb1, and when I had Infura in the mix, I didn’t notice anything problematic. I would try removing Infura before making the community node your primary, as when that inevitably fails, you’ll have 9 nodes stressing arb1, which may rate limit you.

Totally up to you how you want to run it. That’s just my 2 cents :+1:

Is there a way to reverse the order of the fallback?
Have the primary node as the community node and have a pool of endpoints as the fallback?

That sounds like a question for @MikeZupper ! (probably faster to hit him up in discord)

For updates/announcements on the CF Worker, please watch this Discord thread:

Been reading into the runtime environment that CF Workers use. It should be possible to run this script on your own infrastructure for free, maybe with a few modifications. Will give a try if I have some time

That’s really cool! I’d love to know more about this. For example, could this be as flexible as CF, where we only need to make a change on a single UI to effect all nodes, and can we do it on the fly with no downtime? Also, what would failover look like if we were running this on our own infra?

Video guide now available :slight_smile:

Lens:
https://lenstube.xyz/watch/0x8a69-0x0b

Youtube: