Integration Testing AA with Bundlers in Hardhat

Unlike traditional software where bugs often result in software crashes or system downtime, in the world of blockchain, such bugs can mean direct financial losses, unauthorized access to funds, or unintended distribution of tokens. With the absence of a central authority and the principle of decentralization, there is no “undo” button or a direct way to intervene should something go awry.

Smart contract testing ensures that the code behaves exactly as intended before it’s cemented on the blockchain. It checks for logical flaws, and security vulnerabilities, and ensures the contract meets its specifications. In essence, smart contract testing isn’t just a best practice; it’s a crucial measure to protect stakeholders, uphold the project’s integrity, and ensure trust in the decentralized ecosystem.

Introduction

With the advent of ERC4337 and modular smart contracts, tests must take into account the nuances specific to smart contract wallets and account abstraction. In particular, any module executing in the validation phase of the handleOps transaction must satisfy the constraints implemented by the bundlers for the public UserOp mempool, therefore there is a need for a testing environment which is able to verify compliance with these rules.

Restrictions in the Validation Phase

On a high level, these rules restrict the opcodes that can be invoked in the validation phase of the transaction, and also which storage can be accessed. A full list of these rules can be found in the ERC4337 specification.

The rationale behind these restrictions is to minimize (or in some cases eliminate) the dependence of the validity of a User Operation on non-account storage, thus preventing User Operations in a mempool from being invalidated en masse with a constant cost to the invalidator.

Importance of Compliance Testing

If a smart contract executing the validation phase (a validation module, paymaster or smart contract factory) does not respect these rules, bundlers servicing the public mempool may drop any UserOperations interacting with these contracts. In this situation, nobody would be able to use your smart contract unless these issues are fixed, or another mempool with relaxed restrictions is created.

While the latter is possible, it would require convincing bundlers to “trust” that your smart contract is safe and violates the restrictions with a good reason, or alternatively to run your own bundler. Neither option is ideal, therefore it is best to design your smart contracts to comply with these restrictions.

Now some of these rules are not obvious and may change with updates to the ERC. Therefore, there is a need for a testing setup that validates the smart contract execution against these rules and can be updated to newer versions of the ERC with minimum effort.

The rest of the article describes a testing setup by integrating the Infinitism Reference Bundler into hardhat integration tests.

Integrating the Bundler in the Testing Environment

Testing Smart Contracts that interact with 4337 Smart Contract Wallets generally involves creating UserOperations that call the contract being tested. While writing such tests in hardhat, the general way to execute the User Operation is to directly call Entrypoint’s handleOps from the test.

Normal Tests

This test would capture all on-chain details including User Operation reverts, and works well for testing the business logic in the smart contract being tested. However, as noted earlier these tests fail to account for the restrictions placed by bundlers in the validation phase of the transactions.

Therefore, to test against these restrictions we can launch an instance of the bundler during test initialization, and submit all UserOperations to the entrypoint through the bundler. Any violations of the bundler restrictions would be returned as an error from the eth_sendUserOperation RPC call.

Which bundler should we use? The reference bundler (GitHub - eth-infinitism/bundler) maintained by the authors of the ERC4337 is ideal because it’s a minimal implementation without any external dependencies and can also be expected to keep up with any changes to the rules in the ERC.

Tests with Bundler Integration

Running Hardhat tests on an external network is nothing new, it can be simply done by including --network local with the command that starts the tests, where local is configured in the hardhat config file to point to the geth node. Seems pretty simple right?

There is but one problem - to perform these checks and enforce these restrictions, the bundler utilizes the debug_traceCall RPC with the Javascript tracer to know which opcodes have been called and which storage has been accessed in the validation phase of the transaction. Turns out that Hardhat’s node does not support this! So even if we were to use the bundler to submit these user operations using the bundler, it would just skip these checks.

So how do we solve this? The GitHub - eth-infinitism/bundler ‘s README.md suggests that Geth supports this RPC, and thus we should be able to run a single instance of it and use that as a substitute for hardhat’s node.

Building the Test Environment

Challenges with Geth

Replacing Hardhat’s node with Geth solves the compatibility issue with bundlers, however, it introduces its own set of issues. Remember that hardhat’s node is optimized for testing, it implements a variety of features and quality-of-life enhancements that make testing easy. Some of these features are:

  1. A default set of pre-funded addresses each with 10000ETH.
  2. console.log() support
  3. RPCs for account impersonation, rewinding and fast forwarding block.number and block.timestamp.
  4. Chain Snapshots

And more. A full list of custom behaviours implemented to enhance testing can be found here: Reference | Ethereum development environment for professionals by Nomic Foundation

An ideal testing environment with geth would replicate much of the same functionalities with a similar API to ensure that the testing experience is as close to vanilla hardhat testing as possible.

For this article, we focus on replicating the following minimal set of functionalities:

  1. Identify the default addresses used by hardhat and ensure they are funded before the tests are executed.
  2. Chain Snapshots

The first is a non-negotiable requirement for testing - funds are needed to execute transactions. I’d argue that the second is also quite important, as snapshots can be utilized to ensure that every test in a suite starts from a known clean blockchain state. This is important for tests to be independent and deterministic. With these two available in the testing environment, most existing hardhat tests should work with this setup with minimal to no changes.

Based on all the information above, we can identify the following steps to create the bundler-geth testing environment:

  1. Setup Geth and obtain a local RPC endpoint to which transactions can be sent.
  2. Fund the default addresses used by Hardhat.
  3. Deploy the entrypoint on the Geth node.
  4. Launch the bundler and wait for it to start successfully.
  5. Execute the tests on the local Geth node.

The 3rd step is needed because the Infinitism bundler expects a valid RPC and a pre-deployed ERC4337 Entrypoint to be available during initialization. This also means that the Entrypoint address must remain the same across all hardhat tests, the alternative being restarting the bundler with a new entrypoint address in each test which would make test execution extremely slow.

Setting Up Geth

We use Docker for setting up the Geth Node, with the following Dockerfile based on the one found in the bundler repository.

FROM ethereum/client-go:latest
ENTRYPOINT geth \
      --http.vhosts '*,localhost,host.docker.internal' \
      --http \
      --http.api personal,eth,net,web3,debug \
      --http.corsdomain '*' \
      --http.addr "0.0.0.0" \
      --nodiscover --maxpeers 0 --mine \
      --networkid 1337 \
      --dev \
      --allow-insecure-unlock \
      --rpc.allow-unprotected-txs \
      --dev.gaslimit 200000000 \

This is straightforward, but there are a few things we can talk about:

  1. We recommend using the latest release of geth, as we recently found a bug with the implementation of the debug_setHead RPC which caused it to crash after the RPC is called. Further sections of this article will describe how this RPC is used to implement Snapshots for the test environment. The bug was fixed by the Geth team recently, more details can be found here: Calling debug_setHead crashes geth with SIGSEGV in dev mode. · Issue #27990 · ethereum/go-ethereum · GitHub
  2. Normally, geth requires it to be paired with a consensus client. However, to keep things simple we run geth in developer mode, which launches geth as a single-node Ethereum Test Network with no connection to any external peers. This makes it ideal for local testnets. More details on this mode can be found here: Developer mode | go-ethereum.

Funding the Default Hardhat Accounts

We use the default account managed by the geth node as the funding address. This can be done as follows:

// Assume that the geth node is running at localhost:8545
const provider = new ethers.providers.JsonRpcProvider("http://localhost:8545");

// Using the ethers instance from hardhat
const accountsToFund = (await ethers.getSigners()).map(
  (signer) => signer.address
);

const fundingAmount = accountsToFund.map(() => parseEther("10000"));

const fundAccounts = async (
  accountsToFund: string[],
  fundingAmount: BigNumberish[]
) => {
  // Get the JsonRpcSigner which is managed by this Geth node
  const signer = provider.getSigner();
  const nonce = await signer.getTransactionCount();

  await Promise.all(
    accountsToFund.map((account, i) =>
      signer.sendTransaction({
        to: account,
        value: fundingAmount[i],
        nonce: nonce + i,
      })
    )
  );
};

await fundAccounts(accountsToFund, fundingAmount);

Chain Snapshots

We define a snapshot to simply be the state of the blockchain at a particular block. Therefore, a snapshot can be represented as

type Snapshot = {
  blockNumber: number;
};

We can use the debug_setHead RPC provided by geth to roll back the chain from any block B>b to b .

const snapshot = async (): Promise<Snapshot> => ({
  blockNumber: await this.provider.getBlockNumber(),
});

const revert = async (snapshot: Snapshot) => {
  await this.provider.send("debug_setHead", [
    utils.hexValue(BigNumber.from(snapshot.blockNumber)),
  ]);

  // getBlockNumber() caches the result, so we directly call the rpc method instead
  const currentBlockNumber = BigNumber.from(
    await this.provider.send("eth_blockNumber", [])
  );
  if (!BigNumber.from(snapshot.blockNumber).eq(currentBlockNumber)) {
    throw new Error(
      `Failed to revert to block ${snapshot.blockNumber}. Current block number is ${currentBlockNumber}`
    );
  }
};

Resetting the Bundler

It is a good idea to reset the bundler between tests to get rid of any leftover state such as ops in the mempool, counters based on SCW address etc. Conveniently, the bundler provides the debug_bundler_clearState RPC for this exact purpose.

apiClient = axios.create({
  baseURL: this.bundlerUrl,
});

resetBundler = async () => {
  await this.apiClient.post("/rpc", {
    jsonrpc: "2.0",
    method: "debug_bundler_clearState",
    params: [],
  });
};

Launching the Bundler

Official docker images for the infinitism bundler can be found here: Docker. We use a simple docker-compose file to manage the bundler and geth instances:

version: "2"

services:
  bundler:
    ports: ["3000:3000"]
    image: ankurdubeybiconomy/bundler:latest # Image based off accountabstraction/bundler:0.6.1 with fixes for debug_bundler_clearState
    command: --network http://geth-dev:8545 --entryPoint ${ENTRYPOINT} --show-stack-traces
    volumes:
      - ./workdir:/app/workdir:ro

    mem_limit: 1000M
    logging:
      driver: "json-file"
      options:
        max-size: 10m
        max-file: "10"

  geth-dev:
    build: geth-dev
    ports: ["8545:8545"]

We recently contributed a fix (fix: debug_bundler_clearState clears MempoolManager.entryCount by ankurdubey521 · Pull Request #134 · eth-infinitism/bundler · GitHub) in the implementation of debug_bundler_clearState RPC which is crucial for the snapshot functionality. At the time of writing this article, the fixes have been merged to the main branch but haven’t been published to the docker registry, it is therefore advisable to either manually build from the main branch or use the image mentioned in the docker-compose, which is a fork of the official image.

Writing tests with Bundler Integration

Once the environment has been set, a few things need to be kept in mind while writing tests that are compatible with the environment:

  1. All the normal tests and assertions from Chai still work as usual, so the core patterns for writing tests remain the same.

  2. To submit a user operation to the entrypoint, call the eth_sendUserOperation RPC of the bundler.

    const sendUserOperation = async (
      userOperation: UserOperation,
      entrypointAddress: string
    ): Promise<string> => {
      const result = await this.apiClient.post("/rpc", {
        jsonrpc: "2.0",
        method: "eth_sendUserOperation",
        params: [serializeUserOp(userOperation), entrypointAddress],
      });
      if (result.status !== 200) {
        throw new Error(
          `Failed to send user operation: ${JSON.stringify(
            result.data.error.message
          )}`
        );
      }
      if (result.data.error) {
        throw new UserOperationSubmissionError(JSON.stringify(result.data.error));
      }
    
      return result.data;
    };
    
    const serializeUserOp = (op: UserOperation) => {
      return {
        sender: op.sender,
        nonce: hexValue(op.nonce),
        initCode: op.initCode,
        callData: op.callData,
        callGasLimit: hexValue(op.callGasLimit),
        verificationGasLimit: hexValue(op.verificationGasLimit),
        preVerificationGas: hexValue(op.preVerificationGas),
        maxFeePerGas: hexValue(op.maxFeePerGas),
        maxPriorityFeePerGas: hexValue(op.maxPriorityFeePerGas),
        paymasterAndData: op.paymasterAndData,
        signature: op.signature,
      };
    }
    
  3. To reset the chain to a specific snapshot (block) after each test, include the revert() call in the afterEach hook.

  4. To reset the bundler after each test, include the resetBundler() call in the afterEach hook.

  5. Ensure that the address of the Entrypoint contract does not change between tests, and is consistent with the provided address while launching the bundler. This can be done by deploying the entrypoint between launching Geth and the Bundler, and then in the tests instantiating the Entrytpoint contract with the same address.

Example

We created a class called BundlerEnvironment responsible for exposing all the functions related to funding, snapshots, bundler reset and user operations submission. An implementation of this class can be found here: https://github.com/bcnmy/scw-contracts/blob/SCW-V2-Modular-SA/test/bundler-integration/environment/bundlerEnvironment.ts.

The following is an excerpt from one of our tests, where we test that a rule-violating validation module should have its UserOperation rejected by the bundler. The validation module is programmed to call the TIMESTAMP opcode in its validation logic which is forbidden.

describe("Bundler Environment", async () => {
  let signers: SignerWithAddress[];
  let deployer: SignerWithAddress,
    alice: SignerWithAddress,
    bob: SignerWithAddress,
    charlie: SignerWithAddress,
    smartAccountOwner: SignerWithAddress;
  let environment: BundlerTestEnvironment;

  const setupTests = deployments.createFixture(async ({ deployments }) => {
    // ... deploy all contracts except entrypoint

    return {
      entryPoint,
      smartAccountImplementation: await getSmartAccountImplementation(),
      smartAccountFactory: await getSmartAccountFactory(),
      mockToken: mockToken,
      ecdsaModule: ecdsaModule,
      userSA: userSA,
    };
  });

  before(async function () {
		// Skip the test if it's not expected on the local geth chain.
    const chainId = (await ethers.provider.getNetwork()).chainId;
    if (chainId !== BundlerTestEnvironment.BUNDLER_ENVIRONMENT_CHAIN_ID) {
      this.skip();
    }

    environment = await BundlerTestEnvironment.getDefaultInstance();
  });

  beforeEach(async () => {
    signers = await ethers.getSigners();
    [deployer, alice, bob, charlie, smartAccountOwner] = signers;
  });

  afterEach(async function () {
    const chainId = (await ethers.provider.getNetwork()).chainId;
    if (chainId !== BundlerTestEnvironment.BUNDLER_ENVIRONMENT_CHAIN_ID) {
      this.skip();
    }

		// Reset the chain and the bundler after each test
    await Promise.all([
      environment.revert(environment.defaultSnapshot!),
      environment.resetBundler(),
    ]);
  });

  it("Should not submit user operation that calls TIMESTAMP in the validation phase", async () => {
    const { entryPoint, mockToken, userSA, ecdsaModule } =
      await setupTestsWithRecusantValidationModule();
    const tokenAmountToTransfer = ethers.utils.parseEther("0.5345");

    const userOp = await makeEcdsaModuleUserOp(
      "execute_ncC",
      [
        mockToken.address,
        ethers.utils.parseEther("0"),
        encodeTransfer(charlie.address, tokenAmountToTransfer.toString()),
      ],
      userSA.address,
      smartAccountOwner,
      entryPoint,
      ecdsaModule.address,
      {
        preVerificationGas: 50000,
      }
    );

    const expectedError = new UserOperationSubmissionError(
      '{"message":"account uses banned opcode: TIMESTAMP","code":-32502}'
    );
    let thrownError: Error | null = null;

    try {
      await environment.sendUserOperation(userOp, entryPoint.address);
    } catch (e) {
      thrownError = e as Error;
    }

    expect(thrownError).to.deep.equal(expectedError);
  });
});

Notice that we expect the bundle to complain with the error {"message": "account uses banned opcode: TIMESTAMP", "code": -32502} which satisfies our goal - this is an error which would not have been caught in testing if we directly submitted the user operation to the entrypoint.

The implementation of the Validation Module for those curious:

// SPDX-License-Identifier: MIT
pragma solidity 0.8.17;

import {EcdsaOwnershipRegistryModule} from "../modules/EcdsaOwnershipRegistryModule.sol";
import {UserOperation} from "../modules/BaseAuthorizationModule.sol";

contract ForbiddenOpcodeInvokingAuthModule is EcdsaOwnershipRegistryModule {
    function validateUserOp(
        UserOperation calldata userOp,
        bytes32 userOpHash
    ) external view virtual override returns (uint256) {
        // Access the forbidden opcode
        require(block.timestamp > 0);

        // Usual Stuff
        (bytes memory cleanEcdsaSignature, ) = abi.decode(
            userOp.signature,
            (bytes, address)
        );
        if (_verifySignature(userOpHash, cleanEcdsaSignature, userOp.sender)) {
            return VALIDATION_SUCCESS;
        }
        return SIG_VALIDATION_FAILED;
    }
}

Challenges and Testing Strategy

Since these tests depend on an external geth and bundler instance, and perform expensive operations like debug_setHead and debug_bundler_clearState after each test, the tests run slower than normal Hardhat Unit Tests. Also, it remains to be seen how tests that depend on complex timing logic can be written in this environment.

Therefore, we follow the following approach to testing our Smart Contracts:

  1. Write all happy flow tests in the bundler environment.
  2. Write all negative flow tests (where UserOperations are expected to revert) in the normal hardhat testing environment.

The rationale behind this is that it does not make a lot of sense to test the validation rules on User Operations that are expected to fail because of violation of conditions in the business logic. Examples of such tests could be violating onlyOwner restrictions etc.

Also considering the fact that most tests typically test for negative cases, it makes sense to keep them on the faster, stable hardhat testing environment and only keep the happy path tests on the slower but more rigorous bundler environment.

For our codebase we orchestrate the whole process of setting up geth, bundler, deploying the entrypoint and running the tests via a simple bash script, whose implementation can be found here: https://github.com/bcnmy/scw-contracts/blob/SCW-V2-Modular-SA/scripts/bundler-tests.sh. The full suite of our bundler tests can be found here: https://github.com/bcnmy/scw-contracts/tree/SCW-V2-Modular-SA/test/bundler-integration

Conclusion

This article has explored the challenges brought by standards like the 4337 and explained the steps needed to set up good testing systems. As tech changes, the difficulties we face will change too. The article also stressed the importance of testing against ERC4337’s restrictions implemented by the bundlers in the validation phase and detailed the steps for setting up an environment to write such tests.

Wait what about Foundry?

While Anvil supports the debug_traceCall RPC, it currently does not support the JS Tracer. Also, based on our research so far, there is no way to run forge tests on an external node like we do here with hardhat.

Therefore, it doesn’t seem like this is possible in Foundry at the moment (please prove me wrong). An alternate way of doing this could be to set up a hybrid foundry-hardhat repository and re-write the happy path tests in Hardhat. Steps for setting up a hybrid repository can be found here:

  1. Integrating with Hardhat - Foundry Book
  2. Integrating with Foundry | Ethereum development environment for professionals by Nomic Foundation

References

  1. debug Namespace | go-ethereum
  2. Reference | Ethereum development environment for professionals by Nomic Foundation
  3. ERC-4337: Account Abstraction Using Alt Mempool
  4. GitHub - bcnmy/scw-contracts: SCW contracts for Biconomy Smart Account
2 Likes