FireSim + AWS

We’re excited to release our first public demo of FireSim, which you can deploy now on Amazon EC2 F1 instances by following the instructions on this page. Also, stay tuned for an upcoming AWS Compute Blog post about FireSim, which covers more details about FireSim and this demo. You can track FireSim development by following @firesimproject on Twitter.

By the end of this demo, you’ll have an FPGA-accelerated cycle-accurate simulation of 1 or 8 RISC-V Rocket Chips, each with a NIC and block device, and interconnected by a functional network simulation. These simulated processors will boot a pre-built Linux distro included in the FireSim Demo AMI. At the end, we’ll run memcached on the simulated nodes and run YCSB on the EC2 instance to generate load on our simulated cluster of Rocket Chips and demonstrate the ability to run real workloads on FireSim. Running this demo does not require any knowledge about FPGAs or RISC-V.

There are two ways to try the demo, either simulating a single node or simulating an 8-node cluster, which require either an f1.2xlarge or an f1.16xlarge instance respectively. The instructions will guide you to the appropriate section based on your choice.

1. Starting Your F1 Instances

We provide a pre-built solution on the AWS Marketplace that includes an AMI/AFI combo to deploy the simulation. You can find it here. To simulate a single node, start an f1.2xlarge instance using our AMI. To simulate an 8-node cluster, start an f1.16xlarge instance using our AMI. Below are detailed instructions for setting up an instance for new users. If you’ve used F1 instances before, you can skip to the next section after starting an instance.

If you are a new EC2 F1 user, open the AWS EC2 Management Console. In the top right corner, make sure that your region is set to N. Virginia (a.k.a. us-east-1). This is required to be able to use F1 instances. If you are a new EC2 user, it is also likely that your service limit for F1 instances will be set to zero. You can check your limit at this link. Confirm that the limit for f1.2xlarge instances is greater than zero. If it isn’t, follow the instructions to submit a service limit increase here to request access to F1 instances. Set an initial request size of 1 for either f1.2xlarge or f1.16xlarge instances in N. Virginia. You can put “FireSim HW Simulation” for the use case. This request will need to be approved before you can proceed.

Next, you can launch an instance using the FireSim Demo AMI. To launch, click “Continue” then “Launch with 1-click” on that page. The default options are sufficient to get started, but you may configure them as you wish. One option you may wish to change is to use an f1.16xlarge instance if you want to run an 8-node cluster of Rocket Chips.

Once your instance has booted, login with username centos and the key you supplied at instance creation time, just like the FPGA Dev AMI. The first time you login, you should see the regular FPGA Dev AMI login message, in addition to a message that says “FireSim network config completed.” This sets up the necessary tap interfaces and bridge to enable communicating with the simulated nodes from the EC2 Instance.

2. AMI Contents

The AMI includes a variety of tools to help you run simulations and build software for RISC-V systems:

  • riscv64-unknown-* toolchain: The riscv-tools toolchain is pre-installed, including gcc and binutils. You can see all of the included tools by typing riscv64-unknown- and hitting tab twice.
  • ~/firesim-target-software: A custom-built Linux Distribution that runs on the simulated nodes. This directory contains a file called bbl-vmlinux, which contains the bbl bootloader and a Linux kernel. This directory also contains 8 root filesystem images, rootfs[0-7].ext4, one for each simulated node. If these images become damaged or you want to add additional software to the image, see the Extras section for how to re-build them.
  • FireSim-f1: This program controls simulation and communicates with the FPGA. We will not invoke this directly in this demo, instead opting to use convenience scripts that automatically pass command line arguments to this program.
  • boot-firesim-singlenode and boot-firesim-cluster: These scripts automatically invoke FireSim-f1 with the appropriate arguments to boot Linux with the appropriate root filesystem images. The singlenode script runs a single simulation, while the cluster script runs an 8-node simulation. The cluster script can be run only on an f1.16xlarge instance.

To familiarize ourselves with the environment, we will first simulate a single node. The single node simulation will work on both f1.2xlarge instances and f1.16xlarge instances.

3. Single-Node Demo

First, you will need to flash the FPGA with the FireSim AFI. To do so, run:

[[email protected]_ADDR ~]$ sudo fpga-load-local-image -S 0 -I agfi-00a74c2d615134b21

Now, to start a simulation, simply run:

[[email protected]_ADDR ~]$ boot-firesim-singlenode

This will automatically call FireSim-f1, passing it bbl-vmlinux as the bootloader/kernel and rootfs0.ext4 as the root filesystem. This command will produce output in the following format:

Simulations Started. You can use the UART console of each simulated node by attaching to the following screens:
There is a screen on:
        2492.fsim0      (Detached)
1 Socket in /var/run/screen/S-centos.

We can use the UART console by connecting to this screen, but we will opt to use ssh to access the node instead. First, ping the node to make sure it has come online. This is currently required because nodes may get stuck at Linux boot if the NIC does not receive any network traffic.1

[[email protected]_ADDR ~]$ ping 192.168.1.10   # this IP is fixed

This should eventually produce output like so, eventually with responses to pings:

PING 192.168.1.10 (192.168.1.10) 56(84) bytes of data.
From 192.168.1.1 icmp_seq=1 Destination Host Unreachable
...
64 bytes from 192.168.1.10: icmp_seq=1 ttl=64 time=2017 ms
64 bytes from 192.168.1.10: icmp_seq=2 ttl=64 time=1018 ms
64 bytes from 192.168.1.10: icmp_seq=3 ttl=64 time=19.0 ms
...

At this point, we know the simulated node is online. We can ssh into it using the username root and password firesim. It is also convenient to make sure that your TERM variable is set correctly. In this case, the simulation expects TERM=linux, so we will provide that.

[[email protected]_ADDR ~]$ TERM=linux ssh [email protected]
The authenticity of host '192.168.1.10 (192.168.1.10)' can't be established.
ECDSA key fingerprint is 63:e9:66:d0:5c:06:2c:1d:5c:95:33:c8:36:92:30:49.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '192.168.1.10' (ECDSA) to the list of known hosts.
[email protected]'s password:
#

At this point, you’re ssh-ed into the simulated node. Run uname -a as an example. You should see the following output:

# uname -a
Linux buildroot 4.12.0-rc2 #1 Fri Aug 4 03:44:55 UTC 2017 riscv64 GNU/Linux

At this point, you can run programs on the simulated node, as you would with a real machine. For example, you can jump down to section 5 below to run YCSB against memcached on the simulated node.

When you’re done, run the following to shutdown the simulated node:

# poweroff

You can confirm that the simulation has ended by running screen -ls, which should now contain no detached screens.

4. Eight-Node Cluster Demo

If you are running on an f1.16xlarge instance, we can continue on to simulate a cluster of 8 RocketChips. If you followed the previous steps to run a single node, make sure you have powered off the old simulated node first. If you are running an f1.2xlarge instance, skip this step and jump to step 5.

First, you will need to flash all 8 FPGAs with the FireSim AFI. To do so, run:

[[email protected]_ADDR ~]$ for i in {0..7}; do sudo fpga-load-local-image -S $i -I agfi-00a74c2d615134b21; done

To start the 8-node cluster simulation, run the following.

[[email protected]_ADDR ~]$ boot-firesim-cluster

This will produce similar output to the single node case, but it will have started 8 screens, one for each of the 8 nodes in the cluster:

Simulations Started. You can use the UART console of each simulated node by attaching to the following screens:
There is a screen on:
        2492.fsim0      (Detached)
        2493.fsim1      (Detached)
        2494.fsim2      (Detached)
        2495.fsim3      (Detached)
        2496.fsim4      (Detached)
        2497.fsim5      (Detached)
        2498.fsim6      (Detached)
        2499.fsim7      (Detached)

8 Sockets in /var/run/screen/S-centos.

Just like before, we will use ssh to access the simulated nodes, rather than relying on the UART. The nodes have fixed, sequential IP addresses, from 192.168.1.10 to 192.168.1.17. Again, you must ping each of the nodes first to check for liveness before you try to ssh into them.1 Once a node is booted up, you can ssh into it as before, providing username root and password firesim:

[[email protected]_ADDR ~]$ TERM=linux ssh [email protected] #can replace 10 with 11, 12, 13, 14, 15, 16, 17
The authenticity of host '192.168.1.10 (192.168.1.10)' can't be established.
ECDSA key fingerprint is 63:e9:66:d0:5c:06:2c:1d:5c:95:33:c8:36:92:30:49.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '192.168.1.10' (ECDSA) to the list of known hosts.
[email protected]'s password:
#

Once all the nodes have booted, you should be able to ping any node from any other node. As before, you can shut down individual simulated nodes with poweroff. You can check which nodes are still running by checking for entries in screen -ls.

5. Run YCSB Against memcached Running on Your Simulated Cluster

The Linux image for the simulated nodes includes a copy of the memcached key-value store. Once you’re ssh-ed into a simulated node, simply run memcached -u root to start the server.

As a sample workload, we’ll now run the Yahoo! Cloud Services Benchmark (YCSB) against our cluster of one or 8 RocketChips. Be sure to start memcached as shown above on each simulated node.

To install YCSB, run the following on the EC2 instance (not the simulated nodes). Note that we must checkout a specific commit of YCSB, because the version of maven in yum is behind:

[[email protected]_ADDR ~]$ sudo yum -y install maven
[[email protected]_ADDR ~]$ git clone https://github.com/brianfrankcooper/YCSB.git
[[email protected]_ADDR ~]$ cd YCSB
[[email protected]_ADDR ~]$ git checkout a9f5c0453dd90cbba595b581c620473cf3ab9bbd
[[email protected]_ADDR ~]$ mvn -pl com.yahoo.ycsb:memcached-binding -am clean package

To load the data needed for YCSB, run the following commmand. If you’re running a single simulated node, remove the IP addresses for nodes 1 to 7 (IPs 192.168.1.11 to 192.168.1.17) from the commands.

[[email protected]_ADDR ~]$ ./bin/ycsb load memcached -s -P workloads/workloada -p "memcached.hosts=192.168.1.10,192.168.1.11,192.168.1.12,192.168.1.13,192.168.1.14,192.168.1.15,192.168.1.16,192.168.1.17" > load_log

Next, we can run the actual workload (again, remove the unused IPs if you’re only simulating a single node):

[[email protected]_ADDR ~]$ ./bin/ycsb run memcached -s -P workloads/workloada -p "memcached.hosts=192.168.1.10,192.168.1.11,192.168.1.12,192.168.1.13,192.168.1.14,192.168.1.15,192.168.1.16,192.168.1.17" > run_log

This will produce output showing you the real-time performance of the simulated cluster in serving memcached requests to the outside world. For a more visual output, you can install iftop and run sudo iftop -n -i br0 to see a visualization of traffic between the EC2 instance and the simulated nodes.

Extras

You can easily add packages to the Linux Distro that boots on the simulated nodes. First, make sure that all of your simulations are powered off. Then run the following:

[[email protected]_ADDR ~]$ cd ~/firesim-target-software/buildroot
[[email protected]_ADDR ~]$ make menuconfig
[ use menuconfig interface to select packages ]
[[email protected]_ADDR ~]$ cp .config ../buildroot-config
[[email protected]_ADDR ~]$ cd ../
[[email protected]_ADDR ~]$ ./build.sh && ./cluster-rootfs.sh

This will rebuild all of the rootfs[0-7].ext4 to include your new packages.

Troubleshooting / Errata

1. Linux boot gets stuck at network initialization

Currently, nodes may get stuck at Linux boot if the NIC does not receive any network traffic. If this is the case, you will see the Linux boot stalled after the message:

Starting network: ifup: can't move '/var/run/ifstate.new' to '/var/run/ifstate': Function not implemented

To resolve this, simply ping the node from the EC2 instance as the guide indicates. Linux boot will then continue.

2. Other

We’ll add more troubleshooting steps here as we receive feedback about particular issues.

Still stuck?

Post on the FireSim Google Group and we’ll help you fix your problem.

Updated: