DIMM-connected Flash: The next evolution in flash acceleration

Well here we go again. But this time, we’re not talking about some new ultra responsive flash technology simply in terms of a new chip fabrication technique, but rather focusing on narrowing the distance between the flash memory and the CPU. This distance will be the focus of this article. You may ask, “why is that important?” (I’m glad you asked).

While it is true that not all flash memory is the same (SLC vs. MLC vs. TLC), that is only half the story. The rest of the story is how the flash memory is physically connected (and more specifically where it is connected) to the underlying system. There now exists four different ways to connect flash to a system:

The first method is direct attached. This involves connecting the flash memory to an internal storage port within the system, usually through some type of storage I/O controller.

The second method is array based. This involves connecting the flash memory to an external storage array, which has controllers of its own. These controllers are then connected to the system via a Host Bus Adapter (HBA).

The third method is PCIe based. This involves placing the flash memory on a PCIe card and connecting it via the PCIe slot within the system.

The fourth method is DIMM based. This involves placing the flash memory on a DIMM memory board and connecting it via the DIMM slots within the system.

The illustration below shows the various relationships described above:

General System Board Layout

With regard to the diagram above, you should immediately notice that the DIMM-connected option is the closest flash component to the CPU simply in terms of its electrical path. In order to communicate with the devices of the other three connection methods, the I/O to and from the CPU needs to traverse additional electrical circuits (on-board South-Bridge controller and additional controllers downstream from the CPU), therefore increasing latency for each operation.

So what is the bottom line? The bottom line is context switching, and more specifically Round Trip Time (RTT). Context switching is used to maintain the state of a process while waiting for a slower component of the system to respond (such as a disk). Context switching is generally considered to be expensive since it takes time to complete the switching operation (typically between 2 – 100 microseconds). The more time a process takes to complete, the slower the entire process runs. Pretty simple right?

Context switching is necessary in order to prevent holding up the entire system waiting for each process to complete. Without context switching, systems would not have the ability to multitask operations making the user experience extremely frustrating.

The motivation behind connecting flash at the DIMM slot is to reduce the number of context switches needed for each operation performed against the storage device. Below is a comparison of the number of context switches that are typically needed to move I/O through each connection method to the downstream storage device (these are general numbers and will vary based on manufacturer):

Connection Method Context Switches
Array Based 12-16
Direct-Attached 8-10
PCIe Based 2-4
DIMM Based 1-2

It is still very early in the DIMM-connected flash arena, but it offers some very exciting and affordable options for those looking to introduce flash into their environment. With all of these different flash options available in the industry, it will take an intelligent storage management system to efficiently coordinate access and unify all of these various storage devices (hint hint).