Real World Bifurcated (2x) Gen4x8 PCIe performance in excess of 22GB/s using ADM-PCIE-9H7/ASUS Pro WS X-570/AMD Ryzen

Alpha Data have demonstrated real world bifurcated (2x) Gen4x8 PCIe performance between AMD Ryzen 5-3600 CPU and Xilinx Virtex Ultrascale+™ VU37P HBM FPGA, using the ADM-PCIE-9H7 FPGA Accelerator Card on the ASUS Pro WS X570-ACE Motherboard.

Processor: AMD Ryzen 5 3600 6-Core
Motherboard: ASUS Pro WS X570-ACE (BIOS ver.9904)
RAM: 16GB DDR4 @ 3000MHz (dual bank)
VGA: NVIDIA GeForce 210
OS: Windows10 (x64)
Accelerator: Alpha Data ADM-PCIE-9H7


The Xilinx Virtex Ultrascale+™ HBM FPGAs support PCIe Gen4 and have multiple 8 lane capable endpoints in the device. The ADM-PCIE-9H7 board allows 2 of these to connect to the 16 lane edge connector. The AMD Ryzen CPU and the ASUS Pro WS X570-ACE Motherboard also now support PCIe Gen4, and the 9904 BIOS update allows the splitting of a Gen4x16 slot into two Gen4x8 devices. Compatibility and performance of these has now been verified by Alpha Data at their labs in Edinburgh

Reference Design

A bifurcated reference design has been created to demonstrate these features and is available on request to Alpha Data customers. The design implements 2 instantiations of the Xilinx XDMA core which connect to the very high speed high bandwidth on-chip BlockRAM and UltraRAM. The Alpha Data ADXDMA driver and API allows multi-threaded host access to these endpoints and DMA engines to allow maximum practical transfer performance to be achieved. The reference design is available for the ADM-PCIE-9H7 full height double width VU37P based accelerator as well as the low profile ADM-PCIE-9H3 VU33P based accelerator.

Performance Results

Compared to the line bit rate of 32GB/s, testing under the Windows 10 Operating System, the software and FPGA design achieved aggregate rates, in a single direction of more than 22GB/s and in some cases as high as 25GB/s.

Using UltraRAM instead of BRAM

Changing the design to implement 4MB of UltraRam (instead of 1MB of BlockRam), faster transfer speeds were achieved:

  • x10 times average: 26.21 GB/s
  • peak: 26.65 GB/s
  • valley: 25.76 GB/s


Using UltraRam simplifies the clock scheme to a single clock for both memory ports removing the need for cross-clock boundary logic, reducing latency and increasing the maximum throughput. This is achieved by setting the Optimization strategy to “Maximize Performance”.



More information on the Alpha Data FPGA Acceleration Platforms can be found online at:

About Alpha Data

Established in 1993, Alpha Data is a world leader in Xilinx FPGA based plug-in acceleration boards for Data Center, high-performance computing, and rugged embedded computing applications. Alpha Data’s high-reliability hardware platforms are ideal for development as well as full-scale production deployment – shaping the future in video processing, machine learning, and network acceleration.