Real World Bifurcated (2x) Gen4x8 PCIe performance in excess of 22GB/s using ADM-PCIE-9H7/ASUS Pro WS X-570/AMD Ryzen
Alpha Data have demonstrated real world bifurcated (2x) Gen4x8 PCIe performance between AMD Ryzen 5-3600 CPU and Xilinx Virtex Ultrascale+™ VU37P HBM FPGA, using the ADM-PCIE-9H7 FPGA Accelerator Card on the ASUS Pro WS X570-ACE Motherboard.
Processor: | AMD Ryzen 5 3600 6-Core |
Motherboard: | ASUS Pro WS X570-ACE (BIOS ver.9904) |
RAM: | 16GB DDR4 @ 3000MHz (dual bank) |
VGA: | NVIDIA GeForce 210 |
OS: | Windows10 (x64) |
Accelerator: | Alpha Data ADM-PCIE-9H7 |
The Xilinx Virtex Ultrascale+™ HBM FPGAs support PCIe Gen4 and have multiple 8 lane capable endpoints in the device. The ADM-PCIE-9H7 board allows 2 of these to connect to the 16 lane edge connector. The AMD Ryzen CPU and the ASUS Pro WS X570-ACE Motherboard also now support PCIe Gen4, and the 9904 BIOS update allows the splitting of a Gen4x16 slot into two Gen4x8 devices. Compatibility and performance of these has now been verified by Alpha Data at their labs in Edinburgh
Reference Design
A bifurcated reference design has been created to demonstrate these features and is available on request to Alpha Data customers. The design implements 2 instantiations of the Xilinx XDMA core which connect to the very high speed high bandwidth on-chip BlockRAM and UltraRAM. The Alpha Data ADXDMA driver and API allows multi-threaded host access to these endpoints and DMA engines to allow maximum practical transfer performance to be achieved. The reference design is available for the ADM-PCIE-9H7 full height double width VU37P based accelerator as well as the low profile ADM-PCIE-9H3 VU33P based accelerator.
Performance Results
Compared to the line bit rate of 32GB/s, testing under the Windows 10 Operating System, the software and FPGA design achieved aggregate rates, in a single direction of more than 22GB/s and in some cases as high as 25GB/s.
Using UltraRAM instead of BRAM
Changing the design to implement 4MB of UltraRam (instead of 1MB of BlockRam), faster transfer speeds were achieved:
- x10 times average: 26.21 GB/s
- peak: 26.65 GB/s
- valley: 25.76 GB/s
Using UltraRam simplifies the clock scheme to a single clock for both memory ports removing the need for cross-clock boundary logic, reducing latency and increasing the maximum throughput. This is achieved by setting the Optimization strategy to “Maximize Performance”.
ADM-PCIE-9H7 and ADM-PCIE-9H3
More information on the Alpha Data FPGA Acceleration Platforms can be found online at:
About Alpha Data
Established in 1993, Alpha Data is a world leader in Xilinx FPGA based plug-in acceleration boards for Data Center, high-performance computing, and rugged embedded computing applications. Alpha Data’s high-reliability hardware platforms are ideal for development as well as full-scale production deployment – shaping the future in video processing, machine learning, and network acceleration.