How-to Implement DRAM Buffer for Data Acquisition in LabVIEW FPGA

Updated Jun 7, 2023

Environment

Hardware

  • PXIe-7965
  • NI-5734

Software

  • LabVIEW FPGA Module

The following document will guide you through the step-by step process of implementing a DRAM buffer on your supported NI FPGA device. The steps will be demonstrated on a FlexRIO 7965R with a 5734 Digitizer Adapter Module and will cover how to acquire software triggered data into the DRAM on all four analog input channels with a user defined, 120 MHz Sample Rate and how to transfer this data to the host after. The steps follow the exact process of implementing a DRAM buffer on the above mentioned devices, yet you can follow these principles to implement a similar buffer successfully on any supported device with different DRAM size or channel count if you perform the modifications required by your application. The article will follow the best practices recommended by NI and was designed for people with experience in LabVIEW FPGA programming who have got no or limited experience in DRAM buffering. This document makes some simplifying assumptions in that the host will never request data while acquiring and that you do not need to ,,wrap” address pointers when the maximum DRAM address is reached.
 

  1. Read the following guide first to understand the basic principles of DRAM
  2. Create your project and add your FPGA target and adapter module if you have one
  3. Create your required ADC Clock, for the example 5734 and 7965R this will be a 120 MHz Base Clock using IO Module Clock 0
  4. Create a single cycle timed loop clocked by the ADC Clock
  5. Put your I/O nodes, a Fetch Length (U32) and a Software Trigger (BOOL) Control into the loop
1_Controls_Ionodes.jpg
  1. Check the datasheet of your I/O module for the resolution and voltage range of your ADC-s. The 5734 has 16 bit ADC-s on each channel and +-10V of voltage range.
  2. Create an indicator for a channel on your I/O node and check the data type. The 5734 is returning U16 data as it can be seen on the above screenshot. In this case it’s recommended to convert the U16 input to I16 using XOR gates and inserting a binary formatted U16 constant (right click on the U16 numeric constant >> Display Format >> Select binary) where the first number is one and the rest 15 digits are 0. The principle is based on Two’s complement, this logical operation will transform the range of measurement values from [ 0 … 65535 ] to [-32768 …. +32767] in order to better represent the +-10V input range
 
2_Xor gates.jpg
 
  1. Read this guide carefully to understand why the appropriate access size and optimal clock rate is very important in the case of DRAM. Find your device in the provided Table. No matter what device you use, you should always try to use it’s DRAM with these parameters if possible. The 7965R has
 
  • 128bit access size
  •  and 100MHz optimal clock rate
 
Currently we have
  • 4 * 16 = 64 bit of data per iteration
  • In a 120 MHz loop
 
Therefore we need to
  • Bit pack the data into 128 bit chunks
  • Transfer the measurement data to a 100MHz loop which is optimal for the DRAM
  1. Create a custom control of a cluster equal to the access size of your DRAM, in our example this will be two U64-s. Name the members  Upper and Lower. This will be used to maximize all 128 bits of the DRAM interface.
  2. Create a target-scoped DMA FIFO from block memory. It will transfer data to another clock domain for writing to DRAM. Change the data type of the FIFO to use the custom control
 
FIFO w cluster.jpg
 
  1.  You can use the Join Numbers node for bit packing to your access size. In the case of 7965R and 5734 it will take two iterations worth of data (2 iterations x 4 x 16 bit) to fill the 128bit FIFO created in Step 9. So you will only write to it every other clock cycle after the trigger condition. Use a Feedback Node to retain the prior iteration’s ADC data and put it in the Lower member of the 128-bit cluster. Put the current iteration’s ADC data in the Upper member.
 
3_every other iteration.jpg
     
    1. You will also need to control on which cycles the FIFO write is enabled. On the first cycle that the trigger is received, you should not write to the FIFO because you only have 64bits of acquired data since the trigger. You should rather write in the following cycle and every other after until the Fetch Length is reached. Be sure to properly implement an appropriate counter with the initial conditions to provide this behavior too besides counting the iterations for the Fetch Length. Here is one possible implementation:
     
    4_counter and logic.jpg


     
    1. In the case of 7965R and 5734 your code should look like this: 
    5_whole first loop.jpg

     
    1. Build the DRAM loop:
    • Create a clock equal to the optimal rate of the DRAM on your device. In the case of our example setup it will be a base clock based on the the default 100 MHz clock
    • Create a single-cycle timed loop using this clock
    1. Add a new memory named ,,DRAM 0” to your FPGA target in the Project Explorer with the 128-bit custom cluster data type, configure it to the size needed in your application.
     
    6_DRAM props.jpg

     
    1. Build code to read the ADC FIFO and write to sequential addresses in DRAM. You will have to build a counter to do this, reading /writing only by counting when new, valid data is available. Here is one possible implementation:
    7_handshake.jpg
     
    1. Create a DMA Target to host FIFO for transferring data to the host. Since the maximum size for this FIFO is a U64, use that, but we will have to de-interleave the data in a subsequent step.
    2. Since we are assuming that the host will not request data from DRAM until the acquisition completes, we do not have to implement DRAM arbitration. But we must signal when this acquisition completes. See the Acquiring indicator in the ADC loop for this purpose at Step 13.
    3. There are 4 conditions which must be met before we can request data from DRAM for transfer to the host:
    1. Acquiring must be false
    2. There must be room in the DMA to host FIFO to receive the data returning from DRAM. To be safe, this should be larger than the size of the “Maximum outstanding requests for data.”
    3. The DRAM Request data node must be ready for input
    4. We must not attempt to read from an address which has not yet been written to
    The following code is one such way of implementing this:
    8_request Data.jpg
    1. Add the Target to Host DMA FIFO configured with U64 to the block diagram
    2. The DRAM Retrieve Data node can be executed on every clock cycle, except for those immediately following a successful retrieval. This is due to the fact that the DRAM will return 128 bits, and it will take two 64-bit writes into the DMA to Host FIFO to successfully return the data. Delay the MSBs until the second write to return the ADC data in-order. “Stretch” the output valid pulse from the DRAM Retrieve Data to write to the DMA to Host FIFO twice. Ensure that you do not retrieve data on the subsequent clock cycle. Here is one possible implementation:
    9_fromDMAtoHost.jpg
    1. Build your Host VI using FPGA best practices. Note that the conversion with the ,,to I16” node happens on the Host after the split number nodes. It’s important that this step should not replace rather follow the XOR logical operation on the FPGA described at step 7.
    Host_Snippet.png

    Next Steps

    Refer to the LabVIEW High performance FPGA Developer's Guide for optimization techniques and best practices for Hight throughput FPGA applications. The DRAM buffer implemented in this guide can effectively help in handling the transient issues described in the Data Transfer Mechanisms chapter from page 73. However since DRAM has very high possible Data Rate (800MB/s - 10.5 GB/s ) you can easily run into situations where you exceed the PCIe lane bandwidth. The aforementioned chapter from the book also contains guidance on how to avoid these situations.