Peer to Peer(P2P) Streaming FAQ

Updated Sep 9, 2019

Reported In

Hardware

  • PXI Chassis
  • PXI Controller

Issue Details

  • I'm not getting the P2P transfer rate I expect. What do I configure?
  • My P2P application works fine with a PXIe-8135 controller, but suffers reduced throughput with another controller such as PXIe-8133, even though all devices are behind the same backplane switch. Why does this problem happen?
  • Are P2P streams possible between devices that are in two different chassis? If so, what is the best hardware setup?
  • Does P2P depend on any PXIe features, like PXI trigger lines?
  • For P2P between two slots that sit behind different PXI backplane switches, does the PCIe traffic/bandwidth affect the upstream link on the Slot 1 MXI remote controller?
  • Why does my max latency for a continuous P2P stream seem to grow over time?
  • What's the best slot placement in a PXIe chassis for P2P? What about station placement within a PCIe switch?

Solution

  • I'm not getting the P2P transfer rate I expect. What do I configure?
 In order to keep data flowing continuously, the P2P Reader needs to grant the P2P Writer enough credits such that the Writer can send an amount of data equal to the one-way P2P latency between the Writer and Reader plus the amount of time it will take the Reader to read out a number of samples equal to 1/4 of its FIFO depth plus the amount of time it takes the flow control packet (4 bytes) to make it from the Reader back to the Writer. The easiest way to ensure success is to size the P2P Reader large enough and to ensure you don't slam the bus with too much traffic (more than the PCIe links are rated for), leading to congestion and a delay in the arrival of flow control packets.
 
  • My P2P application works fine with a PXIe-8135 controller but suffers reduced throughput with another controller such as PXIe-8133, even though all devices are behind the same backplane switch. Why does this problem happen?
 
 While P2P will work between two devices behind a single backplane switch regardless of the controller in slot 1 (even if the controller itself does not support P2P), there is one impact that the controller exerts on P2P and that is the max payload size.  The BIOS programs all devices in the system with a max payload size, which will determine the max throughput for P2P as well.  For Gen1 x4 devices, the max P2P throughput is 838 and 906 MB/s for 128 and 256-byte payloads, respectively.  While the PXIe-8135 and PXIe-8880 controllers ship with 256-byte max payload size by default, the PXIe-8133 defaults to 128-byte max payloads.  However, the BIOS was updated shortly after shipping to include an option for 256-byte max payload size (it doesn't say 256, but has an "automatic" option).
 
  • Are P2P streams possible between devices that are in two different chassis? If so, what is the best hardware setup?
 P2P between devices in different chassis works.  Pretty much all variations with embedded or remote controllers work with all of our "modern" chassis like the PXIe-1075, PXIe-1085 and PXIe-1082 and "modern" embedded controllers like the PXIe-8133 and the PXIe-8135 as well as all of our remote controllers.  If you have an embedded controller in the first chassis, you will likely be using a peripheral card MXI in that chassis to slot 1 of the second chassis.  Ideally, you should place the peripheral MXI card and one of the devices together behind the same switch (see chassis manual for a diagram of which slots are behind which backplane switch) to keep the traffic local to that switch and to reduce latency a bit.  The other device can be anywhere in the second chassis as its traffic will need to go through the backplane switch to the Slot 1 MXI controller to be sent back to the first chassis via the MXI cable.
 
  • Does P2P depend on any PXIe features, like PXI trigger lines?
 P2P does not rely on or need any PXIe features.  It is strictly a PCI Express capability that is built into all PCI Express switches and most PCIe chipsets (like the ones that Intel and AMD produce as part of the CPU/PCH chipset combination).  All data travels over the PCIe lines.  The only difference is that when the data hits a switch, the switch can decide whether the destination address is host memory or the memory space of another piece of hardware.  If the destination address for the data in a PCIe packet maps to another piece of hardware, the switch routes the data there.  The destination device must be capable of accepting large packets (e.g. 128 bytes or 256 bytes) on the destination address being written to.  Accepting them efficiently is also important, and we have done so in our PCIe interface (CHInCh).  Between the P2P writer and the P2P reader, we also have a flow control mechanism, but that also uses the same approach to send flow control updates as we do for the main data stream.
 
  • For P2P between two slots that sit behind different PXI backplane switches, does the PCIe traffic/bandwidth affect the upstream link on the Slot 1 MXI remote controller?
No, the traffic stays local to the two downstream ports of the Slot 1 PCIe switch that serves the two PXI backplane switches.  In the diagram below, which shows a chassis with a MXI card in slot 1, the red route does NOT consume any of the Gen2 x8 BW from the chassis in the diagram to the master chassis.  It only consumes local BW within the switch and of course the two ports that serve the two backplane switches.  Since we do not have a P2P storage solution, streaming to disk from slot 11 (NI device) to slot 2 (RAID storage) would take the green path up to the host controller and back down again.
 

 
The following diagram is the NI PXIe-8381 (153097x-01L) Block Diagram. It shows the ports coming in and out of the PCIe switch in the Slot 1 MXI card.  For a dual-link chassis like the PXIe-1085, the switch will merge the 4 Gen2 x4 links into 2 Gen2 x8 links that connect to the switches on the chassis backplane.
 

 
  • Why does my max latency for a continuous P2P stream seem to grow over time?
 Your writer rate and reader rate are matched.  If the P2P reader is not reading samples faster than the P2P writer is writing them, any hiccups in the system (such as other streams affecting the P2P stream or host reads/writes of indicators and controls affecting the P2P FIFOs at either end) can have a permanent effect on the latency of the stream.  If the reader ever has a cycle in which he cannot read data (after he starts reading the first sample/packet), then those missed cycles are permanently "lost" and data starts to pile up in the reader FIFO.  The connection between the writer FIFO and the reader FIFO is usually faster than the stream rate, in which case any backed up data in the writer FIFO will eventually make it to the reader FIFO and if it fills up, it will stop granting credits to the writer FIFO.  One way to manage a reader FIFO that must continuously provide data to a DAC, for example, is to only start reading from the FIFO until it is a certain depth full, like half full.  Ideally, make sure the P2P reader can read data out faster than the P2P writer can push it in.  Only match data rates between them if their clocks are locked, and you only started reading from the P2P Reader FIFO after it reaches a fullness that will never be emptied by any temporary delays in traffic arriving.
 
  • What's the best slot placement in a PXIe chassis for P2P? What about station placement within a PCIe switch?
 Most importantly, if you can keep peers behind the same PCIe switch, you'll get the lowest latency and the P2P traffic will have the least impact on the rest of the system. Traffic between 2 devices hanging off of different switches will need to travel through a 3rd PCIe switch or the root complex (CPU), resulting in 2 extra hops through PCIe switches. In this case, the P2P traffic will be sharing the same PCIe bandwidth as other data streams.
 Peers placed behind the same "station" on a PCIe switch may have slightly a lower latency and may have a lower impact on traffic within a switch. A station in a PLX PCIe switch is a block of 16 lanes, which can be divided up into 2 x8 ports or 4 x4 ports.