Hi all,
I have several ConnectX-3 VPI adapters (model MCX354A-FCBT) and an SX1024 switch. The adapters are all in Ethernet mode and RoCE nominally appears to be working (based on, e.g., ib_send_bw and some MPI testing). I am using MLNX_OFED version 2.2-1.0.1 on the hosts and MLNX-OS image 3.3.5006 on the switch. Rather than configuring Priority Flow Control (PFC) and VLANs, I have just enabled pause support on the adapters (via ethtool -A) and on the associated switch ports (via setting flowcontrol receive and send to on).
While this appears to be working, I am not clear on how strong of a guarantee this makes for lossless Ethernet (at layer 2). Does enabling pause / flow control guarantee that there will never be an overflow on the adapter or switch, which would manifest as a dropped packets? I was under the impression that a pause packet requested a halt in transmission for a specified amount of time, but I could imagine instances where, e.g., that delay turned out to be too short to clear a buffer elsewhere. If lossless transfer is guaranteed, what portion of the stack (switch and adapter firmware, switch software, adapter driver) enforces such a guarantee?
Regards,
Thomas Benson