Hello Vorak,
Thank you for the output. I went over the files it seems like the MTU on ib0 is mismatching. on one server its 2044 and on the other one its 4092.
So first I will make sure both servers running on the same IPoIB mode (connected/datagram):
The IPoIB driver supports two modes of operation: datagram and connected. The mode is set and read through an interface's
/sys/class/net/<intf name>/mode file.
In datagram mode, the IB UD (Unreliable Datagram) transport is used and so the interface MTU has is equal to the IB L2 MTU minus the
IPoIB encapsulation header (4 bytes). For example, in a typical IB fabric with a 2K MTU, the IPoIB MTU will be 2048 - 4 = 2044 bytes.
In connected mode, the IB RC (Reliable Connected) transport is used.
Let me know if it helps addressing the issue.
BR,
Einav