Hi Everyone,
I have a new setup using 3 windows server hosts running WinOF with dual port Connectx-2 cards. These connect to two 4036 switches (one port to each switch and two links between switches). Here is the netdiscover:
#
# Topology file: generated on Thu Jun 30 10:32:28 2016
#
# Initiated from node 0008f10500203b28 port 0008f10500203b28
vendid=0x8f1
devid=0x5a5a
sysimgguid=0x8f10500109553
switchguid=0x8f10500109552(8f10500109552)
Switch 36 "S-0008f10500109552" # "Mellanox 4036 # 4036-SW2" enhanced port 0 lid 6 lmc 0
[1] "S-0008f10500203b28"[1] # "Mellanox 4036 # 4036-SW1" lid 1 4xQDR
[2] "S-0008f10500203b28"[2] # "Mellanox 4036 # 4036-SW1" lid 1 4xQDR
[34] "H-0002c903004e445a"[1](2c903004e445b) # "IGA-S2D1" lid 2 4xQDR
[35] "H-0008f104039a3c1c"[2](8f104039a3c1e) # "IGA-S2D2" lid 5 4xQDR
[36] "H-0008f104039a4e3c"[2](8f104039a4e3e) # "IGA-S2D3" lid 8 4xQDR
vendid=0x8f1
devid=0x5a5a
sysimgguid=0x8f10500203b29
switchguid=0x8f10500203b28(8f10500203b28)
Switch 36 "S-0008f10500203b28" # "Mellanox 4036 # 4036-SW1" enhanced port 0 lid 1 lmc 0
[1] "S-0008f10500109552"[1] # "Mellanox 4036 # 4036-SW2" lid 6 4xQDR
[2] "S-0008f10500109552"[2] # "Mellanox 4036 # 4036-SW2" lid 6 4xQDR
[34] "H-0002c903004e445a"[2](2c903004e445c) # "IGA-S2D1" lid 3 4xQDR
[35] "H-0008f104039a3c1c"[1](8f104039a3c1d) # "IGA-S2D2" lid 4 4xQDR
[36] "H-0008f104039a4e3c"[1](8f104039a4e3d) # "IGA-S2D3" lid 7 4xQDR
vendid=0x2c9
devid=0x673c
sysimgguid=0x8f104039a4e3f
caguid=0x8f104039a4e3c
Ca 2 "H-0008f104039a4e3c" # "IGA-S2D3"
[1](8f104039a4e3d) "S-0008f10500203b28"[36] # lid 7 lmc 0 "Mellanox 4036 # 4036-SW1" lid 1 4xQDR
[2](8f104039a4e3e) "S-0008f10500109552"[36] # lid 8 lmc 0 "Mellanox 4036 # 4036-SW2" lid 6 4xQDR
vendid=0x2c9
devid=0x673c
sysimgguid=0x8f104039a3c1f
caguid=0x8f104039a3c1c
Ca 2 "H-0008f104039a3c1c" # "IGA-S2D2"
[1](8f104039a3c1d) "S-0008f10500203b28"[35] # lid 4 lmc 0 "Mellanox 4036 # 4036-SW1" lid 1 4xQDR
[2](8f104039a3c1e) "S-0008f10500109552"[35] # lid 5 lmc 0 "Mellanox 4036 # 4036-SW2" lid 6 4xQDR
vendid=0x2c9
devid=0x673c
sysimgguid=0x2c903004e445d
caguid=0x2c903004e445a
Ca 2 "H-0002c903004e445a" # "IGA-S2D1"
[1](2c903004e445b) "S-0008f10500109552"[34] # lid 2 lmc 0 "Mellanox 4036 # 4036-SW2" lid 6 4xQDR
[2](2c903004e445c) "S-0008f10500203b28"[34] # lid 3 lmc 0 "Mellanox 4036 # 4036-SW1" lid 1 4xQDR
4036-SW1(utilities)#
Everything us up and transmitting data, but I have two concerns. ntttcp tests are only getting 1500-1800 MB/Sec, I would expect a bit more from QDR even with overhead. Can anyone tell me if this is normal or what could be tuned to improve? Additionally ibqueryerrorsis showing an increasing increasing PortXmitWait cound on any sending interface while I test. From what I can see some increase here is normal, but these seem high.
PS C:\> ibqueryerrors
Errors for "IGA-S2D3"
GUID 0x8f104039a4e3d port 1: [PortXmitWait == 1]
GUID 0x8f104039a4e3e port 2: [PortXmitWait == 541647202]
Errors for 0x8f10500203b28 "Mellanox 4036 # 4036-SW1"
GUID 0x8f10500203b28 port ALL: [PortXmitWait == 86266712]
GUID 0x8f10500203b28 port 1: [PortXmitWait == 33430055]
GUID 0x8f10500203b28 port 34: [PortXmitWait == 52836657]
Errors for 0x8f10500109552 "Mellanox 4036 # 4036-SW2"
GUID 0x8f10500109552 port ALL: [PortXmitWait == 2344169726]
GUID 0x8f10500109552 port 0: [PortXmitWait == 261]
GUID 0x8f10500109552 port 34: [PortXmitWait == 2344169465]
Errors for "IGA-S2D1"
GUID 0x2c903004e445b port 1: [PortXmitWait == 59]
1500MB/sec is not terrible, but it is closer to a 15gbps connection than the 32gbps that this should be. Anyone have any hints?
Thanks!