I have been trying to set up a new cluster using an 18 port MSX6015 switch and 9 systems with connectx-3 cards/built in IB:
08:00.0 Network controller: Mellanox Technologies MT27500 Family [ConnectX-3]
and 8X
02:00.0 Network controller: Mellanox Technologies MT27500 Family [ConnectX-3]
All cables are new FDR cables.
They are running a newish (3.17.3) kernel, although I have 4.0.3 ready to go when I can reboot them.
They are running debian with the current OFED, and even opensm 3.3.18
All firmware has been brought up to the most recent posted one.
I have set
# Force PortInfo:LinkSpeedExtEnabled on ports
# If 0, don't modify PortInfo:LinkSpeedExtEnabled on port
# Otherwise, use value for PortInfo:LinkSpeedExtEnabled on port
# Values are (MgtWG RefID #4722)
# 1: 14.0625 Gbps
# 2: 25.78125 Gbps
# 3: 14.0625 Gbps or 25.78125 Gbps
# 30: Disable extended link speeds
# Default 31: set to PortInfo:LinkSpeedExtSupported
#force_link_speed_ext 31
force_link_speed_ext 1
(have also tried 31)
# FDR10 on ports on devices that support FDR10
# Values are:
# 0: don't use fdr10 (no MLNX ExtendedPortInfo MADs)
# Default 1: enable fdr10 when supported
# 2: disable fdr10 when supported
fdr10 1
in the opensm.conf
I still get a connection of 40 and iblinkinfo saye:
4 10[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 3 1[ ] "MT25408 ConnectX Mellanox Technologies" ( Could be FDR10)
or
CA: MT25408 ConnectX Mellanox Technologies:
0x002590fffff7b3c5 33 1[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 4 13[ ] "SwitchX - Mellanox Technologies" ( Could be FDR10)
for all live ports.
Does anyone have any ideas of what i could be doing wrong?
Thanks.