Quantcast
Channel: Mellanox Interconnect Community: Message List
Viewing all articles
Browse latest Browse all 6278

Troubleshooting Mellanox InfiniBand card

$
0
0

 

We have a Mellanox MT27500 Family, ConnectX-3 FDR InfiniBand card that we have purchased and set up in our Mechanical Engineering department cluster. Everything was working fine until a week ago when InfiniBand suddenly stopped working for no apparent reason. I have been trying to troubleshoot this issue with no success and am need of some help from the experts.

 

 

When i try to start the subnet manager on the master node using the command,

 

 

[user@server ~]# /etc/init.d/opensm start

 

 

i get an error saying it failed to start and the following message gets logged in the log file.

 

 

Sep 30 10:36:58 137756 [DE707700] 0x80 -> OpenSM 3.3.15
Entering DISCOVERING state

Sep 30 10:36:58 144767 [DE707700] 0x02 -> osm_vendor_init: 1000 pending umads specified
Sep 30 10:36:58 148482 [DE707700] 0x80 -> Entering DISCOVERING state

No local ports detected!
Sep 30 10:36:58 148959 [DE707700] 0x01 -> perfmgr_mad_unbind: ERR 5405: No previous bind
Sep 30 10:36:58 148969 [DE707700] 0x01 -> osm_congestion_control_shutdown: ERR C108: No previous bind
Sep 30 10:36:58 149163 [DE707700] 0x01 -> osm_sa_mad_ctrl_unbind: ERR 1A11: No previous bind
Exiting SM

 

 

The most curious thing is that the command ibstat returns nothing which is making it really hard for me to troubleshoot this issue. However trying it in debug mode gives the following output.

 

 

[user@server ~] ibstat -dd

 

ibwarn: [29989] umad_init: umad_init
ibwarn: [29989] umad_get_cas_names: max 32
ibwarn: [29989] umad_get_cas_names: return 0 cas

 

 

I am more than willing to provide any other information you need to get to the bottom of it.

 

 

Any help is greatly appreciated!

 


Viewing all articles
Browse latest Browse all 6278

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>