Quantcast
Channel: Mellanox Interconnect Community: Message List
Viewing all 6278 articles
Browse latest View live

Re: MT25418 shows as mt401

$
0
0

the MFE_NO_FLASH_DETECTED is indicating a possible flash corruption

try first to see if the adapter's flash is in recovery state by running: # lspci -vvvxxx | grep Mellanox

In case it is - then try burning the original / earlier fw again over mt25418_pciconf0, ensure that flash is well detected, then try again to change the card id to MT401_pciconf0

if it fails this time then you're probably on the top of a faulty adapter


Advice on partitioning an IB network

$
0
0

Hello,

 

I would appreciate some advice in partitioning an IB network, please.

 

We have quite a large IB network -- there are almost 800 hosts on the network. All these hosts are part of a single computation cluster. Recently, a researcher in the university bought a new system consisting of 4 hosts. This new system is completely separate for the cluster in that it does share the same private Ethernet. On the other hand to made the new system affordable we decided to allow the owner to take 4 of our spare/free IB ports.

 

Currently, both the new and old system share the same OpenSM partition. That is...ibhosts gives both the new/old hosts:

New hosts...

Ca      : 0xe41d2d0300e16190 ports 2 "srv01935 mlx4_0"

Ca      : 0x248a070300f052f0 ports 2 "srv01934 mlx4_0"

Ca      : 0xe41d2d0300e166d0 ports 2 "srv01933 mlx4_0"

Ca      : 0xe41d2d0300e16350 ports 2 "srv01932 mlx4_0"

Old hosts...

Ca      : 0xf452140300225f20 ports 1 "orange02 HCA-1"

Ca      : 0xf452140300225ec0 ports 1 "orange03 HCA-1"

etc, etc...

 

I'm wondering if it is best to place the old/new hosts in separate partitions. Does that make sense? If it does make sense then how do I best construct the partitions.conf file to do this? That is, placing the new (srv..) hosts in a partition is easy, but how do I ensure that the default partition is just the old hosts?

 

Best regards,

David

Mellanox card disappeared from PCI bus

$
0
0

Hello,

 

I have to computers with Mellanox ConnectX-3 Infiniband cards connected with each other directly. I configured several VMs on each node with SR IOV passthrough of Infiniband cards. When I was mostly done I tried to also configure IB to make it usable on the host. I rebooted the hosts and saw that the IB cards completely disappeared from the PCI bus. So I rebooted the system several times again and one of the IB cards reappeared. But another one is still missing. I completely disconnected the host from any cable and even unplugged and plugged the card, but this had no effect.

 

Important fact is that when I boot any of the nodes, one of the first screens which I see during the boot process shows some message from IB firmware. There I can enter into some menu and enable or disable SR-IOV, set maximum number of  physical functions, and some other things. When the IB card is gone from lspci, the boot screen from the firmware does not appear.

 

Now I try to describe my system and outline the actions I took when I configured IB passthrough. As the host I have Debian 9 and I installed IB drivers from the Debian repository. On the guests I have Centos 7.3 and there I installed Mellanox distribution of OFED for Centos 7.3. For virtualization I use Qemu/KVM with libvirt.

 

My card shows on the host as:

05:00.0 Network controller: Mellanox Technologies MT27500 Family [ConnectX-3]

05:00.1 Network controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]

05:00.2 Network controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]

05:00.3 Network controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]

05:00.4 Network controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]

 

Both host and guest used mlx4_core drivers, here is the list of some of the modules in the host system:

Module                  Size  Used by

mlx4_ib               163840  0

mlx4_en               114688  0

mlx4_core             303104  2 mlx4_en,mlx4_ib

kvm_intel             192512  0

kvm                   589824  1 kvm_intel

irqbypass              16384  1 kvm

ib_umad                24576  0

ib_core               208896  2 ib_umad,mlx4_ib

I also was loading ib_ipoib on the host, as well as on the guest. But on the guest it was crashing the kernel.

 

Additional suspicious thing happened when I was attaching virtual functions to the guest systems (sudo virsh attach-device ...). Following messages were appearing in the kernel log:

 

Jul  6 16:07:04 ib1 kernel: [  281.707448] vfio-pci 0000:05:00.4: enabling device (0000 -> 0002)

Jul  6 16:07:06 ib1 kernel: [  283.475412] virbr1: port 5(vnet3) entered learning state

Jul  6 16:07:08 ib1 kernel: [  285.491419] virbr1: port 5(vnet3) entered forwarding state

Jul  6 16:07:08 ib1 kernel: [  285.491424] virbr1: topology change detected, propagating

Jul  6 16:07:13 ib1 kernel: [  290.895918] kvm [2264]: vcpu0, guest rIP: 0xffffffff81060d78 disabled perfctr wrmsr: 0xc2 data 0xffff

Jul  6 16:07:13 ib1 kernel: [  290.933587] kvm: zapping shadow pages for mmio generation wraparound

Jul  6 16:07:13 ib1 kernel: [  290.939149] kvm: zapping shadow pages for mmio generation wraparound

Jul  6 16:07:14 ib1 kernel: [  291.721929] mlx4_core 0000:05:00.0: Received reset from slave:4

Jul  6 16:07:14 ib1 kernel: [  291.767436] mlx4_core 0000:05:00.0: Unknown command:0x55 accepted from slave:4

Jul  7 07:52:13 ib1 kernel: [56990.799006] mlx4_core 0000:05:00.0: mlx4_eq_int: slave:2, srq_no:0x41, event: 14(00)

Jul  7 07:52:13 ib1 kernel: [56990.799009] mlx4_core 0000:05:00.0: mlx4_eq_int: sending event 14(00) to slave:2

Jul  7 08:39:31 ib1 kernel: [59828.975516] mlx4_core 0000:05:00.0: Received reset from slave:4

Jul  7 08:39:31 ib1 kernel: [59829.044683] virbr1: port 5(vnet3) entered disabled state

Jul  7 08:39:31 ib1 kernel: [59829.044752] device vnet3 left promiscuous mode

 

Note the line with "Unknown command".

 

I did not update the firmware, at least no in a recent time.

 

ibstat on the working system says following:

 

CA 'mlx4_0'

CA type: MT4099

Number of ports: 2

Firmware version: 2.34.5000

Hardware version: 0

Node GUID: 0xf45214030010a4a0

System image GUID: 0xf45214030010a4a3

Port 1:

     State: Down

     Physical state: Polling

     Rate: 10

     Base lid: 0

     LMC: 0

     SM lid: 0

     Capability mask: 0x0250486a

     Port GUID: 0xf45214030010a4a1

     Link layer: InfiniBand

Port 2:

     State: Down

     Physical state: Polling

     Rate: 10

     Base lid: 0

     LMC: 0

     SM lid: 0

     Capability mask: 0x0250486a

     Port GUID: 0xf45214030010a4a2

     Link layer: InfiniBand

 

Could you help me to get my card back?

Re: link down and ip address lost with mellanox ofed 100G card

$
0
0

Hi,

 

Check it with ip addr show instead of ifconfig -a.

Marc

Re: random write failing with 100G connect 4x card

$
0
0

Hi,

 

What is the test you use ? fio ?

 

Can you provide me the command line.

 

Thanks

Marc

Question about inner tcp csum calculation

$
0
0

I used ConnectX-3 Pro(Model No CX312B) to send geneve packet

 

when send geneve pkt with no option(geneve headlen is same as vxlan), the nic can calculate inner tcp csum. but send geneve pkt with option(headlen is longer than vxlan), the nic does not calculate inner tcp csum.

 

I think the nic does not know the offset of inner tcp head,

 

How does the nic get the inner tcp head offset?  Is setting by driver or hardware fixed value?

 

If the value can be set, how can I set it?

 

Thanks for all the help

Re: link down and ip address lost with mellanox ofed 100G card

$
0
0

Hi,

 

I suggest you to get more support on this issue by contacting mellanox support from www.mellanox.com

 

Marc

Re: Advice on partitioning an IB network

$
0
0

Hi David,

 

It does make sense to partition your subnet but it depends on how much disruption you're willing to tolerate and whether your apps are partition aware. What are your apps ?

 

Assuming you are running without partitions.conf file now, all hosts are full members of default partition. In order to separate out the 4 new hosts on it's own partition, all the existing hosts will also to be placed on their own partition so there is no communication possible between those groups of hosts due to the default partition requirement for SA communication.

 

Also, where does SM run ? Does it run on one of the existing hosts ? Is it a dedicated node ? Or does it run somewhere else (embedded in a switch) ?

 

-- Hal


Re: Need help to recover switch Silverstorm 9024-CU24-ST2-DDR admin password

Re: Does anyone know what the Max Junction temperature is for the MT27508 IC on a ConnectX-3

$
0
0

Hi Viki,

 

Where can I get a Mellanox document including this thermal specification?

 

Thanks,

Kevin

MLNX-OS latest version for MSX6012F

$
0
0

Where may I download the latest version of MLNX-OS for the SX6012F Infiniband switch ?

Packet loss with multi-frame payloads

$
0
0

Hello,

  I am having a problem with packets loss in my DPDK application and I hope you can help me out. Below you find a description of the application and of the problem.

It is a little long, but I really hope somebody out there can help me, because this is driving me crazy.

 

Application

 

I have a client-server application; single server, multiple clients.

The machines have 8 active cores which poll 8 distinct RX queues to receive packets and use 8 distinct TX queues to burst out packets (i.e., run-to-completion model).

 

Workload

 

The workload is composed of mostly single-frame packets, but occasionally clients send to the server multi-frame packets, and occasionally the server sends back to the client multi-frame replies.

Packets are fragmented at the UDP level (i.e., no IP fragmentation, every packet of the same requests has a frag_id == 0, even though they share the same packet_id).

 

Problem

 

I experience huge packet loss on the server when the occasional multi-frame requests of the clients correspond to a big payload ( > 300 Kb).

The eth stats that I gather on the server say that there is no error, nor any packet loss (q_errors, imissed, ierrors, oerrors, rx_nombuf are all equal to 0). Yet, the application is not seeing some packets of big requests that the clients send.

 

I record some interesting facts

1) The clients do not experience such packet loss, although they also receive  packets with an aggregate payload of the same size of the packets received by the server. The only differences w.r.t. the server is that a client machine of course has a lower RX load (it only gets the replies to its own requests) and a client thread only receives packets from a single machine (the server).

2) This behavior does not arise as long as the biggest payload exchanged between clients and servers is < 200 Kb. This leads me to conclude that fragmentation is not te issue (also, if I implement a stubborn retransmission, eventually all packets are received even with bigger payloads). Also, I reserve plenty of memory for my mempool, so I don't think the server runs out of mbufs (and if that was the case I guess I would see this in the dropped packets count, right?).

3) If I switch to the pipeline model (on the server only) this problem basically disappears. By pipeline model I mean something like the load-balancing app, where a single core on the server receives client packets on a single RX queue (worker cores reply back to the client using their own TX queue). This leads me to think that the problem is on the server, and not on the clients.

4) It doesn't seem to be a "load" problem. If I run the same tests multiple times, in some "lucky" runs I get that the run-to-completion model  outperforms the pipeline one. Also, the run-to-completion model with single-frame packets can handle a number of single-frame packets per second that is much higher than the number of frames per second that are generated with the workload with some big packets.

 

 

Question

 

Do you have any idea why I am witnessing this behavior? I know that having fewer queues can help performance by relieving contention on the NIC, but is it possible that the contention is actually causing packets to get dropped?

 

Platform

 

DPDK: v  2.2-0  (I know this is an old version, but I am dealing with legacy code I cannot change)

MLNX_OFED_LINUX-3.1-1.0.3-ubuntu14.04-x86_64  

My NIC : Mellanox Technologies MT27520 Family [ConnectX-3 Pro]

My machine runs a 4.4.0-72-generic  on Ubuntu 16.04.02

CPU is Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz  2x8 cores

 

 

Thank you a lot, especially if you went through the whole email

Regards,

   Harold

not right on the wakenup state

$
0
0

software :

driver source code download http://www.mellanox.com/page/products_dyn?product_family=27

with version 4.0.0

kernel : ubuntu linux-4.4.0-generic

 

hardware:

mainboard : MSI C236A

mlnx 0x6750 ethernet card

 

normally running is ok ,

but when i call pm-suspend --no-quirks or

equally echo -n "mem" >/sys/power/state

when it was waken up by keyboard ,the mlx4 0x6750 not running ok ,

 

 

 

i debug with this problem ,i find that the slice different is in the code to read MLX4_OWNER_BASE

 

 

 

static int mlx4_get_ownership(struct mlx4_dev *dev)

 

{

void __iomem *owner;

u32 ret;

if (pci_channel_offline(dev->persist->pdev)){

debug_info_mlx(" ");

return -EIO;

}

owner = ioremap(pci_resource_start(dev->persist->pdev, 0) +

MLX4_OWNER_BASE,

MLX4_OWNER_SIZE);

if (!owner) {

debug_info_mlx(" ");

mlx4_err(dev, "Failed to obtain ownership bit\n");

return -ENOMEM;

}

ret = readl(owner);

iounmap(owner);

debug_info_mlx("ret %d", ret);

return (int) !!ret;

}

 

 

on normal the last debug is ret 0

but when on the wakeup the insmod code is

16777216

 

always this code

 

i wonder how to reset the nic card to be read 0

 

thank you

Waiting for link-up on net0...

$
0
0

I've seen several of my Inifiband card hang at this error. Can you tell me if this is a bug or configuration issue:

This is installed in a HP G9 Proliant server. I'm attempting to pxe boot off of net1.

 

Attempting Boot From NIC

MLNX FlexBoot 3.4.225 (PCI 04:00.0) starting execution...ok

MLNX FlexBoot 3.4.225 initializing devices...

Initialization complete

 

 

 

 

Mellanox ConnectX FlexBoot v3.4.225

iPXE 1.0.0+ -- Open Source Network Boot Firmware --

Waiting for link-up on net0...

 

2017-07-20 15_18_46-dev10549.prn2.facebook.com _home_dtse.png

Re: Waiting for link-up on net0...

$
0
0

Hello David,

 

Correct me if wrong: your issue is that pxe-boot process is not trying/attempted on port net1 because it's stuck at port net0?

 

Can you going to flexboot menu of good and bad server after reboot by pressing  Ctrl-B and share comparison.

 

 

Cheers,

~Rage


Where's the procedure of packing network protocol header in RoCE v2?

$
0
0

Hi, lately I began to study the driver's source code of MLNX_OFED_LINUX-4.0-2.0.0.1-rhel7.3-x86_64.

When  proceeding to the Network protocol stack, I met some problem,hoping for some guide from friends in the community.

Please let me show the question:

  Now I'm using verbs api(black line at the below pic 2) and familiar with all its procedure(reading source code and mannual) but the abstract layer below it is not familiar.

  So I want to know:

How is the RoCE v2 packing udp and ip header into the packet?(uh..I'm meaning where it's done,because I haven't found relevant code about it,but do have some clue,seeing below).And I'm not sure if this procudure is done by this driver or by system network stack.Somebody know it?Very pleasure to learn from you!

1.source code from MLNX_OFED_LINUX-4.0-2.0.0.1-rhel7.3-x86_64/MLNX_OFED_SRC-4.0-2.0.0.1/SRPMS/libmlx5-1.2.1mlnx1/src/mlx5.c:

    

    

2.some explanation of RoCEv2

RoCE+Protocol+Stack.jpg

3. RoCEv2 packet format

RoCE+frame+Format.png

Re: Waiting for link-up on net0...

$
0
0

You are correct. I'm attempting to pxe boot off of net1 but not because its stuck at net0. I'm required to use net1. But I cannot get passed net0.

 

I'm unable to get get into the mellanox configuration screen on the bad server since im going through the HP ILO console. I will update the thread if I can.

 

Here is a screenshot of the good server:

 

2017-07-21 13_40_17-dev10549.prn2.facebook.com _home_dtse.png

2017-07-21 13_40_47-dev10549.prn2.facebook.com _home_dtse.png

Configuring Cisco 6513 switch and melanox MLAG

$
0
0

The MLAG configuration is between the Cisco 653 switch and the SN2100 switch.

The SN2100 is set to guide in the document below.

HowTo Configure MLAG on Mellanox Switches

I heard from a Cisco engineer that I have a VSS configuration on my Cisco 653 switch, but I have not been able to guide the config to the Cisco switch because I just received the VRRP configuration and I do not know exactly what it is.

I tried both LACP and static mode in MLAG INTERFACE mode but it failed.

All Status is normal, but in the MLAG Ports Status Summary is Inactive.

 

There seems to be a default MLAG configuration and other settings between the MLAG configuration between the Cisco 653 switch and the SN2100 switch.

Anyone who shares this part of the experience would be grateful.

RDMA problem FreeBSD 11.0

$
0
0

Hello everyone I have a problem to enable RDMA on FreeBSD 11, configuration as follows KVM host , ConnectX-4 card , FreeBSD with one VF allocated to FreeBSD host, I compile modules mlx5 and mlx5en and loaded to kernel as fallows :

1   30 0xffffffff80200000 1fa88f8  kernel

2    2 0xffffffff82219000 1c9ab    mlx5.ko

3    5 0xffffffff82236000 fcf6     linuxkpi.ko

4    1 0xffffffff82246000 152b8    mlx5en.ko

5    1 0xffffffff8225c000 11f0a    krping.ko

6    2 0xffffffff8226e000 5be1a    ibcore.ko

7    1 0xffffffff822ca000 f728     iser.ko

8    1 0xffffffff822da000 114b8    iscsi.ko

9    1 0xffffffff822ec000 3de40    linux.ko

10    2 0xffffffff8232a000 7b08     linux_common.ko

11    1 0xffffffff82332000 389f4    linux64.ko

 

I can ping my second host but I can't ping using rping  or udaddy  , I get the errors like this :

udaddy: starting client

udaddy: connecting

udaddy: event: RDMA_CM_EVENT_ADDR_ERROR, error: -19

test complete

return status -19

 

rping  give me similar error :

 

cma event RDMA_CM_EVENT_ADDR_ERROR, error -19

 

What is interesting in is that rdma connection from KVM host to second test Linux machine is working.

I did not rebuild sources with_OFED='YES' I only build modules from sources in FreeBSD maybe somebody can help me with this ??

 

BR

Adam

Breakout vs Single Cables

$
0
0

Hi,

 

If ports and power is not an issue, is it better or any advantage to use breakout cables (1 x 40GB to 4 x 10GB) or is it preferred to use single cables? Will one be better latency wise? This is to connect to 4 Nutanix Nodes

Viewing all 6278 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>