Re: MT25418 shows as mt401

July 6, 2017, 7:39 am

≫ Next: Advice on partitioning an IB network

≪ Previous: Re: Using IBDump with default kernel driver

the MFE_NO_FLASH_DETECTED is indicating a possible flash corruption

try first to see if the adapter's flash is in recovery state by running: # lspci -vvvxxx | grep Mellanox

In case it is - then try burning the original / earlier fw again over mt25418_pciconf0, ensure that flash is well detected, then try again to change the card id to MT401_pciconf0

if it fails this time then you're probably on the top of a faulty adapter

↧

Advice on partitioning an IB network

July 7, 2017, 7:34 am

≫ Next: Mellanox card disappeared from PCI bus

≪ Previous: Re: MT25418 shows as mt401

Hello,

I would appreciate some advice in partitioning an IB network, please.

We have quite a large IB network -- there are almost 800 hosts on the network. All these hosts are part of a single computation cluster. Recently, a researcher in the university bought a new system consisting of 4 hosts. This new system is completely separate for the cluster in that it does share the same private Ethernet. On the other hand to made the new system affordable we decided to allow the owner to take 4 of our spare/free IB ports.

Currently, both the new and old system share the same OpenSM partition. That is...ibhosts gives both the new/old hosts:

New hosts...

Ca : 0xe41d2d0300e16190 ports 2 "srv01935 mlx4_0"

Ca : 0x248a070300f052f0 ports 2 "srv01934 mlx4_0"

Ca : 0xe41d2d0300e166d0 ports 2 "srv01933 mlx4_0"

Ca : 0xe41d2d0300e16350 ports 2 "srv01932 mlx4_0"

Old hosts...

Ca : 0xf452140300225f20 ports 1 "orange02 HCA-1"

Ca : 0xf452140300225ec0 ports 1 "orange03 HCA-1"

etc, etc...

I'm wondering if it is best to place the old/new hosts in separate partitions. Does that make sense? If it does make sense then how do I best construct the partitions.conf file to do this? That is, placing the new (srv..) hosts in a partition is easy, but how do I ensure that the default partition is just the old hosts?

Best regards,

David

↧

Mellanox card disappeared from PCI bus

July 7, 2017, 11:45 am

≫ Next: Re: link down and ip address lost with mellanox ofed 100G card

≪ Previous: Advice on partitioning an IB network

Hello,

I have to computers with Mellanox ConnectX-3 Infiniband cards connected with each other directly. I configured several VMs on each node with SR IOV passthrough of Infiniband cards. When I was mostly done I tried to also configure IB to make it usable on the host. I rebooted the hosts and saw that the IB cards completely disappeared from the PCI bus. So I rebooted the system several times again and one of the IB cards reappeared. But another one is still missing. I completely disconnected the host from any cable and even unplugged and plugged the card, but this had no effect.

Important fact is that when I boot any of the nodes, one of the first screens which I see during the boot process shows some message from IB firmware. There I can enter into some menu and enable or disable SR-IOV, set maximum number of physical functions, and some other things. When the IB card is gone from lspci, the boot screen from the firmware does not appear.

Now I try to describe my system and outline the actions I took when I configured IB passthrough. As the host I have Debian 9 and I installed IB drivers from the Debian repository. On the guests I have Centos 7.3 and there I installed Mellanox distribution of OFED for Centos 7.3. For virtualization I use Qemu/KVM with libvirt.

My card shows on the host as:

05:00.0 Network controller: Mellanox Technologies MT27500 Family [ConnectX-3]
05:00.1 Network controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]
05:00.2 Network controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]
05:00.3 Network controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]
05:00.4 Network controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]

Both host and guest used mlx4_core drivers, here is the list of some of the modules in the host system:

Module                  Size Used by
mlx4_ib               163840 0
mlx4_en               114688 0
mlx4_core             303104 2 mlx4_en,mlx4_ib
kvm_intel             192512 0
kvm                   589824 1 kvm_intel
irqbypass              16384 1 kvm
ib_umad                24576 0
ib_core               208896 2 ib_umad,mlx4_ib

I also was loading ib_ipoib on the host, as well as on the guest. But on the guest it was crashing the kernel.

Additional suspicious thing happened when I was attaching virtual functions to the guest systems (sudo virsh attach-device ...). Following messages were appearing in the kernel log:

Jul 6 16:07:04 ib1 kernel: [ 281.707448] vfio-pci 0000:05:00.4: enabling device (0000 -> 0002)
Jul 6 16:07:06 ib1 kernel: [ 283.475412] virbr1: port 5(vnet3) entered learning state
Jul 6 16:07:08 ib1 kernel: [ 285.491419] virbr1: port 5(vnet3) entered forwarding state
Jul 6 16:07:08 ib1 kernel: [ 285.491424] virbr1: topology change detected, propagating
Jul 6 16:07:13 ib1 kernel: [ 290.895918] kvm [2264]: vcpu0, guest rIP: 0xffffffff81060d78 disabled perfctr wrmsr: 0xc2 data 0xffff
Jul 6 16:07:13 ib1 kernel: [ 290.933587] kvm: zapping shadow pages for mmio generation wraparound
Jul 6 16:07:13 ib1 kernel: [ 290.939149] kvm: zapping shadow pages for mmio generation wraparound
Jul 6 16:07:14 ib1 kernel: [ 291.721929] mlx4_core 0000:05:00.0: Received reset from slave:4
Jul 6 16:07:14 ib1 kernel: [ 291.767436] mlx4_core 0000:05:00.0: Unknown command:0x55 accepted from slave:4
Jul 7 07:52:13 ib1 kernel: [56990.799006] mlx4_core 0000:05:00.0: mlx4_eq_int: slave:2, srq_no:0x41, event: 14(00)
Jul 7 07:52:13 ib1 kernel: [56990.799009] mlx4_core 0000:05:00.0: mlx4_eq_int: sending event 14(00) to slave:2
Jul 7 08:39:31 ib1 kernel: [59828.975516] mlx4_core 0000:05:00.0: Received reset from slave:4
Jul 7 08:39:31 ib1 kernel: [59829.044683] virbr1: port 5(vnet3) entered disabled state
Jul 7 08:39:31 ib1 kernel: [59829.044752] device vnet3 left promiscuous mode

Note the line with "Unknown command".

I did not update the firmware, at least no in a recent time.

ibstat on the working system says following:

CA 'mlx4_0'
CA type: MT4099
Number of ports: 2
Firmware version: 2.34.5000
Hardware version: 0
Node GUID: 0xf45214030010a4a0
System image GUID: 0xf45214030010a4a3
Port 1:
     State: Down
     Physical state: Polling
     Rate: 10
     Base lid: 0
     LMC: 0
     SM lid: 0
     Capability mask: 0x0250486a
     Port GUID: 0xf45214030010a4a1
     Link layer: InfiniBand
Port 2:
     State: Down
     Physical state: Polling
     Rate: 10
     Base lid: 0
     LMC: 0
     SM lid: 0
     Capability mask: 0x0250486a
     Port GUID: 0xf45214030010a4a2
     Link layer: InfiniBand

Could you help me to get my card back?

↧

Re: link down and ip address lost with mellanox ofed 100G card

July 9, 2017, 1:03 am

≫ Next: Re: random write failing with 100G connect 4x card

≪ Previous: Mellanox card disappeared from PCI bus

Hi,

Check it with ip addr show instead of ifconfig -a.

Marc

↧

Re: random write failing with 100G connect 4x card

July 9, 2017, 1:19 am

≫ Next: Question about inner tcp csum calculation

≪ Previous: Re: link down and ip address lost with mellanox ofed 100G card

Hi,

What is the test you use ? fio ?

Can you provide me the command line.

Thanks

Marc

↧

Question about inner tcp csum calculation

July 9, 2017, 8:04 pm

≫ Next: Re: link down and ip address lost with mellanox ofed 100G card

≪ Previous: Re: random write failing with 100G connect 4x card

I used ConnectX-3 Pro(Model No CX312B) to send geneve packet

when send geneve pkt with no option(geneve headlen is same as vxlan), the nic can calculate inner tcp csum. but send geneve pkt with option(headlen is longer than vxlan), the nic does not calculate inner tcp csum.

I think the nic does not know the offset of inner tcp head,

How does the nic get the inner tcp head offset? Is setting by driver or hardware fixed value?

If the value can be set, how can I set it?

Thanks for all the help

↧

Re: link down and ip address lost with mellanox ofed 100G card

July 11, 2017, 4:48 am

≫ Next: Re: Advice on partitioning an IB network

≪ Previous: Question about inner tcp csum calculation

Hi,

I suggest you to get more support on this issue by contacting mellanox support from www.mellanox.com

Marc

↧

Re: Advice on partitioning an IB network

July 11, 2017, 5:04 am

≫ Next: Re: Need help to recover switch Silverstorm 9024-CU24-ST2-DDR admin password

≪ Previous: Re: link down and ip address lost with mellanox ofed 100G card

Hi David,

It does make sense to partition your subnet but it depends on how much disruption you're willing to tolerate and whether your apps are partition aware. What are your apps ?

Assuming you are running without partitions.conf file now, all hosts are full members of default partition. In order to separate out the 4 new hosts on it's own partition, all the existing hosts will also to be placed on their own partition so there is no communication possible between those groups of hosts due to the default partition requirement for SA communication.

Also, where does SM run ? Does it run on one of the existing hosts ? Is it a dedicated node ? Or does it run somewhere else (embedded in a switch) ?

-- Hal

↧

Re: Need help to recover switch Silverstorm 9024-CU24-ST2-DDR admin password

July 11, 2017, 11:08 am

≫ Next: Re: Does anyone know what the Max Junction temperature is for the MT27508 IC on a ConnectX-3

≪ Previous: Re: Advice on partitioning an IB network

Hello there!

I'm running into the exact same problem but am wondering which serial cable you used to be able to send commands?

I'm using this one and have the issue of read-only access:

VRCABLERJ11 Promise Cable: DB9 to RJ11

I also tried this pin-out on a different cable, but it does not work either... I don't even see anything on the terminal actually (using screen or minicom):

Re: [rhelv6-list] I know this is a stretch, but...

Thanks much!!

↧

Re: Does anyone know what the Max Junction temperature is for the MT27508 IC on a ConnectX-3

July 13, 2017, 1:01 am

≫ Next: MLNX-OS latest version for MSX6012F

≪ Previous: Re: Need help to recover switch Silverstorm 9024-CU24-ST2-DDR admin password

Hi Viki,

Where can I get a Mellanox document including this thermal specification?

Thanks,

Kevin

↧

MLNX-OS latest version for MSX6012F

July 14, 2017, 8:28 am

≫ Next: Packet loss with multi-frame payloads

≪ Previous: Re: Does anyone know what the Max Junction temperature is for the MT27508 IC on a ConnectX-3

Where may I download the latest version of MLNX-OS for the SX6012F Infiniband switch ?

↧

Packet loss with multi-frame payloads

July 17, 2017, 12:26 pm

≫ Next: not right on the wakenup state

≪ Previous: MLNX-OS latest version for MSX6012F

Hello,

I am having a problem with packets loss in my DPDK application and I hope you can help me out. Below you find a description of the application and of the problem.

It is a little long, but I really hope somebody out there can help me, because this is driving me crazy.

Application

I have a client-server application; single server, multiple clients.

The machines have 8 active cores which poll 8 distinct RX queues to receive packets and use 8 distinct TX queues to burst out packets (i.e., run-to-completion model).

Workload

The workload is composed of mostly single-frame packets, but occasionally clients send to the server multi-frame packets, and occasionally the server sends back to the client multi-frame replies.

Packets are fragmented at the UDP level (i.e., no IP fragmentation, every packet of the same requests has a frag_id == 0, even though they share the same packet_id).

Problem

I experience huge packet loss on the server when the occasional multi-frame requests of the clients correspond to a big payload ( > 300 Kb).

The eth stats that I gather on the server say that there is no error, nor any packet loss (q_errors, imissed, ierrors, oerrors, rx_nombuf are all equal to 0). Yet, the application is not seeing some packets of big requests that the clients send.

I record some interesting facts

1) The clients do not experience such packet loss, although they also receive packets with an aggregate payload of the same size of the packets received by the server. The only differences w.r.t. the server is that a client machine of course has a lower RX load (it only gets the replies to its own requests) and a client thread only receives packets from a single machine (the server).

2) This behavior does not arise as long as the biggest payload exchanged between clients and servers is < 200 Kb. This leads me to conclude that fragmentation is not te issue (also, if I implement a stubborn retransmission, eventually all packets are received even with bigger payloads). Also, I reserve plenty of memory for my mempool, so I don't think the server runs out of mbufs (and if that was the case I guess I would see this in the dropped packets count, right?).

3) If I switch to the pipeline model (on the server only) this problem basically disappears. By pipeline model I mean something like the load-balancing app, where a single core on the server receives client packets on a single RX queue (worker cores reply back to the client using their own TX queue). This leads me to think that the problem is on the server, and not on the clients.

4) It doesn't seem to be a "load" problem. If I run the same tests multiple times, in some "lucky" runs I get that the run-to-completion model outperforms the pipeline one. Also, the run-to-completion model with single-frame packets can handle a number of single-frame packets per second that is much higher than the number of frames per second that are generated with the workload with some big packets.

Question

Do you have any idea why I am witnessing this behavior? I know that having fewer queues can help performance by relieving contention on the NIC, but is it possible that the contention is actually causing packets to get dropped?

Platform

DPDK: v 2.2-0 (I know this is an old version, but I am dealing with legacy code I cannot change)

MLNX_OFED_LINUX-3.1-1.0.3-ubuntu14.04-x86_64

My NIC : Mellanox Technologies MT27520 Family [ConnectX-3 Pro]

My machine runs a 4.4.0-72-generic on Ubuntu 16.04.02

CPU is Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz 2x8 cores

Thank you a lot, especially if you went through the whole email

Regards,

Harold

↧

not right on the wakenup state

July 17, 2017, 10:44 pm

≫ Next: Waiting for link-up on net0...

≪ Previous: Packet loss with multi-frame payloads

software :

driver source code download http://www.mellanox.com/page/products_dyn?product_family=27

with version 4.0.0

kernel : ubuntu linux-4.4.0-generic

hardware:

mainboard : MSI C236A

mlnx 0x6750 ethernet card

normally running is ok ,

but when i call pm-suspend --no-quirks or

equally echo -n "mem" >/sys/power/state

when it was waken up by keyboard ,the mlx4 0x6750 not running ok ,

i debug with this problem ,i find that the slice different is in the code to read MLX4_OWNER_BASE

static int mlx4_get_ownership(struct mlx4_dev *dev)

{

void __iomem *owner;

u32 ret;

if (pci_channel_offline(dev->persist->pdev)){

debug_info_mlx(" ");

return -EIO;

}

owner = ioremap(pci_resource_start(dev->persist->pdev, 0) +

MLX4_OWNER_BASE,

MLX4_OWNER_SIZE);

if (!owner) {

debug_info_mlx(" ");

mlx4_err(dev, "Failed to obtain ownership bit\n");

return -ENOMEM;

}

ret = readl(owner);

iounmap(owner);

debug_info_mlx("ret %d", ret);

return (int) !!ret;

}

on normal the last debug is ret 0

but when on the wakeup the insmod code is

16777216

always this code

i wonder how to reset the nic card to be read 0

thank you

↧

Waiting for link-up on net0...

July 20, 2017, 3:23 pm

≫ Next: Re: Waiting for link-up on net0...

≪ Previous: not right on the wakenup state

I've seen several of my Inifiband card hang at this error. Can you tell me if this is a bug or configuration issue:

This is installed in a HP G9 Proliant server. I'm attempting to pxe boot off of net1.

Attempting Boot From NIC

MLNX FlexBoot 3.4.225 (PCI 04:00.0) starting execution...ok

MLNX FlexBoot 3.4.225 initializing devices...

Initialization complete

Mellanox ConnectX FlexBoot v3.4.225

iPXE 1.0.0+ -- Open Source Network Boot Firmware --

Waiting for link-up on net0...

↧

Re: Waiting for link-up on net0...

July 20, 2017, 6:33 pm

≫ Next: Where's the procedure of packing network protocol header in RoCE v2?

≪ Previous: Waiting for link-up on net0...

Hello David,

Correct me if wrong: your issue is that pxe-boot process is not trying/attempted on port net1 because it's stuck at port net0?

Can you going to flexboot menu of good and bad server after reboot by pressing Ctrl-B and share comparison.

Cheers,

~Rage

↧

Where's the procedure of packing network protocol header in RoCE v2?

July 21, 2017, 6:38 am

≫ Next: Re: Waiting for link-up on net0...

≪ Previous: Re: Waiting for link-up on net0...

Hi, lately I began to study the driver's source code of MLNX_OFED_LINUX-4.0-2.0.0.1-rhel7.3-x86_64.

When proceeding to the Network protocol stack, I met some problem,hoping for some guide from friends in the community.

Please let me show the question:

Now I'm using verbs api(black line at the below pic 2) and familiar with all its procedure(reading source code and mannual) but the abstract layer below it is not familiar.

So I want to know:

How is the RoCE v2 packing udp and ip header into the packet?(uh..I'm meaning where it's done,because I haven't found relevant code about it,but do have some clue,seeing below).And I'm not sure if this procudure is done by this driver or by system network stack.Somebody know it?Very pleasure to learn from you!

1.source code from MLNX_OFED_LINUX-4.0-2.0.0.1-rhel7.3-x86_64/MLNX_OFED_SRC-4.0-2.0.0.1/SRPMS/libmlx5-1.2.1mlnx1/src/mlx5.c:

2.some explanation of RoCEv2

3. RoCEv2 packet format

↧

Re: Waiting for link-up on net0...

July 21, 2017, 1:48 pm

≫ Next: Configuring Cisco 6513 switch and melanox MLAG

≪ Previous: Where's the procedure of packing network protocol header in RoCE v2?

You are correct. I'm attempting to pxe boot off of net1 but not because its stuck at net0. I'm required to use net1. But I cannot get passed net0.

I'm unable to get get into the mellanox configuration screen on the bad server since im going through the HP ILO console. I will update the thread if I can.

Here is a screenshot of the good server:

↧

Configuring Cisco 6513 switch and melanox MLAG

July 24, 2017, 1:22 am

≫ Next: RDMA problem FreeBSD 11.0

≪ Previous: Re: Waiting for link-up on net0...

The MLAG configuration is between the Cisco 653 switch and the SN2100 switch.

The SN2100 is set to guide in the document below.

HowTo Configure MLAG on Mellanox Switches

I heard from a Cisco engineer that I have a VSS configuration on my Cisco 653 switch, but I have not been able to guide the config to the Cisco switch because I just received the VRRP configuration and I do not know exactly what it is.

I tried both LACP and static mode in MLAG INTERFACE mode but it failed.

All Status is normal, but in the MLAG Ports Status Summary is Inactive.

There seems to be a default MLAG configuration and other settings between the MLAG configuration between the Cisco 653 switch and the SN2100 switch.

Anyone who shares this part of the experience would be grateful.

↧

RDMA problem FreeBSD 11.0

July 24, 2017, 5:08 pm

≫ Next: Breakout vs Single Cables

≪ Previous: Configuring Cisco 6513 switch and melanox MLAG

Hello everyone I have a problem to enable RDMA on FreeBSD 11, configuration as follows KVM host , ConnectX-4 card , FreeBSD with one VF allocated to FreeBSD host, I compile modules mlx5 and mlx5en and loaded to kernel as fallows :

1 30 0xffffffff80200000 1fa88f8 kernel

2 2 0xffffffff82219000 1c9ab mlx5.ko

3 5 0xffffffff82236000 fcf6 linuxkpi.ko

4 1 0xffffffff82246000 152b8 mlx5en.ko

5 1 0xffffffff8225c000 11f0a krping.ko

6 2 0xffffffff8226e000 5be1a ibcore.ko

7 1 0xffffffff822ca000 f728 iser.ko

8 1 0xffffffff822da000 114b8 iscsi.ko

9 1 0xffffffff822ec000 3de40 linux.ko

10 2 0xffffffff8232a000 7b08 linux_common.ko

11 1 0xffffffff82332000 389f4 linux64.ko

I can ping my second host but I can't ping using rping or udaddy , I get the errors like this :

udaddy: starting client

udaddy: connecting

udaddy: event: RDMA_CM_EVENT_ADDR_ERROR, error: -19

test complete

return status -19

rping give me similar error :

cma event RDMA_CM_EVENT_ADDR_ERROR, error -19

What is interesting in is that rdma connection from KVM host to second test Linux machine is working.

I did not rebuild sources with_OFED='YES' I only build modules from sources in FreeBSD maybe somebody can help me with this ??

Adam

↧

Breakout vs Single Cables

July 24, 2017, 8:44 pm

≫ Next: Low (weird) throughput with ConnectX-3

≪ Previous: RDMA problem FreeBSD 11.0

Hi,

If ports and power is not an issue, is it better or any advantage to use breakout cables (1 x 40GB to 4 x 10GB) or is it preferred to use single cables? Will one be better latency wise? This is to connect to 4 Nutanix Nodes

↧