Re: mst start fails with ConnectX-4 on ppc64le

July 24, 2018, 6:22 am

≫ Next: Re: Yocto embedded build of rdma-core

≪ Previous: mst start fails with ConnectX-4 on ppc64le

Hi Sood,

Do you have Advance Toolchain Runtime installed on the machine?

If yes please not for the following known issue and workaround at page 16 :

http://www.mellanox.com/related-docs/MFT/MFT_4_10_0_Release_Notes.pdf

Regards,

Karen.

↧

Re: Yocto embedded build of rdma-core

July 24, 2018, 7:38 am

≫ Next: rxe driver does not support kernel ABI

≪ Previous: Re: mst start fails with ConnectX-4 on ppc64le

The solution to this problem was to make use of the incorporated recipes in the updated openembedded build. About a month ago, rdma-core was added to the mainlline tree. We had been trying to get this to work ourselves by writing our own recipes. Now that the code is integrated it just builds.

↧

rxe driver does not support kernel ABI

July 24, 2018, 7:42 am

≫ Next: Re: rxe driver does not support kernel ABI

≪ Previous: Re: Yocto embedded build of rdma-core

Getting a small error when I try to do an rping test. I'm building rxe into kernel 4.16 and rdma-core using yocto on an Arria10 socfpga containing a dual core A53 ARM processor. I get the kernel modules and userland loaded:

root@arria10:~# lsmod | grep rxe
rdma_rxe 102400 0
ib_core 192512 6 rdma_rxe,ib_cm,rdma_cm,ib_uverbs,iw_cm,rdma_ucm

I can configure the rxe0 device but rxe_cfg is giving a strange error:

root@arria10:~# rxe_cfg
libibverbs: Warning: Driver rxe does not support the kernel ABI of 1 (supports 2 to 2) for device /sys/class/infiniband/rxe0
IB device 'rxe0' wasn't found
Name Link Driver Speed NMTU IPv4_addr RDEV RMTU
eth0 yes st_gmac 1500 10.0.1.24 rxe0 (?)

Any hints on what this means, i.e. the kernel ABI error would be appreciated!

Thanks,

↧

Re: rxe driver does not support kernel ABI

July 24, 2018, 12:39 pm

≫ Next: Re: rxe driver does not support kernel ABI

≪ Previous: rxe driver does not support kernel ABI

After setting up the yocto build to include the various rdma-core modules according to yocto practices, this error went away.

↧

Re: rxe driver does not support kernel ABI

July 24, 2018, 2:55 pm

≫ Next: Re: Connext-x3 roce mode

≪ Previous: Re: rxe driver does not support kernel ABI

Its back. For some reason I keep getting this warning

libibverbs: Warning: Driver rxe does not support the kernel ABI of 1 (supports 2 to 2) for device /sys/class/infiniband/rxe0

↧

Re: Connext-x3 roce mode

July 24, 2018, 5:15 pm

≫ Next: Re: sr-iov and vxlan used

≪ Previous: Re: rxe driver does not support kernel ABI

Karen,

Thanks for replying and ref doc.

↧

Re: sr-iov and vxlan used

July 24, 2018, 6:54 pm

≫ Next: Re: mst start fails with ConnectX-4 on ppc64le

≪ Previous: Re: Connext-x3 roce mode

Thank you for the reply.

↧

Re: mst start fails with ConnectX-4 on ppc64le

July 25, 2018, 3:18 am

≫ Next: Re: mst start fails with ConnectX-4 on ppc64le

≪ Previous: Re: sr-iov and vxlan used

Hi Karen,

Thanks for your response. I do have the Advanced Toolchain Runtime installed.

$ sudo apt list --installed | grep advance-toolchain

WARNING: apt does not have a stable CLI interface. Use with caution in scripts.

advance-toolchain-at10.0-devel/now 10.0-3 ppc64el [installed,local]

advance-toolchain-at10.0-mcore-libs/now 10.0-3 ppc64el [installed,local]

advance-toolchain-at10.0-perf/now 10.0-3 ppc64el [installed,local]

advance-toolchain-at10.0-runtime/now 10.0-3 ppc64el [installed,local]

advance-toolchain-at7.1-devel/trusty,now 7.1-5 ppc64el [installed]

advance-toolchain-at7.1-mcore-libs/trusty,now 7.1-5 ppc64el [installed]

advance-toolchain-at7.1-perf/trusty,now 7.1-5 ppc64el [installed]

advance-toolchain-at7.1-runtime/trusty,now 7.1-5 ppc64el [installed]

I did the export as mentioned(libc.so.6 exists on my system) but still see the error

$ echo $LD_PRELOAD

/lib/powerpc64le-linux-gnu/libc.so.6

I still see the error however.

${mbindir}/minit from /usr/bin/mst gives a segmentation fault for some reason (as seen in the logs from my previous message), not sure why that happens

↧

Re: mst start fails with ConnectX-4 on ppc64le

July 25, 2018, 3:22 am

≫ Next: Re: "Priority trust-mode is not supported on your system"?

≪ Previous: Re: mst start fails with ConnectX-4 on ppc64le

Thank you Sood,

Please open a support ticket with the details so we can further investigate.

You can open a ticket by sending us an email to support@mellanox.com

Regards,

Karen.

↧

Re: "Priority trust-mode is not supported on your system"?

July 25, 2018, 3:39 am

≫ Next: Web interface error on SX6036

≪ Previous: Re: mst start fails with ConnectX-4 on ppc64le

Hi,

Can you give more details on what you tried and what did you use ?

Thanks

Marc

↧

Web interface error on SX6036

July 23, 2018, 2:45 am

≫ Next: Re: rxe driver does not support kernel ABI

≪ Previous: Re: "Priority trust-mode is not supported on your system"?

I am trying to setup a SX6036 VPI switch, previously used at another institute. I've configured the mgmt interface and can connect to the web UI, however it immediately gives the following error:

Internal Error

An internal error has occurred.

Your options from this point are:

See the logs for more details.

Return to the home page.

Retry the bad page which gave the error.

When I enable logging monitor and try to log in I see the following on the terminal:

Jul 23 11:34:29 ib-switch rh[5127]: [web.ERR]: web_include_template(), web_template.c:364, build 1: can't use empty string as operand of "!"

Jul 23 11:34:29 ib-switch rh[5127]: [web.ERR]: Error in template "status-logs" at line 545 of the generated TCL code

Jul 23 11:34:29 ib-switch rh[5127]: [web.ERR]: web_render_template(), web_template.c:226, build 1: Error code 14002 (assertion failed) returned

Jul 23 11:34:29 ib-switch rh[5127]: [web.ERR]: main(), rh_main.c:337, build 1: Error code 14002 (assertion failed) returned

Jul 23 11:34:29 ib-switch rh[5127]: [web.ERR]: Request handler failed with error code 14002: assertion failed

Jul 23 11:34:29 ib-switch httpd[4535]: [Mon Jul 23 11:34:29 2018] [error] [client ipremvd] Exited with error code 14002: assertion failed, referer: http://ip.removed./admin/launch?script=rh&template=failure&badpage=%2Fadmin%2Flaunch%3Fscript%3Drh%26template%3Dstatus-logs

Any idea as to check what may have failed and how to fix it?

regards

Andrew

↧

Re: rxe driver does not support kernel ABI

July 25, 2018, 9:20 am

≫ Next: Re: rxe driver does not support kernel ABI

≪ Previous: Web interface error on SX6036

I traced this to the function match_device() in libibverbs/init.c

There is a check for ABI versions:

if (sysfs_dev->abi_ver < ops->match_min_abi_version ||

sysfs_dev->abi_ver > ops->match_max_abi_version) {

fprintf(stderr, PFX

"Warning: Driver %s does not support the kernel ABI of %u (supports %u to %u) for device %s\n",

The variable sysfs_dev is being passed into this call by another routine called try_driver() which is called by try_drivers() which is called by try_all_drivers() which appears to be called by

ibverbs_get_device_list()

Does this help?

↧

Re: rxe driver does not support kernel ABI

July 25, 2018, 9:33 am

≫ Next: How do I conifgure teaming in Server 2008 R2?

≪ Previous: Re: rxe driver does not support kernel ABI

It appears that the abi version is stored here:

root@arria10:/sys/class/infiniband# cat rxe0/device/infiniband_verbs/uverbs0/abi_version

And this needs to be 2 according to the code...

↧

How do I conifgure teaming in Server 2008 R2?

July 25, 2018, 10:19 am

≫ Next: Re: rxe driver does not support kernel ABI

≪ Previous: Re: rxe driver does not support kernel ABI

Hi All,

I have a couple of older Server 2008 R2 boxes that have ConnectX-3 Pro dual port cards in them. I need to build LACP teams for my new network, but it doesn't appear that teaming exists within the Mellanox WinOF driver. In Server 2008 R2 Microsoft Teaming didn't exist yet.

How am I supposed to configure these cards in LACP Teams?

Thanks

↧

Re: rxe driver does not support kernel ABI

July 25, 2018, 11:26 am

≫ Next: Various ping programs segfaulting

≪ Previous: How do I conifgure teaming in Server 2008 R2?

I went to kernel 4.17 and this went away.

↧

Various ping programs segfaulting

July 25, 2018, 3:08 pm

≫ Next: when using write op with more than 1024B(MTU) in softroce mode，the operation fail

≪ Previous: Re: rxe driver does not support kernel ABI

I have a build of rdma-core in kernel 4.17 using yocto for an Altera Arria10 with a dual-core A53 ARM processor. The system is build and rxe configures correctly, i.e. I can rxe_cfg start, rxe_cfg add eth0 and ibv_devices looks good:

root@arria10:~# rxe_cfg status

Name Link Driver Speed NMTU IPv4_addr RDEV RMTU

eth0 yes st_gmac 1500 10.0.1.28 rxe0 1024 (3)

root@arria10:~# ibv_devices

device node GUID

------ ----------------

rxe0 085697fffec1059b

root@arria10:~# ibv_devinfo rxe0

hca_id: rxe0

transport: InfiniBand (0)

fw_ver: 0.0.0

node_guid: 0856:97ff:fec1:059b

sys_image_guid: 0000:0000:0000:0000

vendor_id: 0x0000

vendor_part_id: 0

hw_ver: 0x0

phys_port_cnt: 1

port: 1

state: PORT_ACTIVE (4)

max_mtu: 4096 (5)

active_mtu: 1024 (3)

sm_lid: 0

port_lid: 0

port_lmc: 0x00

link_layer: Ethernet

This all looks good. However, when I try to ping this machine against a PC running rdma-core, I'm getting some strange errors including a segfault when the Arria10 acts as server for udaddy.

root@arria10:~# udaddy -s 10.0.1.16

udaddy: starting client

[ 1883.526301] rdma_rxe: null vaddr

udaddy: connecting

failed to reg MR

udaddy: failed to create messages: -1

test complete

Segmentation faultrxe_mem_init_user

I traced the first error, rdma_rxe: null vaddr to rxe_mem_init_user() in <kernel>/drivers/infiniband/sw/rxe/rxe_mr.c It appears that a page address, perhaps from a virtual to physical translation is failing. Any thoughts on how to solve this?

Thanks,

↧