Hi Sood,
Do you have Advance Toolchain Runtime installed on the machine?
If yes please not for the following known issue and workaround at page 16 :
http://www.mellanox.com/related-docs/MFT/MFT_4_10_0_Release_Notes.pdf
Regards,
Karen.
Hi Sood,
Do you have Advance Toolchain Runtime installed on the machine?
If yes please not for the following known issue and workaround at page 16 :
http://www.mellanox.com/related-docs/MFT/MFT_4_10_0_Release_Notes.pdf
Regards,
Karen.
The solution to this problem was to make use of the incorporated recipes in the updated openembedded build. About a month ago, rdma-core was added to the mainlline tree. We had been trying to get this to work ourselves by writing our own recipes. Now that the code is integrated it just builds.
Getting a small error when I try to do an rping test. I'm building rxe into kernel 4.16 and rdma-core using yocto on an Arria10 socfpga containing a dual core A53 ARM processor. I get the kernel modules and userland loaded:
root@arria10:~# lsmod | grep rxe
rdma_rxe 102400 0
ib_core 192512 6 rdma_rxe,ib_cm,rdma_cm,ib_uverbs,iw_cm,rdma_ucm
I can configure the rxe0 device but rxe_cfg is giving a strange error:
root@arria10:~# rxe_cfg
libibverbs: Warning: Driver rxe does not support the kernel ABI of 1 (supports 2 to 2) for device /sys/class/infiniband/rxe0
IB device 'rxe0' wasn't found
Name Link Driver Speed NMTU IPv4_addr RDEV RMTU
eth0 yes st_gmac 1500 10.0.1.24 rxe0 (?)
Any hints on what this means, i.e. the kernel ABI error would be appreciated!
Thanks,
FM
After setting up the yocto build to include the various rdma-core modules according to yocto practices, this error went away.
Its back. For some reason I keep getting this warning
libibverbs: Warning: Driver rxe does not support the kernel ABI of 1 (supports 2 to 2) for device /sys/class/infiniband/rxe0
Karen,
Thanks for replying and ref doc.
Thank you for the reply.
Hi Karen,
Thanks for your response. I do have the Advanced Toolchain Runtime installed.
$ sudo apt list --installed | grep advance-toolchain
WARNING: apt does not have a stable CLI interface. Use with caution in scripts.
advance-toolchain-at10.0-devel/now 10.0-3 ppc64el [installed,local]
advance-toolchain-at10.0-mcore-libs/now 10.0-3 ppc64el [installed,local]
advance-toolchain-at10.0-perf/now 10.0-3 ppc64el [installed,local]
advance-toolchain-at10.0-runtime/now 10.0-3 ppc64el [installed,local]
advance-toolchain-at7.1-devel/trusty,now 7.1-5 ppc64el [installed]
advance-toolchain-at7.1-mcore-libs/trusty,now 7.1-5 ppc64el [installed]
advance-toolchain-at7.1-perf/trusty,now 7.1-5 ppc64el [installed]
advance-toolchain-at7.1-runtime/trusty,now 7.1-5 ppc64el [installed]
I did the export as mentioned(libc.so.6 exists on my system) but still see the error
$ echo $LD_PRELOAD
/lib/powerpc64le-linux-gnu/libc.so.6
I still see the error however.
${mbindir}/minit from /usr/bin/mst gives a segmentation fault for some reason (as seen in the logs from my previous message), not sure why that happens
Thank you Sood,
Please open a support ticket with the details so we can further investigate.
You can open a ticket by sending us an email to support@mellanox.com
Regards,
Karen.
Hi,
Can you give more details on what you tried and what did you use ?
Thanks
Marc
I am trying to setup a SX6036 VPI switch, previously used at another institute. I've configured the mgmt interface and can connect to the web UI, however it immediately gives the following error:
Internal Error
An internal error has occurred.
Your options from this point are:
See the logs for more details.
Return to the home page.
Retry the bad page which gave the error.
When I enable logging monitor and try to log in I see the following on the terminal:
Jul 23 11:34:29 ib-switch rh[5127]: [web.ERR]: web_include_template(), web_template.c:364, build 1: can't use empty string as operand of "!"
Jul 23 11:34:29 ib-switch rh[5127]: [web.ERR]: Error in template "status-logs" at line 545 of the generated TCL code
Jul 23 11:34:29 ib-switch rh[5127]: [web.ERR]: web_render_template(), web_template.c:226, build 1: Error code 14002 (assertion failed) returned
Jul 23 11:34:29 ib-switch rh[5127]: [web.ERR]: main(), rh_main.c:337, build 1: Error code 14002 (assertion failed) returned
Jul 23 11:34:29 ib-switch rh[5127]: [web.ERR]: Request handler failed with error code 14002: assertion failed
Jul 23 11:34:29 ib-switch httpd[4535]: [Mon Jul 23 11:34:29 2018] [error] [client ipremvd] Exited with error code 14002: assertion failed, referer: http://ip.removed./admin/launch?script=rh&template=failure&badpage=%2Fadmin%2Flaunch%3Fscript%3Drh%26template%3Dstatus-logs
Any idea as to check what may have failed and how to fix it?
regards
Andrew
I traced this to the function match_device() in libibverbs/init.c
There is a check for ABI versions:
if (sysfs_dev->abi_ver < ops->match_min_abi_version ||
sysfs_dev->abi_ver > ops->match_max_abi_version) {
fprintf(stderr, PFX
"Warning: Driver %s does not support the kernel ABI of %u (supports %u to %u) for device %s\n",
The variable sysfs_dev is being passed into this call by another routine called try_driver() which is called by try_drivers() which is called by try_all_drivers() which appears to be called by
ibverbs_get_device_list()
Does this help?
It appears that the abi version is stored here:
root@arria10:/sys/class/infiniband# cat rxe0/device/infiniband_verbs/uverbs0/abi_version
1
And this needs to be 2 according to the code...
Hi All,
I have a couple of older Server 2008 R2 boxes that have ConnectX-3 Pro dual port cards in them. I need to build LACP teams for my new network, but it doesn't appear that teaming exists within the Mellanox WinOF driver. In Server 2008 R2 Microsoft Teaming didn't exist yet.
How am I supposed to configure these cards in LACP Teams?
Thanks
C
I went to kernel 4.17 and this went away.
I have a build of rdma-core in kernel 4.17 using yocto for an Altera Arria10 with a dual-core A53 ARM processor. The system is build and rxe configures correctly, i.e. I can rxe_cfg start, rxe_cfg add eth0 and ibv_devices looks good:
root@arria10:~# rxe_cfg status
Name Link Driver Speed NMTU IPv4_addr RDEV RMTU
eth0 yes st_gmac 1500 10.0.1.28 rxe0 1024 (3)
root@arria10:~# ibv_devices
device node GUID
------ ----------------
rxe0 085697fffec1059b
root@arria10:~# ibv_devinfo rxe0
hca_id: rxe0
transport: InfiniBand (0)
fw_ver: 0.0.0
node_guid: 0856:97ff:fec1:059b
sys_image_guid: 0000:0000:0000:0000
vendor_id: 0x0000
vendor_part_id: 0
hw_ver: 0x0
phys_port_cnt: 1
port: 1
state: PORT_ACTIVE (4)
max_mtu: 4096 (5)
active_mtu: 1024 (3)
sm_lid: 0
port_lid: 0
port_lmc: 0x00
link_layer: Ethernet
This all looks good. However, when I try to ping this machine against a PC running rdma-core, I'm getting some strange errors including a segfault when the Arria10 acts as server for udaddy.
root@arria10:~# udaddy -s 10.0.1.16
udaddy: starting client
[ 1883.526301] rdma_rxe: null vaddr
udaddy: connecting
failed to reg MR
udaddy: failed to create messages: -1
test complete
Segmentation faultrxe_mem_init_user
I traced the first error, rdma_rxe: null vaddr to rxe_mem_init_user() in <kernel>/drivers/infiniband/sw/rxe/rxe_mr.c It appears that a page address, perhaps from a virtual to physical translation is failing. Any thoughts on how to solve this?
Thanks,
FM
my soft roce is in "Red Hat Enterprise Linux Server release 7.4 (Maipo)"
when my write opcode with length=1024, it is ok. but when length=1025 in the same code, it will fail.
when the same code with length=1024 or 1025 run using mellanox CX4 card, it is ok.
Can anyone give me some suggestion? or is it a bug of softroce?
Thank you!
MY code is like this:
ctx->send_flags = 0; // IBV_SEND_SIGNALED;
//ctx->send_flags = IBV_SEND_SIGNALED;
struct ibv_sge wr_list = {
.addr = (uintptr_t) ctx->mr_rd->addr,
.length = 1024, //when it wat 1025, the write failed; when it was 1024, it will sucess;
.lkey = ctx->mr_rd->lkey
};
struct ibv_send_wr wr_wr ;
bzero(&wr_wr, sizeof(wr_wr));
wr_wr.wr_id = PINGPONG_WR_WRID ;
wr_wr.sg_list = &wr_list ;
wr_wr.num_sge = 1 ;
wr_wr.opcode = IBV_WR_RDMA_WRITE ;
wr_wr.send_flags = ctx->send_flags ;
wr_wr.next = NULL ;
wr_wr.wr.rdma.rkey = rkey;
wr_wr.wr.rdma.remote_addr = remote_addr;
//wr_wr.wr.rdma.rkey = ctx->mr_wr->rkey;
//wr_wr.wr.rdma.remote_addr = ctx->mr_wr->addr;
struct ibv_send_wr *bad_wr_wr;
int ret = -1;
ret = ibv_post_send(ctx->qp, &wr_wr, &bad_wr_wr);
Hi, Marc.
I installed MLNX_OFED_LINUX-4.1-1.0.2.0 on my server and used the provided tool "mlnx_qos" to set the trust mode for Connect-X 3 Pro.
The command is "mlnx_qos -i p4p1 --trust=dscp".
Then the result is "Priority trust mode is not supported on your system".
Thanks
Hello, everybody!
I have errors on physical interfaces between mellanox switches, connected by MALGs.
Switches are connected by mellanox Active Cable (XLPPI). Errors appears one time in few days wich count about 1000.
You can see interface statistics in attached file.
What can be a reason of this errors?
May it be problems on queue?
Hi,
What is the idea? Why you need it that way?