Quantcast
Channel: Mellanox Interconnect Community: Message List
Viewing all 6278 articles
Browse latest View live

Re: mst start fails with ConnectX-4 on ppc64le


Re: Yocto embedded build of rdma-core

$
0
0

The solution to this problem was to make use of the incorporated recipes in the updated openembedded build.  About a month ago, rdma-core was added to the mainlline tree.  We had been trying to get this to work ourselves by writing our own recipes.  Now that the code is integrated it just builds.

rxe driver does not support kernel ABI

$
0
0

Getting a small error when I try to do an rping test. I'm building rxe into kernel 4.16 and rdma-core using yocto on an Arria10 socfpga containing a dual core A53 ARM processor. I get the kernel modules and userland loaded:

 

root@arria10:~# lsmod | grep rxe
rdma_rxe 102400 0
ib_core 192512 6 rdma_rxe,ib_cm,rdma_cm,ib_uverbs,iw_cm,rdma_ucm

 

I can configure the rxe0 device but rxe_cfg is giving a strange error:

 

root@arria10:~# rxe_cfg
libibverbs: Warning: Driver rxe does not support the kernel ABI of 1 (supports 2 to 2) for device /sys/class/infiniband/rxe0
IB device 'rxe0' wasn't found
Name Link Driver Speed NMTU IPv4_addr RDEV RMTU
eth0 yes  st_gmac      1500 10.0.1.24 rxe0 (?)

 

Any hints on what this means, i.e. the kernel ABI error would be appreciated!

 

Thanks,

FM

Re: rxe driver does not support kernel ABI

$
0
0

After setting up the yocto build to include the various rdma-core modules according to yocto practices, this error went away.

Re: rxe driver does not support kernel ABI

$
0
0

Its back.  For some reason I keep getting this warning

libibverbs: Warning: Driver rxe does not support the kernel ABI of 1 (supports 2 to 2) for device /sys/class/infiniband/rxe0

Re: Connext-x3 roce mode

$
0
0

Karen,

 

Thanks for replying and ref doc.

Re: sr-iov and vxlan used

Re: mst start fails with ConnectX-4 on ppc64le

$
0
0

Hi Karen,

 

Thanks for your response. I do have the Advanced Toolchain Runtime installed.

 

$ sudo apt list --installed | grep advance-toolchain

WARNING: apt does not have a stable CLI interface. Use with caution in scripts.

advance-toolchain-at10.0-devel/now 10.0-3 ppc64el [installed,local]

advance-toolchain-at10.0-mcore-libs/now 10.0-3 ppc64el [installed,local]

advance-toolchain-at10.0-perf/now 10.0-3 ppc64el [installed,local]

advance-toolchain-at10.0-runtime/now 10.0-3 ppc64el [installed,local]

advance-toolchain-at7.1-devel/trusty,now 7.1-5 ppc64el [installed]

advance-toolchain-at7.1-mcore-libs/trusty,now 7.1-5 ppc64el [installed]

advance-toolchain-at7.1-perf/trusty,now 7.1-5 ppc64el [installed]

advance-toolchain-at7.1-runtime/trusty,now 7.1-5 ppc64el [installed]

 

I did the export as mentioned(libc.so.6 exists on my system) but still see the error

 

$ echo $LD_PRELOAD

/lib/powerpc64le-linux-gnu/libc.so.6

 

I still see the error however.

 

${mbindir}/minit from /usr/bin/mst gives a segmentation fault for some reason (as seen in the logs from my previous message), not sure why that happens


Re: mst start fails with ConnectX-4 on ppc64le

$
0
0

Thank you Sood,

Please open a support ticket with the details so we can further investigate.

You can open a ticket by sending us an email to support@mellanox.com

 

Regards,

Karen.

Re: "Priority trust-mode is not supported on your system"?

$
0
0

Hi,

 

Can you give more details on what you tried and what did you use ?

 

Thanks

Marc

Web interface error on SX6036

$
0
0

I am trying to setup a SX6036 VPI switch, previously used at another institute. I've configured the mgmt interface and can connect to the web UI, however it immediately gives the following error:

 

Internal Error

An internal error has occurred.

Your options from this point are:

See the logs for more details.

Return to the home page.

Retry the bad page which gave the error.

 

 

When I enable logging monitor and try to log in I see the following on the terminal:

 

Jul 23 11:34:29 ib-switch rh[5127]: [web.ERR]: web_include_template(), web_template.c:364, build 1: can't use empty string as operand of "!"

Jul 23 11:34:29 ib-switch rh[5127]: [web.ERR]: Error in template "status-logs" at line 545 of the generated TCL code

Jul 23 11:34:29 ib-switch rh[5127]: [web.ERR]: web_render_template(), web_template.c:226, build 1: Error code 14002 (assertion failed) returned

Jul 23 11:34:29 ib-switch rh[5127]: [web.ERR]: main(), rh_main.c:337, build 1: Error code 14002 (assertion failed) returned

Jul 23 11:34:29 ib-switch rh[5127]: [web.ERR]: Request handler failed with error code 14002: assertion failed

Jul 23 11:34:29 ib-switch httpd[4535]: [Mon Jul 23 11:34:29 2018] [error] [client ipremvd] Exited with error code 14002: assertion failed, referer: http://ip.removed./admin/launch?script=rh&template=failure&badpage=%2Fadmin%2Flaunch%3Fscript%3Drh%26template%3Dstatus-logs

 

 

Any idea as to check what may have failed and how to fix it?

 

regards

Andrew

Re: rxe driver does not support kernel ABI

$
0
0

I traced this to the function match_device() in libibverbs/init.c

 

There is a check for ABI versions:

 

if (sysfs_dev->abi_ver < ops->match_min_abi_version ||

            sysfs_dev->abi_ver > ops->match_max_abi_version) {

                fprintf(stderr, PFX

                        "Warning: Driver %s does not support the kernel ABI of %u (supports %u to %u) for device %s\n",

 

The variable sysfs_dev is being passed into this call by another routine called try_driver() which is called by try_drivers() which is called by try_all_drivers() which appears to be called by

ibverbs_get_device_list()

 

Does this help?

Re: rxe driver does not support kernel ABI

$
0
0

It appears that the abi version is stored here:

root@arria10:/sys/class/infiniband# cat rxe0/device/infiniband_verbs/uverbs0/abi_version

1

And this needs to be 2 according to the code...

How do I conifgure teaming in Server 2008 R2?

$
0
0

Hi All,

 

I have a couple of older Server 2008 R2 boxes that have ConnectX-3 Pro dual port cards in them.   I need to build LACP teams for my new network, but it doesn't appear that teaming exists within the Mellanox WinOF driver.  In Server 2008 R2 Microsoft Teaming didn't exist yet.

 

How am I supposed to configure these cards in LACP Teams?

 

Thanks

 

C

Re: rxe driver does not support kernel ABI

$
0
0

I went to kernel 4.17 and this went away.


Various ping programs segfaulting

$
0
0

I have a build of rdma-core in kernel 4.17 using yocto for an Altera Arria10 with a dual-core A53 ARM processor.  The system is build and rxe configures correctly, i.e. I can rxe_cfg start, rxe_cfg add eth0 and ibv_devices looks good:

 

root@arria10:~# rxe_cfg status

  Name  Link  Driver   Speed  NMTU  IPv4_addr  RDEV  RMTU

  eth0  yes   st_gmac         1500  10.0.1.28  rxe0  1024  (3)

root@arria10:~# ibv_devices

    device                 node GUID

    ------              ----------------

    rxe0                085697fffec1059b

root@arria10:~# ibv_devinfo rxe0

hca_id: rxe0

        transport:                      InfiniBand (0)

        fw_ver:                         0.0.0

        node_guid:                      0856:97ff:fec1:059b

        sys_image_guid:                 0000:0000:0000:0000

        vendor_id:                      0x0000

        vendor_part_id:                 0

        hw_ver:                         0x0

        phys_port_cnt:                  1

                port:   1

                        state:                  PORT_ACTIVE (4)

                        max_mtu:                4096 (5)

                        active_mtu:             1024 (3)

                        sm_lid:                 0

                        port_lid:               0

                        port_lmc:               0x00

                        link_layer:             Ethernet

 

This all looks good.  However, when I try to ping this machine against a PC running rdma-core, I'm getting some strange errors including a segfault when the Arria10 acts as server for udaddy.

 

root@arria10:~# udaddy -s 10.0.1.16

udaddy: starting client

[ 1883.526301] rdma_rxe: null vaddr

udaddy: connecting

failed to reg MR

udaddy: failed to create messages: -1

test complete

Segmentation faultrxe_mem_init_user

 

I traced the first error, rdma_rxe: null vaddr to rxe_mem_init_user() in <kernel>/drivers/infiniband/sw/rxe/rxe_mr.c  It appears that a page address, perhaps from a virtual to physical translation is failing.  Any thoughts on how to solve this?

 

Thanks,

FM

when using write op with more than 1024B(MTU) in softroce mode,the operation fail

$
0
0

my soft roce is in "Red Hat Enterprise Linux Server release 7.4 (Maipo)"

 

 

when my write opcode with length=1024, it is ok. but when length=1025 in the same code, it will fail.

when the same code with length=1024 or 1025 run using mellanox CX4 card, it is ok.

Can anyone give me some suggestion? or is it a bug of softroce?

 

Thank you!

 

 

MY code is like this:

       ctx->send_flags = 0; // IBV_SEND_SIGNALED;

        //ctx->send_flags = IBV_SEND_SIGNALED;

        struct ibv_sge wr_list = {

                .addr   = (uintptr_t) ctx->mr_rd->addr,

                .length = 1024,   //when it wat 1025, the write failed; when it was 1024, it will sucess;

                .lkey   = ctx->mr_rd->lkey

        };

        struct ibv_send_wr wr_wr ;

        bzero(&wr_wr, sizeof(wr_wr));

        wr_wr.wr_id      = PINGPONG_WR_WRID             ;

        wr_wr.sg_list    = &wr_list                     ;

        wr_wr.num_sge    = 1                            ;

        wr_wr.opcode     = IBV_WR_RDMA_WRITE            ;

        wr_wr.send_flags = ctx->send_flags              ;

        wr_wr.next       = NULL                         ;

        wr_wr.wr.rdma.rkey = rkey;

        wr_wr.wr.rdma.remote_addr = remote_addr;

        //wr_wr.wr.rdma.rkey = ctx->mr_wr->rkey;

        //wr_wr.wr.rdma.remote_addr = ctx->mr_wr->addr;

        struct ibv_send_wr *bad_wr_wr;

        int ret = -1;

        ret = ibv_post_send(ctx->qp, &wr_wr, &bad_wr_wr);

Re: "Priority trust-mode is not supported on your system"?

$
0
0

Hi, Marc.

 

I installed MLNX_OFED_LINUX-4.1-1.0.2.0 on my server and used the provided tool "mlnx_qos" to set the trust mode for Connect-X 3 Pro.

The command is "mlnx_qos -i p4p1 --trust=dscp".

Then the result is "Priority trust mode is not supported on your system".

 

Thanks

error packets

$
0
0

Hello, everybody!

I have errors on physical interfaces between mellanox switches, connected by MALGs.

Switches are connected by mellanox Active Cable (XLPPI). Errors appears one time in few days wich count about 1000.

You can see interface statistics in attached file.

What can be a reason of this errors?

May it be problems on queue?

 

Re: Assign a MAC to a VLAN

$
0
0

Hi,

What is the idea? Why you need it that way?

Viewing all 6278 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>