ibping is not really sensitive to latency, and is basically used to test basic inband connectivity. Not to say you can't get better results but I would test with ib_write_lat with and bind it to specific CPUs/Numa for benchmarking results.
Starting from there you can go over the performance tuning guide and see if you've missed something along the way.: