Skip to content

IB Verbs

Quote

  • 头文件:infiniband/verbs.h
  • API 前缀:ibv_
  • 软件包:libibverbs-dev

例程

代码见 RDMA_RC_example.c

准备阶段:

  • resource_create():创建资源,包括 PD、MR、QP、CQ 等。
  • connect_qp():通信双方交换信息,包括 LID、QP_NUM、RKEY 等,将 QP 状态更改为 INIT、RTR、RTS。
    • sock_sync_data():通过 TCP 通信交换信息。
    • modify_qp_to_init()
    • post_receive():预置接收队列,也可以放在通信阶段。
    • modify_qp_to_rtr()
    • modify_qp_to_rts()
    • 同步点
flowchart TD
 subgraph s1["resource_create()"]
  n27@{ shape: "rounded", label: "ibv_query_port()" }
  n26@{ shape: "hex", label: "ibv_port_attr" }
  n25@{ shape: "hex", label: "ibv_mr" }
  n17@{ shape: "hex", label: "char *" }
  n16@{ shape: "rounded", label: "ibv_get_device_name()" }
  n15@{ shape: "rounded", label: "ibv_get_device_list()" }
  n14@{ shape: "hex", label: "ibv_device" }
 n1@{ shape: "hex", label: "ibv_pd" }
 n2@{ shape: "rounded", label: "ibv_alloc_pd()" }
 n3@{ shape: "hex", label: "ibv_context" }
 n4@{ shape: "hex", label: "buf" }
 n3 --- n2
 n2 --- n1
 n5@{ shape: "rounded", label: "ibv_open_device()" }
 n5 --- n3
 n6@{ shape: "rounded", label: "ibv_create_cq()" }
 n3 --- n6
 n7@{ shape: "hex", label: "mr_flags" }
 n8@{ shape: "rounded", label: "ibv_reg_mr" }
 n4 --- n8
 n7@{ shape: "fr-rect", label: "mr_flags<br/>=IBV_ACCESS_REMOTE_READ|..." } --- n8
 n1 --- n8@{ shape: "rounded", label: "ibv_reg_mr()" }
 n9@{ shape: "hex", label: "ibv_qp_init_attr" }
 n10@{ shape: "hex", label: "ibv_cq" }
 n6 --- n10
 n9@{ shape: "hex", label: "ibv_qp_init_attr" }
 n10 ---|"send_cq, recv_cq"| n9
 n11@{ shape: "fr-rect", label: "qp_type<br/>=IBV_QPT_RC" }
 n11 --- n9
 n12@{ shape: "rounded", label: "ibv_create_qp()" }
 n9 --- n12
 n13@{ shape: "hex", label: "ibv_qp" }
 n12 --- n13
 end
 n15 --- n14@{ shape: "hex", label: "ibv_device **" }
 n14 --- n16
 n16 --- n17
 n14 --- n5
 subgraph s2["connect_qp()"]
  n36@{ shape: "rounded", label: "sock_sync_data()" }
  subgraph s3["cm_con_data_t"]
   n24@{ shape: "hex", label: "buf" }
   n23@{ shape: "hex", label: "lid" }
   n22@{ shape: "hex", label: "qp_num" }
   n20@{ shape: "hex", label: "rkey" }
   n21@{ shape: "hex", label: "gid" }
  end
  n18@{ shape: "hex", label: "ibv_gid" }
  n19@{ shape: "rounded", label: "ibv_query_gid()" }
 end
 n3 --- n19
 n19 --- n18
 n18 --- n21
 n8 --- n25
 n25 --- n20
 n4 --- n24
 n13 --- n22
 n27 --- n26
 n3 --- n27
 n26 --- n23
 n29 --- n30
 n13 --- n30
 n28 --- n29
 subgraph s5["post_receive()"]
  n33@{ shape: "rounded", label: "ibv_post_recv" }
  n32@{ shape: "hex", label: "ibv_recv_wr" }
  n31@{ shape: "hex", label: "ibv_sge" }
 end
 n25 ---|"lkey"| n31
 n4 --- n31
 subgraph s4["modify_qp_to_init, rts()"]
  n28@{ shape: "fr-rect", label: "qp_state<br/>=IBV_QPS_INIT" }
  n30@{ shape: "rounded", label: "ibv_modify_qp()" }
  n29@{ shape: "hex", label: "ibv_qp_attr" }
 end
 n31 --- n32
 n32 --- n33
 subgraph s6["modify_qp_to_rtr()"]
  n34@{ shape: "rounded", label: "Rounded Rectangle" }
  n35@{ shape: "hex", label: "ibv_qp_attr" }
 end
 n22 --- n35
 n23 --- n35
 n35 --- n34@{ shape: "rounded", label: "ibv_modify_qp()" }
 s3 --- n36

通信阶段:

  • post_send():创建并发送 WR,WR 的类型取决于 opcode
  • poll_completion():轮询得到 WC。
flowchart TD
 subgraph s1["post_send()"]
  n12@{ shape: "rounded", label: "ibv_post_send()" }
  n7@{ shape: "fr-rect", label: ".opcode<br/>IBV_WR_SEND<br/>IBV_WR_RDMA_READ<br/>IBV_WR_RDMA_WRITE" }
  n1@{ shape: "hex", label: "ibv_sge" }
  n2@{ shape: "hex", label: "ibv_send_wr" }
 end
 n3@{ shape: "hex", label: "ibv_mr" }
 n3 ---|".lkey"| n1
 n4@{ shape: "hex", label: "buf" }
 n4 --- n1
 n1 --- n2
 subgraph s2["IBV_WR_SEND only"]
  n5@{ shape: "hex", label: ".rkey" }
  n6@{ shape: "hex", label: ".remote_addr" }
 end
 s2 ---|".wr.rdma"| n2
 n7 --- n2
 subgraph s3["poll_completion()"]
  n11@{ shape: "rounded", label: "assert()" }
  n10@{ shape: "rounded", label: "ibv_poll_cq()" }
  n9@{ shape: "hex", label: "ibv_wc" }
 end
 n8@{ shape: "hex", label: "ibv_cq" }
 n9 --- n10
 n8 --- n10
 n10 ---|".status == IBV_WC_SUCCESS"| n11
 n2 --- n12

该程序演示了下面的操作:

  • resource_create():服务端把 SEND operation 字符串放在缓冲区 res->buf 中。
  • connect_qp():交换资源信息,远端信息放入 res->remote_props。交换内容包括 res->buf 的地址。Client 向 Server 发送一个 Receive。
  • post_send():Server 发送一个 Send。该 WR 的构成:
    • .sg_list->addrres->buf,即 Server 的缓冲区地址。
    • .wr.rdma.remote_addrres->remote_props.addr,即 Client 的缓冲区地址。
  • poll_completion():Client 收到并显示信息 SEND operation
  • Server 再将缓冲区内容修改为 RDMA read operation
  • post_send():Client 发送一个 read 操作,读取到 RDMA read operation。因为这是单边操作,Server 不会知道。
  • Client 将缓冲区内容修改为 RDMA write operation
  • post_send():Client 发送一个 write 操作,写入到 Server 的缓冲区。
  • Server 打印缓冲区内容,为 RDMA write operation

SRQ

SRQ 作为 struct ibv_qp_init_attr 中的一个可选字段。