物理主机发包流程

vhost_worker-->handle_tx_kick-->handle_tx(sock->ops->sendmsg)-->tun_sendmsg-->tun_get_user-->netif_rx_ni-->do_softirq-->call_softirq-->__do_softirq-->net_rx_action-->process_backlog-->__netif_receive_skb-->__netif_receive_skb_core-->netdev_frame_hook-->netdev_port_receive-->ovs_vport_receive-->ovs_dp_process_packet-->ovs_execute_actions-->do_execute_actions-->do_output-->ovs_vport_send-->vxlan_tnl_send-->vxlan_xmit_skb-->udp_tunnel_xmit_skb-->iptunnel_xmit-->ip_local_out_sk-->__ip_local_out-->__ip_local_out_sk-->dst_output_sk-->ip_output-->ip_finish_output-->ip_finish_output2-->neigh_hh_output-->dev_queue_xmit-->__dev_xmit_skb-->sch_direct_xmit-->dev_hard_start_xmit-->xmit_one-->netdev_start_xmit-->__netdev_start_xmit-->bond_start_xmit-->__bond_start_xmit-->bond_3ad_xor_xmit-->bond_dev_queue_xmit-->dev_queue_xmit-->__dev_xmit_skb-->sch_direct_xmit-->dev_hard_start_xmit-->xmit_one->-netdev_start_xmit-->__netdev_start_xmit-->ixgbe_xmit_frame

以上是host kernel的整个调用路径,如果需要排查瓶颈,可以定义以下时间记录点

  • netif_rx_ni 第一个记录点,vhost的获取到skb之后,第一个接口。
  • ip_local_out_sk 第二个记录点,报文从vhost到ovs查表并且vxlan封装完毕准备发送的时间。
  • dev_queue_xmit 第三个记录点,报文经过netfilter、route和邻居子系统2层封装。
  • dev_queue_xmit 第四个记录点,报文经过bond的时间。
[365665.609813] netif_rx_ni:        skb ffff881f7b117700 len 65212
[365665.609818] ip_local_out_sk:    skb ffff881f7b117700 len 65262
[365665.609825] dev_queue_xmit:        skb ffff881f7b117700 len 65276
[365665.609831] dev_queue_xmit:        skb ffff881f7b117700 len 65276

[365665.610047] netif_rx_ni:        skb ffff881fcf9ad000 len 65212
[365665.610051] ip_local_out_sk:    skb ffff881fcf9ad000 len 65262
[365665.610058] dev_queue_xmit:        skb ffff881fcf9ad000 len 65276
[365665.610062] dev_queue_xmit:        skb ffff881fcf9ad000 len 65276

报文长度65212,包间距0.234ms,包路径0.018ms。
(65212 - 40) * 8 / (365665.610047 - 365665.609813) = 2228102564.102564 bps = 2.07 Gbps

[367138.736316] netif_rx_ni:        skb ffff881f85e1e800 len 1500
[367138.736319] ip_local_out_sk:    skb ffff881f85e1e800 len 1550
[367138.736320] dev_queue_xmit:        skb ffff881f85e1e800 len 1564
[367138.736321] dev_queue_xmit:        skb ffff881f85e1e800 len 1564

[367138.736325] netif_rx_ni:        skb ffff881f85e1ee00 len 1500
[367138.736327] ip_local_out_sk:    skb ffff881f85e1ee00 len 1550
[367138.736330] dev_queue_xmit:        skb ffff881f85e1ee00 len 1564
[367138.736332] dev_queue_xmit:        skb ffff881f85e1ee00 len 1564

1500,包间距0.009ms,包路径0.005ms。
(1500 - 40) * 8 / (367138.736325 - 367138.736316) = 1297777777.777778 bps = 1.2 Gbps

[433171.804667] netif_rx_ni:        skb ffff881f896e2e00 len 8868
[433171.804673] ip_local_out_sk:    skb ffff881f896e2e00 len 8918
[433171.804676] dev_queue_xmit:        skb ffff881f896e2e00 len 8932
[433171.804678] dev_queue_xmit:        skb ffff881f896e2e00 len 8932

[433171.804765] netif_rx_ni:        skb ffff881f88948200 len 8868
[433171.804768] ip_local_out_sk:    skb ffff881f88948200 len 8918
[433171.804770] dev_queue_xmit:        skb ffff881f88948200 len 8932
[433171.804772] dev_queue_xmit:        skb ffff881f88948200 len 8932

[433171.805772] netif_rx_ni:        skb ffff881f852a2f00 len 8868
[433171.805775] ip_local_out_sk:    skb ffff881f852a2f00 len 8918
[433171.805778] dev_queue_xmit:        skb ffff881f852a2f00 len 8932
[433171.805780] dev_queue_xmit:        skb ffff881f852a2f00 len 8932

8868,包间距0.098ms,包路径0.011ms。
(8868 - 40) * 8 / (433171.804765 - 433171.804667) = 720653061.2244898 bps = 687Mbps 实测120Mbps
(8868 - 40) 2 8 / (433171.805772 - 433171.804667) = 127826244.3438914 bps = 122Mbps

[433389.227051] netif_rx_ni:        skb ffff881fcee54500 len 128
[433389.227053] ip_local_out_sk:    skb ffff881fcee54500 len 178
[433389.227055] dev_queue_xmit:        skb ffff881fcee54500 len 192
[433389.227056] dev_queue_xmit:        skb ffff881fcee54500 len 192

[433389.227059] netif_rx_ni:        skb ffff881fcee54400 len 128
[433389.227061] ip_local_out_sk:    skb ffff881fcee54400 len 178
[433389.227063] dev_queue_xmit:        skb ffff881fcee54400 len 192
[433389.227064] dev_queue_xmit:        skb ffff881fcee54400 len 192

[433389.227067] netif_rx_ni:        skb ffff881fcee55a00 len 128
[433389.227069] ip_local_out_sk:    skb ffff881fcee55a00 len 178
[433389.227071] dev_queue_xmit:        skb ffff881fcee55a00 len 192
[433389.227072] dev_queue_xmit:        skb ffff881fcee55a00 len 192

[433389.227076] netif_rx_ni:        skb ffff881fcee54700 len 128
[433389.227078] ip_local_out_sk:    skb ffff881fcee54700 len 178
[433389.227079] dev_queue_xmit:        skb ffff881fcee54700 len 192
[433389.227081] dev_queue_xmit:        skb ffff881fcee54700 len 192

[433389.227084] netif_rx_ni:        skb ffff881fcee54300 len 128
[433389.227085] ip_local_out_sk:    skb ffff881fcee54300 len 178
[433389.227087] dev_queue_xmit:        skb ffff881fcee54300 len 192
[433389.227088] dev_queue_xmit:        skb ffff881fcee54300 len 192

[433389.227091] netif_rx_ni:        skb ffff881fcee54a00 len 128
[433389.227093] ip_local_out_sk:    skb ffff881fcee54a00 len 178
[433389.227095] dev_queue_xmit:        skb ffff881fcee54a00 len 192
[433389.227096] dev_queue_xmit:        skb ffff881fcee54a00 len 192

[433389.227098] netif_rx_ni:        skb ffff881fcee54c00 len 128
[433389.227100] ip_local_out_sk:    skb ffff881fcee54c00 len 178
[433389.227102] dev_queue_xmit:        skb ffff881fcee54c00 len 192
[433389.227103] dev_queue_xmit:        skb ffff881fcee54c00 len 192

[433389.227106] netif_rx_ni:        skb ffff881fd191c300 len 128
[433389.227108] ip_local_out_sk:    skb ffff881fd191c300 len 178
[433389.227110] dev_queue_xmit:        skb ffff881fd191c300 len 192
[433389.227111] dev_queue_xmit:        skb ffff881fd191c300 len 192

[433389.227114] netif_rx_ni:        skb ffff881fd191dd00 len 128
[433389.227116] ip_local_out_sk:    skb ffff881fd191dd00 len 178
[433389.227118] dev_queue_xmit:        skb ffff881fd191dd00 len 192
[433389.227119] dev_queue_xmit:        skb ffff881fd191dd00 len 192

[433389.227121] netif_rx_ni:        skb ffff881fd191d000 len 128
[433389.227124] ip_local_out_sk:    skb ffff881fd191d000 len 178
[433389.227127] dev_queue_xmit:        skb ffff881fd191d000 len 192
[433389.227129] dev_queue_xmit:        skb ffff881fd191d000 len 192

[433389.227147] netif_rx_ni:        skb ffff881fcedbc200 len 128
[433389.227149] ip_local_out_sk:    skb ffff881fcedbc200 len 178
[433389.227151] dev_queue_xmit:        skb ffff881fcedbc200 len 192
[433389.227152] dev_queue_xmit:        skb ffff881fcedbc200 len 192

第二个轮回

[433389.227259] netif_rx_ni:        skb ffff881fcedbc500 len 128
[433389.227261] ip_local_out_sk:    skb ffff881fcedbc500 len 178
[433389.227263] dev_queue_xmit:        skb ffff881fcedbc500 len 192
[433389.227264] dev_queue_xmit:        skb ffff881fcedbc500 len 192

第三个轮回

[433389.227329] netif_rx_ni:        skb ffff881fcedbc700 len 128
[433389.227331] ip_local_out_sk:    skb ffff881fcedbc700 len 178
[433389.227332] dev_queue_xmit:        skb ffff881fcedbc700 len 192
[433389.227334] dev_queue_xmit:        skb ffff881fcedbc700 len 192

第四个轮回

[433389.227448] netif_rx_ni:        skb ffff881fcedbd400 len 128
[433389.227450] ip_local_out_sk:    skb ffff881fcedbd400 len 178
[433389.227452] dev_queue_xmit:        skb ffff881fcedbd400 len 192
[433389.227454] dev_queue_xmit:        skb ffff881fcedbd400 len 192

128,包间距0.008ms,包路径0.001ms。
(128 - 40) * 8 / (433389.227059 - 433389.227051) = 88000000 bps = 84Mbps 实测51Mbps多一些

第一轮回
(128 - 40) 10 8 / (433389.227147 - 433389.227051) = 73333333.33333333 bps = 70Mbps

第四轮回
(128 - 40) 10 8 / (433389.227448 - 433389.227329) = 59159663.86554622 bps = 56Mbps

从以上数据可以看到
64k的大报文的路径耗时会长一些(0.018ms),但是报文间隔更长0.234ms,这块时间主要是虚拟机发包、vhost和物理机网卡发包的耗时。
其中ovs和vxlan封包耗时0.005ms,netfilter、分片、路由、邻居子系统耗时0.008ms,报文过bond耗时0.006ms。
但是单个统计不准,这里针对三块统计的两个数据分别为:

0.005 0.008 0.006
0.004 0.007 0.004

1500的报文路径耗时会短一些(0.005ms),并且报文间隔也还好(0.009ms)。
其中ovs和vxlan封包耗时0.003ms,netfilter、分片、路由、邻居子系统耗时0.001ms,报文过bond耗时0.001ms。
这一组的对比可以看出,ovs和vxlan处耗时会多一些,netfilter处耗时多是因为分片多,bond处耗时多还不得而知。

0.003 0.001 0.001
0.002 0.003 0.002

这块的出入比较大,也会受收报文的影响。

8k报文的路径耗时0.011ms,报文间隔短的0.098ms,长的1.007ms。
其中ovs和vxlan封包耗时0.006ms,netfilter、分片、路由、邻居子系统耗时0.003ms,报文过bond耗时0.003ms。

0.006 0.003 0.002
0.003 0.002 0.002
0.003 0.003 0.002

报文间距的长短也是受接收的影响,如果没有接收报文,间隔为0.098ms,有报文接收的时候,报文间隔1.007ms。

128报文的路径耗时0.005ms,报文间隔短的0.008ms,长的0.026ms。
报文间距长短区别同上。

暂时没发现哪块能有比较好的优化效果,因为报文是线性处理,所以报文性能受限于vhost以及之后的这一系列的处理。

最后修改:2021 年 08 月 18 日
如果觉得我的文章对你有用,请随意赞赏