Southbound DB主要是ovn-northd翻译Northbound DB产生的。然后由ovn-controller作为控制器,将SbDB翻译为真实的流表和ovs操作下发给OVS。

拓扑


前面介绍了我们查看数据库的拓扑,这里再贴一下:

请输入图片描述

Southbound DB


内容展示

Southbound DB位于控制节点,主要是由ovn-northd服务将Northbound DB中的数据翻译之后的信息,就不像之前有那么好的可读性了,但是相对OVS展现的信息还是具有一定的可读性,相当于逻辑拓扑和底层的实际流表的一个中间状态,接下来我们就看一下它包含了哪些信息,详细信息可以查看可以查看文件ovn-sb.ovsschema,或者ovn-sb.5.txt:

  • SB_Global 存储一些全局信息
  • Chassis表示一个hypervisor或者网关或者物理网络

    • name 通过external_ids:system-id添加
    • hostname代表物理机Host
    • nb_cfg从上面SB_Global拷贝的
    • external_ids: ovn-bridge-mappings 映射桥和物理网络,比如physnet1:br-eth0
    • external_ids: datapath-type 配置类型,默认是system,但是我们需要DPDK的时候需要配置为netdev
    • external_ids: iface-types 配置端口类型
    • encaps 表示配置的tunnel类型
  • Encap表示tunnel封装的配置

    • type 封装的类型
    • options可以包含一些其他的信息
    • ip表示我们封装tunnel的源ip
  • Address_set 可以查看NbDB的信息
  • Logical Flow表示了逻辑流表

    • logical_datapath Datapath_Binding 表示Logical Flow属于哪个datapath
    • pipeline ingress或者egress,一般是ingress直接对接egress
    • table_id 0-15 类似Openflow的table id,但是更少
    • priority 0-65535 和Openflows的优先级,越大越高
    • match 匹配项,更加通俗易懂一些
    • actions 更加易懂
  • Multicast_Group

    • datapath: Datapath_Binding
    • tunnel_key
    • name
    • ports
  • Datapath_Binding 每个Logical datapath实现了Port_Binding中的端口的Logical pipeline

    • tunnel_key 1-16777215 每个Logical datapath绑定一个
    • external_ids: logical-switch 一个Logical datapath代表了一个logical switch
    • external_ids: logical-router 一个Logical datapath代表了一个logical router
    • external_ids: name ovn-northd从NbDB中的Logical switch或者Logical router拷贝过来UUID
  • Port_Binding

    • datapath Logical port所属的Logical datapath
    • logical_port 从NbDB中Logical_Switch_Port获取的
    • chassis

      • 默认 Logical port
      • vtep hardware_vtep gateway
      • localnet
      • l3gateway
      • l2gateway
    • tunnel_key 1-32767 Logical Port记录的key值
    • mac Logical Port的mac
    • type 端口类型

      • 默认 VM端口
      • patch
      • l3gateway 连接多个chassis
      • localnet
      • l2gateway
      • vtep
      • chassisredirect 重定向到其他的chassis
    • patch 选项

      • peer
    • L3网关选项

      • peer
      • l3gateway-chassis
      • nat-address
    • Localnet选项

      • network_name
    • L2 网关选项

      • network_name
      • l2gateway-chassis
    • VTEP选项

      • vtep-physical-switch
      • vtep-logical-switch
    • VMI选项

      • qos_max_rate
      • qos_burst
      • qdisc_queue_id
    • Chassis Redirect选项

      • distributed-port
      • redirect-chassis
    • Nested Containers

      • parent_port
      • tag
  • MAC_Binding logical_port和IP得到MAC

    • logical_port, ip, mac, datapath
  • DHCP_Options 包含name, code, type
  • DHCPv6_Options
  • Connection
  • SSL

chassis信息展示

ovn-sbctl show展示的是chassis信息,每项的意义上面已经介绍。

//此处的chassis的名字对应的是ovs的system-id,通过命令ovs-vsctl get open . external-ids:system-id查看
//相当于每个宿主机的ovn-controller根据这个id来确定自己属于哪个chassis
Chassis "0d8eee77-63c1-4250-a9f0-c242360525d5"
    //设备的主机名
    hostname: "compute1"
    //封装类型
    Encap geneve
        //封装的Local IP
        ip: "1.1.1.201"
        //options
        options: {csum="true"}
    //绑定的端口
    Port_Binding "ls1-vm1"
    Port_Binding "ls2-vm3"
Chassis "e748fbdb-60bc-4707-87e6-66823a343990"
    hostname: "compute2"
    Encap geneve
        ip: "1.1.1.202"
        options: {csum="true"}
    Port_Binding "ls1-vm2"
    Port_Binding "ls2-vm4"
Chassis "5fa32c09-4b0c-4f35-bdfc-9a6675bf1b95"
    hostname: "network"
    Encap geneve
        ip: "1.1.1.200"
        options: {csum="true"}

Logical Flow信息展示

ovn-sbctl dump-flows展示的是流表,下面会展示,所以我们在这里仅仅介绍一下Logical Flow的pipeline。详细文档可以参照ovn-northd

LS ingress流表展示

  • table=0 ls_in_port_sec_l2 L2的端口安全

    • 100优先级的丢弃带有VLAN标签或者源MAC多播的报文
    • 50优先级,port security开启的话,匹配inport、合理eth.src的报文给到下一级,不开启的话将所有的报文给到下一级
    • 没流表了,但是默认就是丢包
  • table=1 ls_in_port_sec_ip IP的端口安全

    • 90优先级允许匹配inport、合理的eth.src和ip4.src通过
    • 90优先级允许合理的eth.src的IPv4的DHCP discovery流量通过,因为IP没有被分配的时候为0.0.0.0
    • ipv6同上两条,优先级一致,这个下面默认不说就是优先级一致
    • 80优先级允许丢弃匹配in_port、合理eth.src的IPv4和IPv6流量
    • 0匹配所有的报文,给到下一级table,此表以下都有就不写了
  • table=2 ls_in_port_sec_nd 邻居发现的端口安全

    • 90优先级的流表允许匹配in_port、合理eth.src和arp.sha的报文通过
    • ipv6同上
    • 80优先级丢弃匹配in_port和合理eth.src的ARP或者IPv6的发现报文
  • table=3 ls_in_pre_acl from-lport Pre-Acls

    • 如果设置了ACL,100优先级设置reg0[0] = 1;next为了发给conntrack加状态做准备
  • table=4 ls_in_pre_lb

    • 如果配置了LB,100优先级为每个VIP添加一个match(ip && ip4.dst==VIP)并且设置action=(reg0[0] = 1; next)
  • table=5 ls_in_pre_stateful

    • 100优先级将reg0[0] == 1的报文使用action=(ct_next;)发送给conntrack
  • table=6 ls_in_acl from-lport ACLS

    • 65535优先级允许回复方向的报文转发到下一级,即action=(next;),这个是allow ACLs翻译的
    • 65535优先级允许已经commit的报文转发到下一级
    • 65535优先级丢弃所有ct.inv的报文
    • 65535优先级丢弃所有ct_label.blocked==1的报文
    • 1优先级将不是establish的报文翻译为action=(reg0[1] = 1; next;),这个是allow-related翻译的
  • table=7 ls_in_qos from-lport QoS marking

    • 需要添加的QoS
  • table=8 ls_in_lb LB

    • 100优先级match(ct.est && !ct.rel && !ct.new && !ct.inv)执行action=(reg0[2] = 1; next;)意味着通过conntrack发送报文到NAT模块
  • table=9 ls_in_stateful

    • 120优先级协议P、IP地址VIP和端口PORT,match(ct.new && ip && ip4.dst == VIP && P && P.dst == PORT),action=(ct_lb(args))表示LB到哪个args IP。
    • 110优先级匹配IP地址,match(ct.new && ip && ip4.dst == VIP),action=(ct_lb(args))
    • 100优先级match=(reg0[1] == 1), action=(ct_commit(ct_label=0/1); next;)
    • 100优先级match=(reg0[2] == 1), action=(ct_lb;)
  • table=10 ls_in_arp_rsp arp请求

    • 100优先级将inport类型为localnet和vtep的arp请求直接转发到下一级的流表
    • 100优先级和下面比,match项多了一项inport,就是检查自己端口发出的报文,请求是自己IP的ARP请求,就会转发到下一级,这么做是为了检查网段内有没有IP冲突
    • 50优先级将Logical带有的IP进行ARP请求的IP作为匹配,并且修改报文的ARP项中的源目的IP和MAC,设置loopback标志发送,这里有多少Logical Port IP就有多少条
    • IPv6同上
  • table=11 ls_in_dhcp_options DHCP options处理

    • 100优先级根据源端口、源IP、inport、UDP源端口68和目的端口67判定是否符合条件的DHCP请求,将需要给的IP等一系列信息通过put_dhcp_opts传递给reg0[3],转发下一级流表,也可以多条
    • IPv6同上
  • table=12 ls_in_dhcp_response

    • 100优先级主要是将前面符合DHCP请求报文进行修改,修改为回复报文,类似ARP回复,设置loopback标志发送,也是多条
    • DHCPv6同上
  • table=13 ls_in_l2_lkup

    • 100优先级将广播报文直接,出端口设置为MC_FLOOD发送
    • 50优先级将Logical Port支持的MAC进行匹配目的MAC,出端口设置为该Logical Port发送,多条

LS egress流表展示

  • table=0 ls_out_pre_lb 类似上面ls_in_pre_lb
  • table=1 ls_out_pre_acl 类似上面ls_in_pre_acl,只不过这里是to-lport
  • table=2 ls_out_pre_stateful 类似上面ls_in_pre_stateful
  • table=3 ls_out_lb 类似上面的ls_in_lb
  • table=4 ls_out_acl 类似上面的ls_in_acl,这里也是to-lport
  • table=5 ls_out_qos 类似上面的ls_in_qos,也是to-lport
  • table=6 ls_out_stateful 和上面的ls_in_stateful类似,没有LB相关的操作

    • 34000优先级允许ls_in_dhcp_response回复的报文通过
  • table=7 ls_out_port_sec_ip 和上面的ls_in_port_sec_ip类似,只是inport、eth.src、ip4.src、ip6.src更换成了outport和目的地址
  • table=8 ls_out_port_sec_l2 类似ls_in_port_sec_l2,出入端口和源目的地址要替换

    • 150优先级将disabled Logical Port的广播、多播报文丢弃
    • 100优先级允许广播、多播的通过

LR ingress流表展示

  • table=0 lr_in_admission

    • 100优先级丢掉带有vlan标签或者源MAC多播的报文
    • 50优先级满足入端口和目的MAC一致或者多播的时候转发下一级
    • 50优先级入端口是GW,匹配扩展目的MAC的时候转发下一级,主要用于分布式路由并且有NAT规则时
    • 没流表了,但是默认就是丢包
  • table=1 lr_in_ip_input

    • 100优先级满足以下条件之一的会丢掉

      • 源IP多播
      • 源IP广播
      • localhost的源目的IP
      • zero network的源目的IP
      • 源IP是Router拥有的IP,一般是网关IP并且reg9[1] == 0,这个应该是指定不是loopback的报文
      • router已知任意IP的广播地址,什么鬼。。。
    • 90优先级,做IPv4和IPv6的网关IP的ICMP echo reply
    • 90优先级,做IPv4和IPv6的网关IP的邻居子系统的回复,arp和nd
    • 和上面一条类似,但是是给NAT和LB的虚拟IP的代答
    • 90优先级处理arp reply,维护arp表
    • 80优先级处理UDP端口不可达,Logical Router不接受UDP报文
    • 80优先级处理TCP reset,Logical Router不接受TCP报文
    • 70优先级处理协议不可达,支持TCP、UDP和ICMP
    • 60优先级丢弃目的IP为Router网关的报文
    • 50优先级丢弃Ethernet local broadcast的报文
    • 40优先级将inport是Router端口的TTL到期的报文回复ICMP的超时回复报文
    • 30优先级丢弃TTL到期的报文
    • 0匹配所有的报文,给到下一级table,此表以下都有就不写了
  • table=2 lr_in_defrag

    • 100优先级匹配IP报文,并且目的IP是LB指定VIP的时候,执行ct_next去conntrack报道,为LB存在
  • table=3 lr_in_unsnat

    • 网关Router

      • 110优先级强制SNAT之前DNAT过的报文到B,匹配目的IP为B执行ct_snat;next;
      • 100优先级强制SNAT之前LB过的报文到B,同上
      • 90优先级修改源IP A到B的报文,同上
    • 分布式Router 这块可以再细看看

      • 100优先级修改源IP A到B的报文,匹配目的IP为B,并且inport==GW的报文,执行ct_snat;next;
      • 50优先级同上,只是最终执行变成了REGBIT_NAT_REDIRECT = 1; next;貌似是其他路由端口上收上来的报文需要重定向到分布式网关端口来进行NAT操作
  • table=4 lr_in_dnat

    • 网关Router

      • 120优先级match(ct.new && ip && ip4.dst == VIP && P && P.dst == PORT), action=(flags.force_snat_for_lb = 1; ct_lb(args);)
      • 120优先级match(ct.est && ip && ip4.dst == VIP && P && P.dst == PORT), action=(flags.force_snat_for_lb = 1; ct_dnat;)
      • 110优先级match(ct.new && ip && ip4.dst == VIP), action=(flags.force_nat_for_lb = 1; ct_lb(args);)
      • 110优先级match(ct.est && ip && ip4.dst == VIP), action=(flags.force_snat_for_lb = 1; ct_dnat;)
      • 100优先级把A目的IP修改B,match(ip && ip4.dst == A), action=(flags.force_snat_for_lb = 1; flags.loopback = 1; ct_dnat(B));
      • 50优先级所有的网关Router的IP都执行操作action=(flags.loopback=1; ct_dnat;)
    • 分布式Router

      • 100优先级把A目的IP修改为B,match(ip && ip4.dst == B && inport == GW), action=(ct_dnat(B);)
      • 50优先级把A目的IP修改为B,match(ip && ip4.dst == B), action=(REGBIT_NAT_REDIRECT = 1; next;)
  • table=5 lr_in_ip_routing

    • 300优先级match(REGBIT_NAT_REDIRECT == 1), action=(ip.ttl--; next;)
    • 文档没说优先级,我看到的是IPv6是129优先级,IPv4是49优先级,分别根据目的IP段进行匹配,执行操作有ttl减一、记录网关IP、记录目的IP、源MAC修改、出端口指定等操作,最后转发下一级流表
    • 这里是上面的例子match=(ip4.dst == 192.168.1.0/24), action=(ip.ttl--; reg0 = ip4.dst; reg1 = 192.168.1.1; eth.src = 52:54:00:c1:68:50; outport = "lr1-ls1"; flags.loopback = 1; next;)
  • table=6 lr_in_arp_resolve

    • 200优先级match(REGBIT_NAT_REDIRECT == 1), action=(eth.dst = E; next;)
    • 100优先级是静态的MAC绑定,IPv4和IPv6的报文匹配端口为Router Port和reg0为已知的段内IP,执行操作修改目的MAC为IP的MAC转发
    • 0优先级是动态的MAC绑定,IPv4和IPv6的报文匹配IP报文即可,执行的操作为action=(get_arp(outport, reg0); next;)或者action=(get_nd(outport, xxreg0); next;)
  • table=7 lr_in_gw_redirect

    • 200优先级match(REGBIT_NAT_REDIRECT == 1), action=(outport=CR; next;)
    • 150优先级match(output == GW && eth.dst == 00:00:00:00:00:00), action=(outport=CR; next;)
    • 100优先级有NAT规则的时候match(ip4.src == B && outport == GW), action=(next;)
    • 50优先级match(outport == GW), action=(outport=CR; next;)
  • table=8 lr_in_arp_request

    • 100优先级目的MAC为0的时候,封装ARP Request报文,主要是目的MAC修改为广播、源IP从reg1取,目标IP从reg0取、操作为request,最后发送

LR egress流表展示

  • table=0 lr_out_undnat

    • 100优先级match(ip && ip4.src == B && outport == GW), action=(ct_dnat;)
  • table=1 lr_out_snat

    • 网关Router

      • 100优先级强制SNAT之前DNAT过的报文为B,match(flags.force_snat_for_dnat == 1 && ip), action=(ct_snat(B);)
      • 100优先级强制SNAT之前LB过得报文为B,match(flags.force_snat_fir_lb == 1 && ip), action=(ct_snat(b);)
      • 优先级不固定,SNAT a==>B,由A计算优先级,match(ip && ip4.src == A), action=(ct_snat(B));
    • 分布式Router

      • 优先级不固定,SNAT a==>B,由A计算优先级,match(ip && ip4.src == A && outport == GW), action=(ct_snat(B));
  • table=2 lr_out_egr_loop

    • 100优先级支持NAT规则时使用,将目的IP是NAT指定的IP而出端口是GW端口的话,报文回注到ingress的table 0
  • table=3 lr_out_delivery

    • 100优先级的出口匹配Logical Router Port,执行发送操作

逻辑流表展示

首先查看逻辑路由的逻辑流表

Datapath: "lr1" (f93e843a-0b9c-4c50-8ef2-d3a338661759)  Pipeline: ingress

//table 0 比较简单,就是说允许从逻辑交换机来的报文,到网关或者广播的都允许通过
table=0 (lr_in_admission    ), priority=100  , match=(vlan.present || eth.src[40]), action=(drop;)
table=0 (lr_in_admission    ), priority=50   , match=(eth.dst == 52:54:00:c1:68:50 && inport == "lr1-ls1"), action=(next;)
table=0 (lr_in_admission    ), priority=50   , match=(eth.dst == 52:54:00:c1:68:60 && inport == "lr1-ls2"), action=(next;)
table=0 (lr_in_admission    ), priority=50   , match=(eth.mcast && inport == "lr1-ls1"), action=(next;)
table=0 (lr_in_admission    ), priority=50   , match=(eth.mcast && inport == "lr1-ls2"), action=(next;)

//table 1 首先丢掉一些不符合条件的报文,不关心,接下来就是网关arp和icmp的代答
table=1 (lr_in_ip_input     ), priority=100  , match=(ip4.mcast || ip4.src == 255.255.255.255 || ip4.src == 127.0.0.0/8 || ip4.dst == 127.0.0.0/8 || ip4.src == 0.0.0.0/8 || ip4.dst == 0.0.0.0/8), action=(drop;)
table=1 (lr_in_ip_input     ), priority=100  , match=(ip4.src == {192.168.1.1, 192.168.1.255} && reg9[1] == 0), action=(drop;)
table=1 (lr_in_ip_input     ), priority=100  , match=(ip4.src == {192.168.2.1, 192.168.2.255} && reg9[1] == 0), action=(drop;)
table=1 (lr_in_ip_input     ), priority=100  , match=(ip6.src == fe80::5054:ff:fec1:6850), action=(drop;)
table=1 (lr_in_ip_input     ), priority=100  , match=(ip6.src == fe80::5054:ff:fec1:6860), action=(drop;)
table=1 (lr_in_ip_input     ), priority=90   , match=(arp.op == 2), action=(put_arp(inport, arp.spa, arp.sha);)
table=1 (lr_in_ip_input     ), priority=90   , match=(inport == "lr1-ls1" && arp.tpa == 192.168.1.1 && arp.op == 1), action=(eth.dst = eth.src; eth.src = 52:54:00:c1:68:50; arp.op = 2; /* ARP reply */ arp.tha = arp.sha; arp.sha = 52:54:00:c1:68:50; arp.tpa = arp.spa; arp.spa = 192.168.1.1; outport = "lr1-ls1"; flags.loopback = 1; output;)
table=1 (lr_in_ip_input     ), priority=90   , match=(inport == "lr1-ls1" && nd_ns && ip6.dst == {fe80::5054:ff:fec1:6850, ff02::1:ffc1:6850} && nd.target == fe80::5054:ff:fec1:6850), action=(put_nd(inport, ip6.src, nd.sll); nd_na { eth.src = 52:54:00:c1:68:50; ip6.src = fe80::5054:ff:fec1:6850; nd.target = fe80::5054:ff:fec1:6850; nd.tll = 52:54:00:c1:68:50; outport = inport; flags.loopback = 1; output; };)
table=1 (lr_in_ip_input     ), priority=90   , match=(inport == "lr1-ls2" && arp.tpa == 192.168.2.1 && arp.op == 1), action=(eth.dst = eth.src; eth.src = 52:54:00:c1:68:60; arp.op = 2; /* ARP reply */ arp.tha = arp.sha; arp.sha = 52:54:00:c1:68:60; arp.tpa = arp.spa; arp.spa = 192.168.2.1; outport = "lr1-ls2"; flags.loopback = 1; output;)
table=1 (lr_in_ip_input     ), priority=90   , match=(inport == "lr1-ls2" && nd_ns && ip6.dst == {fe80::5054:ff:fec1:6860, ff02::1:ffc1:6860} && nd.target == fe80::5054:ff:fec1:6860), action=(put_nd(inport, ip6.src, nd.sll); nd_na { eth.src = 52:54:00:c1:68:60; ip6.src = fe80::5054:ff:fec1:6860; nd.target = fe80::5054:ff:fec1:6860; nd.tll = 52:54:00:c1:68:60; outport = inport; flags.loopback = 1; output; };)
table=1 (lr_in_ip_input     ), priority=90   , match=(ip4.dst == 192.168.1.1 && icmp4.type == 8 && icmp4.code == 0), action=(ip4.dst <-> ip4.src; ip.ttl = 255; icmp4.type = 0; flags.loopback = 1; next; )
table=1 (lr_in_ip_input     ), priority=90   , match=(ip4.dst == 192.168.2.1 && icmp4.type == 8 && icmp4.code == 0), action=(ip4.dst <-> ip4.src; ip.ttl = 255; icmp4.type = 0; flags.loopback = 1; next; )
table=1 (lr_in_ip_input     ), priority=90   , match=(ip6.dst == fe80::5054:ff:fec1:6850 && icmp6.type == 128 && icmp6.code == 0), action=(ip6.dst <-> ip6.src; ip.ttl = 255; icmp6.type = 129; flags.loopback = 1; next; )
table=1 (lr_in_ip_input     ), priority=90   , match=(ip6.dst == fe80::5054:ff:fec1:6860 && icmp6.type == 128 && icmp6.code == 0), action=(ip6.dst <-> ip6.src; ip.ttl = 255; icmp6.type = 129; flags.loopback = 1; next; )
table=1 (lr_in_ip_input     ), priority=90   , match=(nd_na), action=(put_nd(inport, nd.target, nd.tll);)
table=1 (lr_in_ip_input     ), priority=80   , match=(nd_ns), action=(put_nd(inport, ip6.src, nd.sll);)

//table 1 的网关不可连接,所以到网关的报文除了之前的处理都丢掉,其他的报文直接通过
table=1 (lr_in_ip_input     ), priority=60   , match=(ip4.dst == {192.168.1.1}), action=(drop;)
table=1 (lr_in_ip_input     ), priority=60   , match=(ip4.dst == {192.168.2.1}), action=(drop;)
table=1 (lr_in_ip_input     ), priority=60   , match=(ip6.dst == fe80::5054:ff:fec1:6850), action=(drop;)
table=1 (lr_in_ip_input     ), priority=60   , match=(ip6.dst == fe80::5054:ff:fec1:6860), action=(drop;)
table=1 (lr_in_ip_input     ), priority=50   , match=(eth.bcast), action=(drop;)
table=1 (lr_in_ip_input     ), priority=30   , match=(ip4 && ip.ttl == {0, 1}), action=(drop;)
table=1 (lr_in_ip_input     ), priority=0    , match=(1), action=(next;)

//table 2 3 4 因为我们没有配置,所以没有相关的操作
table=2 (lr_in_defrag       ), priority=0    , match=(1), action=(next;)
table=3 (lr_in_unsnat       ), priority=0    , match=(1), action=(next;)
table=4 (lr_in_dnat         ), priority=0    , match=(1), action=(next;)

//table 5 直接转发,根据目的IP进行转发
table=5 (lr_in_ip_routing   ), priority=129  , match=(inport == "lr1-ls1" && ip6.dst == fe80::/64), action=(ip.ttl--; xxreg0 = ip6.dst; xxreg1 = fe80::5054:ff:fec1:6850; eth.src = 52:54:00:c1:68:50; outport = "lr1-ls1"; flags.loopback = 1; next;)
table=5 (lr_in_ip_routing   ), priority=129  , match=(inport == "lr1-ls2" && ip6.dst == fe80::/64), action=(ip.ttl--; xxreg0 = ip6.dst; xxreg1 = fe80::5054:ff:fec1:6860; eth.src = 52:54:00:c1:68:60; outport = "lr1-ls2"; flags.loopback = 1; next;)
table=5 (lr_in_ip_routing   ), priority=49   , match=(ip4.dst == 192.168.1.0/24), action=(ip.ttl--; reg0 = ip4.dst; reg1 = 192.168.1.1; eth.src = 52:54:00:c1:68:50; outport = "lr1-ls1"; flags.loopback = 1; next;)
table=5 (lr_in_ip_routing   ), priority=49   , match=(ip4.dst == 192.168.2.0/24), action=(ip.ttl--; reg0 = ip4.dst; reg1 = 192.168.2.1; eth.src = 52:54:00:c1:68:60; outport = "lr1-ls2"; flags.loopback = 1; next;)

//table 6 根据出端口和上面reg0拿到的目的IP获取arp信息,得到相应的IP对应的mac
table=6 (lr_in_arp_resolve  ), priority=0    , match=(ip4), action=(get_arp(outport, reg0); next;)
table=6 (lr_in_arp_resolve  ), priority=0    , match=(ip6), action=(get_nd(outport, xxreg0); next;)

//table 7 不清楚
table=7 (lr_in_gw_redirect  ), priority=0    , match=(1), action=(next;)

//table 8 表示前面要么没有获取相应的arp信息,就发arp request,要么就直接转发
table=8 (lr_in_arp_request  ), priority=100  , match=(eth.dst == 00:00:00:00:00:00), action=(arp { eth.dst = ff:ff:ff:ff:ff:ff; arp.spa = reg1; arp.tpa = reg0; arp.op = 1; output; };)
table=8 (lr_in_arp_request  ), priority=0    , match=(1), action=(output;)

Datapath: "lr1" (f93e843a-0b9c-4c50-8ef2-d3a338661759)  Pipeline: egress

//table 0 1 2 都不管
table=0 (lr_out_undnat      ), priority=0    , match=(1), action=(next;)
table=1 (lr_out_snat        ), priority=0    , match=(1), action=(next;)
table=2 (lr_out_egr_loop    ), priority=0    , match=(1), action=(next;)

//table 3 表示根据出口直接发送
table=3 (lr_out_delivery    ), priority=100  , match=(outport == "lr1-ls1"), action=(output;)
table=3 (lr_out_delivery    ), priority=100  , match=(outport == "lr1-ls2"), action=(output;)

接下来查看逻辑交换机1的逻辑流表

Datapath: "ls1" (94b0f9b2-a91e-435b-8db2-01dec1acbf3d)  Pipeline: ingress

//table 0 表示来自逻辑路由和来自虚拟机的报文都可以通过,前提是mac是对的
table=0 (ls_in_port_sec_l2  ), priority=100  , match=(eth.src[40]), action=(drop;)
table=0 (ls_in_port_sec_l2  ), priority=100  , match=(vlan.present), action=(drop;)
table=0 (ls_in_port_sec_l2  ), priority=50   , match=(inport == "ls1-lr1"), action=(next;)
table=0 (ls_in_port_sec_l2  ), priority=50   , match=(inport == "ls1-vm1" && eth.src == {52:54:00:c1:68:70}), action=(next;)
table=0 (ls_in_port_sec_l2  ), priority=50   , match=(inport == "ls1-vm2" && eth.src == {52:54:00:c1:68:72}), action=(next;)

//table 1 没有啥安全设置
table=1 (ls_in_port_sec_ip  ), priority=0    , match=(1), action=(next;)

//table 2 是vm的arp安全检测
table=2 (ls_in_port_sec_nd  ), priority=90   , match=(inport == "ls1-vm1" && eth.src == 52:54:00:c1:68:70 && arp.sha == 52:54:00:c1:68:70), action=(next;)
table=2 (ls_in_port_sec_nd  ), priority=90   , match=(inport == "ls1-vm1" && eth.src == 52:54:00:c1:68:70 && ip6 && nd && ((nd.sll == 00:00:00:00:00:00 || nd.sll == 52:54:00:c1:68:70) || ((nd.tll == 00:00:00:00:00:00 || nd.tll == 52:54:00:c1:68:70)))), action=(next;)
table=2 (ls_in_port_sec_nd  ), priority=90   , match=(inport == "ls1-vm2" && eth.src == 52:54:00:c1:68:72 && arp.sha == 52:54:00:c1:68:72), action=(next;)
table=2 (ls_in_port_sec_nd  ), priority=90   , match=(inport == "ls1-vm2" && eth.src == 52:54:00:c1:68:72 && ip6 && nd && ((nd.sll == 00:00:00:00:00:00 || nd.sll == 52:54:00:c1:68:72) || ((nd.tll == 00:00:00:00:00:00 || nd.tll == 52:54:00:c1:68:72)))), action=(next;)
table=2 (ls_in_port_sec_nd  ), priority=80   , match=(inport == "ls1-vm1" && (arp || nd)), action=(drop;)
table=2 (ls_in_port_sec_nd  ), priority=80   , match=(inport == "ls1-vm2" && (arp || nd)), action=(drop;)
table=2 (ls_in_port_sec_nd  ), priority=0    , match=(1), action=(next;)

//table 3 - 12 没有啥设置,都是过
table=3 (ls_in_pre_acl      ), priority=0    , match=(1), action=(next;)
table=4 (ls_in_pre_lb       ), priority=0    , match=(1), action=(next;)
table=5 (ls_in_pre_stateful ), priority=100  , match=(reg0[0] == 1), action=(ct_next;)
table=5 (ls_in_pre_stateful ), priority=0    , match=(1), action=(next;)
table=6 (ls_in_acl          ), priority=0    , match=(1), action=(next;)
table=7 (ls_in_qos_mark     ), priority=0    , match=(1), action=(next;)
table=8 (ls_in_lb           ), priority=0    , match=(1), action=(next;)
table=9 (ls_in_stateful     ), priority=100  , match=(reg0[1] == 1), action=(ct_commit(ct_label=0/1); next;)
table=9 (ls_in_stateful     ), priority=100  , match=(reg0[2] == 1), action=(ct_lb;)
table=9 (ls_in_stateful     ), priority=0    , match=(1), action=(next;)
table=10(ls_in_arp_rsp      ), priority=0    , match=(1), action=(next;)
table=11(ls_in_dhcp_options ), priority=0    , match=(1), action=(next;)
table=12(ls_in_dhcp_response), priority=0    , match=(1), action=(next;)

//table 13 会根据mac地址查找该发到哪里
table=13(ls_in_l2_lkup      ), priority=100  , match=(eth.mcast), action=(outport = "_MC_flood"; output;)
table=13(ls_in_l2_lkup      ), priority=50   , match=(eth.dst == 52:54:00:c1:68:50), action=(outport = "ls1-lr1"; output;)
table=13(ls_in_l2_lkup      ), priority=50   , match=(eth.dst == 52:54:00:c1:68:70), action=(outport = "ls1-vm1"; output;)
table=13(ls_in_l2_lkup      ), priority=50   , match=(eth.dst == 52:54:00:c1:68:72), action=(outport = "ls1-vm2"; output;)

Datapath: "ls1" (94b0f9b2-a91e-435b-8db2-01dec1acbf3d)  Pipeline: egress

//table 0 - 7 都没啥设置
table=0 (ls_out_pre_lb      ), priority=0    , match=(1), action=(next;)
table=1 (ls_out_pre_acl     ), priority=0    , match=(1), action=(next;)
table=2 (ls_out_pre_stateful), priority=100  , match=(reg0[0] == 1), action=(ct_next;)
table=2 (ls_out_pre_stateful), priority=0    , match=(1), action=(next;)
table=3 (ls_out_lb          ), priority=0    , match=(1), action=(next;)
table=4 (ls_out_acl         ), priority=0    , match=(1), action=(next;)
table=5 (ls_out_qos_mark    ), priority=0    , match=(1), action=(next;)
table=6 (ls_out_stateful    ), priority=100  , match=(reg0[1] == 1), action=(ct_commit(ct_label=0/1); next;)
table=6 (ls_out_stateful    ), priority=100  , match=(reg0[2] == 1), action=(ct_lb;)
table=6 (ls_out_stateful    ), priority=0    , match=(1), action=(next;)
table=7 (ls_out_port_sec_ip ), priority=0    , match=(1), action=(next;)

//table 8 主要是根据mac进行发送了
table=8 (ls_out_port_sec_l2 ), priority=100  , match=(eth.mcast), action=(output;)
table=8 (ls_out_port_sec_l2 ), priority=50   , match=(outport == "ls1-lr1"), action=(output;)
table=8 (ls_out_port_sec_l2 ), priority=50   , match=(outport == "ls1-vm1" && eth.dst == {52:54:00:c1:68:70}), action=(output;)
table=8 (ls_out_port_sec_l2 ), priority=50   , match=(outport == "ls1-vm2" && eth.dst == {52:54:00:c1:68:72}), action=(output;)

最后查看逻辑交换机2的逻辑流表,和以上类似,这里不做标注

Datapath: "ls2" (85cd1eaf-5f4c-45e2-882b-11d0f96aba9e)  Pipeline: ingress

table=0 (ls_in_port_sec_l2  ), priority=100  , match=(eth.src[40]), action=(drop;)
table=0 (ls_in_port_sec_l2  ), priority=100  , match=(vlan.present), action=(drop;)
table=0 (ls_in_port_sec_l2  ), priority=50   , match=(inport == "ls2-lr1"), action=(next;)
table=0 (ls_in_port_sec_l2  ), priority=50   , match=(inport == "ls2-vm3" && eth.src == {52:54:00:c1:68:71}), action=(next;)
table=0 (ls_in_port_sec_l2  ), priority=50   , match=(inport == "ls2-vm4" && eth.src == {52:54:00:c1:68:73}), action=(next;)
table=1 (ls_in_port_sec_ip  ), priority=0    , match=(1), action=(next;)
table=2 (ls_in_port_sec_nd  ), priority=90   , match=(inport == "ls2-vm3" && eth.src == 52:54:00:c1:68:71 && arp.sha == 52:54:00:c1:68:71), action=(next;)
table=2 (ls_in_port_sec_nd  ), priority=90   , match=(inport == "ls2-vm3" && eth.src == 52:54:00:c1:68:71 && ip6 && nd && ((nd.sll == 00:00:00:00:00:00 || nd.sll == 52:54:00:c1:68:71) || ((nd.tll == 00:00:00:00:00:00 || nd.tll == 52:54:00:c1:68:71)))), action=(next;)
table=2 (ls_in_port_sec_nd  ), priority=90   , match=(inport == "ls2-vm4" && eth.src == 52:54:00:c1:68:73 && arp.sha == 52:54:00:c1:68:73), action=(next;)
table=2 (ls_in_port_sec_nd  ), priority=90   , match=(inport == "ls2-vm4" && eth.src == 52:54:00:c1:68:73 && ip6 && nd && ((nd.sll == 00:00:00:00:00:00 || nd.sll == 52:54:00:c1:68:73) || ((nd.tll == 00:00:00:00:00:00 || nd.tll == 52:54:00:c1:68:73)))), action=(next;)
table=2 (ls_in_port_sec_nd  ), priority=80   , match=(inport == "ls2-vm3" && (arp || nd)), action=(drop;)
table=2 (ls_in_port_sec_nd  ), priority=80   , match=(inport == "ls2-vm4" && (arp || nd)), action=(drop;)
table=2 (ls_in_port_sec_nd  ), priority=0    , match=(1), action=(next;)
table=3 (ls_in_pre_acl      ), priority=0    , match=(1), action=(next;)
table=4 (ls_in_pre_lb       ), priority=0    , match=(1), action=(next;)
table=5 (ls_in_pre_stateful ), priority=100  , match=(reg0[0] == 1), action=(ct_next;)
table=5 (ls_in_pre_stateful ), priority=0    , match=(1), action=(next;)
table=6 (ls_in_acl          ), priority=0    , match=(1), action=(next;)
table=7 (ls_in_qos_mark     ), priority=0    , match=(1), action=(next;)
table=8 (ls_in_lb           ), priority=0    , match=(1), action=(next;)
table=9 (ls_in_stateful     ), priority=100  , match=(reg0[1] == 1), action=(ct_commit(ct_label=0/1); next;)
table=9 (ls_in_stateful     ), priority=100  , match=(reg0[2] == 1), action=(ct_lb;)
table=9 (ls_in_stateful     ), priority=0    , match=(1), action=(next;)
table=10(ls_in_arp_rsp      ), priority=0    , match=(1), action=(next;)
table=11(ls_in_dhcp_options ), priority=0    , match=(1), action=(next;)
table=12(ls_in_dhcp_response), priority=0    , match=(1), action=(next;)
table=13(ls_in_l2_lkup      ), priority=100  , match=(eth.mcast), action=(outport = "_MC_flood"; output;)
table=13(ls_in_l2_lkup      ), priority=50   , match=(eth.dst == 52:54:00:c1:68:60), action=(outport = "ls2-lr1"; output;)
table=13(ls_in_l2_lkup      ), priority=50   , match=(eth.dst == 52:54:00:c1:68:71), action=(outport = "ls2-vm3"; output;)
table=13(ls_in_l2_lkup      ), priority=50   , match=(eth.dst == 52:54:00:c1:68:73), action=(outport = "ls2-vm4"; output;)

Datapath: "ls2" (85cd1eaf-5f4c-45e2-882b-11d0f96aba9e)  Pipeline: egress

table=0 (ls_out_pre_lb      ), priority=0    , match=(1), action=(next;)
table=1 (ls_out_pre_acl     ), priority=0    , match=(1), action=(next;)
table=2 (ls_out_pre_stateful), priority=100  , match=(reg0[0] == 1), action=(ct_next;)
table=2 (ls_out_pre_stateful), priority=0    , match=(1), action=(next;)
table=3 (ls_out_lb          ), priority=0    , match=(1), action=(next;)
table=4 (ls_out_acl         ), priority=0    , match=(1), action=(next;)
table=5 (ls_out_qos_mark    ), priority=0    , match=(1), action=(next;)
table=6 (ls_out_stateful    ), priority=100  , match=(reg0[1] == 1), action=(ct_commit(ct_label=0/1); next;)
table=6 (ls_out_stateful    ), priority=100  , match=(reg0[2] == 1), action=(ct_lb;)
table=6 (ls_out_stateful    ), priority=0    , match=(1), action=(next;)
table=7 (ls_out_port_sec_ip ), priority=0    , match=(1), action=(next;)
table=8 (ls_out_port_sec_l2 ), priority=100  , match=(eth.mcast), action=(output;)
table=8 (ls_out_port_sec_l2 ), priority=50   , match=(outport == "ls2-lr1"), action=(output;)
table=8 (ls_out_port_sec_l2 ), priority=50   , match=(outport == "ls2-vm3" && eth.dst == {52:54:00:c1:68:71}), action=(output;)
table=8 (ls_out_port_sec_l2 ), priority=50   , match=(outport == "ls2-vm4" && eth.dst == {52:54:00:c1:68:73}), action=(output;)

其实这个逻辑流表是集中式的,是根据逻辑拓扑转换而来的流表,这个转换细节我们没去看,但是逻辑还是很清晰的,就是根据我们针对逻辑拓扑的配置,来确定哪些报文发送到哪里。而比较困难的就是这些逻辑流表怎么能转换到各个宿主机,每个宿主机都是不一样的流表。

最后修改:2021 年 08 月 20 日
如果觉得我的文章对你有用,请随意赞赏