1 概述
Tap设备通常用于虚拟化场景下,参考如下场景:
图中标注了关键函数,以及数据流向。
tun有两个数据接口,
- file,给用户态使用;
- socket,给内核态使用,例如vhost
2 异步处理
图中,蓝色线标识的是虚机输出的网络流量,在tap设备这边,不存在异步处理,参考代码:
tun_sendmsg() / tun_chr_write_iter()-> tun_get_user()-> tun_rx_batched()-> netif_receive_skb()
如红色线部分,是输入到虚拟机的网络流量,在tap设备这边,则存在异步处理,需要wait和wakeup,参考代码:
tun_recvmsg() / tun_chr_read_iter()-> tun_do_read()-> tun_ring_recv()---ptr = ptr_ring_consume(&tfile->tx_ring);if (ptr)goto out;if (noblock) {error = -EAGAIN;goto out;}add_wait_queue(&tfile->socket.wq.wait, &wait);while (1) {set_current_state(TASK_INTERRUPTIBLE);ptr = ptr_ring_consume(&tfile->tx_ring);if (ptr)break;...schedule();}__set_current_state(TASK_RUNNING);remove_wait_queue(&tfile->socket.wq.wait, &wait);---tun_net_xmit()
---if (ptr_ring_produce(&tfile->tx_ring, skb))goto drop;/* NETIF_F_LLTX requires to do our own update of trans_start */queue = netdev_get_tx_queue(dev, txq);queue->trans_start = jiffies;/* Notify and wake up reader process */if (tfile->flags & TUN_FASYNC)kill_fasync(&tfile->fasync, SIGIO, POLL_IN);tfile->socket.sk->sk_data_ready(tfile->socket.sk);
---sock_def_readable()
---rcu_read_lock();wq = rcu_dereference(sk->sk_wq);if (skwq_has_sleeper(wq))wake_up_interruptible_sync_poll(&wq->wait, EPOLLIN | EPOLLPRI |EPOLLRDNORM | EPOLLRDBAND);sk_wake_async(sk, SOCK_WAKE_WAITD, POLL_IN);rcu_read_unlock();
---By default, sk->sk_wq is socket->wq,参考sock_init_data()
vhost只是数据使用了socket接口,在等待来自tap的数据时,它使用的了poll,
vhost_net_enable_vq()
---sock = vhost_vq_get_backend(vq);if (!sock)return 0;return vhost_poll_start(poll, sock->file);
---tun_chr_poll()
---sk = tfile->socket.sk;poll_wait(file, sk_sleep(sk), wait);...
---vhost_poll_init()
---init_waitqueue_func_entry(&poll->wait, vhost_poll_wakeup);
---
sk_sleep()就是sk->sk_wq,在sk_def_readable()会对其执行唤醒操作,进而调用vhost_poll_wakeup(),后者会提交一个vhost work,执行handle_rx操作。