epoll_create和fork顺序的影响

CIC Oct 25, 2018

## 先epoll_create再fork 首先epoll_create产生了一棵红黑树在内核cache中，然后进程进行了fork操作，父子进程都持有指向同一个epoll的文件描述符，当epoll监听的socket三次握手完毕，将会同时通知两个进程进行accept，最终当然只有一个进程accept成功，目前到这里还都正常工作。但是accept完后要在epoll中注册新的fd的时候，由于红黑树是同一棵，那么我们假设父进程accept成功并获得了一个新的fd，并对他使用epoll_ctl加入红黑树，当这个新的fd有可读事件时，内核再次唤醒两个进程，但是对于子进程，他并没有accept这样的一个fd，所以这时候的子进程就会出现错误。

先fork再epoll_create

先fork再epoll的话底层是多棵红黑树，这时候如果不用SO_REUSEPORT选项的话，会有惊群现象，例如accept请求来时，多个进程都对这个socket注册了感兴趣的event，所以所有进程都会被唤醒（如果消息处理速度很快的话，有可能不会有惊群现象，因为再准备通知其他进程的时候消息已经处理完毕），这个时候要解决epoll下的accept惊群的思路有两个

使用SO_REUSEPORT
使用EPOLLEXCLUSIVE选项。该选项是在linux 4.5+的版本中引入的，需要在epoll_ctl加入server的fd时指定epoll_event中的events选项与EPOLLEXCLUSIVE进行或操作即可。

libev的文档中提到部分跟这个相关的内容：
The epoll mechanism deserves honorable mention as the most misdesigned of the more advanced event mechanisms: mere annoyances include silently dropping file descriptors, requiring a system call per change per file descriptor (and unnecessary guessing of parameters), problems with dup, returning before the timeout value, resulting in additional iterations (and only giving 5ms accuracy while select on the same platform gives 0.1ms) and so on. The biggest issue is fork races, however - if a program forks then I parent and child process have to recreate the epoll set, which can take considerable time (one syscall per file descriptor)
and is of course hard to detect.

因此，尽量在fork后再进行epoll_create，如果epoll_create一定要先进行，那建议在子进程中关闭最初的epoll。