It is no secret that much of the work on the in-kernel BPF virtual machine and associated user-space support code is being done at Facebook. But less is known about how Facebook is actually using BPF. At Kernel Recipes 2019, BPF developer Alexei Starovoitov described a bit of that work, though even he admitted that he didn't know what most of the BPF programs running there were doing. He also summarized recent developments with BPF and some near-future work.
众所周知,Facebook 在内核中的 BPF 虚拟机以及相关用户空间支持代码上投入了大量工作。但关于 Facebook 实际上是如何使用 BPF 的情况,外界知之甚少。在 2019 年的 Kernel Recipes 会议上,BPF 开发者 Alexei Starovoitov 描述了部分相关工作,尽管他也承认自己并不知道大多数正在运行的 BPF 程序到底在做什么。他还总结了 BPF 的最新进展以及近期的开发方向。
Kernels at Facebook
Facebook, he began, has an upstream-first philosophy, taken to an extreme; the company tries not to carry any out-of-tree patches at all. All work done at Facebook is meant to go upstream as soon as it practically can. The company also runs recent kernels, upgrading whenever possible. The company can move to a new kernel in a matter of days; this process could be faster, he said, except that it still takes some time to reboot thousands of servers. As of just before the talk, most of the Facebook fleet was running 4.16, with a few 4.11 machines hanging around and some at 5.2.
Facebook 的内核策略
他首先介绍了 Facebook 的“上游优先”理念,并将其发挥到了极致:公司几乎不维护任何内核树外的补丁。Facebook 的所有开发工作都力争在可行的情况下尽快提交到上游。公司也总是运行较新的内核,并尽可能升级。一旦需要,公司能在几天之内切换到新内核;他说,这个过程本来可以更快,但重启成千上万台服务器仍需要时间。在他演讲前不久,Facebook 大多数服务器运行的是 4.16 版本的内核,还有一些机器在跑 4.11,以及少数已经在跑 5.2。
He pointed out the lack of long-term-support kernels in the above list. Facebook does not plan to stay with any given kernel for a long time, so the company doesn't care about long-term support. Instead, machines are simply upgraded to whichever kernel is available. Within a given version, though, there can be a fair amount of variation across the fleet; the kernel team evidently backports features into older kernels when the need arises. That can create challenges for applications and, especially, BPF-based applications.
他指出,上述列表中缺少长期支持(LTS)内核。Facebook 并不打算长期停留在某个固定内核版本上,因此对长期支持并不关心。相反,服务器只会直接升级到可用的最新内核。但在同一个版本内部,Facebook 的服务器之间可能