The previous post mentioned that February was still full of various “maintenance” tasks, mainly around the backports, and the preparation of the future Linux 6.9. The beginning of March was similar to that, then more time was finally available to look at fixing issues, and preparing new features. Read on to find out more about what happened in March!

The future v6.9 and backports

Linux v6.8 was released on March 10th. As mentioned in my previous post, we had up to this date to suggest new features, and refactoring to be included in net-next tree before being closed for new submissions. We took this opportunity to send a last feature for the future v6.9 (TCP_NOTSENT_LOWAT socket option support from Paolo) one week before, and a bunch of refactoring in the selftests initiated by Geliang, a few days before the limit. We usually don’t like to rush things just before the closure, but it generally helps to reduce the maintenance cost to send big refactoring early, than having to carry it only in our tree for a bit of time.

This has been done while in parallel, I was also helping the stable team backporting even more patches which could not be applied without conflicts in stable versions. Pretty much the same as what was done in February, indeed, not that interesting then :)

CI: a big step forward

With more available time, this allows me to work on the long awaited tasks linked to the CI:

GitHub Actions and KVM support

Back in December, when the switch to GitHub Actions started, it was not possible to enable KVM support with public runners. That was the main reason behind choosing Cirrus CI a few years ago, and keeping it for the tests with the debug kernel config a few months ago. As described in the previous post, our workflow was impacted by Cirrus CI’s monthly limit, and it was the reason behind this partial switch to GitHub Actions. Moving only the tests with a non-debug kernel config was not enough, we were still impacted by that: the monthly limit was reached on the 31st of January, and on the 16th of February. Another solution was then required.

I was then looking at adding a self-hosted runner. I managed to successfully execute the tests on a self-hosted runner which was a refurbished mini PC at home. I then realised that was not enough: KVM was still not used, because the docker image is not executed with enough permissions (--privileged, or --cap-add + mount).

I knew from a GitHub blog post from last year that it was possible to have KVM support, so I tried to find a way to use it with our “Docker container actions”, like they do in reactivecircus/android-emulator-runner. Then I found out that since January this year, it is possible to have KVM support with the Linux public GitHub runners! So no need to host and maintain that at home with a limited Internet connection! Plus it means there is no need to restrict these tests to patches sent on our mailing list, people can have results from the CI simply by sending code to their GitHub fork repo!

So I:

With KVM support, the CPU usage is reduced and no longer near the 100% limit, so our tests are more stable. Dropping Cirrus-CI support with a bunch of pretty much duplicated code is helpful for the maintenance in the long term.

BPF Tests

MPTCP BPF tests are present in the Linux kernel since 2022 (they were already in our tree in August 2020, but the development got interrupted). Back then, the tests were limited to the available features: being able to read fields from an MPTCP socket and checking if a TCP socket is an MPTCP subflow. With this, it is possible to monitor MPTCP connections, and even interact with them, e.g. by changing socket options per subflow. Later, mptcpify BPF program has been added to force the creation of MPTCP sockets instead of TCP ones.

Until recently, these tests – and the ones for the work-in-progress MPTCP BPF packet schedulers – were not validated by our CI. We didn’t track regressions in this area. With the help of Geliang, our CI scripts have been adapted to run these tests. Recently, I added a “matrix” support on GitHub Action to be able to run these tests requiring more kernel config options in a dedicated runner.

Virtme NG

Virtme is very useful to quickly run a VM with a custom kernel, and using the file system of the host (or in our case, the one of a container containing all required dependences). We have been using it since 2019, and we were happy with it.

In 2020, it looks like this Virtme project started to get unmaintained. In December 2022, we had to patch it to support kernels >= 6.2. More recently, another patch was required to support QEmu >= 7.2. Andrea Righi started to gather different fixes on his side, before creating the virtme-ng project in 2023.

virtme-ng brings interesting features introduced in this nice LWN article. Switching to it would reduce the boot time, and reduce a lot the I/O thanks to virtiofs. So that’s what we did recently. It should also help us for the long term maintenance.

Tracking regressions

Since we use a public CI, results are simply published on an IRC channel (#mptcp-ci). This is not really easy to track regressions.

Publish Test Results GitHub Action has been added, but it doesn’t keep a long history of results.

A new “Flakes” has then been created to help us to track unstable tests. It is similar to Netdev’s Flakes page (with dark scheme support :) ).

It is a shame such service is not better integrated in GitHub Actions. In a perfect world where tests are all stable, it should not be needed. But here, when hosts need to talk to each other, packets can be delayed for some reason, causing retransmissions, etc. It is not easy to predict everything. The cURL project is using TestClutch, but it is an external service to deploy, and it doesn’t support the TAP format yet.

What’s next?

Big work has been started to rewrite mptcp.dev website. When working on adding native MPTCP support to apps like lighttpd and curl, it was clear that a website gathering all required info to know about MPTCP to set it up, and to add its support in apps were missing. (Note: our website was updated on the 18th of April, it was looking like this before.)

Publishing a doc in the kernel official documentation will also help end-users and app developers.

In terms of developments, the next priorities are adding missing features to have MPTCP enabled by default in Go.

Team work

As always, it is important to note that what I presented here so far is mostly what I was working on. But I’m not alone in this project. For example, Geliang continued to do some clean-ups in the KSelfTests, looked at the MPTCP support in IPerf3, and started to look at adding “last time” counters in MPTCP_INFO. Mat and Paolo helped with the reviews, and Christoph looked at running fuzzing tests on top of the last RHEL kernel.

A great community!