Summer update and MPTCP features in Linux v6.18
Long time no see (or read?) as we could say! The last update was in January. Since then, we have been very busy! Read on to find out what happened around MPTCP during the last few months, and which new features will be present in the future v6.18.
Activities
In March, I was at Netdev 0x19 in Zagreb, and I had a BoF session: MPTCP: present, future, and its development workflow (CI). Do not hesitate to check the video or the slides. This session covered different aspects about MPTCP: what is MPTCP, its use-cases, and the different components. Then how easy it is to use MPTCP today with a recent and up-to-date Linux environment. There were some words about the current status, what was planned, and some discussions. There was also a second part about the development workflow, and how a CI with a specific setup can greatly help!
Soon after, I started a temporally part-time contract at UCLouvain, as a Research assistant in the IP Networking Lab. That was a great opportunity to work with excellent colleagues, learn more about current research in the academic world, contribute to different scientific research. It was also a way to get financial support for the MPTCP maintenance, plus access to some servers to run SyzKaller, an excellent kernel fuzzer, to continue finding bugs in the current implementation.
A few months ago, I got a mission to find a solution for middleboxes intercepting TCP connections, and thus forcing MPTCP to fallback to “plain” TCP. This resulted in the TCP-in-UDP eBPF program. Please check the dedicated blog post for more details about that.
In July, I presented two unrelated and independent extensions to the MPTCP protocol. The first one extends the Data-Level Length (DLL) size to allow MPTCP packets of more than 64 KB, mainly to allow internal egress packets of more than 64 KB, and improve performances in a data centre. It can also be helpful when IPv6 jumbograms packets are used. See this draft for more details. The second extension suggests using application-level keys to better secure MPTCP when establishing new subflows, announcing addresses, and resetting connections. See this other draft for more explanations about this idea. If you know a company or an actor present on the Internet interested in these extensions and can help to push them to be accepted, feel free to contact me.
Finally, it is important to note that more funding around MPTCP recently got accepted! 🎉 Thanks again NLnet for your invaluable your support!
New features
Better MPCapable
’s C-flag support on the client side
The MPTCP protocol and its implementation in the Linux kernel support
deployments behind load-balancers. This is typically used by CDNs. When a
layer-4 load-balancer is in place, it means a connection will be handled by one
server out of many placed behind it. In other words, it means multiple servers
are accepting connections to the same IP address and port. An MPTCP connection
can be composed of … multiple TCP subflows (path), and it is important to make
sure new path requests (MPJoin
) reach the right end-server. If such path
request is sent to the original IP address and port, there is a high change the
load-balancers will route it to a different end-server. To cope with that, the
MPTCP protocol
allows a host to set a flag (C-flag) in the connection request (MPCapable
) to
tell the receiver it cannot try to open any additional subflows toward this
address and port. Instead, the same host will announce a unique IP address and
port that can be used to reach the right end-server. For more details about this
case, please see this page: Deployment behind a load
balancer.
The implementation on the server side has been supported for a few years now on
Linux, and is already well-used. A server simply has to set the
net.mptcp.allow_join_initial_addr_port
sysctl knob to 0
, and add a signal
MPTCP endpoint with a dedicated IP
address and an optional port.
So far, it looks like this setup was mainly used when interacting with iOS devices, so not using the Linux kernel on the client side then. On this side, the in-kernel path-manager will respect the C flag by not establishing new paths to the initial address, but that was it. By default, in such situations with the C flag and the in-kernel path-manager, if a client has multiple interfaces, the non-primary ones were not being used to establish extra paths. This was not done because the extra interfaces are by default only used to create new paths to the initial address of the server, not allowed in this case. This was not good behaviour. A fix has been recently sent to improve this situation. Now, in this particular case, the in-kernel path-manager considers using the other MPTCP endpoints to establish new paths to the announced address.
With the userspace path-manager, the userspace daemon didn’t know when the other
peer has set this C-flag. That means it was not able to respect the protocol
when it is set. The kernel now
announces this info, and the “official” userspace daemon (mptcpd
) will support
it soon.
New laminar
endpoints
Up to Linux v6.18, upon the reception of an ADD_ADDR
(and when the fullmesh
flag was not used), the in-kernel PM was only creating new subflows using the
local address picked by the routing configuration. That works well when the
announced addresses can be predicted, but not on the Internet with servers
controlled by someone else. Instead, it is easier to pick local addresses from a
selected list of endpoints, and use them only once, than relying on routing
rules. laminar
endpoints have been
added in v6.18.
In other words, on the client side, it is now recommended to set both subflow
and laminar
flags by default. If both the client and the server sides have
multiple network interfaces they want to use, it might be interesting to use
only the laminar
flag on all client side MPTCP endpoints, and only the
signal
one on all server side MPTCP endpoints.
mptcpd: security report & improvements
Thanks to the NLnet funding, Radically Open Security B.V. did a security review of mptcpd. Thank you, Tim and Marcus, for this great work! No security issues have been found 🎉
The report mentioned one attention point: the plugin directory should not be
world writeable, not to let other pieces of code executed with extra permissions
(CAP_NET_ADMIN
). The full report is available
here.
In terms of improvements, it is good to note that mptcpd is now available in
more Linux distributions: OpenWrt, Alpine Linux, NixOS, etc. A future v0.14
version is planned, and it will include some new features around mptcpize
:
setting the GODEBUG=multipathtcp=1
environment variable, and also appending
LD_PRELOAD
if previously set, instead of overriding it. This version should
also support new laminar
endpoints, and the new deny_join_id0
parameter.
User applications
Quite a few new applications now have a dedicated option to enable MPTCP support: IPerf3, sing-box, Valkey, FreeNginx, etc. Please also note that since GoLang 1.24, all applications written in Go have MPTCP enabled by default on the server side! This includes Caddy, Traefik, Shadowsocks Go, and many more!
Miscellaneous
When working on current and future features around the path-manager, a lot of clean-ups have been done by Geliang and me. Some were required to allow new features, but others have been also added to improve the code itself by renaming variables, splitting large functions, regrouping code per purpose, etc. This might cause a bit more of attention during the backports, but it will help with the maintenance in the long term.
To help with the debugging, new MIB counters for the rejected MPJoin
and
for fallbacks to TCP have been added by Paolo and me. Some of them have been
validated by Gang when working on improving the code coverage when running the
whole test suite.
The MPTCP CI was taking more and more time
due to the addition of new tests. To accelerate the whole process, more builders
are used in parallel: now the mptcp_join
selftest is executed in a dedicated
job for the normal and debug modes. Results can now be shared after ~1h15
instead of 2h.
Performances are being improved thanks to the work from Paolo and Christoph! More work is still ongoing, and a proper perf regression lab should be put in place soon. More explanations will be shared in a later blog post.
Regarding the socket options, TCP_MAXSEG
has been added by Geliang, and an
MPTCP version of SO_MAX_PACING_RATE
from Christoph is in discussion. More work
will be done around the socket options to simplify the code and improve the
maintenance in the long term.
When an address is announced by a peer via an ADD_ADDR
, the signalling
packet carried in a TCP ACK can be lost. Up to v6.18, the retransmissions
were done after a timeout controlled by the
net.mptcp.add_addr_timeout
sysctl knob. The default value is set to 2 minutes, which is a safe choice, but
certainly too high for most use-cases. Geliang changed its behaviour to be used
as a maximum value for the timeout, and instead, the timeout now depends on the
connection’s round-trip-time (RTT) to better adapt to the situation.
Last but not least, thanks to Paolo for helping with some fixes, to Mat for the code review, and to everybody who have reported issues, sent fixes and promoted MPTCP! A great community!
Conclusion
Quite a lot of new features and improvements will be present in the future Linux kernel LTS version (v6.18)! Looking forward for even more of them in the coming months!
If you like my work and wish me to continue doing so, you can become a sponsor via LiberaPay, GitHub or Patreon.
Please contact me for professional collaborations, short or long missions, or for financial support for my contributions to the maintenance of MPTCP and various apps around it.