macOS has a 49.7-day networking time bomb built in that only a reboot fixes — comparison operation on unreliable time value stops machines dead in their tracks

macOS has a 49.7-day networking time bomb built in that only a reboot fixes — comparison operation on unreliable time value stops machines dead in their tracks

According to Photon, the current mitigation is a reboot, although the team says it's working on an alternative solution. They also found this issue to be the source of some bugs discussed online in the Apple Community forums, too. The long-existing RFC 7323 specifies what should happen to the timestamp clock (tcp_now) when it reaches its limit, but Apple's kernel performs an incorrect implementation. It's safe to say this issue will likely be fixed quickly—and hopefully before 49.7 days after the report.

Follow Tom's Hardware on Google News , or add us as a preferred source , to get our latest news, analysis, & reviews in your feeds.

Bruno Ferreira is a contributing writer for Tom's Hardware. He has decades of experience with PC hardware and assorted sundries, alongside a career as a developer. He's obsessed with detail and has a tendency to ramble on the topics he loves. When not doing that, he's usually playing games, or at live music shows and festivals. ","collapsible":{"enabled":true,"maxHeight":250,"readMoreText":"Read more","readLessText":"Read less"}}), "https://slice.vanilla.futurecdn.net/13-4-20/js/authorBio.js"); } else { console.error('%c FTE ','background: #9306F9; color: #ffffff','no lazy slice hydration function available'); } Bruno Ferreira Social Links Navigation Contributor Bruno Ferreira is a contributing writer for Tom's Hardware. He has decades of experience with PC hardware and assorted sundries, alongside a career as a developer. He's obsessed with detail and has a tendency to ramble on the topics he loves. When not doing that, he's usually playing games, or at live music shows and festivals.

JamesJones44 I'm not so sure I agree with their analysis, at least not 100%. I have a 2018 Mac Mini running as a reverse proxy for a VPN that never gets rebooted. In fact the last time it was rebooted was December 15th when I upgraded it to macOS 26. It has been running and serving the reverse proxy for 113 days. Perhaps this only affects a subset of macOS hardware, but Photon is claiming any Mac which my 2018 Mac Mini seems to disprove. Reply

CelicaGT I never reboot my Macs, excepting for updates. What is the author doing to require a reboot every few weeks? This isn't Windows '98… Reply

Rhongomiant I have definitely seen issues like like what is described here, but unfortunately they are not limited to this 49 day issue. I haven’t tracked uptime days when these issues have occurred, but currently I have an issue where the screen share session from a MBP 16” M3 to a MBA 15” M4 drops and reconnects at ~10 minutes of use or less. This started yesterday. The day before and every day before that was fine. The MBP has been up for 16.7 days and the MBA has been up for 7.5 days. I operate hundreds of Linux servers since 2008 and have had to reboot servers less than I have had to reboot my Macs for issues like this. I haven’t used Windows on the regular since 2015 and never experienced this issue and haven’t heard of people that use it having like issues then or now. I didn’t have these issues until I got an M series Mac. My MBP 2019 didn’t have it when I used it regularly, so it must be something or things that have been added or updated in more recent versions of macOS and maybe with specific hardware at either the hardware or code level, Intel code vs Arm code. Reply

PEnns From the article, it seems this issue is confined to Photon and its "fleet of Macs'" (somebody correct me if I am wrong). My take on this: ANY disgruntled 3rd rate coder / administrator can write a Date() based routine that will cause issues say, starting every 49 days after they have been let go…. If my theory = True, they're lucky their data doesn't get erased after an X amount of days and repeats till the real reason is found! Reply

palladin9479 CelicaGT said: I never reboot my Macs, excepting for updates. What is the author doing to require a reboot every few weeks? This isn't Windows '98… It's in the toms article but goes deeper when you read the linked Photon article https://photon.codes/blog/we-found-a-ticking-time-bomb-in-macos-tcp-networking "The bug we found in macOS belongs to this exact family. The XNU kernel stores its TCP timestamp as a uint32_t counting milliseconds since boot. 2³² milliseconds = 49 days, 17 hours, 2 minutes, and 47.296 seconds . After that, the counter wraps back to zero. What happens next is the subject of this post." They wrote some test code and observed and yes the MacOS kernel, at the articles time of writing, had a bug with the networking stack that can cause the TCP stack connection cleanup to break when the TCP timestamp value roles over the 32-bit counter. This is the decisive evidence. macOS TIME_WAIT timeout is 2 \00d7 MSL = 30 seconds. 84 seconds after the script stopped, all 2,828 TIME_WAIT connections should have expired to zero . Instead, not a single one was reclaimed — the count actually increased slightly as the system's own normal connections also began piling up. You can only have so many TCP connections open at once so once cleanup stops working it's merely a matter of time until you hit that limit and no more new TCP connections can be made. They even go into exactly where in the kernel code the bug is at. Reply

JamesJones44 PEnns said: From the article, it seems this issue is confined to Photon and its "fleet of Macs'" (somebody correct me if I am wrong). I feel like it has to be, there are several Mac hosting services out there (Mac Stadium for example), I find it hard to believe no other service has seen this. Reading the source article, it sounds like it has more to due with the number of connections + time vs just a raw 49.7 days. A low number of connections sounds like it wouldn't be an issue for a very long time, but rapid connections causes the issue. I could be wrong, but that is how I've read it. I'm still have my suspicions of the analysis as a whole (no doubt they found something, but is it really the problem or is it they way they are interacting with the system), seems like we would have heard of this before now, but at least the added number of connections variable makes it more plausible Reply

palladin9479 JamesJones44 said: I feel like it has to be, there are several Mac hosting services out there (Mac Stadium for example), I find it hard to believe no other service has seen this. Reading the source article, it sounds like it has more to due with the number of connections + time vs just a raw 49.7 days. A low number of connections sounds like it wouldn't be an issue for a very long time, but rapid connections causes the issue. I could be wrong, but that is how I've read it. I'm still have my suspicions of the analysis as a whole (no doubt they found something, but is it really the problem or is it they way they are interacting with the system), seems like we would have heard of this before now, but at least the added number of connections variable makes it more plausible The original article lists where in the kernel code they found the bug, it absolutely exists. When your Mac starts up it records the number of milliseconds in a 32-bit integer in the kernel. 32-bit integers have a maximum value of 4,294,967,296, so this variable can go up to 4,294,967,296 milliseconds. When it hits that number it will then roll over back to 0 and start counting up again. The bug is in the TCP connection garbage collection code of the network stack, this is where old TCP connections are closed out after a period of time. This code doesn't know how to handle that 32-bit integer rolling over and will stop working once this occurs. Once your TCP connection garbage collection stops working you will start getting zombie connections, each connection consuming a TCP socket, of which there are only 65,536 of. As connections are opened over time they will continue consuming those sockets until you hit 65,536 open sockets, when that happens your network stack will no longer be able to open new TCP connections resulting in a complete shutdown of TCP network capability. Networking will not work correctly until you restart the kernel to clear those values and restart the TCP network stack. If you are doing security updates every 30~60 days, then that reboot cycle is naturally doing this for you, especially as Mac's are not used as web servers that handle tens of thousands of TCP connections at once like you see with Apache / Nginx on linux. Photon did try to use Mac's to do this sort of task (why IDK) and tripped over this use case. Reply

abufrejoval I immediately went to this 1000Hz ticks overflowing a 32 bit integer after reading the headline: this has been an issue with just about any Unixoid since they went to 64-bit and increased the clock tick from 100 to 1000 in the process. I faced that in a production payment system which lost Oracle database access after some time. Nothing wrong with the database, nothing wrong with any of the systems, all diagnostics came back with perfect health, yet the application had lost the database and nobody could pay with their cards. Doing the thing you do when you don't know any better we restarted the application, or rather the application's OpenVZ container and things just went back to normal immediately…. until the same thing happened. I was eventually able to create and analyze a crash dump of the failing application, just before it got restarted and on its stack trace I noticed that it was failing within an Oracle client library and had gotten an error on a system call to some variant of time or utime, that evidently wasn't expected and then also not handled correctly. That syscall returned the ticks since boot and noted that ticks were measured in 100 Hz traditionally on Unix since the PDP-11; but might be 1000 Hz on Gigahertz 64-bit systems, which tended to require much higher precision, because they did so much more work, while returning a 64 bit value offered a lot more headroom. A mental groan started forming in my head as I popped up my Excel and sure enough 32-bit would overflow at close to 50 days at 1000 Hz ticks, which matched the data from the crash dump and the error code returned (but not properly handled). Just in case you're worried, there are quite a few new and far more precise time management libraries and tools on Unix these days… So anyhow, I proudly opened a bug report on Oracle, and Oracle disappointingly quickly replied that it had been fixed a few months ago and that it was time I patched our systems… Of course this must have been at least 20 years ago, but I felt pretty good having found a bug in Oracle software, without access to source code. And at the time my job as an operations team manager didn't even require any programming skills. I just happened to have a CS master degree and had done years of OS kernel and library work as part of my thesis in HPC and thus felt I should at least try, before just putting a "reboot every month" into the SOPs. Of course then came PCI-DSS and not rebooting at least once a month (after installing patches), became a security framework violation… I wonder how many more of these bugs have thus never been caught! The reason we got into that situation in the first place was like everyone else at the time, we were consolidating workloads from hundreds of small 32-bit Linux servers to far fewer 64-bit Linux servers via virtualization. Except we weren't using virtual machines, which would have avoided that issue (via a 100Hz tick kernel), but used IaaS abstraction (pre Docker) OpenVZ container, which run 32 and 64 userlands on a 64-bit kernel. And we got 70:1 consolidation with only a RAM upgrade on pre-existing machines, while VMs typically only managed 4:1 on existing hardware, if and only if that had VM hardware support built in. Of course the issue wasn't OpenVZ, any 32-bit workload on a 64-bit kernel would have had the same issue. Since I had been doing that conversion as a skunkworks project without management approavel (no budget, no ask), the pressure (and motivation) was on the high side. Reply

Pierce2623 JamesJones44 said: I'm not so sure I agree with their analysis, at least not 100%. I have a 2018 Mac Mini running as a reverse proxy for a VPN that never gets rebooted. In fact the last time it was rebooted was December 15th when I upgraded it to macOS 26. It has been running and serving the reverse proxy for 113 days. Perhaps this only affects a subset of macOS hardware, but Photon is claiming any Mac which my 2018 Mac Mini seems to disprove. I’m guessing it’s Apple Silicon Macs. Reply

CelicaGT Pierce2623 said: I’m guessing it’s Apple Silicon Macs. My M3 Air is at just over 60 days up time and no issues observed, 2020 Intel i5 Air is at 180 days (checked both in Terminal) and no issues on it either. Reading an above explanation in response to my earlier and rather glib post makes sense though I'm still not convinced this is present in all use cases? My machines sit mainly idle, one is a DD for me and the other is used by the family for general use computing. FWIW I am by no means any kind of networking expert. I can do many things on my own as a computer enthusiast but networking is just not one of them (well not without some coaching). So basically I'm not disputing the article or the researchers, it's just that my machines seem fine so it's likely I'm missing some piece of the pie here. It's fine, it's an interesting read and that's why I continue to come to Tom's. And now I'll be watching these things like a hawk lol… Reply

Key considerations

  • Investor positioning can change fast
  • Volatility remains possible near catalysts
  • Macro rates and liquidity can dominate flows

Reference reading

More on this site

Informational only. No financial advice. Do your own research.

Leave a Comment