Linux's Bedtime Routine

206 points by JNRowe 10 months ago

nyanpasu64 10 months ago

I've done some digging in Linux power management a while ago, while debugging a (not-fully-fixed) Linux AMDGPU dGPU crash on low memory (https://gitlab.freedesktop.org/drm/amd/-/issues/2362). Along the way, I discovered that you can hibernate both through /sys/power/disk, and the userland snapshot/hibernate/suspend interface (https://docs.kernel.org/power/userland-swsusp.html, snapshot_ioctl()). IIRC these two mechanisms go along quite different codepaths internally.

The specific crash bug I encountered was because Linux calls pm_restrict_gfp_mask() to prevent swapping to disk, before dpm_prepare() (the first opportunity for a GPU driver to backup VRAM to system RAM before the PCIe GPU is shut down and VRAM is lost). So if you don't have enough free system RAM to hold all VRAM, the sleep is aborted midway through (waking the system) or produces a failed memory allocation later during sleep or resume (often resulting in undefined system state, halted network or USB controllers, or worst yet a halted NVMe controller resulting in the system running around like a headless chicken unable to load data from disk or even log data to the journal). I'm wondering if this was a deliberate decision or an unforeseen interaction between suspend-time GFP masks and GPU drivers.

It seems Nvidia can't reliably backup VRAM either without being informed by systemd prior to the kernel initiating suspend (https://download.nvidia.com/XFree86/Linux-x86_64/560.35.03/R...).

nubinetwork 10 months ago

In a way, I'm not surprised... when I reboot my systemd-based servers, it almost seems like (based on speed) that it didn't do anything to the running services and filesystems, and just tells the kernel to immediately reboot.
- zokier 10 months ago
  
  Why guess when the shutdown(/reboot) process is explicitly documented:
  > Shortly before executing the actual system power-off/halt/reboot/kexec systemd-shutdown will run all executables in /usr/lib/systemd/system-shutdown/ and pass one arguments to them: either "poweroff", "halt", "reboot", or "kexec", depending on the chosen action. All executables in this directory are executed in parallel, and execution of the action is not continued before all executables finished. Note that these executables are run after all services have been shut down, and after most mounts have been unmounted (the root file system as well as /run/ and various API file systems are still around though).
  https://www.freedesktop.org/software/systemd/man/devel/syste...
  - bbarnett 10 months ago
    
    Yup. And unlike sysvinit Debian systems I run (hundreds under bookworm), systemd init systems (I run thousands) require all sorts of workarounds for this sort of behaviour. I get VMs not rebooting due to NFS umount failures, VMs not logging shutdown info because rsyslogd is terminated too early, literally endless issues.
    Killing services without a proper TERM and wait prior to -9 is only one of the wonderful shortcomings I find with systemd.
    
    zokier 10 months ago
    
    TERM, wait, KILL is exactly what systemd does by default, and it's again configurable (and documented):
    > If no ExecStop= commands are specified, the service gets the SIGTERM immediately. This default behavior can be changed by the TimeoutStopFailureMode= option. Second, it configures the time to wait for the service itself to stop. If it doesn't terminate in the specified time, it will be forcibly terminated by SIGKILL (see KillMode= in systemd.kill(5)).
    https://www.freedesktop.org/software/systemd/man/latest/syst...
    https://www.freedesktop.org/software/systemd/man/latest/syst...

amelius 10 months ago

I suspend my Linux box every night. However, I notice that after some 30-50 times the machine freezes at a random point when using the machine (even the num-lock led stops working). Curious if others have the same experience, and if it's Linux-related or a problem of my particular hardware.

sillystuff 10 months ago

I suspend my computer, every time I walk away from it, and it is not performing a long-running process. Often 10+ times per day and have never experienced an issue with suspend/resume after working out these two issues:
1) I had to add a hook script to unload the module for the Intel AX210 wireless adapter on suspend, and re-load on resume. Before doing that, the laptop would crash every few suspend cycles. And, would crash on hibernate, every time. This issue may have been addressed with later kernels/firmware, but I've never re-visited. For systemd, hook scripts go into: /usr/lib/systemd/system-sleep/.
2) when experimenting with rocm (unsupported on my igpu, but gave it a try anyway), after running rocminfo, the system would resume with a black/blank display, and nothing I tried got the display back. I never got rocm fully working on my laptop, so the solution for this one was simple.
Every laptop I've owned since the mid-90s has had 100% reliable suspend/resume on Linux. Sometimes, it "just worked". Sometimes it took some investigation upfront to work out an issue (e.g., used to run swsusp for suspend to work around issues with ATI gpus with kernel suspend prior to kernel mode switching KMS), but after this initial futzing, it was always 100% reliable.
There is debug logging you can enable to help track down suspend/resume issues and also entries in debugfs, but you may have to resort to trial and error, to track down the issue.
- netcoyote 10 months ago
  
  I want to give you props for one of the most Linux guru answers I’ve seen.
  “Oh yeah, Linux is entirely reliable on suspend/resume if you simply diagnose which of the one-hundred odd system services and drivers is not working, and write a script to give it an extra kick.”
  I’m not making fun — I do the same types of stuff as a Linux-lover too!
  On Windows, you just call IT and ask for a new machine under those circumstances!
tasn 10 months ago

I have had the same thing before (but no longer) and haven't been able to find anything online about it. Glad to see I'm not the only one!
Same thing, after 30-50 suspends it freezes randomly. I'm pretty sure it's Linux related as it was fixed on a system upgrade and regressed after another. (Works now with latest Arch)
gkhartman 10 months ago

I've noticed the same thing. It's the delay between waking up and freezing that makes it so frustrating.
I'm using Pop_OS on an AMD 7950x + Nvidia RTX 4080. I'm using the proprietary Nvidia drivers, so I blamed that initially, but I see another comment mentioning that this only happens on their AMD systems, so maybe they are not the culprit this time.
The only solution I've found is to disable sleep/suspend entirely.
not_your_vase 10 months ago

I do see this on my AMD systems (4700U + 5700G) - but not in Intel ones interestingly.

heavyset_go 10 months ago

This is a great write up that goes deeper than I expected it to. Glad to have seen it.

pino82 10 months ago

I just read the first few lines so far. They play around a lot with strings, compared to the fact that it's not about word processing but power management.

I'm not a developer on this system level of things. When you usually try to write 'nice' code, you are somewhat surprised about concerns like "convert the last space to a newline" there.

Yes, I know, everything is a file, and this is just the other side of this odd ancient paradigm.

To me it looks tedious. But, well, could be that this is just for me, because I'm not used to it. Maybe it's not a problem at all once you are deeper inside it.

But even from a logical perspective, it is funny: There is a file that contains all available sleep modes. Once you write a particular one into the same file (let's say you open it in a text editor and remove all states but one and then save), the system goes into that sleep mode.

Yes, I know, operating systems are different from a tiny web service in Python (and even there you start tricking around with weird http concepts instead)... It was just an observation.

telgareith 10 months ago

I'm not sure where you're coming from here.
"Everything is a file" is literally part of the design philosophy.
Thats all it is: design philosophy. Well, besides improper string termination being the root of a staggering number of vulnerabilities.
There's nothing keeping somebody in either windows or linux from writing kernel code/drivers that takes syscalls instead of text, or text instead of syscalls. Except microsoft and the linux community would both decline to include it.
- pino82 10 months ago
  
  At first glance, the Redmond philosophy looks better to me here. I know that they made a lot of marketing around it (files vs APIs). And parts of that is just marketing bs, but there is some truth imho. What is all the string overhead really for? Isn't the client side equally tedious? You write e.g. weird shell scripts that sed/awk/grep some files from procfs or sysfs, spend a lot of time into string parsing, and then there are also corner cases where it fails (sometimes it's enough to have a space in some file names). What do I actually get back from all that complexity? There is probably something; I just haven't recognized it so far.
  I'm asking that as a Linux-only-since-two-decades user btw.
  - ahartmetz 10 months ago
    
    > What do I actually get back from all that complexity?
    Very easy experimentation / exploration, extremely rapid prototyping and one-off scripts.
    I have written a GUI program to show memory pages of a process using procfs, it was fine. About two days of effort to parse and piece together data from obscure procfs files. A well-documented API with example code would have been faster I guess, but a text format with minimal documentation is OK.
    
    jdiez17 10 months ago
    
    I like Drew's racecar analogy (from https://drewdevault.com/2021/12/05/What-desktop-Linux-needs....): Linux is a high-performance F1 car intended mostly for advanced users, other operating systems are like an SUV.
    
    lproven 10 months ago
    
    That is an interesting comparison, from at least 2 different angles.
    1. As a performance car: it's not a particularly high-performance OS compared to a lot of much smaller simpler OSes, such as RTOSes, but also including some former contemporaries (e.g. RISC OS, Symbian, or late-era OS/2). Last year I installed, updated and briefly used Windows XP64 as a desktop OS on some fairly late hardware it can support: a Core 2 Duo with 8GB of RAM, a discrete GPU, and an SSD.
    https://www.theregister.com/2023/07/24/dangerous_pleasures_w...
    It is amazingly fast and responsive compared even to the lightweight end of modern Linux, such as Crunchbang++, Bodhi Linux, or Q4OS. It's also faster and more responsive than OpenBSD or NetBSD.
    The only thing that came close was Alpine Linux and XP64 still wins.
    So, I think as a model of screaming fast performance vehicle is poor: it's not. As Neal Stephenson put it in _In The Beginning Was The Command Line_
    https://web.stanford.edu/class/cs81n/command.txt
    ... it's a sort of super-efficient amphibious armoured car: big, fat, ugly, but can do anything on anything, will get you there, costs nothing and runs on anything (I am thinking of "Mr Fusion" from Back to the Future here.)
    2. So how does it compare to an F1 car? Well, it's fiddly and delicate and complicated and only an expert can drive it.
    It can be easy and nigh-on foolproof. Look at Android or ChromeOS. They are barely recognisable as Linux but they're billion-selling consumer OSes.
    But it doesn't compare well to the performance aspects at all, IMHO. It's the ultimate Swiss army knife: can be used for anything but as a result it's huge and ugly and complicated and won't fit in any pocket.
- nyanpasu64 10 months ago
  
  The funny thing is that Linux does have an alternative ioctl-based suspend interface (https://docs.kernel.org/power/userland-swsusp.html)... with an incompatible API and different purpose from the string-based one...
- p_l 10 months ago
  
  That said, it could have been handled better than have from scratch string handling in every driver.
  Compare Plan 9's getfields or approach from 9front https://man.9front.org/9/parsecmd
- LegionMammal978 10 months ago
  
  > Except microsoft and the linux community would both decline to include it.
  And yet the number of syscalls and ioctls expands nonetheless. E.g., in Linux, they just last year added the listmount() and statmount() syscalls, even though they return substantially the same information as /proc/self/mountinfo, since the latter simply can't be queried as flexibly.
  - pino82 10 months ago
    
    Cooool! Thx! I recently searched precisely for that, but was unable to find anything.
mastax 10 months ago

There was an article a while back, I think it was Marc talking about OpenBSDs pledge(), which argued that just taking a string argument of space separated flags is better than the traditional enum flags argument for a syscall. Sort of orthogonal but I found it very persuasive, even as someone who also breaks out in hives when I see my kernel full of stringly typed APIs.
pino82 10 months ago

PS: Yes, I know, if it would be a web api, I'd need to do play the same games with strings there.
But there it's at least obvious why it is needed (and it would also be less tedious in a modern scripting language, compared to dealing with string in raw C).
Again, I'm not really complaining. I'm just wondering whether one would still solve it in the same today in a brand new OS.
- baq 10 months ago
  
  Microsoft has powershell and it’s a properly good tool for manipulating objects.
  Strings are easiest to manipulate using string manipulation tools, which unix/linux/posix has plenty, and no standard way to expose objects. Perfect is the enemy of the good here.
  - homebrewer 10 months ago
    
    I hope support for structured text output becomes more common. For example, the `ip` set of tools can output data in JSON, which is safe and easy to destructure and extract whatever fields you need. Seems like a nice middle ground.
    % ip --json link % ip --json addr
    etc.
jdiez17 10 months ago

Not sure why people are downvoting you. Sure, it can seem surprising at first to see string management in the PM side of the Linux kernel. But the advantages of almost-everything-is-a-file are worth it.