Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

There's really two problems as I understand it:

- Overcommit. Linux will "overcommit" memory: allocations will succeed when there's no memory, and then hang when the page is actually mapped if no physical pages are available (to my understanding.) Windows NT doesn't do this. Not sure exactly how macOS/XNU handles it.

- The OOM killer. Because allocations don't fail, to actually recover from an OOM situation the kernel will enumerate processes and try to kill ones that are using a lot of memory, by scoring them using heuristics. The big problem? If there isn't a single process hogging the memory, this approach is likely to work very poorly. As an example, consider a highly parallel task like make -j32. An individual C++ compiler invocation is unlikely to use more than a gigabyte or two of memory, so it's more likely that things like Electron apps will get caught first. The thrashing of memory combined with the high CPU consumption of compilers that are not getting killed will grind the machine to a near-complete halt. If you are lucky, then it will finally pick a compiler to kill, and set off a chain reaction that ends your make invocation.

There are solutions... Indeed, you can use quotas with cgroups. There's tools like systemd-oomd that try to provide better userspace OOM killing using cgroups. You can disable overcommit, but some software will not function very well like this as they like to allocate a ton of pages ahead of time and potentially use them later. Overcommit fundamentally improves the ability to efficiently utilize all available memory. Ultimately I think overcommit is probably a bad idea... but it is hard to come up with a zero-compromises solution that keeps optimal memory/CPU utilization but avoids pathological OOM conditions by design.



> two problems ... overcommit

Is there any other sensible way to do this though? It would be quite inefficient to constantly call mmap for additional small(ish) pieces of memory. In effect overcommit just means that until the page is actually written to it hasn't really been allocated. (Aside: I believe a malloc implementation that zero'd out blocks on allocation would fail abruptly rather than later in case that happens to be what bugs you about it.)

Additionally how do you suppose fork should be implemented efficiently? Currently it performs copy-on-write. At minimum you'd need a way to mark pages as "never going to write to these, don't reserve space for a copy". Except such an API is either very awkward to use in practice or else leaves you with some very awkward edge cases to deal with in your program logic.

> You can disable overcommit, but some software will not function very well

Yeah about that.

Chromium runs (AFAIK) 1 PID namespace per tab. On my machine right now it reports 1.1 TiB virtual memory with a little over 100 MiB resident per tab. 1.1 TiB mapped PER TAB. Of the resident I have no idea how much is actually unique (ie written to following the initial fork).

Firefox is much more reasonable at a mere 18 GiB mapped per PID.


> Chromium runs (AFAIK) 1 PID namespace per tab. On my machine right now it reports 1.1 TiB virtual memory with a little over 100 MiB resident per tab. 1.1 TiB mapped PER TAB. Of the resident I have no idea how much is actually unique (ie written to following the initial fork).

This is most likely a trick for garbage collection or memory bug hardening or both. Haskell programs also map 1tb.


A potential workaround would be to still allow giant mmaps but not hang a program when it runs out of pages and instead send a signal to it. Obviously, neither Chrome nor Firefox actually use this much memory in practice.


Rather than a workaround I think that would just be an overall better approach. Receive an actionable error when the allocation happens "for real", whether that's at an arbitrary point in user code or when malloc zeros out the block ahead of time.

However I think you'd need per-thread signal handlers for that to work sensibly. Which the kernel supports (see man 2 clone) but would require updates to (at least) posix and glibc.

It would probably also be nice to have a way to allocate pages without writing to them. Currently we have mlock but that prevents swapping which isn't desirable in this context.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: