Transition to using 16 KB page sizes for Android apps and games

android-developers.googleblog.com

92 points by ingve 5 days ago

mkw5053 a day ago

I used to work closely with the Android team at Unity, and in my experience, shifting large native codebases to a new page size often uncovers subtle runtime assumptions beyond just replacing hardcoded constants like PAGE_SIZE. I’m optimistic Google’s tooling will help a lot, but interested about how effectively it catches these more nuanced compatibility issues like custom allocators or memory pooling tuned for 4K boundaries.

Twirrim 13 hours ago

Someone I collaborate with has been having all sorts of fun with a 4k->64k page transition for stuff running on Arm. Among some of the fun has been discovering memory leaks that really weren't noticeable or a big deal at 4k, but now that the page is 16x larger, suddenly becomes noticeable and can even cause problems.
HPsquared a day ago

Could they find those by setting page size to some absurdly large value like 1MB?
- alexey-salmin 16 hours ago
  
  Tangent, but 1Mb pages aren't that absurd really. The x86-64 has hardware support for 4k, 2MB, and 1GB page sizes (because each level of the pagemap cuts 9 bits from the virtual address). Luckily it supports all 3 mixed together so normally you just keep most of your data in 4kb pages and use 2Mb/1Gb occasionally. But from my understanding nothings prevents you from forcing 2Mb on all userspace code even though Linux kernel doesn't support it.
- majke a day ago
  
  A lot of software wont work if you do that. Many jits and memory allocators have opinions on page size. Also tagged pointers are very common.
  - vient 15 hours ago
    
    Memory page size should be transparent for tagged pointers (any pointers, really), I don't see how they can be affected. You have an object at address 0xAB0BA, does the size of underlying page matter?
    
    danudey 7 hours ago
    
    It can be an issue of behavior; for example, Redis recommended disabling transparent huge page support in Linux because of (among other things?) copy-on-write memory page behaviors, and still does if you're going to persist data to disk.
    1. You have a redis instance with e.g. 1GB of mapped memory in one 1GB huge page
    2. Redis forks a copy of itself when it tries to persist data to disk so it can avoid having to lock the entire dataset for writes
    3. The new Redis process does anything to modify any of the data anywhere in that 1GB
    4. The OS has to now allocate a new 1GB page and copy the entire data set over
    5. Oops, we're under memory pressure! Better page out 1GB of data to the paging file, or flush 1GB of data from the filesystem cache, so that I can allocate this 1GB page for the next 200ms.
    You could imagine how memory allocators that try to be intelligent about what they're allocating and how much in order to optimize performance might care; when a custom allocator is trying to allocate many small pages and keep them in a pool so it can re-use them without having to request new pages from the OS, getting 100x 2M pages instead of 100x 4k pages is a colossal waste of memory and (potentially) performance.
    It's not necessarily that the allocators will break or behave in weird, incorrect ways (they may) but often that the allocators will say "I can't work under these conditions!" (or will work but sub-optimally).
  - rwmj 21 hours ago
    
    Also quite a lot of kernel drivers allocate whole pages (sometimes the device being driven requires it).
  - bigstrat2003 16 hours ago
    
    This just provides yet another example of why tagged pointers are a terrible idea and shouldn't be used. Someday, more of the address space will get used and your software will break.

nubinetwork a day ago

I can understand the desire for google to want devs to recompile their apps, but I don't see the need to dump old apps from the app store... who cares if an old app that works wastes 12k if it only needs a single 4k page?

altfredd a day ago

Google already dumps old apps from store for no reason whatsoever:
https://android-developers.googleblog.com/2022/04/expanding-...
You have to update an application every year, even if it is just meaningless version bump. Otherwise it will be removed after 2 years. Despite saying that this policy is required to ensure user security, several recent Android releases didn't have any corresponding major security changes.
ryao a day ago

I am not familiar with Android, but Linux ELF binaries that specify 4KB alignment will not work on systems with 16KB page sizes, since the ELF interpreter will refuse to load them. This hit me recently when trying to run a 32-bit binary on a Linux ARM system that had 16KB size pages, since the 32-bit OpenSSL libraries specified 4KB alignment. Presumably, this was done for maximizing entropy available to ASLR, but it breaks the binaries when the page size increases.
In any case, I assume that there is something similar affecting Android.
- mapt 17 hours ago
  
  As a user not involved in android or linux development: I don't care. Fix it. You just don't break the entire ecosystem of unmaintained apps for a 3% performance improvement.
  We maintained win32-x86 executable compatibility for decades. Keeping things working might require some sort of emulation layer, and it might impact performance substantially, and that's fine. I can accept that.
  "Everything just stops working" is not an option for a real operating system. I don't expect to put my workshop tools away and wake up in the morning to find the toolchest manufacturer sent them to the landfill because they didn't efficiently fit their new drawers.
  One of the areas that Android is common in that I couldn't possibly recommend is home automation. Your light switches are 50-year purchases. Odds that the app based light switches are working in five years are 50/50... Compound odds of longer are miniscule.
  - ars 13 hours ago
    
    I think most apps are written in Java, and according to the blog post will not be affected.
    It's only the apps written in c++ that need to be compiled, and those are probably large games and heavily performance critical apps.
  - Grazester 16 hours ago
    
    Is anything industrial is going to be built on Android. There are no ATM's, no manufacturing CNC machines etc. One my say everything that runs on Android is throw away. It is only recently Samsung and Google started to aim at 7 year life spans. At 7 years for an industrial piece of equipment, I may not have even paid it off as yet, then again is the software on these things even updated?
    
    cbarrick 16 hours ago
    
    Point of sale systems, like Toast. Media systems on airplanes. Infotainment systems in cars.
    In fact, NCR does sell an Android-based ATM solution. [1]
    Android is actually used somewhat widely in embedded systems that need to provide a nice GUI to the user.
    [1]: https://www.zdnet.com/article/ncr-launches-kalpana-an-androi...
    
    userbinator 2 hours ago
    
    Fortunately, like a truly embedded system, those are usually completely independent of Google or its app store.
- dwattttt a day ago
  
  Page size impacts page permissions; it's not a matter of wasting 12k, it's that with 4kb pages you're allowed to have a consecutive 8kb region with different permissions. 16kb pages can't do that without segfaulting every time memory is used "wrong", and trying to fix that up transparently would be a nightmare.
  - bjourne 9 hours ago
    
    That's a valid point, but isn't memory protection the only common user-visible effect of changed page sizes? It would seem most apps which do not use write-protected memory would be unaffected.
    
    dwattttt 7 hours ago
    
    I think the most immediate problem would be ELF segments that aren't 16kb aligned. Code will abut data, you can't add a gap without breaking offsets inside the ELF, and you'll induce the segfault during every write to a global at the start of the writable code, or executing code at the end of the code segment.
    A less safe option would be for permissions to be a union in that region, as code rarely depends on a permission being absent. That would be quite the security hole though.
  - yjftsjthsd-h a day ago
    
    > 16kb pages can't do that without segfaulting every time memory is used "wrong", and trying to fix that up transparently would be a nightmare.
    I would natively imagine the kernel could trap that and remap on the fly, at the tiny cost of murdering performance. Is that untrue, or is the perf so bad that it's not worth it?
    
    plorkyeran a day ago
    
    Anything which involves the kernel tracking permissions at 4k granularity despite using larger pages is just going to be worse in every way than using 4k pages.
    
    dwattttt a day ago
    
    It depends on how much of a program actually triggers the failure case, so you can't answer in the abstract.
    In the worst case, ~every memory access causes the kernel to need to fix it, causing every memory access to be several orders of magnitude worse (e.g. a hot cache hit vs trapping into kernel, wiping caches, at the very least hundreds more accesses).
    EDIT: I see you suggested remapping the page permissions. Maybe that helps! But maybe it adds the cost of the remapping onto the worst case, e.g. the first 4kb are instructions that write into the second 4kb.
vineyardmike a day ago

> I can understand the desire for google to want devs to recompile their apps, but I don't see the need to dump old apps from the app store
It could be that Google explicitly wants to dump un(der) maintained apps. Sure some might be clean and basic utilities that will work till the end of time, but many are probably abandonware or crummy demos and hello-world apps that look old and dated. They went on a whole purge recently already.
The App Store market is changing as Google/Apple grapple with efforts to end their monopolies. Maybe they're seeing that change and trying to use their dying advantage to frame themselves as the curated and reputable stores with high-quality maintained and up to date apps. When distribution is available from other places, they can chose their customers.
pjmlp a day ago

They already force apps to update to new APIs every couple of years, it was the only way to stop developers to keep using deprecated stuff.
saagarjha a day ago

Apps that expect 4K pages will try to enforce memory protections at that granularity.
tonyhart7 a day ago

"but I don't see the need to dump old apps from the app store"
They literally remove almost 50% total android app earlier this year, they clearly devoted to quality and security
jandrese 15 hours ago

What if you have a data structure that straddles the 4k boundary?

Eric_WVGG 16 hours ago

> This table describes who needs to transition and recompile their apps

information design is my passion

butz 16 hours ago

Table added as image, alt text vaguely explains table contents and has a spelling error. Great.

nephanth a day ago

From my noobish standpoint, it feels like most code shounldn't care what the page size is? Why does it need te be recompiled?

What typically tends to break when changing it?

junon a day ago

Performance, safety and IO critical code must care, because the page size affects TLB caching and is the finest granularity for security flags such as read-only, no execute, etc. which are critical for e.g. guard pages.
If your code that created two guard pages sandwiching a security critical page to make sure that under/overruns caused a page fault and crashed that assumed the boundary was at 4KiB, but is really now at 16KiB, that means that buffer overruns now will not get caught.
Further, code that assumed it was on a page boundary for some reason, for performance reasons, will now have only a 25% chance of being so.
It also means that MMIO physical pages that were expected to be contained within a 4KiB page such that when mapped into a sensitive user space driver context, neighboring MMIO control blocks wouldn't be touched, might be affected too since you'll get up to 3 neighboring blocks in either direction. This probably doesn't happen so often, I don't know Android internals much, but still something to consider.
This is in large part because PAGE_SIZE in a lot of C code is a macro or constant, rather than something populated at runtime depending on the system the code is running - something I've always felt is a bit problematic.
That being said, code that's hard coding PAGE_SIZE won't run anyway if using e.g. mmap() because it validates the page size and will error on mismatch.
This is going to wreak general havoc for a while no matter how you spin it.
okanat a day ago

Because the final ELF binary is linked to contain page aligned segments. Segments define how should the binary be loaded into memory and what permissions they require.
If you have a 4KB segment that is marked Read-Write followed immediately by a Read-Execute, naively loading it will open a can of security issues.
Moreover many platform data structures like Global Object Table of the dynamic executable uses addresses. You cannot simply bump things around.
On top of that libraries like C++ standard library (or abseil from Google) rely on the page size to optimize data structures like hash maps (i.e. unordered_map).
leidenfrost a day ago

Typically low level code and some manual fiddling with memory by asuming page size.
Everything's ok until some obscure library suddenly segfaults without any error
magnat a day ago

Mostly for I/O, e.g. mmap requires file offset to be multiple of the page size.
kevingadd a day ago

Off the top of my head:
If you rely on being able to do things like mark a range of memory as read-only or executable, you now have to care about page sizes. If your code is still assuming 4KB pages you may try to change the protection of a subset of a page and it will either fail to do what you want or change way too much. In both cases weird failures will result.
It also can have performance consequences. For example, if before you were making a lot of 3.5KB allocations using mmap, the wastage involved in allocating a 4KB page for each one might not have been too bad. But now those 3.5KB allocations will eat a whole 16KB page, making your app waste a lot of memory. Ideally most applications aren't using mmap directly for this sort of thing though. I could imagine it making life harder for the authors of JIT compilers.
Some algorithms also take advantage of the page size to do addressing tricks. For example, if you know your page size is 4KB, the addresses '3' and '4091' both can be known to have the same protection flags applied (R/W/X) and be the same kind of memory (mmap'd file on disk, shared memory segment, mapped memory from a GPU, etc.) This would allow any tables tracking information like that to only have 4KB granularity and make the tables much smaller. So that sort of trick needs to know the page size too.
notepad0x90 a day ago

most code shouldn't but you don't know what the library you're using is doing behind the scenes. the few code that do care, if a lot of people use them as a dependency, that could get real messy real fast.

userbinator a day ago

Weird. AFAIK 4K and 64K were the common ARM64 page sizes, and 16K was the odd "think different" one that Apple uses. No mention of 16K in the Linux kernel docs:

https://www.kernel.org/doc/html/next/arm64/memory.html

tux3 a day ago

64K starts to be a little too wasteful. It is a small performance gain as you'd expect, but less granularity means significantly more wasted memory
On a phone with limited RAM, this starts to be a bad tradeoff quickly. 16K is a reasonable jump from the venerable 4K page size.
ryao a day ago

The 64-bit kernel shipped for the Raspberry Pi 5 uses 16KB pages.
fsfod 13 hours ago

AMD also switched to 16k(4 x 4K) down from 8 in Zen1 for there PTE Coalescing system that is effectively run length like compression of page table entries with sequential addresses in to one TLB slot.
jeffbee a day ago

16K is the weird one in practice, but ARM says they are implementation-defined.
https://developer.arm.com/documentation/101811/0104/Translat...

londons_explore a day ago

If you're making the migration at all, you really ought to be going for fully variable page sizes, otherwise 5 years from now there'll be a 64K page size CPU and suddenly everyone has to recompile everything again and there is another compatibility wall...

rwmj a day ago

Is there a such a thing? Page size gets baked into things like executable layouts, plus any place that uses the PAGE_SIZE constant (instead of sysconf(_SC_PAGESIZE)).
- londons_explore 19 hours ago
  
  Indeed it would take redesigning a bunch of things to make runtime variable page size an option.
bjourne 9 hours ago

4 KiB page sizes have been used since the 1960's. More memory doesn't necessarily mean that larger pages are beneficial. Maybe 16 KiB is better for Android? Maybe. There really is no clear consensus on what the optimal page size for modern architectures should be.

AnonC a day ago

> Starting November 1st, 2025, all new apps and app updates that use native C/C++ code targeting Android 15+ devices submitted to Google Play must support 16 KB page sizes.

I realize that most apps wouldn’t need to make changes and that a recompilation would suffice, but is this time frame enough for the apps that do need code changes?

extraduder_ire a day ago

They've mentioned this requirement before, last hn post I see is from early may.
They only added support in android 15, in august 2024. https://android-developers.googleblog.com/2024/08/adding-16-...
I don't know what "targetting Android 15+" means specifically. Does that include anything with a lower API level?
- jlokier a day ago
  
  > I don't know what "targetting Android 15+" means specifically. Does that include anything with a lower API level?
  - On Android, apps are built with targetSdkVersion set to the API version you're app is compiled for and tested against, but you cam set a lower minSdkVersion to the lowest device API version your app will run on.
  - On devices with API level newer than targetSdkVersion, the OS looks at your app's targetSdkVersion and disables newer behaviours than your app is targetting. So the app should run well on newer devices.
  - On devices with API level older than targetSdkVersion, but newer than (or same as) minSdkVersion, your own app is responsible for detecting missing APIs before trying to use them, and adapting itself to the older environment.
  - On devices with API level older than minSdkVersion, your app will not be run. This ensures the user gets a clear failure, rather than unpredictable crashes or unexpected behaviour due to missing APIs the app tries to call.
  So, in principle, it's possible to build an app which targets the most recent Android 15, while being capable of running on all versions of Android back to version 1. Apps linked for 16 kiB page-alignment should run on older devices that use 4 kiB pages too.
  The Google Play Store enforces that targetSdkVersion is fairly close to the latest Android version. But it doesn't place requirements on minSdkVersion.
- izacus a day ago
  
  Android apps have a flag in their manifest which tells the OS "this app was built with Android X (API level X) in mind".
  This allows the OS to selectively enable backwards compatibility and change certain behaviors (e.g. selectively enforce new permissions so old apps aren't broken).
  Play Store requires apps to target new OSes and port APIs within certain time of an OS launching (usually ~2 years).
  This avoids apps targeting older OSes to avoid new security and privacy enhancements (e.g. asking for permisisons to show notifications, asking for permisisons to access microphone, being allowed to show a fullscreen popup ad, etc. Those restrictions were all gated behind the target check.)
ohdeargodno a day ago

If you, yourself have native code you're trying to build, it only required bumping the NDK (which is automatically bumped when you upgrade the android gradle plugin), so that's mostly an automatic step (provided you're not stuck on old, AGP7 build scripts).
If you depend on a package that uses a native library, you wait for them to update. Or you fork, bump AGP and rebuild.
It's a very minor change, unless you depend on unmaintained code.

gok 16 hours ago

It's kind of too bad Linux to doesn't just support multiple base page sizes.

croemer a day ago

"offering improved performance gains" is a pleonasm. "offering performance gains" works just fine.

rwmj a day ago

From the experience implementing 64K page sizes on aarch64 in Fedora & RHEL, this is not going to be a simple transition. All sorts of things will break in subtle, strange and interesting ways. Good luck to the Android team :-)

croemer a day ago

I think you meant "Android developers" that are forced to switch their apps.

londons_explore a day ago

This is dumb. The abstraction is at the wrong level.

Applications should assume the page size is 1 byte. One should be able to map, protect, etc memory ranges down to byte granularity - which is the granularity of everything else in computers. One fewer thing for programmers to worry about. History has shown that performance hacks with ongoing complexity tend not to survive (eg. interlaced video).

At the hardware level, rather than picking a certain number of bits of the address as the page size, you have multiple page tables, and multiple TLB caches - eg. one for 1 megabyte pages, one for 4 kilobyte pages, and one for individual byte pages. The hardware will simultaneously check all the tables (parallelism is cheap in hardware!).

The benefit of this is that, assuming the vast majority of bytes in a process address space are made of large mappings, you can fit far more mappings in the (divided up) TLB - which results in better performance too, whilst still being able to do precise byte-level protections.

The OS is the only place there is complexity - which has to find a way to fit the mappings the application wants into what the hardware can do (ie. 123456 bytes might become 30 4-kilobyte pages and 576 byte pages.).

murderfs a day ago

Your response to a change that's motivated by performance improvements is to suggest switching to a scheme that'll have catastrophically worse performance?
- londons_explore 21 hours ago
  
  It would likely have better performance for similar power and silicon area, because a hierarchical TLB will have a higher hit rate for the same number of transistors.
pjc50 a day ago

If you're going to go that far, you might as well move malloc() into hardware and start using ARM-style secure tagged pointers. Then finally C users can be free of memory allocation bugs.
maowtm 6 hours ago

Transistors aren't free (as in power consumptions, thermal etc), and wasting them on implementing 1 byte granularity TLBs would probably be a hard sell, even if assuming everything can indeed be done in parallel.
ohdeargodno a day ago

Dozens of years of kernel building, dozens of OSes, dozens of physical architectures, all having settled on minimum 4KB pages being a right balance between performance and memory usage, wiped away by a single offhand comment with no knowledge about the situation. Now that's HN.
Just the sheer TLB memory usage and performance implication of doing single byte pages would send CPU performance back to the stone age.
- FullyFunctional 21 hours ago
  
  Completely false. The 4 KiB page size came from a machine with a total of 512 KiB (1962 Atlas, 3072B pages, 96k 48b words). It hasn’t scaled at all for inertia reasons and it has real and measurable costs. 64 KiB would have been the better choice IMO, but 16 is better than 4.
  - ohdeargodno 19 hours ago
    
    Hence the "minimum" part. The thread is literally about Android being compiled for 16KB pages, CPU support for larger pages has grown, easily up to 4MB for most consumer CPUs.
    Going down _lower_ than 4KB is purely a waste of memory and performance.
- londons_explore 21 hours ago
  
  My proposed design has many page sizes - nothing stops a software developer making all mappings multiples of 4kb and not using the byte sized pages.
  My example was 1mb, 4kb and 1 byte pages - but a real design would probably use every power of two, or every even power of two to get best use of the TLB space.
  It hasn't been done before because of a chicken and egg problem. CPU designers don't build it because no OS has the ability to use it, and no OS uses it because no CPU supports it. It would be a substantial amount of work for both parties.
growse a day ago

> One should be able to map, protect, etc memory ranges down to byte granularity - which is the granularity of everything else in computers.
But you can do this, you simply have to pay the cost of using PAGE_SIZE of memory per byte you want to protect?