malloc cycles do not matter. $$$ spent on hardware matter.
5
most cpu cycles do nothing
6
hugepages cheapen the page table walk
7
hugepages make the TLB bigger!
8
space efficient hugepage aware allocators are hard
9
demand oscillates wildly
10
emptying density - binpacking
11
mistakes can live forever
12
tcmalloc structure
13
spans back everything
14
change nothing but the page heap
15
Temeraire: the design
16
slack and donation
17
how does the HugeFiller make decisions?
18
HugeFiller tracks metadata per hugepage
19
we favor fragmentation over fullness
20
results
21
staged rollout
22
saved -1.3% of cycles
23
saving memory in the process
24
virtuous cycles: hugepage coverage
Description:
Explore a 14-minute conference talk from OSDI '21 that delves into TEMERAIRE, a hugepage-aware enhancement of TCMALLOC designed to optimize memory allocation at warehouse scale. Learn how this innovative approach goes beyond traditional malloc efficiency to improve fleet-wide productivity by maximizing hugepage coverage and minimizing fragmentation overheads. Discover the design and implementation strategies behind TEMERAIRE, including its impact on reducing CPU overheads in application code. Examine the results of application studies across 8 different applications, showcasing improvements in requests-per-second and RAM usage. Gain insights from a large-scale experiment and longitudinal rollout in Google's warehouse scale computers, revealing significant reductions in TLB miss stalls and memory fragmentation. Conclude with a discussion on enhancing allocator development processes and potential optimization strategies for future memory allocators.
Beyond Malloc Efficiency to Fleet Efficiency - A Hugepage-Aware Memory Allocator