Главная
Study mode:
on
1
Intro
2
How does data reach the disk?
3
fsync is really important
4
It's hard to get durability correct Applications find it difficult
5
fsync can fail Durability gets harder to get right
6
Why care about fsync failures? "About a year ago the PostgreSQL community discovered that fsync (on Linux and some BSD systems) may not work the way we always thought it is [sic], with possibly disas…
7
Our work Systematically understand fsync failures
8
File System Results
9
Application Results
10
Outline
11
File System | Methodology: Fault Injection
12
File System Methodology: Workloads Common write patterns in applications • Reduced to simplest form
13
File System Result #1: Clean Pages Dirty page is marked clean after fsync failure on all three file systems
14
File System Result #22: Page Content File systems do not handle fsync errors uniformly • Page content depends on file system
15
File System Result #3: In-memory state In-memory data structures are not entirely reverted
16
Applications Five widely used applications
17
Applications Results: Overview Ext4 Ordered Mode
18
Crash/Restart Simple strategies fail Crash/restart is incorrect recovers wrong data from page cache • Example: PostgreSQL
19
Applications Results #1: False Failures False Failures: Indicate failure but actually succeed
20
Late Error Reporting All applications susceptible to data loss on ext4 data mode
21
Btrfs winning?
22
Applications Results Summary Simple strategies fail • Applications have moved away from retries
23
Challenges and Directions
Description:
Explore the intricacies of fsync failures and their impact on file systems and data-intensive applications in this USENIX ATC '20 conference talk. Delve into a comprehensive analysis of how ext4, XFS, and Btrfs file systems react to fsync failures, uncovering commonalities and differences in their behavior. Examine the failure-handling strategies employed by popular applications like PostgreSQL, LMDB, LevelDB, SQLite, and Redis, and discover why these approaches fall short in preventing catastrophic outcomes such as data loss and corruption. Learn about the implications of these findings for designing file systems and applications that aim to provide robust durability guarantees. Gain insights into the challenges of achieving true data durability and the potential directions for improvement in this critical area of computer science.

Can Applications Recover from fsync Failures?

USENIX
Add to list
0:00 / 0:00