Главная
Study mode:
on
1
Intro
2
RDMA are getting rapidly adopted
3
There exist unexpected performance anomalies
4
Existing integration tests
5
Strawman solutions are not enough
6
Question: How to define performance anomaly
7
Challenge #1: Comprehensive Search Space
8
Challenge #2: Efficient Search Algorithm
9
Finding the narrow waist
10
Hardware counters as search signal
11
Minimal Feature Set (MFS)
12
Implementation
13
Evaluation Settings
14
Lessons and Future Work
15
Conclusion
Description:
Explore a 15-minute conference talk from USENIX NSDI '22 that introduces Collie, a tool designed to uncover performance anomalies in RDMA subsystems. Learn how Collie constructs a comprehensive search space for application workloads and uses simulated annealing to drive RDMA-related performance and diagnostic counters to extreme value regions. Discover the tool's effectiveness in finding 15 new performance anomalies across various RDMA NIC, CPU, and hardware component combinations. Gain insights into the challenges of defining performance anomalies, creating a comprehensive search space, and implementing efficient search algorithms. Understand the importance of hardware counters as search signals and the concept of Minimal Feature Set (MFS) in Collie's approach. Examine the evaluation settings, lessons learned, and future work directions for improving RDMA subsystem performance testing.

Collie - Finding Performance Anomalies in RDMA Subsystems

USENIX
Add to list
00:00
-00:27