Reducing Tail Latency in Interactive Services via Dynamic Parallelism in Heterogeneous Multicore Systems

Apr 22

Wednesday, April 22, 2015

11:30 am - 12:30 pm
Teer 115


Md E. Haque, PhD Student, Rutgers University

Interactive services, such as Web search, recommendations, games, and finance, must respond quickly to satisfy customers. Achieving this goal requires optimizing tail latency (e.g.,99th+ percentile). In this talk, I will discuss Few-to-Many (FM)incremental parallelization, which dynamically increases the parallelism of request processing to reduce tail latency. Dynamic parallelism is necessary since blindly parallelizing all requests quickly oversubscribes hardware resources and parallelizing the numerous short requests will not improve tail latency. However, choosing the right amount of parallelism dynamically is challenging because service demand is unknown when requests arrive. FM uses request service demand profiles and knowledge of hardware parallelism to compute a policy in an offline phase. This policy is represented as an interval table, specifying when and how much software parallelism to add. At runtime, FM increases the parallelism for the processing of each request according to the interval table indexed by system load and request processing progress. The longer a request executes, the more parallelism FM adds. I will present the evaluation results for homogeneous multicore systems and multicore systems with simultaneous multithreading (SMT). These results illustrate that incremental dynamic parallelism is a powerful tool for reducing tail latency.


Currin, Ellen