Bayes Centre

Partner event: Data-driven mixed-Integer linear programming-based optimisation for efficient failure detection in large-scale distributed systems

About the Event

Btissam Er-Rahmadi will give a talk, in person and online, for the Coffee House Tech Talk Series

 Failure detectors (FDs) are fundamental building blocks for distributed systems. An FD detects whether a process has crashed or not based on the reception of heartbeats’ messages sent by this process over a communication channel. A key challenge of FDs is to tune their parameters to achieve optimal performance which satisfies the desired system requirements. This is challenging due to the complexities of large-scale networks. Existing FDs ignore such optimization and adopt ad-hoc parameters. In this paper, we propose a new Mixed Integer Linear Programming (MILP) optimization-based FD algorithm. We obtain the MILP formulation via piecewise linearization relaxations. The MILP involves obtaining optimal FD parameters that meet the optimal trade-off between its performance metrics requirements, network conditions and system parameters.

The MILP maximizes our FD’s accuracy under bounded failure detection time while considering network and system conditions as constraints. The MILP’s solution represents optimized FD parameters that maximize FD’s expected performance. To adapt to real-time network changes, our proposed MILP-based FD fits the probability distribution of heartbeats’ inter-arrivals. To address our FD scalability challenge in large-scale systems where the MILP model needs to compute approximate optimal solutions quickly, we also propose a heuristic algorithm.

To test our proposed approach, we adopt Amazon Cloud as a realistic testing environment and develop a simulator for robustness tests. Our results show consistent improvement of overall FD performance and scalability. To the best of our knowledge, this is the first attempt to combine the MILP-based optimization modelling with FD to achieve performance guarantees.

Brief speaker bio: Btissam Er-Rahmadi is a Senior Researcher in the Knowledge Graph Lab at Huawei Edinburgh Research Centre, UK. Prior to that, she was a Research Fellow in Network Systems at the University of Southampton, UK. She received her industrial (i.e. CIFRE) Ph.D. degree in Computer Science from University Rennes 1, France, in 2016. She has completed her Ph.D. research work at Orange Labs, Lannion, France. Her research interests include Operations Research and AI applied to performance enhancement in distributed systems and network systems, and Knowledge Graphs for (Personalised) Search and Recommendation.

 

Aug 02 2022 -

Partner event: Data-driven mixed-Integer linear programming-based optimisation for efficient failure detection in large-scale distributed systems

This Tech Talk lecture series is a part of the Huawei Coffee House offering.

Room G.03, Bayes Centre (47 Potterrow, Edinburgh EH8 9BT).