Network Slicing to Improve Multicasting in HPC Clusters
Document Type
Article
Publication Date
9-2018
DOI
https://doi.org/10.1007/s10586-017-1561-5
Abstract
In high performance computing (HPC) resources’ extensive experiments are frequently executed. HPC resources (e.g. computing machines and switches) should be able to handle running several experiments in parallel. Typically HPC utilizes parallelization in programs, processing and data. The underlying network is seen as the only non-parallelized HPC component (i.e. no dynamic virtual slicing based on HPC jobs). In this scope we present an approach in this paper to utilize software defined networking (SDN) to parallelize HPC clusters among the different running experiments. We propose to accomplish this through two major components: A passive module (network mapper/remapper) to select for each experiment as soon as it starts the least busy resources in the network, and an SDN-HPC active load balancer to perform more complex and intelligent operations. Active load balancer can logically divide the network based on experiments’ host files. The goal is to reduce traffic to unnecessary hosts or ports. An HPC experiment should multicast, rather than broadcast to only cluster nodes that are used by the experiment. We use virtual tenant network modules in Opendaylight controller to create VLANs based on HPC experiments. In each HPC host, virtual interfaces are created to isolate traffic from the different experiments. The traffic between the different physical hosts that belong to the same experiment can be distinguished based on the VLAN ID assigned to each experiment. We evaluate the new approach using several HPC public benchmarks. Results show a significant enhancement in experiments’ performance especially when HPC cluster experiences running several heavy load experiments simultaneously. Results show also that this multi-casting approach can significantly reduce casting overhead that is caused by using a single cast for all resources in the HPC cluster. In comparison with InfiniBand networks that offer interconnect services with low latency and high bandwidth, HPC services based on SDN can provide two distinguished objectives that may not be possible with InfiniBand: The first objective is the integration of HPC with Ethernet enterprise networks and hence expanding HPC usage to much wider domains. The second objective is the ability to enable users and their applications to customize HPC services with different QoS requirements that fit the different needs of those applications and optimize the usage of HPC clusters.
Publication Information
Alsmadi, Izzat; Khreishah, Abdallah; and Xu, Dianxiang. (2018). "Network Slicing to Improve Multicasting in HPC Clusters". Cluster Computing, 21(3), 1493-1506.