Publication Date

8-2016

Date of Final Oral Examination (Defense)

6-28-2016

Type of Culminating Activity

Thesis

Degree Title

Master of Science in Computer Science

Department

Computer Science

Supervisory Committee Chair

James Buffenbarger, Ph.D.

Supervisory Committee Member

Kyle Wheeler, Ph.D.

Supervisory Committee Member

Amit Jain, Ph.D.

Abstract

Across the landscape of computing, parallelism within applications is increasingly important in order to track advances in hardware capability and meet critical performance metrics. However, writing parallel applications is difficult to do in a scalable way, which has led to the creation of tasking libraries and language extensions like OpenMP, Intel Threading Building Blocks, Qthreads, and more. These tools abstract parallel execution by expressing it in terms of work units (tasks) rather than specific hardware details. This abstraction enables scaling and allows programmers to write software solutions that can leverage whatever level of parallelism is available.However, the typical task scheduler is greedy and naïve. Thus, concurrent parallel processes compete for computational resources, which results in unnecessary context switches, mis-timed synchronization, unnecessary resource contention, and the associated consequences. By providing a mechanism of communication between the task schedulers, processes can cooperate to more effectively utilize hardware and avoid the negative consequences of coarse-grained resource contention. This work uses Qthreads to demonstrate that cooperative allocation of computational resources reduces contention and decreases execution time. The overhead added for the resource allocation is shown to have minimal impact. Using the Unbalanced Tree Search (UTS) and High Performance Conjugate Gradient (HPCG) benchmarks, execution time across concurrent processes shows significant decreases across a range of machines running a variety of hardware resources and software configurations. Tests also indicate that dynamic compute-resource allocation provides a clear performance benefit even when hardware resources are oversubscribed: when there are more processes than processing units. UTS tests saw an average of 4.98% reduction in execution time in Linux compared to Qthread's yielding option and an 89.32% reduction in execution time in Apple OS X. HPCG resulted in partitioning reducing execution time by an average of 22.31% compared to the default Qthreads configuration across all test platforms.

Share

COinS