A unified runtime system for heterogeneous multicore architectures - HPPC 2008

A unified runtime system for heterogeneous multicore architectures - HPPC 2008

Abstract. Approaching the theoretical performance of heterogeneous multicore architectures, equipped with specialized accelerators, is a chal- lenging issue. Unlike regular CPUs that can transparently access the whole global memory address range, accelerators usually embed local memory on which they perform all their computations using a specific instruction set. While many research efforts have been devoted to of- floading parts of a program over such coprocessors, the real challenge is to find a programming model providing a unified view of all available computing units.
In this paper, we present an original runtime system providing a high- level, unified execution model allowing seamless execution of tasks over the underlying heterogeneous hardware. The runtime is based on a hi- erarchical memory management facility and on a codelet scheduler. We demonstrate the efficiency of our solution with a LU decomposition for both homogeneous (3.8 speedup on 4 cores) and heterogeneous machines (95% efficiency). We also show that a “granularity aware” scheduling can improve execution time by 35%.

Download

Bibtex

@Inproceedings{AugNam08HPPC, author = {C{'e}dric Augonnet and Raymond Namyst}, title = {{A unified runtime system for heterogeneous multicore architectures}}, booktitle = {Proceedings of the International Euro-Par Workshops 2008, HPPC'08}, address = {Las Palmas de Gran Canaria, Spain}, publisher = {Springer}, series = {Lecture Notes in Computer Science}, volume = 5415, pages = {174--183}, doi = {10.1007/978-3-642-00955-6_22}, isbn = {978-3-642-00954-9}, month = AUG, year = 2008, url = {http://hal.inria.fr/inria-00326917}, keywords = {StarPU} }