大连理工大学主页平台管理系统郭禾--中文主页-- APR: A novel parallel repacking algorithm for efficient GPGPU parallel code transformation

郭禾

教授博士生导师硕士生导师
性别：男
毕业院校：大连理工大学
学位：硕士
所在单位：软件学院、国际信息与软件学院
联系方式：guohe@dlut.edu.cn
电子邮箱：guohe@dlut.edu.cn

访问量：

开通时间：..

最后更新时间：..

个人学术主页

当前位置: 中文主页 >> 科学研究 >> 论文成果

APR: A novel parallel repacking algorithm for efficient GPGPU parallel code transformation

点击次数：

论文类型：会议论文

发表时间：2014-03-01

收录刊物：EI、Scopus

页面范围：81-89

摘要：General-purpose graphics processing units (GPGPU) brings an opportunity to improve the performance for many applications. However, exploiting parallelism is low productive in current programming frameworks such as CUDA and OpenCL. Programmers have to consider and deal with many GPGPU architecture details; therefore it is a challenge to trade off the programmability and the efficiency of performance tuning. Parallel Repacking (PR) is a popular performance tuning approach for GPGPU applications, which improves the performance by changing the parallel granularity. Existing code transformation algorithms using PR increase the productivity, but they do not cover adequate code patterns and do not give an effective code error detection. In this paper, we propose a novel parallel repacking algorithm (APR) to cover a wide range of code patterns and improve efficiency. We develop an efficient code model that expresses a GPGPU program as a recursive statement sequence, and introduces a concept of singular statement. APR building upon this model uses appropriate transformation rules for singular and non-singular statements to generate the repacked codes. A recursive transformation is performed when it encounters a branching/loop singular statement. Additionally, singular statements unify the transformation for barriers and data sharing, and enable APR to detect the barrier errors. The experiment results based on a prototype show that out proposed APR covers more code patterns than existing solutions such as the automatic thread coarsening in Crest, and the repacked codes using the APR achieve effective performance gain up to 3:28X speedup, in some cases even higher than manually tuned repacked codes. Copyright 2014 ACM.

上一条：A new geometric descriptor for symbols with affine deformations

下一条：以度量分段约束为特征的形状匹配算法