He Guo

Professor   Supervisor of Doctorate Candidates   Supervisor of Master's Candidates

Gender:Male

Alma Mater:大连理工大学

Degree:Master's Degree

School/Department:软件学院、国际信息与软件学院

Contact Information:guohe@dlut.edu.cn

E-Mail:guohe@dlut.edu.cn


Paper Publications

A Credit-Based Load-Balance-Aware CTA Scheduling Optimization Scheme in GPGPU

Hits:

Indexed by:期刊论文

Date of Publication:2016-02-01

Journal:INTERNATIONAL JOURNAL OF PARALLEL PROGRAMMING

Included Journals:SCIE、EI

Volume:44

Issue:1,SI

Page Number:109-129

ISSN No.:0885-7458

Key Words:GPGPU; CTA scheduler; Credit-based load-balance-aware scheduling scheme; Load balance

Abstract:GPGPU improves the computing performance due to the massive parallelism. The cooperative-thread-array (CTA) schedulers employed by the current GPGPUs greedily issue CTAs to GPU cores as soon as the resources become available for higher thread level parallelism. Due to the locality consideration in the memory controller, the CTA execution time varies in different cores, and thus it leads to a load imbalance of the CTA issuance among the cores. The load imbalance causes the computing resources under-utilized, and leaves an opportunity for further performance improvement. However, existing warp and CTA scheduling policies did not take account of load balance. We propose a credit-based load-balance-aware CTA scheduling optimization scheme (CLASO) piggybacked to a standard GPGPU scheduling system. CLASO uses credits to limit the amount of CTAs issued on each core to avoid the greedy issuance to faster executing cores as well as the starvation to leftover cores. In addition, CLASO employs the global credits and two tuning parameters, active levels and loose levels, to enhance the load balance and the robustness. Instead of a standalone scheduling policy, CLASO is compatible with existing CTA and warp schedulers. The experiments conducted using several paradigmatic benchmarks illustrate that CLASO effectively improves the load balance by reducing 52.4 % idle cycles on average, and achieves up to 26.6 % speedup compared to the GPGPU baseline scheduling policy.

Pre One:海冰与海洋平台碰撞分析中的可视化方法

Next One:Complexity of problem TF2|v=1, c=2|Cmax

Profile

教育背景:

  • 学士学位:吉林大学计算机系,1982

  • 硕士学位:大连理工大学计算机系,1989

科研与工作经历:

  • 198610月—198710月,新西兰Progeni Company,访问学者

  • 199010月—199212月,德国PDI Karlsruhe University计算机系,访问学者

  • 199212月—200712月,大连理工大学计算机系,副教授

  • 19953月—19966月,大连市金卡工程系统,总工程师

  •  

  • 20081月—今,大连理工大学软件学院,教授

  • 20204 退休

教学工作:

  • 1992年—2007年,计算机导论,计算机组织与结构,计算机系统结构

  • 2009年—2019年,存储技术,计算机系统结构,并行计算

科研:

  • 研究兴趣:并行与分布式计算。