• 更多栏目

    郭禾

    • 教授     博士生导师   硕士生导师
    • 性别:男
    • 毕业院校:大连理工大学
    • 学位:硕士
    • 所在单位:软件学院、国际信息与软件学院
    • 联系方式:guohe@dlut.edu.cn
    • 电子邮箱:guohe@dlut.edu.cn

    访问量:

    开通时间:..

    最后更新时间:..

    A Stall-Aware Warp Scheduling for dynamically optimizing thread-level parallelism in GPGPUs

    点击次数:

    论文类型:会议论文

    发表时间:2015-06-08

    收录刊物:EI、Scopus

    卷号:2015-June

    页面范围:15-24

    摘要:General-Purpose Graphic Processing Units (GPGPU) have been widely used in high performance computing as application accelerators due to their massive parallelism and high throughput. A GPGPU generally contains two layers of schedulers, a cooperative-thread-array (CTA) scheduler and a warp scheduler, which administer the thread level parallelism (TLP). Previous research shows the maximized TLP does not always deliver the optimal performance. Unfortunately, existing warp scheduling schemes do not optimize TLP at runtime, which is impossible to fit various access patterns for diverse applications. Dynamic TLP optimization in the warp scheduler remains a challenge to exploit the GPGPU highly-parallel compute power. In this paper, we comprehensively investigate the TLP performance impact in the warp scheduler. Based on our analysis of the pipeline eficiency, we propose a Stall-Aware Warp Scheduling (SAWS), which optimizes the TLP according to the pipeline stalls. SAWS adds two modules to the original scheduler to dynamically adjust TLP at runtime. A trigger-based method is employed for a fast tuning response. We simulated SAWS and conducted extensive experiments on GPGPU-Sim using 21 paradigmatic benchmarks. Our numerical results show that SAWS effectively improves the pipeline eficiency by reducing the structural hazards without causing extra data hazards. SAWS achieves an average speedup of 14:7% with a geometric mean, even higher than existing Two-Level scheduling scheme with the optimal fetch group sizes over a wide range of benchmarks. More importantly, compared with the dynamic TLP optimization in the CTA scheduling, SAWS still has 9:3% performance improvement among the benchmarks, which shows that it is a competitive choice by moving dynamic TLP optimization from the CTA to warp scheduler. ? Copyright 2015 ACM.