大连理工大学主页平台管理系统王宇新--中文主页-- DD-L1D: Improving the Decoupled L1D Efficiency for GPU Architecture

王宇新

副教授硕士生导师
性别：男
毕业院校：大连理工大学
学位：博士
所在单位：计算机科学与技术学院
办公地点：创新园大厦A0827
联系方式：18640987378
电子邮箱：wyx@dlut.edu.cn

访问量：

开通时间：..

最后更新时间：..

个人学术主页

当前位置: 中文主页 >> 科学研究 >> 论文成果

DD-L1D: Improving the Decoupled L1D Efficiency for GPU Architecture

点击次数：

论文类型：会议论文

发表时间：2017-01-01

收录刊物：SCIE、EI、CPCI-S、Scopus

页面范围：1-10

摘要：GPU L1 data cache contention, caused by a huge amount of concurrent threads, leads to insufficient cache utilization and poor performance, especially for cache unfriendly applications. Cache bypassing is a widely-used method to alleviate this problem, and Decoupled L1D (D-L1D) is a preventive bypassing scheme, which achieves performance improvement for cache unfriendly applications by considering the data locality of memory access streams. However, our experiments and analyses show that limited performance gain by D-L1D is attained due to the pre-defined locality threshold. To address this issue, we propose a novel bypassing scheme named as Dynamic D-L1D (DD-L1D) that directs the L1 data cache to the less contention by dynamically updating the locality threshold during runtime. We evaluate four metrics in DD-L1D to indicate the L1 cache bypassing state, and choose bypassing miss rate in our final configuration. The experimental results demonstrate that DD-L1D improves the baseline performance by 1.45X on average for cache unfriendly benchmarks. It also outperforms D-L1D and the state-of-the-art GPU cache bypassing schemes with lower hardware overhead and memory traffic.

上一条：基于直接后继节点完成时间的异构调度算法

下一条：An Efficient Dynamic Ridesharing Algorithm