• 其他栏目

    申彦明

    • 教授     博士生导师 硕士生导师
    • 性别:男
    • 毕业院校:纽约理工大学
    • 学位:博士
    • 所在单位:计算机科学与技术学院
    • 办公地点:海山楼B0813
    • 联系方式:
    • 电子邮箱:

    访问量:

    开通时间:..

    最后更新时间:..

    论文成果

    当前位置: 中文主页 >> 科学研究 >> 论文成果
    Fast correlation coefficient estimation algorithm for HBase-based massive time series data

    点击次数:

      发布时间:2019-07-01

      论文类型:期刊论文

      发表时间:2019-08-01

      发表刊物:FRONTIERS OF COMPUTER SCIENCE

      收录刊物:SCIE

      卷号:13

      期号:4

      页面范围:864-878

      ISSN号:2095-2228

      关键字:time series; HBase; correlation coefficient; fast estimation

      摘要:In recent years, the rapid development of Internet of Things and sensor networks makes the time series data experiencing explosive growth. OpenTSDB and other emerging systems begin to use Hadoop, HBase to store massive time series data, and how to use these platforms to query and mine time series data has become a current research hotspot. As a typical time series distance measurement method, correlation coefficient is widely used in various applications. However, it requires a large amount of I/O and network transmission to compute the correlation coefficient of long time sequence on HBase in real time, and therefore cannot be applied to interactive query. To address this problem, in this paper, we present two methods to estimate the correlation coefficients of two sequences on HBase. We first propose a fast estimation algorithm for the upper and lower bounds of correlation coefficient, named as DCE. In order to further reduce the cost of I/O, we extend the DCE algorithm, and propose the ADCE algorithm, which can estimate the correlation coefficient quickly with an iterative manner. Experiments show that the algorithms proposed in this paper can quickly calculate the correlation coefficient of the long time series.