Dang Yanzhong
Personal Homepage
Paper Publications
基于长度递减与串频统计的文本切分算法
Hits:

Indexed by:期刊论文

Date of Publication:2006-01-01

Journal:情报学报

Included Journals:PKU、ISTIC、CSCD、CSSCI

Volume:25

Issue:1

Page Number:74-79

ISSN No.:1000-0135

Key Words:汉字;自动切分;串频;长串优先匹配

Abstract:提出了一种基于汉字串频度及串长度递减的中文文本自动切分算法.采用长串优先匹配法,不需要词典,不需要事先估计字之间的搭配概率,不需要建立字索引,利用串频信息可以自动切分出文本中有意义的汉字串.该算法能够有效地切分出文本中新涌现的通用词、专业术语及专有名词,并且能够有效避免具有包含关系的长、短汉字串中的短汉字串的错误统计.实验表明,在无需语料库学习的情况下,该算法能够快速、准确地切分出中文文档中出现频率大于等于支持度阈值的汉字串.

Personal information

Professor
Supervisor of Doctorate Candidates
Supervisor of Master's Candidates

Gender:Male

Alma Mater:大连理工大学

Degree:Doctoral Degree

School/Department:系统工程研究所

Discipline:Management Science and Engineering. Systems Engineering

Click:

Open time:..

The Last Update Time:..


Address: No.2 Linggong Road, Ganjingzi District, Dalian City, Liaoning Province, P.R.C., 116024

MOBILE Version