RASC863-G2 ——863地方普通话语音语料库(第2批,6地语音库)
RASC863-G2 -- 863 Regional Accent Speech Corpus(group2, 6 regional corpus)
RASC863是在国家863高技术项目支持下完成的地方普通话语音语料库。其中第一批四个地方的数据(上海、广州、重庆和厦门)已经于2004年完成并发布,本语料库(RASC863-G2)是该项目的第二批数据库,共有6个地方的数据(长沙、洛阳、南京、南昌、太原、温州)。
1996年863语音识别数据库以朗读语体为主,考虑了语音的音段平衡。随着语音识别技术的发展,制作口音和口语化的语音库变得重要起来。所以,我们这次由国家863项目基金支持的RASC863项目,在收集地方普通话语音语料库时,突出了口语化的特点,加大了语料覆盖范围。
Funded by the National 863 High-Tech Program, we collected a speech
corpus RASC863 with four representative regional accents, namely
The national Speech Corpus for ASR of the
year 1996, also funded by the National 863 High-Tech Program, was mainly of
reading style, considering the phonetic balance between segments. However, as the
development of speech technology, it becomes important to build accented and spontaneous
speech corpora. So, this time in Project RASC863, we have emphasized particularly on coverage and spontaneity of
speech in much more areas of daily life, while collecting speech data of
中国社会科学院语言所北京建国门内大街5号
中国社会科学院语言所语音研究室邮政编码:100732电话:(+86)-010-65237408 / 85195394
Phonetics Lab,
5 Jian Guo Men Nei Da Jie, Beijing 100732
CHINA
Phone: +86-10-6523 7408
Fax: +86-10-85195396
E-mail: yinzhg@cass.org.cn
2004年1月至2006年9月
Period of development:
Jan. 2003-Jun. 2004
每个方言点的发音人为200人(100男+100女),共1200人。各方言点发音人情况按照事先设计的年龄、性别和教育背景分布。(详情见“发音人规范”)
RASC863-G2包括自然口语部分(口语独白和常见问题回答)和朗读(语音平衡句子、常用口语句和信息通讯语句)部分。自然口语部分分为依据话题的口语独白和回答问题两个部分:口语独白部分是由发音人从我们事先准备的160个话题中任意选择一个,然后讲述3-5分钟相关内容;回答问题部分是由每个发音人回答23个常见问题。朗读语料部分包括经过挑选的语音平衡句子共1890余句、460个常用口语句以及100个面向信息通讯应用的数字、字母、短信内容的混合语料。(详情见“语料设计规范”)
RASC863-G2录音数据收集近距(距嘴角距离2
The corpus consists of spontaneous speech, read speech and selected dialectical words. For the
spontaneous speech, each
speaker was asked to select a topic himself or from our prepared topic sheet with a variety of 160 topics and then to
give a 4-5 minute spontaneous speech on the topic. Besides, each speaker was asked to answer 23 questions spontaneously. The read speech consists of
1895 phonetically balanced sentences selected
automatically, 460 sentences frequently used in daily
life , and 100 info-compound sentences (digits, characters, SMS) . 1200 speakers (200 from each region; balanced in terms of the age, sex, and
educational background) were recruited in
the project.
[待商定]
《地方口音普通话语音库RASC863-G2技术报告中文版》
《地方口音普通话语音库RASC863-G2技术报告英文版》
《地方口音普通话语音库RASC863-G2录音存储规范》
《地方口音普通话语音库RASC863-G2发音人规范》
《地方口音普通话语音库RASC863-G2语料设计规范》
《地方口音普通话语音库RASC863-G2标注规范》
A technical report on RASC863 (Chinese version)
A technical report on RASC863 (English version)
The specifications of recording and storing for RASC863
The specifications of speakers for RASC863
The specifications of text design for RASC863
The specifications of annotation for RASC863
Samples:
Please refer to Samples of RASC863