| 863
program in 2003 speech recognition evaluation data |
863
program in 2004 speech recognition evaluation data |
| 863
program in 2003 speech synthesis evaluation data |
863
program in 2004 speech synthesis evaluation data |
| 863
program in 2003 machine translation evaluation data |
863
program in 2004 machine translation evaluation data |
| 863
program in 2003 automatic index evaluation data |
863
program in 2004 automatic index evaluation data |
| 863
Program in 2003 Assessment and test data of text classification |
863
Program in 2004 Assessment and test data of text classification |
| 863 Program in 2003 Assessment and test data of chinese recognition |
863
program in 2004 information index evaluation data |
| 863
program in 2003 full text retrieval evaluation data |
863
program in 2003 name entry identification evaluation data |
| 863
program in 2003 part-of-speech evaluation data |
Three
parallel language Chinese, English, Japanese corpus developed
for Olympic(Chinese and English) |
| Three
parallel language Chinese, English, Japanese corpus developed
for Olympic |
Chinese Lexicon |
| 863
program in 2005 machine translation evaluation data |
863
program in 2005 information index evaluation data |
| 863
program in 2005 speech recognition evaluation data |
The
identifiable speech database of Chinese mandarin -----extract
database |
| The Grammatical
Knowledge-base of Contemporary Chinese (High Frequency
Words) |
Chinese POS Tagged
Corpus |
| Chinese-English Sentence aligned
Bilingual Corpus |
Tsinghua Chinese
Treebank |
| Chinese-English/Chinese-Japanese
parallel corpora |
Chinese-English
Olympic Dictionary |
| Special Scene and special domain
dialogue corpus |
Mordern Chinese
semantic Dictionary based on International Logical Model |
| CASIA98-99
speech testing library |
CASIAThe weather forecast
broadcasts the pronunciation storehouse |
| CASIA
single syllable isolated word speech corpus |
Natural Broadcasting Speech
corpus |
| Chinese and English speech
corpus |
Tsinghua-Corpus of
speech synthesis |
| RASC863-annotated 4 regional
accent speech corpus(Ⅰ) |
RASC863-annotated 4
regional accent speech corpus(Ⅱ) |
| RASC863-annotated 4 regional
accent speech corpus(Ⅲ) |
Telephone speech
corpus for speech recognition |
| CASIA-Mandarin
continuous digit speech corpus |
CASIA-
Northern China accent speech corpus |
| CASIA-
Southern China accent speech corpus |
CASIA-Chinese Question
Structures Corpus |
| CASIA-Chinese Emotional
Speech Corpus |
CASIA-863Chinese
Speech Synthesis Corpus |
| Chinese
geographic name storehouse |
ASCCD-
Annotated Speech
Corpus of Chinese Discourse |
| CADCC-Chinese
Annotated Dialogue and Conversation Corpus |
SCSC——Syllable
Corpus of Standard Chinese
|
| WCSC——Word
Corpus of Standard Chinese
|
TSC973-Telephone
Speech Corpus 973
|
| The
identifiable speech database of telephone speech——the
name of person, the name of place ( 265 people using mobile
telephone ) |
The
identifiable speech database of telephone speech——the name
of person, the name of place (285 speakers using stable
telephone ) |
| The
identifiable speech database of telephone speech——the number
string (265 people using mobile telephone ) |
The
identifiable speech database of telephone speech——the number
string (285 speakers using stable telephone) |
| The
identifiable speech database of telephone speech——stock (265
people using mobile telephone ) |
The
identifiable speech database of telephone speech——the stock
(285 people using stable telephone ) |
| The
identifiable speech database of telephone speech——the
message (64 people using mobile telephone ) |
The
identifiable speech database of telephone speech——the
message (86 people using mobile telephone ) |
| The
identifiable speech database of tabletop speech——the message
(200 persons ) |
The
identifiable speech database of tabletop speech——the number
string (200 persons ) |
| The
identifiable speech database of tabletop speech——the number
string (10 persons ) |
the
identifiable speech database of tabletop speech——the message
(120 persons ) |
| The
identifiable speech database of tabletop speech——the number
string (120 persons ) |
The
identifiable speech database of tabletop speech——the
people’s name, the place’ name (120 persons ) |
| The
identifiable speech database of tabletop speech——the stock
(70 persons ) |
The
identifiable speech database of tabletop speech——free topic
(50 persons ) |
| The
identifiable speech database of Chinese mandarin -----wide label |
6 regional accent speech corpus-spoken language |
| 6 regional accent speech corpus-Recitation |
863
program in 2007 SSMT machine translation evaluation data |
|
Chinese Web 5-gram Corpus |
English Keywords for
public servants |
|
|
| 863
program in 2003 speech synthesis evaluation data |
Code:
2003-863-002
Description:
general area and Olympic Games related area.
Log in http://www.863data.org.cn
Creator:
Institute of Computing, Chinese Academy of Science and Institute
of Linguistics, Chinese Academy of Social Sciences.
Usage:
Speech Synthesis
Price:
1000RMB
for research organization of China;
1000USD for foreign research organization;
3000RMB for commercial organization of China
3000USD for foreign commercial organization
Preferential
Price:
Participate in the preferential activity
Description
Label
Report
Samples Purchase
|
| 863 Program in 2003 Assessment and test data of chinese recognition |
Code:2003-863-003
Description:Including the big character storehouse, the small character storehouse, the storehouse of haphazard
order of strokes observed in calligraphy amounts to 100 sets .Log
in http://www.863data.org.cn
Creator: Institute of Computing, Chinese Academy of Science
Usage:chinese recognition
Price:
The big character storehouse:
4000RMB
for research organization of China;
4000USD for foreign research organization;
12000RMB for commercial organization of China
12000USD for foreign commercial organization
The small character storehouse:
2000RMB
for research organization of China;
2000USD for foreign research organization;
6000RMB for commercial organization of China
6000USD for foreign commercial organization
Preferential Price:Participate in the preferential activity
Description
Label Report
Samples Purchase
|
| 863
program in 2003 machine translation evaluation data |
Code:
2003-863-004
Description:
for evaluation of the projects of Chinese to English, English to
Chinese, Chinese to Japanese, Japanese to Chinese dialogues in
the area of Olympic Games and paragraph.
Log in http://www.863data.org.cn
creator:
Institute of Computing, Chinese
Academy of Science
Usage:
Machine translation
Price:
For each language
(Four
Parts):
1000RMB
for research organization of China;
1000USD for foreign research organization;
3000RMB for commercial organization of China
3000USD for foreign commercial organization
Preferential
Price:
Participate in the preferential activity
Description
Label
Report
Samples Purchase
|
| 863
program in 2003 automatic index evaluation data |
Code:2003-863-005
Description:Language material Include 10 articles,The number of words ranges from 1755 to 4502 .Log
in http://www.863data.org.cn
Creator:Institute of Computing, Chinese Academy of Science
Usage:Automatic article
Price:Free
Preferential Price:Participate in the preferential activity
Description
Label
Report
Samples
Purchase
|
| 863
program in 2003 full text retrieval evaluation data |
Code:
2003-863-006
Description:
small-scale testing corpus, small-scale evaluation corpus,
large-scale evaluation corpus
.Log
in
http://www.863data.org.cn
creator:
Institute of Computing, Chinese Academy of Science
Usage:
full
text retrieval
Price:
1000RMB
for research organization of China;
1000USD for foreign research organization;
3000RMB for commercial organization of China
3000USD for foreign commercial organization
Preferential
Price:
Participate in the preferential activity
Description
Label
Report
Samples
Purchase
|
| 863 Program in 2003 Assessment and test data of text classification |
Code:2003-863-007
Description:Amount to 3600 files.Log in http://www.863data.org.cn
creator:Institute of Computing, Chinese Academy of Science and Open System and Chinese Information Processing
center in Institute of Software, Chinese Academy of Science
Usage:text classification
Price:
1000RMB
for research organization of China;
1000USD for foreign research organization;
3000RMB for commercial organization of China
3000USD for foreign commercial organization
Preferential Price:Participate in the preferential activity
Description
Label
Report
Samples Purchase
|
| 863
program in 2003 part-of-speech evaluation data |
Code:
2003-863-008
Description:
242 files, about 400 thousand Chinese characters.Log
in http://www.863data.org.cn
Usage:
for evaluation of part-of-speech
Price:
2000RMB
for research organization of China;
2000USD for foreign research organization;
6000RMB for commercial organization of China
6000USD for foreign commercial organization
Preferential
Price:
Participate in the preferential activity
Description
Label
Report
Samples Purchase
|
| 863 Program in 2004 Assessment and test data of machine translation |
Code:2004-863-001
Description:It
contains three languages, which are Chinese, English and
Japanese. It both contains dialog and discourse, and also can be
separated as two domains: Olympics and general.Log
in http://www.863data.org.cn
Creator:Institute of Computing, Chinese Academy of Science and Japanese information communication research institution
Keihanna information communication merges the research center
Usage:machine translation
Price:
Each part (Total Five
Parts):
1000RMB
for research organization of China;
1000USD for foreign research organization;
3000RMB for commercial organization of China
3000USD for foreign commercial organization
Preferential Price:Participate in the preferential activity
Description
Label
Report
Samples Purchase
|
| 863
program in 2004 name entry identification evaluation data |
Code:
2004-863-002
Description:
the contents of corpus include two categories: simplified
characters (241 files, about 400 thousand Chinese characters)
and traditional characters (126 files, about 400 thousand
Chinese characters)Log
in http://www.863data.org.cn
Creator:Institute
of computing technology, Chinese academy of sciences; Institute
of Computer Application ,Shanxi University Computer of Science;
Hong Kong city university language information science research
center
Usage:
for evaluation of name entry identification
Price:
500RMB
for research organization of China;
500USD for foreign research organization;
1500RMB for commercial organization of China
1500USD for foreign commercial organization
Preferential
Price:
Participate in the preferential activity
Description
Label
Report
Samples Purchase
|
| 863
program in 2004 information index evaluation data |
code:2004-863-003
description:amount to 30 inquiry.Log
in http://www.863data.org.cn
creator:Institute of Computing, Chinese Academy of Science and Open System and Chinese Information Processing
center in Institute of Software, Chinese Academy of Science and The computer network of Beijing University and distributed systematic laboratory
Usage:text classification
usage:information index
Price:
1000RMB
for research organization of China;
1000USD for foreign research organization;
3000RMB for commercial organization of China
3000USD for foreign commercial organization
Preferential Price:Participate in the preferential activity
Description
Label
Report
Samples Purchase
|
| 863
program in 2004 automatic index evaluation data |
Code:
2004-863-004
Description:
the corpus includes 20 articles which is different in length
form 800 to 2500 Chinese characters.Log
in http://www.863data.org.cn
Creator:
Institute of Computing Technology, Chinese Academy of Sciences
Usage:
for evaluation of automatic index
Price:
free
Preferential
Price:
Participate in the preferential activity
Description
Label
Report
Samples
Purchase |
| 863
Program in 2004 Assessment and test data of text classification |
Code:
2004-863-005
Description:
Amount to 3600 files.Log
in http://www.863data.org.cn
Creator:
Institute of Computing Technology, Chinese Academy of Sciences
Open System & Chinese Information Processing Center,
Institute of Software, Chinese Academy of Sciences
Usage:
Text classification
Price:
1000RMB
for research organization of China;
1000USD for foreign research organization;
3000RMB for commercial organization of China
3000USDfor foreign commercial organization
Preferential
Price: Participate in the preferential activity
Description
Label
Report
Samples Purchase
|
| 863
program in 2004 speech recognition evaluation data |
Code:
2004-863-006
Description:
the corpus includes three parts: Chinese desktop speech,
telephone speech, and PDA speech.Log
in http://www.863data.org.cn
Creator:institute
of computing technology, Chinese academy of science; institute
of Linguistics, Chinese academy of social sciences; Capital
information development limited liability company.
Usage:
for evaluation of speech recognition
Price:
Chinese
desktop speech:
1000RMB for research organization of China;
1000USD for foreign research organization;
3000RMB for commercial organization of China
3000USD for foreign commercial organization
English
desktop speech:
500RMB for research organization of China;
500USD for foreign research organization;
1500RMB for commercial organization of China
1500USD for foreign commercial organization
Telephone
speech (including syntax labels):
1000RMB for research organization of China;
1000USD for foreign research organization;
3000RMB for commercial organization of China
3000USD for foreign commercial organization
PDA
speech:
500RMB for research organization of China;
500USD for foreign research organization;
1500RMB for commercial organization of China
1500USD for foreign commercial organization
Preferential
Price:
Participate in the preferential activity
Description
Label
Report
Samples Purchase |
| 863
Program in 2004 Assessment and test data of speech synthesis |
Code:
2004-863-007
Description:
The corpus in several is divided into a field in common use and
specific field of Olympic Games.Log
in http://www.863data.org.cn
Creator:
Institute of Computing Technology, Chinese Academy of Sciences
Usage:
Speech synthesis
Price:
500RMB for research organization of China;
500USD for foreign research organization;
1500RMB for commercial organization of China
1500USD for foreign commercial organization
Preferential
Price:
Participate in the preferential activity
Description
Label
Report
Samples
Purchase
|
| Three
parallel language Chinese, English, Japanese corpus developed
for Olympic(Chinese
and English) |
Code:
2004-863-008
Description:This
corpus is a part of the corpus named “three parallel language
(Chinese, English, Japanese) corpus developed for Olympic”. From
exact the part of Chinese and English, a parallel language corpus
is got. This corpus includes the dialog contents from trip, food,
traffic, sports and commerce. There are 52227 parallel sentences
which has been manual checked, which can be used in the
development of machine translation
Creator:Harbin
institute of technology
Usage:
For
the research and development of machine translation
Price:
1500RMB for research organization of China;
3000RMB for foreign research organization;
18000RMB for commercial organization of China;
45000RMB for foreign commercial organization.
Preferential Price: Participate in the preferential activity
Description
Label Report
Samples
Purchase
|
| Three
parallel language Chinese, English, Japanese corpus developed
for Olympic |
Code:
2004-863-009
Description:This
resources mainly use in Chinese, English and Japanese during three
languages face Olympic Games' machine translation system
development, Specially for system development provided face the
spoken language processing training language materials. Meanwhile
regarding In other Chinese, English and Japanese during three
languages cross language information management systems
development has the certain application value.
Creator:Harbin
institute of technology
Usage:The
language materials contain Traveling, dining, sports,
transportation, commercial and Olympic Games close correlation
domain. The complete language materials have completed the
sentence level to the uneven processing, and passes through manual
adjust. The language materials sign note uses the xml form,
Retained the primitive language materials natural structure
information, Like paragraph, dialogue speech turn of structure,
dialogue population and so on; The sign has poured the scene which
the language occurs, Chapter information and so on spoken language
subject.
Price:
2000RMB for research organization of China;
4000RMB for foreign research organization;
25000RMB for commercial organization of China;
60000RMB for foreign commercial organization.
Preferential Price: Participate in the preferential activity
Description
Label
Report
Samples
Purchase
|
| 863
program in 2005 machine translation evaluation data |
Code:
2005-863-001
Description:
Include
Chinese-English, English-Chinese, Chinese-Japanese,
Japanese-Chinese, English-Japanese and Japanese-English.
Two types: Dialogue and Writing.
Domain: Olympic-related for dialogue and News for writing.
http://www.863data.org.cn
Creator:
Institute
of Computing Technology, CAS, China and National
Institute of Information and Communications Technology (NICT),
Japan.
Usage:
Machine translation
Price:
Each part price : (Total Six Parts)
1000RMB for research organization of China;
1000USD for foreign research organization;
3000RMB for commercial organization of China;
3000USD for foreign commercial organization.
Word
Alignment:
500RMB
for research organization of China;
500USD for foreign research organization;
1500RMB for commercial organization of China;
1500USD for foreign commercial organization.
Preferential
Price: Participate
in the preferential activity.
Description
Label
Report
Samples
Purchase
|
| 863
program in 2005 information index evaluation data |
Code:
2005-863-002
Description:
CWT100g
–Chinese web corpus which contains 5,712,710 web pages.The
relevant documents are extracted after pooling the submitted
results of the participating systems in the IR evaluation. http://www.863data.org.cn
Creator:
Computer
network and distributed system lab, Peking University,Institute
of Software ,Chinese
Academy of Science
Usage:
Information index
Price:
1000RMB for research organization of
China;
1000USD for foreign research organization;
3000RMB for commercial organization of China;
3000USD for foreign commercial organization.
Preferential
Price: Participate
in the preferential activity.
Description
Label
Report
Samples
Purchase
|
| 863
program in 2005 speech recognition evaluation data |
Code:
2005-863-003
Description:
The
total data consist of Desktop PC speech data and telephone
speech data http://www.863data.org.cn
Creator:
Institute
of Computing Technology, CAS, China.
Usage:Speech recognition
Price:
Desktop PC speech data:
1000RMB for research organization of
China;
1000USD for foreign research organization;
3000RMB for commercial organization of China;
3000USD for foreign commercial organization.
Telephone speech data :
3000RMB for research organization of
China;
3000USD for foreign research organization;
9000RMB for commercial organization of China;
9000USD for foreign commercial organization.
Preferential
Price: Participate
in the preferential activity.
Description
Label
Report
Samples
Purchase
|
| 863
program in 2007 SSMT machine translation evaluation data |
Code:
2007-863-001
Description:
SSMT2007 statistics from the third seminar on machine
translation machine translation evaluation.
SSMT2007 include Chinese-English, English-Chinese
translation of the two directions of machine testing corpus,
the chapter types, from the information field. SSMT2007
Chinese and English words with the direction of alignment
test corpus, to provide after-word Chinese-English sentence
right, from the field of information.In addition, the
measure contains the outline report on the results of
evaluation and assessment software.
Creator:
Institute
of Computing Technology
Usage:
Machine translation
Price:
Each part price : (Total Six Parts)
1000RMB for research organization of China;
1000USD for foreign research organization;
3000RMB for commercial organization of China;
3000USD for foreign commercial organization.
Word
Alignment:
500RMB
for research organization of China;
500USD for foreign research organization;
1500RMB for commercial organization of China;
1500USD for foreign commercial organization.
Preferential
Price: Participate
in the preferential activity.
Description
Label
Report
Samples
Purchase
|
| Chinese Lexicon |
Code: CLDC-LAC-2003-001
Description:
Chinese Lexicon with size of 98000 Chinese word items
accompanied with information of frequency and PINYIN.
Creator:
Tsinghua university, CASIA
Usage:
Natural Language Comprehension
Price:
2500RMB for research organization of
China;
2500USD for foreign research organization;
12500RMB for commercial organization of China;
12500USD for foreign commercial organization.
|