User Name:


Password:
     Register

 

Resources Tables
863 program in 2003 speech recognition evaluation data 863 program in 2004 speech recognition evaluation data
863 program in 2003 speech synthesis evaluation data 863 program in 2004 speech synthesis evaluation data
863 program in 2003 machine translation evaluation data 863 program in 2004 machine translation evaluation data
863 program in 2003 automatic index evaluation data 863 program in 2004 automatic index evaluation data
863 Program in 2003 Assessment and test data of text classification 863 Program in 2004 Assessment and test data of text classification
863 Program in 2003 Assessment and test data of chinese recognition 863 program in 2004 information index evaluation data
863 program in 2003 full text retrieval evaluation data 863 program in 2003 name entry identification evaluation data
863 program in 2003 part-of-speech evaluation data Three parallel language Chinese, English, Japanese corpus developed for Olympic(Chinese and English)
Three parallel language Chinese, English, Japanese corpus developed for Olympic Chinese Lexicon
863 program in 2005 machine translation evaluation data 863 program in 2005 information index evaluation data
863 program in 2005 speech recognition evaluation data The identifiable speech database of Chinese mandarin -----extract database
The Grammatical Knowledge-base of Contemporary  Chinese (High Frequency Words) Chinese POS Tagged Corpus
Chinese-English Sentence aligned Bilingual Corpus Tsinghua Chinese Treebank
Chinese-English/Chinese-Japanese parallel corpora Chinese-English Olympic Dictionary
Special Scene and special domain dialogue corpus Mordern Chinese semantic Dictionary based on International Logical Model
CASIA98-99 speech testing library CASIAThe weather forecast broadcasts the pronunciation storehouse
CASIA single syllable isolated word speech corpus Natural Broadcasting Speech corpus
Chinese and English speech corpus Tsinghua-Corpus of speech synthesis
RASC863-annotated 4 regional accent speech corpus(Ⅰ) RASC863-annotated 4 regional accent speech corpus(Ⅱ)
RASC863-annotated 4 regional accent speech corpus(Ⅲ) Telephone speech corpus for speech recognition
CASIA-Mandarin continuous digit speech corpus CASIA- Northern China accent speech corpus
CASIA- Southern China accent speech corpus CASIA-Chinese Question Structures Corpus
CASIA-Chinese Emotional Speech Corpus CASIA-863Chinese Speech Synthesis Corpus
Chinese geographic name storehouse ASCCD- Annotated Speech Corpus of Chinese Discourse
CADCC-Chinese Annotated Dialogue and Conversation Corpus SCSC——Syllable Corpus of Standard Chinese  
WCSC——Word Corpus of Standard Chinese TSC973-Telephone Speech Corpus 973  
The identifiable speech database of telephone speech——the name of person, the name of place ( 265 people using mobile telephone )  The identifiable speech database of telephone speech——the name of person, the name of place (285 speakers using stable telephone ) 
The identifiable speech database of telephone speech——the number string (265 people using mobile telephone ) The identifiable speech database of telephone speech——the number string (285 speakers using stable telephone)
The identifiable speech database of telephone speech——stock (265 people using mobile telephone ) The identifiable speech database of telephone speech——the stock (285 people using stable telephone )
The identifiable speech database of telephone speech——the message (64 people using mobile telephone ) The identifiable speech database of telephone speech——the message (86 people using mobile telephone )
The identifiable speech database of tabletop speech——the message (200 persons ) The identifiable speech database of tabletop speech——the number string (200 persons )
The identifiable speech database of tabletop speech——the number string (10 persons ) the identifiable speech database of tabletop speech——the message (120 persons )
The identifiable speech database of tabletop speech——the number string (120 persons ) The identifiable speech database of tabletop speech——the people’s name, the place’ name (120 persons )
The identifiable speech database of tabletop speech——the stock (70 persons ) The identifiable speech database of tabletop speech——free topic (50 persons )
The identifiable speech database of Chinese mandarin -----wide label 6 regional accent speech corpus-spoken language
6 regional accent speech corpus-Recitation 863 program in 2007 SSMT machine translation evaluation data
Chinese Web 5-gram Corpus English Keywords for public servants
Order: Resource Name    Resource Use    Resource Code
863 program in 2003 speech synthesis evaluation data

Code: 2003-863-002

Description: general area and Olympic Games related area. Log in http://www.863data.org.cn

Creator: Institute of Computing, Chinese Academy of Science and Institute of Linguistics, Chinese Academy of Social Sciences.

Usage: Speech Synthesis

Price:  

        1000RMB for research  organization of China;

        1000USD for foreign research organization;

        3000RMB for commercial organization of China

        3000USD for foreign commercial organization

Preferential Price: Participate in the preferential activity

Description     Label     Report     Samples     Purchase

863 Program in 2003 Assessment and test data of chinese recognition Code:2003-863-003
Description:Including the big character storehouse, the small character storehouse, the storehouse of haphazard 
order of strokes observed in calligraphy amounts to 100 sets .
Log in http://www.863data.org.cn
Creator: Institute of Computing, Chinese Academy of Science
Usage:chinese recognition
Price:

The big character storehouse:
       
4000RMB for research  organization of China;

        4000USD for foreign research organization;

        12000RMB for commercial organization of China

        12000USD for foreign commercial organization


The small character storehouse:
        2
000RMB for research  organization of China;

        2000USD for foreign research organization;

        6000RMB for commercial organization of China

        6000USD for foreign commercial organization


Preferential Price:
Participate in the preferential activity

Description    Label    Report    Samples    Purchase   

863 program in 2003 machine translation evaluation data

Code: 2003-863-004

Description: for evaluation of the projects of Chinese to English, English to Chinese, Chinese to Japanese, Japanese to Chinese dialogues in the area of Olympic Games and paragraph. Log in http://www.863data.org.cn

creator: Institute of Computing, Chinese Academy of Science

Usage: Machine translation

Price: For each language (Four Parts):

        1000RMB for research  organization of China;

        1000USD for foreign research organization;

        3000RMB for commercial organization of China

        3000USD for foreign commercial organization

Preferential Price: Participate in the preferential activity

Description    Label    Report    Samples    Purchase 

863 program in 2003 automatic index evaluation data Code:2003-863-005
Description:Language material Include 10 articles,The number of words ranges from 1755 to 4502 .
Log in http://www.863data.org.cn
Creator:Institute of Computing, Chinese Academy of Science
Usage:Automatic article 
Price:Free
Preferential Price:
Participate in the preferential activity

Description    Label    Report    Samples    Purchase 

863 program in 2003 full text retrieval evaluation data

Code: 2003-863-006

Description: small-scale testing corpus, small-scale evaluation corpus, large-scale evaluation corpus .Log in                                 http://www.863data.org.cn

creator: Institute of Computing, Chinese Academy of Science

Usage: full text retrieval

Price:

        1000RMB for research  organization of China;

       1000USD for foreign research organization;

        3000RMB for commercial organization of China

        3000USD for foreign commercial organization

Preferential Price: Participate in the preferential activity

Description    Label    Report    Samples    Purchase 

863 Program in 2003 Assessment and test data of text classification Code:2003-863-007
Description:Amount to 3600 files.Log in http://www.863data.org.cn
creator:Institute of Computing, Chinese Academy of Science and Open System and Chinese Information Processing 
center in Institute of Software, Chinese Academy of Science 
Usage:text classification
Price:

        1000RMB for research  organization of China;

        1000USD for foreign research organization;

        3000RMB for commercial organization of China

        3000USD for foreign commercial organization


Preferential Price:
Participate in the preferential activity

Description    Label    Report    Samples    Purchase 

863 program in 2003 part-of-speech evaluation data

Code: 2003-863-008

Description: 242 files, about 400 thousand Chinese characters.Log in http://www.863data.org.cn

Usage: for evaluation of part-of-speech

Price: 

        2000RMB for research  organization of China;

        2000USD for foreign research organization;

       6000RMB for commercial organization of China

       6000USD for foreign commercial organization

Preferential Price: Participate in the preferential activity

Description    Label    Report    Samples    Purchase 

863 Program in 2004 Assessment and test data of machine translation  Code:2004-863-001
Description:It contains three languages, which are Chinese, English and Japanese. It both contains dialog and discourse, and also can be separated as two domains: Olympics and general.Log in http://www.863data.org.cn
Creator:Institute of Computing, Chinese Academy of Science and Japanese information communication research institution 
Keihanna information communication merges the research center 
Usage:machine translation
Price:

    Each part  (Total Five Parts):

        1000RMB for research  organization of China;

        1000USD for foreign research organization;

        3000RMB for commercial organization of China

        3000USD for foreign commercial organization
Preferential Price:
Participate in the preferential activity

Description    Label    Report    Samples    Purchase 

863 program in 2004 name entry identification evaluation data

Code: 2004-863-002

Description: the contents of corpus include two categories: simplified characters (241 files, about 400 thousand Chinese characters) and traditional characters (126 files, about 400 thousand Chinese characters)Log in http://www.863data.org.cn

Creator:Institute of computing technology, Chinese academy of sciences; Institute of Computer Application ,Shanxi University Computer of Science; Hong Kong city university language information science research center

Usage: for evaluation of name entry identification

Price:  

        500RMB for research  organization of China;

        500USD for foreign research organization;

        1500RMB for commercial organization of China

        1500USD for foreign commercial organization

Preferential Price: Participate in the preferential activity

Description    Label    Report    Samples    Purchase 

863 program in 2004 information index evaluation data code:2004-863-003
description:amount to 30 inquiry.Log in http://www.863data.org.cn
creator:Institute of Computing, Chinese Academy of Science and Open System and Chinese Information Processing 
center in Institute of Software, Chinese Academy of Science and The computer network of Beijing University and distributed systematic laboratory 
Usage:text classification
usage:information index
Price:

        1000RMB for research  organization of China;

        1000USD for foreign research organization;

        3000RMB for commercial organization of China

        3000USD for foreign commercial organization


Preferential Price:
Participate in the preferential activity

Description    Label    Report    Samples    Purchase 

863 program in 2004 automatic index evaluation data

Code: 2004-863-004

Description: the corpus includes 20 articles which is different in length form 800 to 2500 Chinese characters.Log in http://www.863data.org.cn

Creator: Institute of Computing Technology, Chinese Academy of Sciences

Usage: for evaluation of automatic index

Price: free

Preferential Price: Participate in the preferential activity

Description    Label    Report    Samples    Purchase 

863 Program in 2004 Assessment and test data of text classification

Code: 2004-863-005

Description: Amount to 3600 files.Log in http://www.863data.org.cn

Creator: Institute of Computing Technology, Chinese Academy of Sciences Open System & Chinese Information Processing Center, Institute of Software, Chinese Academy of Sciences

Usage: Text classification

Price:  

        1000RMB for research  organization of China;

        1000USD for foreign research organization;

        3000RMB for commercial organization of China

        3000USDfor foreign commercial organization

Preferential Price: Participate in the preferential activity

Description    Label    Report    Samples    Purchase 

863 program in 2004 speech recognition evaluation data

Code: 2004-863-006

Description: the corpus includes three parts: Chinese desktop speech, telephone speech, and PDA speech.Log in http://www.863data.org.cn

Creator:institute of computing technology, Chinese academy of science; institute of Linguistics, Chinese academy of social sciences; Capital information development limited liability company.

Usage: for evaluation of speech recognition

Price:

Chinese desktop speech: 

         1000RMB for research  organization of China;

        1000USD for foreign research organization;

        3000RMB for commercial organization of China

        3000USD for foreign commercial organization

 English desktop speech: 

        500RMB for research  organization of China;

        500USD for foreign research organization;

        1500RMB for commercial organization of China

        1500USD for foreign commercial organization

Telephone speech (including syntax labels): 

        1000RMB for research  organization of China;

        1000USD for foreign research organization;

        3000RMB for commercial organization of China

        3000USD for foreign commercial organization

PDA speech: 

        500RMB for research  organization of China;

        500USD for foreign research organization;

        1500RMB for commercial organization of China

        1500USD for foreign commercial organization

Preferential Price: Participate in the preferential activity

Description    Label    Report    Samples    Purchase 

863 Program in 2004 Assessment and test data of speech synthesis

Code: 2004-863-007

Description: The corpus in several is divided into a field in common use and specific field of Olympic Games.Log in http://www.863data.org.cn

Creator: Institute of Computing Technology, Chinese Academy of Sciences

Usage: Speech synthesis

Price: 

        500RMB for research  organization of China;

        500USD for foreign research organization;

        1500RMB for commercial organization of China

        1500USD for foreign commercial organization

Preferential Price: Participate in the preferential activity

Description    Label    Report    Samples    Purchase 

Three parallel language Chinese, English, Japanese corpus developed for Olympic(Chinese and English) Code: 2004-863-008

Description:This corpus is a part of the corpus named “three parallel language (Chinese, English, Japanese) corpus developed for Olympic”. From exact the part of Chinese and English, a parallel language corpus is got. This corpus includes the dialog contents from trip, food, traffic, sports and commerce. There are 52227 parallel sentences which has been manual checked, which can be used in the development of machine translation

Creator:Harbin institute of technology

Usage: For the research and development of machine translation

Price: 1500RMB for research organization of China

         3000RMB for foreign research organization

         18000RMB for commercial organization of China

         45000RMB for foreign commercial organization.

Preferential PriceParticipate in the preferential activity

Description    Label    Report    Samples    Purchase 

Three parallel language Chinese, English, Japanese corpus developed for Olympic Code: 2004-863-009

Description:This resources mainly use in Chinese, English and Japanese during three languages face Olympic Games' machine translation system development, Specially for system development provided face the spoken language processing training language materials. Meanwhile regarding In other Chinese, English and Japanese during three languages cross language information management systems development has the certain application value.

Creator:Harbin institute of technology

Usage:The language materials contain Traveling, dining, sports, transportation, commercial and Olympic Games close correlation domain. The complete language materials have completed the sentence level to the uneven processing, and passes through manual adjust. The language materials sign note uses the xml form, Retained the primitive language materials natural structure information, Like paragraph, dialogue speech turn of structure, dialogue population and so on; The sign has poured the scene which the language occurs, Chapter information and so on spoken language subject.

Price: 2000RMB for research organization of China

         4000RMB for foreign research organization

         25000RMB for commercial organization of China

         60000RMB for foreign commercial organization.

Preferential PriceParticipate in the preferential activity

Description    Label    Report    Samples    Purchase 

863 program in 2005 machine translation evaluation data Code: 2005-863-001

Description: Include Chinese-English, English-Chinese, Chinese-Japanese, Japanese-Chinese, English-Japanese and Japanese-English. Two types: Dialogue and Writing. Domain: Olympic-related for dialogue and News for writing.  http://www.863data.org.cn

Creator: Institute of Computing Technology, CAS, China  and  National Institute of Information and Communications Technology (NICT), Japan.

Usage: Machine translation 

Price:  

Each part price : (Total Six Parts)

         1000RMB for research organization of China

         1000USD for foreign research organization

         3000RMB for commercial organization of China

         3000USD for foreign commercial organization.

Word Alignment:

          500RMB for research organization of China

          500USD for foreign research organization

         1500RMB for commercial organization of China

         1500USD for foreign commercial organization.

Preferential PriceParticipate in the preferential activity.

Description    Label    Report    Samples    Purchase 

 

863 program in 2005 information index evaluation data Code: 2005-863-002

Description: CWT100g –Chinese web corpus which contains 5,712,710 web pages.The relevant documents are extracted after pooling the submitted results of the participating systems in the IR evaluation. http://www.863data.org.cn

Creator: Computer network and distributed system lab, Peking University,Institute of Software Chinese Academy of Science

Usage: Information index 

Price: 1000RMB for research organization of China

          1000USD for foreign research organization

           3000RMB for commercial organization of China

           3000USD for foreign commercial organization.

Preferential PriceParticipate in the preferential activity.

Description    Label    Report    Samples    Purchase 

 

863 program in 2005 speech recognition evaluation data Code: 2005-863-003

Description: The total data consist of Desktop PC speech data and telephone speech data http://www.863data.org.cn

Creator: Institute of Computing Technology, CAS, China.

Usage:Speech recognition 

Price:  

     Desktop PC speech data:

         1000RMB for research organization of China

         1000USD for foreign research organization

         3000RMB for commercial organization of China

         3000USD for foreign commercial organization.

      Telephone speech data :

         3000RMB for research organization of China

         3000USD for foreign research organization

         9000RMB for commercial organization of China

         9000USD for foreign commercial organization.

Preferential PriceParticipate in the preferential activity.

Description    Label    Report    Samples    Purchase 

 

863 program in 2007 SSMT machine translation evaluation data Code: 2007-863-001

Description: SSMT2007 statistics from the third seminar on machine translation machine translation evaluation.
SSMT2007 include Chinese-English, English-Chinese translation of the two directions of machine testing corpus, the chapter types, from the information field. SSMT2007 Chinese and English words with the direction of alignment test corpus, to provide after-word Chinese-English sentence right, from the field of information.In addition, the measure contains the outline report on the results of evaluation and assessment software.
 

Creator: Institute of Computing Technology

Usage: Machine translation 

Price:  

Each part price : (Total Six Parts)

         1000RMB for research organization of China

         1000USD for foreign research organization

         3000RMB for commercial organization of China

         3000USD for foreign commercial organization.

Word Alignment:

          500RMB for research organization of China

          500USD for foreign research organization

         1500RMB for commercial organization of China

         1500USD for foreign commercial organization.

Preferential PriceParticipate in the preferential activity.

Description    Label    Report    Samples    Purchase

 

Chinese Lexicon Code: CLDC-LAC-2003-001

Description: Chinese Lexicon with size of 98000 Chinese word items accompanied with information of frequency and PINYIN.

Creator: Tsinghua university, CASIA

Usage: Natural Language Comprehension

Price: 2500RMB for research organization of China

          2500USD for foreign research organization

          12500RMB for commercial organization of China

          12500USD for foreign commercial organization.