资源列表
Makelib
- 通过VC编程实现收词程序,并且通过编程实现数据库的建立,并且建立了索引-through VC Programming entries procedures, and database programming through the establishment and the establishment of the Index
sortwenzi
- 通过VC++实现的各种文字通用的排序算法,方便实用,但需要各种文字的基字的码表-through VC + + language of the common algorithm, convenient and practical, it needs all kinds of written characters of the base code table
gets
- 基于字符层面的短语比较工具,可用于短语相似度计算-characters based on the phrase level comparison tool can be used to calculate the phrase Similarity
TestUnicode
- 将中文汉字转换成Unicode码,用于在通信中传递字符串,接收端收到后用相应函数进行转化成字符串-Chinese characters will be converted into Unicode code for the communications pass strings, after receiving end received with the corresponding functions into strings
wenben.txt
- 在一个文件中找到给定单词出现的位置并统计出现次数-documents in a given word to find the location and frequency statistics
html2txt
- 从html文件提取可显示的文本内容。可用于windows和linux环境。-from html document can show that the extraction of text. Available for Windows and Linux environment.
ful2hlf
- 将文本中的全角转变为半角,供后续使用。主要可以用于对网页内容的预处理。-text of the entire half-angle of the angle changes for the use of follow-up. The main website can be used as pretreatment.
text2idngram
- 最注明的cmu语言模型工具箱中的将文本转化为trigram统计的工具。在linux下可用。用法可使用-help命令。-most of the annotated cmu language model kit of text into trigram statistics tool. Linux can be used in the next. Usage may use-help orders.
ngram2mgram
- 可以将trigram转换成bigram的工具。也是cmu的工具。用法参见-help命令。-can be converted into trigram ng tool. Also cmu tool. See Usage-help orders.
词法分析程序
- 词法分析程序,可以在visualc++中运行-lexical analysis procedure can be run in visualc
ziku
- 这是一个字库,用他可以帮助你开发拼音输入法系统挺好用的-This is a character, he can use to help you develop Pinyin input method using the system very well,
transfont
- 把24*24点阵的字库文件转换为24*16点阵形式的文件-put 24 * 24 character dot-matrix document the conversion of 24 * 16 dot matrix forms of documentation