文件名称:BootCaT-0.1.2.tar
-
所属分类:
- 标签属性:
- 上传时间:2012-11-16
-
文件大小:50.37kb
-
已下载:1次
-
提 供 者:
-
相关连接:无下载说明:别用迅雷下载,失败请重下,重下不扣分!
介绍说明--下载内容来自于网络,使用问题请自行百度
此软件是开源软件,主要用于中文信息处理,信息检索。本人主要用于网络获取双语语料库。此软件用perl编写,模块独立性强,在获得收集一些种子网址后,即可用于双语网络获取。-The perl scr ipts included in the BootCaT toolkit implement an
iterative procedure to bootstrap specialized corpora and terms from
the web, requiring only a list of ``seeds (terms that are expected
to be typical of the domain of interest) as input.
In implementing the algorithm, we followed the old UNIX adage that
each program should do only one thing, but do it well. Thus, we
developed a small, independent tool for each separate subtask of the
algorithm.
As a result, BootCaT is extremely modular: One can easily run a subset
of the programs, look at intermediate output files, add new tools to
the suite, or change one program without having to worry about the
others.
iterative procedure to bootstrap specialized corpora and terms from
the web, requiring only a list of ``seeds (terms that are expected
to be typical of the domain of interest) as input.
In implementing the algorithm, we followed the old UNIX adage that
each program should do only one thing, but do it well. Thus, we
developed a small, independent tool for each separate subtask of the
algorithm.
As a result, BootCaT is extremely modular: One can easily run a subset
of the programs, look at intermediate output files, add new tools to
the suite, or change one program without having to worry about the
others.
相关搜索: bootc
(系统自动生成,下载前可以参看下载内容)
下载文件列表
BootCaT-0.1.2/
BootCaT-0.1.2/examples/
BootCaT-0.1.2/examples/mw_terms
BootCaT-0.1.2/examples/candidate_uniterms
BootCaT-0.1.2/examples/final_urls
BootCaT-0.1.2/add1_smoothing.pl
BootCaT-0.1.2/Readme.BootCaT-0.1.2
BootCaT-0.1.2/basic_tokenizer.pl
BootCaT-0.1.2/build_random_tuples.pl
BootCaT-0.1.2/collect_mw_terms.pl
BootCaT-0.1.2/collect_urls_from_google.pl
BootCaT-0.1.2/connect_bi_connectors.pl
BootCaT-0.1.2/doc_delimited_uniq.pl
BootCaT-0.1.2/filter_unigrams.pl
BootCaT-0.1.2/get_connector_grams.pl
BootCaT-0.1.2/get_top_percentage.pl
BootCaT-0.1.2/log_odds_ratio.pl
BootCaT-0.1.2/print_good_ngrams.pl
BootCaT-0.1.2/print_pages_from_url_list.pl
BootCaT-0.1.2/print_rank.pl
BootCaT-0.1.2/simple_filter.pl
BootCaT-0.1.2/examples/
BootCaT-0.1.2/examples/mw_terms
BootCaT-0.1.2/examples/candidate_uniterms
BootCaT-0.1.2/examples/final_urls
BootCaT-0.1.2/add1_smoothing.pl
BootCaT-0.1.2/Readme.BootCaT-0.1.2
BootCaT-0.1.2/basic_tokenizer.pl
BootCaT-0.1.2/build_random_tuples.pl
BootCaT-0.1.2/collect_mw_terms.pl
BootCaT-0.1.2/collect_urls_from_google.pl
BootCaT-0.1.2/connect_bi_connectors.pl
BootCaT-0.1.2/doc_delimited_uniq.pl
BootCaT-0.1.2/filter_unigrams.pl
BootCaT-0.1.2/get_connector_grams.pl
BootCaT-0.1.2/get_top_percentage.pl
BootCaT-0.1.2/log_odds_ratio.pl
BootCaT-0.1.2/print_good_ngrams.pl
BootCaT-0.1.2/print_pages_from_url_list.pl
BootCaT-0.1.2/print_rank.pl
BootCaT-0.1.2/simple_filter.pl
本网站为编程资源及源代码搜集、介绍的搜索网站,版权归原作者所有! 粤ICP备11031372号
1999-2046 搜珍网 All Rights Reserved.