搜索资源列表
HTMLtoTXT
- 将HTML网页格式中的正文提取出来 主要是小说网上下载的打包小说
content_abstract
- 针对高校教师的个人网页的源文件进行的正文提取,也可应用于一般的网页的正文提取。-Colleges and universities for their personal web page of the source file to extract the body, but also can be applied to the general body of the page extraction.
gekhtml
- 基于ekhtml,自动提取网页正文,将提取出来的title,author,正文text, 文章发布的时间存入mysql数据库.-Based on ekhtml, Automatic extraction of web page text, will be extracted out of the title, author, body text, the article published time into mysql database.