Merge pull request #91 from wardseptember/master

add goose3
pull/65/merge
tangyouhua 2020-06-23 20:08:17 +08:00 committed by GitHub
commit a528d6085e
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
1 changed files with 2 additions and 1 deletions

View File

@ -634,7 +634,8 @@ Python 实现的数据库。
* micawber一个小型网页内容提取库用来从 URLs 提取富内容。[官网](https://github.com/coleifer/micawber) * micawber一个小型网页内容提取库用来从 URLs 提取富内容。[官网](https://github.com/coleifer/micawber)
* [newspaper](http://hao.importnew.com/python-newspaper/):使用 Python 进行新闻提取,文章提取以及内容策展。[官网](https://github.com/codelucas/newspaper) * [newspaper](http://hao.importnew.com/python-newspaper/):使用 Python 进行新闻提取,文章提取以及内容策展。[官网](https://github.com/codelucas/newspaper)
* opengraph一个用来解析开放内容协议(Open Graph Protocol)的 Python 模块。[官网](https://github.com/erikriver/opengraph) * opengraph一个用来解析开放内容协议(Open Graph Protocol)的 Python 模块。[官网](https://github.com/erikriver/opengraph)
* [python-goose](http://hao.importnew.com/python-goose/)HTML 内容/文章提取器。[官网](https://github.com/grangier/python-goose) * [python-goose](http://hao.importnew.com/python-goose/)HTML 内容/文章提取器(python2)。[官网](https://github.com/grangier/python-goose)
* [goose3](http://goose3.readthedocs.io/en/latest/index.html): HTML 内容/文章提取器(python3)。[官网](https://github.com/goose3/goose3)
* python-readabilityarc90 公司 readability 工具的 Python 高速端口。[官网](https://github.com/buriy/python-readability) * python-readabilityarc90 公司 readability 工具的 Python 高速端口。[官网](https://github.com/buriy/python-readability)
* sanitize为杂乱的数据世界带来调理性。[官网](https://github.com/Alir3z4/python-sanitize) * sanitize为杂乱的数据世界带来调理性。[官网](https://github.com/Alir3z4/python-sanitize)
* sumy一个为文本文件和 HTML 页面进行自动摘要的模块。[官网](https://github.com/miso-belica/sumy) * sumy一个为文本文件和 HTML 页面进行自动摘要的模块。[官网](https://github.com/miso-belica/sumy)