## Web Mining 1. scrapy Scrapy is a fast high-level screen scraping and web crawling framework, used to crawl websites and extract structured data from their pages. It can be used for a wide range of purposes, from data mining to monitoring and automated testing. Project Source: https://github.com/scrapy/scrapy Project Homepage: http://scrapy.org/ 1. Pattern Web mining module for Python, with tools for scraping, natural language processing, machine learning, network analysis and visualization. Project Source: https://github.com/clips/pattern Project Homepage: http://www.clips.ua.ac.be/pages/pattern 1. portia Portia is a tool for visually scraping web sites without any programming knowledge. Project Source: https://github.com/scrapinghub/portia 1. python-goose Html Content / Article Extractor, web scrapping lib in Python. Project Source: https://github.com/grangier/python-goose 1. newspaper News extraction, article extraction and content curation in python. Project Source: https://github.com/codelucas/newspaper Project Homepage: http://newspaper.readthedocs.org/en/latest/ 1. gensim Gensim is a Python library for topic modelling, document indexing and similarity retrieval with large corpora. Project Source: https://github.com/piskvorky/gensim Project Homepage: http://radimrehurek.com/gensim/ 1. distribute_crawler A distributed web crawler. Project Source: https://github.com/gnemoug/distribute_crawler 1. pyspider A spider system in python. Project Source: https://github.com/binux/pyspider 1. tagger A Python module for extracting relevant tags from text documents. Project Source: https://github.com/apresta/tagger 1. cola A distributed crawling framework. Project Source: https://github.com/chineking/cola