## Web Mining

1. scrapy  
Scrapy is a fast high-level screen scraping and web crawling framework, used to crawl websites and extract structured data from their pages. It can be used for a wide range of purposes, from data mining to monitoring and automated testing.  
Project Source: https://github.com/scrapy/scrapy  
Project Homepage: http://scrapy.org/

1. Pattern  
Web mining module for Python, with tools for scraping, natural language processing, machine learning, network analysis and visualization.  
Project Source: https://github.com/clips/pattern  
Project Homepage: http://www.clips.ua.ac.be/pages/pattern

1. portia  
Portia is a tool for visually scraping web sites without any programming knowledge.   
Project Source: https://github.com/scrapinghub/portia

1. python-goose   
Html Content / Article Extractor, web scrapping lib in Python.  
Project Source: https://github.com/grangier/python-goose

1. newspaper  
News extraction, article extraction and content curation in python.  
Project Source: https://github.com/codelucas/newspaper  
Project Homepage: http://newspaper.readthedocs.org/en/latest/ 

1. gensim  
Gensim is a Python library for topic modelling, document indexing and similarity retrieval with large corpora.   
Project Source: https://github.com/piskvorky/gensim  
Project Homepage: http://radimrehurek.com/gensim/  

1. distribute_crawler   
A distributed web crawler.    
Project Source: https://github.com/gnemoug/distribute_crawler     

1. pyspider   
A spider system in python.    
Project Source: https://github.com/binux/pyspider    

1. tagger    
A Python module for extracting relevant tags from text documents.    
Project Source: https://github.com/apresta/tagger  

1. cola    
A distributed crawling framework.    
Project Source: https://github.com/chineking/cola