## Data Processing 1. pandas pandas is a package providing fast, flexible, and expressive data structures designed to make working with "relational" or "labeled" data both easy and intuitive. It aims to be the fundamental high-level building block for doing practical, real world data analysis in Python. Project Source: https://github.com/pydata/pandas Project Homepage: http://pandas.pydata.org/ 1. Faker Faker is a package that generates fake data for you. Whether you need to bootstrap your database, create good-looking XML documents, fill-in your persistence to stress test it, or anonymize data taken from a production service, Faker is for you. Project Source: https://github.com/joke2k/faker Project Documentation: http://fake-factory.readthedocs.org/en/latest/ 1. tablib Tablib is a format-agnostic tabular dataset library, written in Python. Project Source: https://github.com/kennethreitz/tablib Project Documentation: http://docs.python-tablib.org/en/latest/ 1. data_hacks Command line utilities for data analysis. Project Source: https://github.com/bitly/data_hacks 1. fuzzywuzzy Fuzzy string matching like a boss. Project Source: https://github.com/seatgeek/fuzzywuzzy 1. snownlp Python library for processing Chinese text. Project Source: https://github.com/isnowfy/snownlp 1. jieba Chinese text segmentation. Project Source: https://github.com/fxsjy/jieba Online Demo Address: http://jiebademo.ap01.aws.af.cm/ 1. cubes Light-weight Python OLAP framework for multi-dimensional data analysis. Project Source: https://github.com/Stiivi/cubes Project Homepage: http://cubes.databrewery.org/