Pandas with NumPy and Matplotlib
Pandas provides a Python library such as IPython toolkit and other libraries, the environment for doing data analysis in Python.
We're not going to do a lot in this article but presents a simple example for reading in a data file and do a little bit of data manipulation using NumPy. Then, we'll draw a simple scatter plot.
We'll use Jupyter notebook throughout this material, and the notetook I used is available from Gihub : H-R-Diagram-Pandas-Matplotlib.ipynb.
The data we will reads in is about H-R diagram which looks like this:
pic. source Hertzsprung–Russell diagram.
Here is my Jupyter notebook (Gihub : H-R-Diagram-Pandas-Matplotlib.ipynb)
Let's read in the data:
Since the data has blank (space) data, we need to clean it up. In this case, I just removed the row though here are couple of ways to handle the missing data (scikit-learn : Data Preprocessing I - Missing / Categorical data)).
Now we do not have flawed data, and we're ready to plot.
Here is the scatter plot via Matplotlib:
Just for fun, in this section, we'll do classification via Scikit-learn's DBSCAN which is one of the unsupervised clustering algorithm.
The code is available from Gihub : H-R-Diagram-Pandas-Matplotlib.ipynb:
Here is the plot after clustering. But still not catching the "White Dwarf":
A dataset we're going to read in is "Breast Cancer Wisconsin" dataset.
It contains 569 samples of malignant and benign tumor cells.
The first two columns in the dataset has the unique ID numbers of the samples and the corresponding diagnosis (M=malignant, B=benign), respectively. The columns 3-32 contain 30 real-value features that have been computed from digitized images of the cell nuclei, which can be used to build a model to predict whether a tumor is benign or malignant.
Let's read in the dataset directly from the UCI website using pandas:
Then, let's assign the 30 features to a NumPy array X, and corresponding diagnosis (M=malignant, B=benign) to y.
Using LabelEncoder, we can transform the class labels from their original string representation ( B and M ) into integers (0 and 1):
Running Python Programs (os, sys, import)
Modules and IDLE (Import, Reload, exec)
Object Types - Numbers, Strings, and None
Strings - Escape Sequence, Raw String, and Slicing
Strings - Methods
Formatting Strings - expressions and method calls
Files and os.path
Traversing directories recursively
Regular Expressions with Python
Object Types - Lists
Object Types - Dictionaries and Tuples
Functions def, *args, **kargs
map, filter, and reduce
Sets (union/intersection) and itertools - Jaccard coefficient and shingling to check plagiarism
Hashing (Hash tables and hashlib)
Dictionary Comprehension with zip
The yield keyword
Generator Functions and Expressions
Classes and Instances (__init__, __call__, etc.)
if__name__ == '__main__'
@static method vs class method
Private attributes and private methods
bits, bytes, bitstring, and constBitStream
json.dump(s) and json.load(s)
Python Object Serialization - pickle and json
Python Object Serialization - yaml and json
Priority queue and heap queue data structure
Graph data structure
Dijkstra's shortest path algorithm
Prim's spanning tree algorithm
Functional programming in Python
Remote running a local file using ssh
SQLite 3 - A. Connecting to DB, create/drop table, and insert data into a table
SQLite 3 - B. Selecting, updating and deleting data
MongoDB with PyMongo I - Installing MongoDB ...
Python HTTP Web Services - urllib, httplib2
Web scraping with Selenium for checking domain availability
REST API : Http Requests for Humans with Flask
Blog app with Tornado
Python Network Programming I - Basic Server / Client : A Basics
Python Network Programming I - Basic Server / Client : B File Transfer
Python Network Programming II - Chat Server / Client
Python Network Programming III - Echo Server using socketserver network framework
Python Network Programming IV - Asynchronous Request Handling : ThreadingMixIn and ForkingMixIn
Python Interview Questions I
Python Interview Questions II
Python Interview Questions III
Python Interview Questions IV
Python Interview Questions V
Image processing with Python image library Pillow
Python and C++ with SIP
PyDev with Eclipse
Redis with Python
NumPy array basics A
NumPy Matrix and Linear Algebra
Pandas with NumPy and Matplotlib
Batch gradient descent algorithm
Longest Common Substring Algorithm
Python Unit Test - TDD using unittest.TestCase class
Simple tool - Google page ranking by keywords
Google App Hello World
Google App webapp2 and WSGI
Uploading Google App Hello World
Python 2 vs Python 3
virtualenv and virtualenvwrapper
Uploading a big file to AWS S3 using boto module
Scheduled stopping and starting an AWS instance
Cloudera CDH5 - Scheduled stopping and starting services
Removing Cloud Files - Rackspace API with curl and subprocess
Checking if a process is running/hanging and stop/run a scheduled task on Windows
Apache Spark 1.3 with PySpark (Spark Python API) Shell
Apache Spark 1.2 Streaming
bottle 0.12.7 - Fast and simple WSGI-micro framework for small web-applications ...
Flask app with Apache WSGI on Ubuntu14/CentOS7 ...
Fabric - streamlining the use of SSH for application deployment
Ansible Quick Preview - Setting up web servers with Nginx, configure enviroments, and deploy an App
Neural Networks with backpropagation for XOR using one hidden layer
NLP - NLTK (Natural Language Toolkit) ...
RabbitMQ(Message broker server) and Celery(Task queue) ...
OpenCV3 and Matplotlib ...
Simple tool - Concatenating slides using FFmpeg ...
iPython - Signal Processing with NumPy
iPython and Jupyter - Install Jupyter, iPython Notebook, drawing with Matplotlib, and publishing it to Github
iPython and Jupyter Notebook with Embedded D3.js
Downloading YouTube videos using youtube-dl embedded with Python
Machine Learning : scikit-learn ...
Django 1.6/1.8 Web Framework ...
Ph.D. / Golden Gate Ave, San Francisco / Seoul National Univ / Carnegie Mellon / UC Berkeley / DevOps / Deep Learning / Visualization