A common python package that makes everything easy when you’re dealing with csv/excel file
Referece: Yuan’s blog
Load files
pd.read_{filetype} is a function that used to load the csv/exel files to the workspace as a data type called DataFrame (df
in short)1
2
3
4
5import pandas as pd
# 1. load csv file
df = pd.read_csv('data.csv')
# 2. load excel file
df = pd.read_excel('data.xlsx')
Display data
Quick peek to the whole data distribution.1
2
3
4
5
6
7
8
9
10
11
12
13
14# 1. display first n data
df.head(n)
# 2. display last n data
df.tail(n)
# 3. display the column*length of dataframe
df.shape
# 4. display the name of the columns
df.columns
# 5. display the type of the data
df.dtypes
# 6. display the general info of dataframe
df.info()
# 7. display the statistics of the data
df.describe()
Data filtering
Similar to the filter function in excel!1
2
3
4
5
6# 1. Filtering colunms
df[['column1', 'column2']]
# 2. Filtering specific criterion
df[df['column'] == value]
# 3. Filtering with multiple criterions
df[(df['column1'] == value1) & (df['column2'] == value2)]
Data addition/removal
1 | # 1. Concatenate multiple dataframes |
Data processing
1 | # 1. sort the dataframe by value |
Data cleaning
1 | # 1. Check any null data in dataframe |
Export DataFrame
1 | # 1. Export to csv file |