Have you ever wondered what is Pandas? With the rising need of data science nowadays, you definitely heard of Python; if you know Python, you definitely heard of Pandas. Not the panda who knows Kung Fu! But once you know Pandas, you will gain a superpower called data analysis which is as cool as Kung Fu. In this post, I will introduce to you some very common and useful functions of Pandas, get ready!

What is Pandas?

pandas is a software library written for the Python programming language for data manipulation and analysis. In particular, it offers data structures and operations for manipulating numerical tables and time series. It is free software released under the three-clause BSD license. The name is derived from the term “panel data”, an econometrics term for data sets that include observations over multiple time periods for the same individuals. Its name is a play on the phrase “Python data analysis” itself. Wes McKinney started building what would become pandas at AQR Capital while he was a researcher there from 2007 to 2010.

How do I call Pandas library?

It’s fairly easy to call Pandas in Python. We just use import pandas as pd. You don’t have to use “pd” as Pandas’s nickname, you can choose whatever you want, it’s just “pd” is commonly used by almost all Pandas users. Everyone knows “pd” represents Pandas!

How to create a series or a dataframe in Pandas?

Series:
s = pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])

Dataframe:
data = {'Country': ['Taiwan', 'Korea', 'Japan'], 'Capital': ['Taipei', 'Seoul', 'Tokyo'], 'Population': [23920776, 51375508, 125543793]}
df = pd.DataFrame(data,columns=['Country', 'Capital', 'Population'])

Test Image

Read and write to csv/Excel

Oftentimes we need to output our dataframe to csv file, here is how we do it:
csv:
pd.read_csv('file.csv', header=None, nrows=10)
df.to_csv('mydataFrame.csv')

Excel:
pd.read_excel('file.xlsx')
df.to_excel('dir/myDataFrame.xlsx', sheet_name='Sheet1')

Selecting

By label/position
df[1:2] will get you the first row of this dataframe. Likewise, df[1:3] will get you index 1 to index 2
Note: The index in Python starts at 0. So if you want the fifth item, the index for it is 4
df[1:] will get the rest of the dataframe starting from index 1.

Test Image

Conditional Selecting
Sometimes we need to select by some conditions. This is when >, <, ==, !=, &, | come in handy.</ul>

Let’s see some examples here:

  1. df[df['Population']>120000000]

    Test Image

  2. df[(df['Population']<50000000)&(df['Capital']!='Seoul')]

    Test Image

  3. df[(df['Population']<120000000)|(df['Capital']=='Tokyo')]
    The above code will give us all three cities since we use | (or) expression. All cities satisfy the conditions.

</ul>

Dropping, sort and rank

Drop:
df.drop('Country', axis=1) This will help us drop the “Country” column.

Sort & Rank
df.sort_index() Sort by labels along an axis
df.sort_values(by='Country') Sort by the values along an axis
df.rank() Assign ranks to entries

Retrieve dataframe information

Basic information
df.shape Retrieve (rows, columns)
df.index Describe index
df.columns Describe DataFrame columns
df.info() Info on DataFrame
df.count() Number of non-NA values

Basic statistical information
df.sum() Sum of values
df.cumsum() Cumulative sum of values
df.min()/df.max() Minimum/maximum values
df.idxmin()/df.idxmax() Minimum/Maximum index value
df.describe() Summary statistic
df.mean() Mean of values
df.median() Median of values

Reshaping data - change the layout of a data set

  1. Transfer your data set from wide to long
    pd.melt(df, id_vars=['id_var'], value_vars=['value_var']) This is extremely helpful when you are tidying your data
  2. Transfer your data set from long to wide
    df.pivot(columns='var', values='val') This is extremely helpful when you are tidying your data
  3. Append rows of dataframes
    pd.concat([df1, df2])
  4. Append columns of dataframes
    pd.concat([df1, df2, axis=1])

</ul>

Test Image

Conclusion

Pandas is a extremly powerful tool in data tidying, wrangling, and analysis. This post is not an exhaustive look at Pandas, there are a lot of different ways to achieve the same results. Some are smarter and faster than my methods. Below are a list of helpful links that will enhance your understanding and ability of utilizing this robust library. As you diligently research and apply different methods in your python script using Pandas, you will surely become a better data analyst and a Pandas expert!

W3Schools on Pandas
GeeksforGeeks on Pandas
Python Pandas tutorial
tutorialspoint on Pandas
Real Python on Pandas