Pandas#1

And with the blink of an eye, summer is gone and it is time to return to blogging. Recently the idea to start a short blog series was formed in my mind and in the few forthcoming posts I will cover all the basics (and more) that you need to start using Pandas. 

First and foremost – what is Pandas?

Pandas is a popular Python library that allows users to easily analyse and manipulate data. It offers powerful and flexible data structures and is vastly popular among data scientists and analysts. As with any other library to be able to use Pandas you have to import the library. 

import pandas as pd

Before you start manipulating data with Pandas, you should understand how data is shaped into a readable form – the topic that I will cover in Pandas#1. The library provides two data structures- Series and DataFrame. 

Series

A panda series is a one-dimensional array that consists of a key-value pair. You can think of a series like a column in a table. Series are similar to Python dictionaries, however Series makes data manipulation much easier. 

To create a Series we use pd.Series() . There are lots of optional arguments that you can use, however the most commonly used one is data, which specifies the element of the series. 

sample_series = pd.Series ([‘September’,‘October’,‘November’])

On the left hand side of the Series elements there are integers – those integers are known as index of a series. By default index is set to numbered list, however you can update it. The custom index should be the same length as the number of Series element. 

sample_series_index = pd.Series([‘September’,‘October’,‘November’], index = [‘first’,‘second’,‘third’])

DataFrame

A DataFrame is a 2-dimensional data structure which is similar to a SQL table or a spreadsheet. You can also think of it as a combination of two or more Series. Each column of a DataFrame can contain different data types. Similar to Series, when you create a DataFrane you can specify a custom index. 

We can create a Dataframe using pd.DataFrame 

sample_DataFrame = ( [ [‘2021’ , ‘2019’ , ‘2021’] , [ ‘London’ , ‘Paris’ , ‘Berlin] , [‘GBP’ , ‘EUR’ , ‘EUR’]] , index = [‘row_1’ , ‘row_2’ , ‘row_3’] , columns = [‘year’ , ‘city’ , ‘currency’])

Now you know how Pandas shapes data into a credible format and in the next post I will cover some data manipulation techniques like adding a new row, dropping a row, updating a row and more. Hope that this mini blog series will be beneficial to other folks and I would love it if people who use Pandas share their experience in the comments bellow. Pandas#2 coming soon 🙂