The objective of this post is to present an intuitive overview of features of *pandas* DataFrame object. Minimum temperature data from 1901 to 2017 provided by data.gov.in is used as an example.

### Table of Contents

- What is pandas?
- Installing pandas
- Running this example on Kaggle
- Creating a DataFrame from Excel or CSV
- Glancing at the data
- Statistical overview of the data
- Finding the hottest year
- Visualizing annual minimum temperature over years
- Visualizing temperatures rise and fall (Mean Temp – Months)
- Finding hottest seasons (1901-2017)
- Finding the most extreme year
- Plotting Differences
- Looking into abnormal winters

### 1. What is pandas?

It is a Python library for data analysis. It is interestingly named as acronym of *PAnel DAta*. It has rich data structures and tools for working with structured data sets common to statistics and other fields. Its main data structure is called *DataFrame.*

### 2. Installing pandas

`conda install pandas`

- If you have Anaconda installed, you can install pandas using above command.

### 3. Running this example on Kaggle

- You can fork this on Kaggle from here.
- You can download the dataset from here.

### 4. Creating a DataFrame from Excel or CSV

```
import pandas as pd
temp = pd.read_excel ('../input/temp.xls')
#temp = pd.read_csv ('../input/temp.csv')
temp = temp.set_index (temp.YEAR)
```

- Firstly, we import pandas library.
*read_excel ()*and*read_csv()*both return DataFrame object. Here we are using read_excel as input file is an Excel file this case.- Every DataFrame has an index, in this case we want YEAR column to be the index.
*set_index()*function returns a new DataFrame and doesn’t modify the existing one.

### 5. Glancing at the data

```
temp.head()
```

- head() returns five first rows from the data with column headers.

### 6. Statistical overview of the data

```
temp.describe()
```

*describe()*returns basic statistics from the dataset e.g. count, mean, min, max, std etc.

### 7. Finding the hottest year

```
temp['ANNUAL'].idxmax()
```

2016

*idxmax()*returns index of the row where column value is maximum. Because YEAR is our index, we get hottest year by finding maximum on ANNUAL column. We can achieve this simply by using idxmax() on ANNUAL column.

### 8. Visualizing annual minimum temperature over years

```
import matplotlib.pyplot as plt
x = temp.index
y = temp.ANNUAL
plt.scatter(x,y)
plt.show()
```

- We’ve imported matplotlib for plotting.
- Here a scatter plot with columns ANNUAL against YEAR is plotted.

### 9. Visualizing temperatures rise and fall (Mean Temp – Months)

```
mean_months = temp.loc[:,'JAN':'DEC'].mean()
plt.plot(mean_months.index, mean_months)
```

JAN 13.167009 FEB 14.656239 MAR 17.774872 APR 21.054274 MAY 23.233846 JUN 23.838291 JUL 23.718462 AUG 23.386838 SEP 22.228974 OCT 19.735299 NOV 16.255470 DEC 13.735641 dtype: float64

*loc*is used to access values by labels. Here we are accessing columns from ‘JAN’ through ‘DEC’.*loc*when used with [] returns a Series.*loc*when used with [[]] returns a DataFrame.*mean()*does not need an explanation.

### 10. Finding hottest seasons (1901-2017)

```
hottest_seasons = {'Winter' : temp['JAN-FEB'].idxmax(),
'Summer' : temp['MAR-MAY'].idxmax(),
'Monsoon': temp['JUN-SEP'].idxmax(),
'Autumn' : temp['OCT-DEC'].idxmax()}
print (hottest_seasons)
```

{'Winter': 2016, 'Summer': 2016, 'Monsoon': 2016, 'Autumn': 2017}

### 11. Finding the most extreme year

```
temp ['DIFF'] = temp.loc[:,'JAN':'DEC'].max(axis=1) - temp.loc[:,'JAN':'DEC'].min(axis=1)
temp.DIFF.idxmax()
```

1921

- Calculate min() and max() on JAN to DEC columns for each row
- Calculate difference = max – min for each row
- Add difference (DIFF) column to the dataframe
- Do idxmax() on DIFF column

### 12. Plotting Difference over Years

axes= plt.axes() axes.set_ylim([5,15]) axes.set_xlim([1901,2017]) plt.plot(temp.index, temp.DIFF) temp.DIFF.mean()

10.895128205128202

### 13. Looking into abnormal winters

```
year_dict = temp.loc[:,'JAN':'DEC'].to_dict(orient='index')
sorted_months = []
for key, value in year_dict.items():
sorted_months.append (sorted(value, key=value.get)[:4])
winter = sorted_months[:]
winter_set = []
for x in winter:
winter_set.append (set(x))
temp['WINTER'] = winter_set
winter_routine = max(sorted_months, key=sorted_months.count)
temp.WINTER [temp.WINTER != set(winter_routine)]
```

YEAR 1957 {FEB, JAN, MAR, DEC} 1976 {FEB, JAN, MAR, DEC} 1978 {FEB, JAN, MAR, DEC} 1979 {FEB, JAN, MAR, DEC} Name: WINTER, dtype: object

- Abnormal winters, here, mean a season of four months where most cold temperatures where at least one month is different from commonly observed set of winter months.