Week 7 Day 1 - More File I/O
More file loading!
Today’s class is going to be primarily practical: we’ll talk a little bit about some of the things we missed from last class, do a little review, and then you’ll spend the class working with data.
Pandas
Reading in tabular data like this is a very common use of Python, so there are a lot of prewritten modules that make it easier to do things like this.
One of them is pandas, which handles files in CSV format. CSV file format is basically how things like excel spreadsheets are stored. If you’ve used R, you’ll be familiar with a lot of the syntax of Pandas.
Pandas has a lot of features, so we’ll just look at a few of them. Here’s a good supplementary guide to these notes if this all makes good sense to you, or if you have some experience with R and want to know how to do similar things in Python.
Installing Pandas
First we have to install Pandas with
pip3 install pandas
Check if it’s installed by running a python script with the following:
import pandas as pd
prof_info = {
"Ackles" : {
"classes": ["Python", "Algorithms"],
"office": "Steitz 131",
"firstname": "Acacia"
},
"Krebsbach" : {
"classes": ["Java", "FYS"],
"office": "Briggs 411"
},
"Gregg" : {
"classes": ["Web Devel", "Algorithms"],
"office": "Briggs 413"
}
}
df = pd.DataFrame(data=prof_info)
print(df)
Now we have a table with all of this information, instead of a dictionary.
Reading to Pandas
We can read an entire file directly into a pandas dataframe with read_csv
.
Take the example of the temps file from earlier.
hattemps = pd.read_csv("hats.txt", sep = " ")
Note that it doesn’t actually have to be a csv, just saved in csv format.
Note also that this automatically detects the header row!
Working with Data
There are many, many things you can do with data in Pandas. Here are a few common ones.
Indexing:
templist = df["temps"]
temp = df["temps"][4]
Or .loc
and .iloc
t1 = df.loc[2, "temp"]
t2 = df.iloc[2, 0]
Changing names:
df = df.rename(columns={"hats": "num_hats"})
or
df.rename(columns={"num_hats": "hats"}, inplace=True)
Same for rows, but use index.
Creating columns from other columns:
def isWarm(temp):
return row['temp'] > 70
df['warm'] = df.apply(lambda row: isWarm(row), axis =1)
Writing from Pandas
Finally, we can write out from pandas to a new file.
df.write_csv("new_temps.csv")
In-Class Exercise
Produce a visualization of some of the data from any of the CSV files at this site. You can create a bar graph, a line graph, a pie chart – whatever you want! Here’s the matplotlib site where you can see some popular kinds of plots you might want to make.
Go wild with it in your groups, and then after a few minutes you’ll share with each other. Then, each group will pick one visualization to share with the class.