[MUSIC]
0:00
In the data science and
business software worlds,
0:04
it's hard to escape
the dreaded spreadsheet.
0:07
Well, maybe you don't dread
them as much as I do.
0:09
I just never learned to use
Office software appropriately.
0:12
Okay, enough about me.
0:15
Anyway, in those two worlds and
many others,
0:16
a lot of your work as a coder will be
using CSV or comma separated value files.
0:19
These files usually have a header row and
then multiple rows below them with each
0:24
columns of value separated from
the previous column by a comma.
0:29
Yeah, it's a spreadsheet without
all of the Excel excellence.
0:33
Python though has a pretty robust
CSV module that lets us read and
0:37
write tabular data in this popular format.
0:40
Now if you've done the data science
basics course you've already seen how to
0:43
use CSVs, but it's worth focusing on this
for all of you that have to use CSVs but
0:46
don't need all of
the information in that course.
0:51
I have a CSV here from the Canada Science
and Technology Museums Corporation.
0:55
And has about a hundred of
their artifacts in it but I
1:00
can't display CSVs in work spaces and even
if I could I don't really wanna read it.
1:04
I wanna use it, so
let's make a file for this.
1:09
I'm gonna say new file, artifacts.py,
1:12
Python's CSV module is where we wanna
start, at least most of the time.
1:17
I'll put a link or two in the teacher's
notes for some other CSV libraries.
1:21
The biggest problem that
Python's built-in one has
1:24
is that it's slightly lacking
when it comes to Unicode support.
1:27
But let's just see what we can do with it.
1:31
So we're gonna import csv.
1:33
And we're gonna, obviously the first
thing I have to do is import the library.
1:35
And then I wanna see about
reading from our file.
1:38
We need to open the file a little bit
differently than we normally would do.
1:41
Normally we would just get with
you know a flag or something.
1:45
And in this case we wanna give it a new
line argument, so let's just say,
1:48
with open museum.csv,
1:51
and then when I say newline is equal
to and just an empty thing here.
1:57
The reason I want to do that is
because if we have a new line inside
2:00
of a quoted section then
we want to be sure and
2:04
be able to differentiate that from
an actual new line marking a new line.
2:08
So yeah.
2:13
And we're just gonna call this csvfile.
2:15
Oops, I forgot my colon.
2:17
And so now we need to make our CSV reader.
2:18
And we instantiate this reader so that we
can tell it how to read from the file.
2:21
So I'm gonna call this, I don't know,
artreader, and it will be csv.reader.
2:25
And it's gonna be a csv file.
2:32
And the delimiter,
2:34
I know this from looking at my,
I think that's how you spell it,
2:35
they actually used a pipe character
as the delimiter instead of a comma.
2:40
So really it's a pipe
separated value file.
2:45
Whatever, the idea is that
something separates each thing.
2:48
You kind of have to look at your csv files
to see what they're using as a delimiter.
2:51
All right, so
our artreader object here is a generator.
2:55
That means we can't index it.
2:59
We can't do like artreader two but
I only wanna look at a few of the records.
3:00
So I'm gonna go ahead and
turn this thing into a list.
3:05
So I'm gonna say rows = list (artreader).
3:07
And now I can loop over it.
3:12
So for row in rows up to two,
let's print a comma and a space.
3:14
And we'll join whatever is in that row.
3:22
So okay, cool.
3:25
And now, let's go ahead and
python artifacts.py.
3:27
So, cool.
3:33
There is the stuff that's in our file.
3:34
So you can see our first row
that we printed out there,
3:37
those are our field names.
3:40
So, artifact number, object name,
category three, and so on.
3:41
And then we get this thing here,
that here's the number, it's a cover,
3:45
it's paper.
3:50
And so on.
3:51
So that's cool.
3:53
The problem with this is, I have to know,
like, which column the thing I want is in.
3:53
And that's not always the easiest thing,
right?
3:58
So let's turn these into
dictionaries instead of just lists.
4:02
And we can do that by changing our reader.
4:07
So we have this reader here.
4:10
Think of this as like a list reader and
4:12
we can actually replace that
with what's called DictReader.
4:14
And DictReader reads and
gives us dictionaries.
4:18
So that's pretty cool, all right.
4:22
So we'll just take those first two.
4:24
Actually, let's skip that first one.
4:26
And let's print out row and
4:29
then let's print out group1, so
whatever they have in their group1 field.
4:31
Let's go ahead and run that again.
4:37
And our two groups, or
4:40
our two rows rather,
both have Aviation as their group1.
4:42
So that's pretty cool.
4:46
We can give a list of field
names to the DictReader,
4:48
when we specify the DictReader right here.
4:51
And it'll use those as the field names
in order, but since we didn't do that,
4:53
it actually reads in that first row,
and uses those as the field names.
4:57
So it's kind of smart,
which is pretty cool.
5:03
Let's change group one
over there to manucity.
5:05
Which is the manufactured city.
5:09
If we run that,
then we don't have a manufactured city for
5:12
either of those two items.
5:15
Alright, cool.
5:17
This is for working with a reader.
5:18
Just the thing that reads in the data and
parses it.
5:21
Let's talk about making something
that writes out to a CSV.
5:24
I'm actually going to do a new file for
this.
5:28
Just because I kind of want to keep
these a little bit separate, right?
5:32
So I'm gonna call this teachers.py,
and again I'm gonna import CSV.
5:36
And then I'm gonna do with open
teachers.csv with the a format as csvfile,
5:39
and I'm gonna say my field
names are first name
5:48
and last name and topic.
5:55
You can, of course,
put whatever fieldnames in here you want.
5:58
You can have hundreds of these,
that's completely up to you.
6:03
I'm going to call this 'Teach Writer'.
6:05
And we'll do csv.DictWriter and
we're going to write to our csv files.
6:08
And our field names
are the field names variable.
6:14
So then let's do teachwriter.
6:20
Writeheader and so that writes out
a row that has the header stuff in it.
6:23
And then we'll do teachwriter.writerow.
6:30
And we'll write first_name is Kenneth.
6:36
Last_name is Love and topic is Python.
6:43
And I'm gonna rack those up.
6:51
All right, cool.
6:55
I'm actually pretty good.
6:58
If I had a list of dictionaries that
already had all my things in it,
6:59
then I could pass those in to
the right rows method as well.
7:01
Or I could call right row multiple times.
7:04
So, like let's add in two rows here.
7:06
I'm gonna add in Alena Holligan,
and she teaches PHP.
7:10
All right, so now let's run this.
7:16
Python teachers.py.
7:19
And then if I was to cat teachers.csv,
7:23
then we've got our header row, and then we
have the row for me and the row for Alena.
7:31
So that's pretty cool, so
7:38
reading and writing CSVs is
fairly straightforward in Python.
7:39
The main thing you have to remember to
do is create these field names if you're
7:42
writing I recommend using
the DictReader and DictWriter.
7:46
They're a little bit more useful.
7:49