Today: int type, str type, accumulate pattern, int modulo, text files, standard output, print(), file-reading, crazycat example program

123 vs. '123'

What is the difference between these two? To a non-programmer, they look basically the same.

123 vs. '123'

123 is an int number, and '123' is a string length 3, made of 3 digit chars.

Every Value Has A "Type"

Python code works on values, and each value has a "type" which determines how it behaves. The name of the integer type is int and the string type is str. Most often, Python code will take actions that make intuitive sense, so it's natural to not think much about the internal structure. Nonetheless, here we'll look under the hood to see how Python really works, tracking values and types.

See the Variables chapter in the guide.

What are int Values For - Arithmetic

1 + 2 * 3 -> 7

What are str Values For - Text Manipulation

'hi' + 'there' -> 'hithere'

int, str, +

One obvious difference is that the + operator takes different actions based on the type of its data — addition for int, and concatenation for str.

>>> 1 + 2
3>>>>>> 'a' + 'b'
'ab'>>>

The results here are not too surprising, but how does the + know what to do?

int and str Variables

Suppose we set up these three variables

>>> x = 3>>> y = 'hi'>>> z = '7'

Here is what memory looks like. Each variable points to its assigned value, as usual. In addition, each value in memory is tagged with its type - here int and str.

alt: a b c variables, each pointing to value+type

How + Operator Works - Type

Python uses the type of a value to guide operations on that value. Look at the + operator in the expressions below. At the moment the + runs, it follows the arrow to see the values to use. On each value, in particular, it can see the type. In this case, when it sees int on the left value, it does arithmetic and returns an int value. When it sees str, it does string concatenation and returns a str value.

For each variable, Python follows the arrow to get the value to use, and each value is tagged with its type. What is the result for the expressions like x + x below?

alt: hilight types on variables

>>> x = 3>>> y = 'hi'>>> z = '7'>>>>>> x + x
6 >>> y + y
hihi>>> z + z
77 >>>

The + with int values does addition, but with str values it does string concatenation.

The type of '7' is str, so '7' + '7' is '77'

(optional) Python Does Not Deduce Type from Variable Name

Normally we follow the convention that a variable named s to points to a string. This is a good convention, allowing people reading the code get the right impression of what the variable stores. We always follow this convention in our example code, so students naturally get the impression that it's some sort of rule. As if Python knows the value is a string because the variable name is s.

In fact, Python does not have a rule that a certain variable name must point to a certain type. To Python, the variable name is just a label of that variable used to identify it within the code. Python's attitude to the variable name is like: this is the name my human uses for this variable.

The type comes from the value at the end of the arrow, such as 7 (int) or 'Hello' (str).

(optional) Contrary Name Example

Just to be difficult, here we've chose variable name that do not correspond to the types. What does Python do in this case?

>>> s = 7>>> x = '9'>>>>>> s + s
14 >>> x + x
99 

Type Conversions - int() str()

Conversion Functions

>>> str(25)  # Convert int -> str
'25'>>>>>> int('200')  # str -> int
200>>> int(2.5)  # float -> int
2>>>

Example 1 - Convert int to str, Concatenate

Suppose we have a number we want to combine with a string, say to print out a user's current score, like this:

score:13

Use the str(n) function to convert int to str, then concatenate to combine strings.

>>> # Concatenate str with int - error>>> 'score:' + 13
TypeError: can only concatenate str (not "int") to str>>>>>> # Use str() convert int -> str, then it works>>> 'score:' + str(13)
'score:13'>>>

Example 2 - extract2(s)

> extract2()

For this problem, given a string, extract a substring showing a number, convert it to an int, and return its square.

'magic:12' -> 144

Make a little drawing to think about how to write the slice. Recall that omitting the second number of the slice goes through the end of the string.

 'magic:12'
 ^ 
 colon

So extract like this...

 colon = s.find(':')
 s[colon + 1:] -> '12'

extract2(s) Solution

Extract the substring containing the number. Then covert the substring to an int number, so we can do arithmetic with it. Here each step is done on its own line, although in practice it might be written with fewer lines.

def extract2(s):
 colon = s.find(':')
 if colon == -1:
 return -1
 # Here putting each step on its own line
 num_str = s[colon + 1:] # '12'
 num = int(num_str) # 12
 return num ** 2

Leveraging Patterns

Often when you confront a computer problem, you've seen something similar before. It's nice to lean on patterns like this, filling in some remembered structure quickly, and then focussing on what is specific to this problem.

Accumulate Pattern

Look at the double_char() function, and we see an "accumulate" code pattern which will solve a whole class of problems.

1. Before the start: result = empty

2. In the loop, some form of: result += xxx

3. At the end: return result

Recognizing this pattern gives you have a head start solving similar problems.

e.g. Loop Counting

A common problem in computer code is counting the number of times something happens within a data set. This fits the accumulate pattern, using count = 0 before the loop and count += 1 in the loop. Recall that the line count += 1 will increase the int stored in the variable by 1.

count = 0
loop:
 if thing-to-count:
 count += 1
return count

Example count_e()

This string problem shows how to use += 1 to count the occurrences of something, in this case the number of 'e' in a string.

> count_e()

count_e() Solution

def count_e(s):
 count = 0
 for i in range(len(s)):
 if s[i] == 'e':
 count += 1
 return count

e.g. Loop Summing

Suppose I want to add up a bunch of numbers. We can use the accumulate pattern here too. Set total = 0 before the loop. Inside the loop, use result += next_number to add each number to the sum. When the loop is done, the sum variable holds the answer.

total = 0
loop:
 total += next_number
return total

Aside: the variable name sum seems like a good choice the variable above. However, there is a built in python function name sum(), and as a matter of style, we avoid giving a variable a name which is also the name of a function. That's why we use total here.

Example shout_score()

> shout_score()

Say we want to rate an email about how long and how much shouting it has in it before we read - like scoring emails from your nutty relatives.

Example high-score email:

Hi Sarah, just relaxing in retirement.
I CAN'T BELIEVE WHAT YOUR MOM IS UP TO!!!!!!
WITH THAT NEW HAIRCUT!!!!!!!!!!!!
AND WHY IS THANKSGIVING SO EARLY THIS YEAR!!!!!

Scoring for each char:

lowercase char -> 1 point
uppercase char -> 2 points
 '!' char -> 10 points

Reminder, boolean string tests:

s.isalpha() s.isdigit() s.isspace() s.islower() s.isupper()

shout_score(s): Given a string s, we'll say the "shout" score is defined this way: each exclamation mark '!' is 10 points, each lowercase char is 1 point, and each uppercase char is 2 points. Return the total of all the points for the chars in s.

'Arg!!' -> 24 points
'A' -> 2
'r' -> 1
'g' -> 1
'!' -> 10
'!' -> 10

In the loop, use the sum pattern to compute the score for the string.

shout_score() Solution

def shout_score(s):
 score = 0
 for i in range(len(s)):
 if s[i] == '!':
 score += 10
 elif s[i].islower():
 score += 1
 elif s[i].isupper():
 score += 2
 return score

Here using if/elif structure, since our intention is to pick out 1 of N tests. As a practical matter, it also works as a series of plain if. Since '!' and lowercase and uppercase chars are all exclusive from each other, only one if test will be true for each char.


Exercise sum_digits()

> sum_digits()

'12abc3' -> 6

Students try this one. It combines the accumulate pattern and str/int conversion. Reminder, boolean string test: s.isdigit()

sum_digits(s): Given a string s. Consider the digit chars in s. Return the arithmetic sum of all those digits, so for example, '12abc3' returns 6. Return 0 if s does not contain any digits.

sum_digits() Starter

Here's the rote parts of sum_digits() you can start with. Work out the code inside the loop.

def sum_digits(s):
 total = 0
 for i in range(len(s)):
 # use s[i]
 pass
 return total

sum_digits() Solution

def sum_digits(s):
 total = 0
 for i in range(len(s)):
 if s[i].isdigit():
 # str '7' -> int 7
 num = int(s[i]) 
 total += num
 return total


Int Division Operator //

(Not really doing this today, just mentioning)

The int division operator // rounds down to produce int, so we use this when we need an int. This division and discards any remainder, rounding the result down to the next integer.

>>> 5 / 2  # Problem: / produces float
2.5>>>>>>>>>>>> 5 // 2  # Solution: // rounds down to int
2>>> 6 // 2
3>>> 7 // 2
3>>> 100 // 25
4>>> 107 // 25
4>>>

Modulo, Mod % Operator

Related to int division, we have the "modulo" operator % which is essentially the remainder after int division. It's usually called the "mod" operator for short. So for example (57 % 10) yields 7 — int divide 57 by 10 and 7 is the leftover remainder. The mod operator makes the most sense with positive integers, so avoid negative numbers or floats with modulo.

Say we have positive ints a and n, then a % n is the modulo, the remainder left after dividing a by n. Two facts about modulo:

1. a % n -> 0..n-1 inclusive
2. a % n -> 0 means divided evenly

Mod by 0 is an error, just like divide by 0

Mod Examples

>>> 31 % 10
1>>> 56 % 10
6>>> 60 % 10  # 0 result -> divides evenly
0>>> 54 % 5
4>>> 55 % 5
0>>> 56 % 5
1>>> 57 % 5
2>>> 56 % 0
ZeroDivisionError: integer division or modulo by zero>>>

Mod - Even vs. Odd

A simple use of mod is checking if an int is even or odd. Consider the result of n % 2. If the result is 0, then n is even, otherwise odd. It's common to use mod like this to, say, color every other row of a table green, white, green, white .. pattern. (See next example)

>>> 8 % 2
0>>> 9 % 2
1>>> 10 % 2
0>>> 11 % 2
1>>> 12 % 2
0

Example crazy_str()

Produce that internet crazy capitalization like

tHeRe aRe nO MoRe bUgS

crazy_str(s): Given a string s, return a crazy looking version where the first char is lowercase, the second is uppercase, the third is lowercase, and so on. So 'Hello' returns 'hElLo'. Use the mod % operator to detect even/odd index numbers. For even indexes, convert the char to lowercase, for odd convert to uppercase.

'Hello' -> 'hElLo'
index i: 0 1 2 3 4
 even odd even odd even
 lower, upper, lower, upper, lower ...

> crazy_str()

crazy_str() Solution

def crazy_str(s):
 result = ''
 for i in range(len(s)):
 if i % 2 == 0: # even
 result += s[i].lower()
 else:
 result += s[i].upper()
 return result

File Processing - crazycat example

Your code often wants to access masses of data stored in files.

Today we'll look at the crazycat example to demonstrate how to read data out of files, and printing and standard output.

Foreshadow: Parts of the Computer

alt: computer is made of CPU, RAM, storage

We'll meet these later, but the CPU does the computation, RAM stores data when it's worked on, and storage holds files that store the data.

crazycat.zip

What Are Files?

alt: hibye.txt file

Text File

hibye.txt Text File Example

The file named "hibye.txt" is in the crazycat folder. The hibye.txt file has 2 lines of text, each with a '\n' marking its end (here the '\n' are shown in gray, but normally in an editor they are not shown on screen).

Hi and\n
bye\n

Here is what that file looks like in an editor that shows little gray marks for the space and \n (the show-invisibles mode in a word processor):

alt: hibye.txt chars, showing \n ending each line

Backslash Chars in a String

There are some commonly used codes in CS using backslash \ to include special chars in a string literal. Note: backslash is different from the regular slash / on the ? key.

\n newline char
\' single quote
\" double quote
\\ backlash char
# Write the word: isn't
s = 'isn\'t' # use \'
s = "isn't" # or use " outside

hibye.txt as a String

Using '\n' to write each newline char, we can write the contents of the file as a Python string - see how the newline chars end each line:

'Hi and\nbye\n'

(optional) How many chars? How many bytes?

How many chars are in that file (each \n is one char)?

There are 11 chars. The latin alphabet A-Z chars like this take up 1 byte per char. Characters in other languages take 2 or 4 bytes per char. Use your operating system to get the information about the hibye.txt file. What size in bytes does your operating system report for this file?

So when you send a 50 char text message .. that's about 50 bytes sent on the network + some overhead. Text data like this uses very few bytes compared to sound or images or video.

Aside: Detail About Line Endings

In the old days, there were two chars to mark the end of a line. The \r "carriage return", would move the typing head back to the left edge. Then the \n "new line" would advance to the paper next line. So in old systems, e.g. DOS, the end of a line is marked by two chars next to each other \r\n. On Windows, you will see text files with this convention to this this day. Python largely insulates your code from this detail - the for line in f form shown below will go through the lines correctly, adjusting the line ending found in the file to look like the regular '\n' form.


Before reading the file, we need a little more background.

Recall: Function Dataflow - Parameters and Return

Q: How does data flow in and out of the functions in your program?

A: Parameters and Return value

Parameters carry data from the caller code into a function when it is called. The return value of a function carries data back out to the caller.

This is the key data flow in your program. It is 100% the basis of the Doctests. It is also the basis of the old black-box picture of a function. This is still true, despite what we see in the next section.

alt: black-box function, params in, return value out

"Standard Output" Text Area / print()

In reality, there is an additional, parallel output area for a program, shared by all its functions.

There is a text area known as Standard Output associated with every run of a program. By default standard output is made of text, a series of text lines, just like a text file. The standard output area is an informal catch-all area — any function can append a line of text to standard out by calling the print() function, and conveniently that text will appear in the terminal window that started the python program. The standard output area works in other computer languages too, and each language has its own form of the print() function.

Here we see the print() output from calling the main() function in this example:

alt: print() function prints to standard output text area

print() function

See guide chapter: print()

>>> print(1, 2, 3)
1 2 3>>> print('hello there')
hello there>>> print('hello', 123, '!')
hello 123 !>>> print()  # just the newline>>>

Data out of function: return vs. print

Return and print() are both ways to get data out of a function, so they can be confused with each other. We will be careful when specifying a function to say that it should "return" a value (very common), or it should "print" something to standard output (rare). Return is the most common form, but several of today's examples below use print().

Crazycat Program example

This example program is complete, showing some functions, Doctests, and file-reading.

crazycat.zip

1. Try "ls" and "cat" in terminal

See guide: Command line

See guide: File Read/Write

Open the crazycat project in PyCharm. The crazycat folder has some text files in it. Open a terminal in the crazycat directory (see the Command Line guide for more information running in the terminal). Terminal commands - work in both Mac and Windows. When you type command in the terminal, you are typing command directly to the operating system that runs your computer - Mac OS, or Windows, or Linux.

pwd - print out the location of the folder we are in

ls - see list of filenames ("dir" on older Windows)

cat filename - see file contents ("type" on older Windows)

$ ls
__pycache__	hibye.txt	quote2.txt
alice-book.txt	poem.txt	quote3.txt
crazycat.py	quote1.txt	quote4.txt
$ 
$ cat hibye.txt 
Hi and
bye
$
$ cat poem.txt 
Roses Are Red
Violets Are Blue
This Does Not Rhyme
$
$ cat quote1.txt 
Shut up, he explained.
 - Ring Lardner
$ 

2. Run crazycat.py with filename

$ python3 crazycat.py poem.txt 
Roses Are Red
Violets Are Blue
This Does Not Rhyme
$ python3 crazycat.py hibye.txt 
Hi and
bye
$

3. Standard File-Read Code v1

Say the variable filename holds the name of a file as a string, like 'poem.txt'. The file 'poem.txt' is out in the file system with lines of text in it. Here is the standard code to read through the lines of the file. (This is v1 of the code, and we'll improve it to v2 below.)

with open(filename) as f:
 for line in f:
 # use line (with \n)
 ...
 

1. The phrase - with open(filename) as f - opens a connection to that file and stores it in the variable f. Code that wants to read the data from the file works through the f variable, which is a sort of conduit to the file.

2. The phrase for line in f: accesses each line of the file, one line at a time, as detailed below.

File-Read Picture

This picture shows how the variables f and line loop through all the lines in the file.

alt:file read loop, gets one line at a time from file

Details: the chars for each line reside out in the file system, not in memory. The loop constructs a string in memory to hold the chars of each line on the fly. This can be done quickly and with only a small amount of memory, since it only needs to represent one line at a time.

There are other, less commonly used variations on the open function, and these are described in the guide. If the file read fails with a unicode error, the file may have an unexpected unicode encoding. The following variation lets you specify a specific encoding, so you can try to find an encoding that matches the file: open(filename, encoding='utf-8'). The encoding "utf-8" is one widely used encoding shown as an example.

4. s.strip() Function

The newline character '\n' at the end of each line can be a nuisance. We can remove it with the s.strip() function which returns a version of the string with whitespace chars like space and newline removed from the beginning and end of a string. Here we use it as an easy way to get rid of the newline. This uses the x = change(x) pattern for modifying a string.

>>> line = ' hello there\n'  # with \n>>> line
' hello there\n'>>> line = line.strip()  # remove \n>>> line
'hello there'

5. Standard File Read Code v2 - line.strip()

with open(filename) as f:
 for line in f:
 line = line.strip()
 # use line (no \n)

This file read loop has line = line.strip() added in the loop to eliminate the newline char from each line. For CS106A, this will be our standard way to loop over a file, so we never need to think about the'\n'.

alt:file read loop, gets one line at a time from file, here the '\n' newline is removed from each line

If some CS106A problem asks you to read all the lines of a file, you could paste in the above.

6. Look at print_file_plain() Code

Back to the crazycat example - look at the code.

This command line we saw earlier calls the print_file_plain() function below, passing in the string 'poem.txt' as the filename.

$ python3 crazycat.py poem.txt 
Roses Are Red
Violets Are Blue
This Does Not Rhyme

Here is the print_file_plain() function that implements the "cat" feature - printing out the contents of a file. You can see the code is simply the standard file-reading code, and then for each line, it simply prints the line to standard output.

def print_file_plain(filename):
 """
 Given a filename, read all its lines and print them out.
 This shows our standard file-reading loop.
 """
 with open(filename) as f:
 for line in f:
 line = line.strip()
 print(line)

7. Run With -crazy Command Line Option

The program looks for a '-crazy' option on the command line, and if present, calls the print_file_crazy() function. (You'll see how to code that yourself on a later homework.)

Here is command line to run with -crazy option

$ python3 crazycat.py -crazy poem.txt 
rOsEs aRe rEd
vIoLeTs aRe bLuE
tHiS DoEs nOt rHyMe

What is the code in the print_file_crazy() function to produce that output?

8. Recall: crazy_str(s) Function

Recall the crazy_str(s) black-box function that takes in a string, and computes and returns a funny-capitalization version of it. This function is included in crazycat.py.

crazy_str('Hello') -> 'hElLo'

9. -crazy Code Plan

1. Read each line of text from the file with the standard loop.

2. For each line, call the crazy_str() function passing the line in as a parameter, getting back the crazy version of that line.

3. Print the crazy version of the line.

10. print_file_crazy() Code

The code is similar to print_file_plain() but passes each line through the crazy_str() function before printing. Think about the flow of data for each iteration of the loop - from the file, to the line variable, through crazy_str(), and printed to standard output.

def print_file_crazy(filename):
 """
 Given a filename, read all its lines and print them out
 in crazy form.
 """
 with open(filename) as f:
 for line in f:
 line = line.strip()
 line_crazy = crazy_str(line)
 print(line_crazy)

Experiments

1. Run on alice-book.txt - 3600 lines. The file for-loop rips through the data in a fraction of a second. You can get a feel for how some future research project of yours could use Python to tear through some giant text file of data.

$ python3 crazycat.py -crazy alice-book.txt

2. Shorten the print() in the loop to one line, as below. Describe the sequence of things that happens to each line:

 print(crazy_str(line))

This syntax is reminiscent of Math class, where we write f(g(x)) to take the output of the function g and feed it as input to the function f. To emphasize the theme, you could add in the function call to remove the newline, so that's 3 functions stacked into one expression, which is a bit excessive:
print(crazy_str(line.strip()))

3. (very optional) Try removing the line = line.strip(). What happens to the output? What is happening: the line has a '\n' at its end. The print() function also adds a newline at the end of what it prints.

Optional > Trick

Try running this way, works on all operating systems:

$ python3 crazycat.py -crazy alice-book.txt> capture.txt

What does this do? Instead of printing to the terminal, it captures standard output to a file "capture.txt". You can look at the contents of capture.txt from within PyCharm. Or use "ls" and "cat" to look at the new file. This is a super handy way to use your programs. You run the program, experimenting and seeing the output interactively in the terminal. When you have a form you like, like use > once to capture the output. Like the pros do it!

AltStyle によって変換されたページ (->オリジナル) /