Python Unit 2
Control flow
The Python interpreter executes the statements in a Python script one by one, and quits after the last statement.
= 4
a = 9
b = 5
c = (a + b + c) / 3
mean print(mean)
Evidently, this basic flow of control only allows you to write the most simple programs. Therefore, Python supports compound statements that alter the control flow.
You may run all of the following examples in interactive mode. However, if you are writing a compound statement, entering several indented lines may be tedious. Thus, you might prefer to create one Python script per example (such as func_test.py
) and then run this script (e.g., python func_test.py
).
Functions
Call a function by writing
- function name
- left parenthesis
- arguments separated by commas
- right parenthesis
We already encountered print()
, a so-called built-in function that prints its argument(s) to the standard output (stdout).
print("Hello world!")
Define a function by writing
- keyword
def
- function name
- left parenthesis
- parameters separated by commas
- right parenthesis
- colon
- indented statements (suite)
The following function calculates the mean of three numbers:
def mean_of_three(a, b, c):
= (a + b + c) / 3
result return result
= mean_of_three(5, 8, 2)
m print(m)
A function may pass values to the caller by using the return
statement. In the example above, Python assigns the return value to the variable m
.
- A statement such as
return value1, value2, value3
tells a function to return several values. - A function that lacks a
return
statement implicitly returnsNone
.
In order to correctly use functions, we need to understand the difference between parameters and arguments:
- A parameter (also called “formal parameter”) is part of the function definition and specifies the number (and sometimes also the type) of input values the function may receive.
- An argument (also called “actual parameter”) is the value that is passed to a function when it is called.
When defining a function, we need to distinguish between required and optional parameters:
- A required parameter must receive a value when the function is called.
- An optional parameter may be omitted when the function is called. In this case, a default value that was specified in the function definition is assigned to the parameter.
Below, we define an exponential function. The parameter exponent
is required; by contrast, base
is optional, with a default value of 2.71828.
def exp(exponent, base=2.71828):
return base ** exponent
print(exp(4)) # equal to exp(4, 2.71828)
print(exp(4, 10))
When calling a function, we need to distinguish between positional and keyword arguments:
- A positional argument is assigned to a parameter based on its position.
- A keyword argument is assigned to a parameter based on its key.
print(exp(4, 10)) # 4 -> exponent, 10 -> base
print(exp(base=10, exponent=4)) # 10 -> base, 4 -> exponent
If statements
if statements will execute a block of code if and only if a condition is true:
= -1
n
if n < 0:
print(n, "is negative")
Create an if statement by writing
- keyword
if
- condition (an expression that yields a Boolean value)
- colon
- indented statements (suite)
Python executes the code block following the if statement if the condition is true. If Python should also do something if the condition is false, you must add an else
block:
= 5
n
if n < 0:
print(n, "is negative")
else:
print(n, "is positive")
elif
blocks allow you to chain if statements, i.e., to test an arbitrary number of conditions and execute a distinct set of statements for each condition. Thus, our final if statement, which also checks whether a provided number is zero, is:
if n < 0:
print(n, "is negative")
elif n > 0:
print(n, "is positive")
else:
print(n, "is zero")
Let’s put the if statement which we developed above into a function. This will allow us to test several numbers for their sign. We will replace the print()
statements with return
, since our function should not print its result, but rather return it.
The function will format the result by means of an f-string (“formatted string literal”). An f-string is written as f" ... "
. It behaves like a normal string, but each part enclosed in curly braces {}
is interpreted as Python statement whose result is inserted into the string.
f"An addition: {2+3}"
#> 'An addition: 5'
Store the following code in test_sign.py
and execute the file with Python:
def test_sign(n):
if n < 0:
return f"{n} is negative"
elif n > 0:
return f"{n} is positive"
else:
return f"{n} is zero"
print(test_sign(102))
print(test_sign(-12))
print(test_sign(0))
Loops
while
loops
A while
loop executes its suite as long as the given condition is true. Create such a loop by writing
- keyword
while
- condition (an expression that yields a Boolean value)
- colon
- indented statements (suite, loop body)
Store the following code in calc_powers.py
and execute the file with Python:
= 1
i
while i < 10000:
= i * 2
i print(i)
Typically, the condition will become false after several iterations of the loop body. Above, the variable i
receives a new value whenever the loop body is run.
for
loops
In Python, a for
loop is a so-called iterative loop, since it executes the same code block once for each value in a sequence and may access the current element within the loop body.
Create a for loop by writing
- keyword
for
- loop variable
- keyword
in
- iterable object (e.g., string or list)
- colon
- indented statements (suite, loop body)
for letter in "Python":
print(letter)
The loop above is executed six times, and during each run, the loop variable letter
contains a different character of the string "Python"
(i.e., the values "P"
, "y"
, "t"
, "h"
, "o"
, and "n"
).
A common theme is to iterate over a list and store some information on each list element for later. Below, we initialize a counter variable with zero. We then iterate over a list of numbers and add each element to the counter variable. After the loop has finished, this variable therefore contains the sum of all numbers in the list.
= 0
counter
for n in [1, 5, 13, 9, 4]:
= counter + n
counter
print(counter)
#> 32
Sometimes we wish to iterate over a long sequence of numbers. It would be cumbersome if we had to explicitly write a list containing the numbers from, say, 0 to 10000 – there must be a simpler approach, right? Indeed, there is the built-in range()
function, which generates a sequence of numbers.
Depending on the numbers of arguments passed to range()
, it will return
a sequence of integers from 0 to a final value (excluded)
for i in range(10): print(i) #> 0, 1, …, 9
a sequence of integers from an initial value (included) to a final value (excluded)
for i in range(2, 7): print(i) #> 2, 3, 4, 5, 6
a sequence of integers as above, using the third argument as step size:
for i in range(20, 3, -3): print(i) #> 20, 17, 14, 11, 8, 5
Modularization
Modular programming is a technique that allows to split code into several files, each of which contains related functions and data types. Thereby, code may easily be reused, since a function
- has to be implemented only once in a module (also called package)
- and then may be used by every program that imports this module.
For instance, the math
module defines common mathematical functions. (math
is part of the Python standard library, whose modules are available on each Python installation. Other modules must be installed by the user.)
The keyword import
imports a module and creates a namespace with the same name, which allows us to call a function from the module as follows:
- namespace
.
- function name
import math
7) math.factorial(
We also may select a different name for the namespace upon importing:
import math as mathematics
7) mathematics.factorial(
Moreover, individual (or all) functions of a module may be imported into the global namespace:
from math import factorial
7) factorial(
To increase readability of your code, you should place all import statements at the top of your Python script.
Data structures
Lists
Lists belong to the class of compound data types, which serve as a “collection” of other data types. Lists are created by enclosing comma-separated elements in brackets. For instance, the following list comprises the first eight prime numbers:
= [2, 3, 5, 7, 11, 13, 17, 19]
primes primes
Indexing and slicing work as explained for strings, returning a single element or a new list containing the sliced elements, respectively:
3]
primes[2:5] primes[
Since lists are mutable, you may change their elements:
3] = 100
primes[ primes
The built-in function len()
returns the number of elements in a list:
len(primes)
Importantly, you may iterate over a list via a for
loop:
for p in primes:
print(p)
The elements of a list may be lists themselves, which yields nested lists:
= [
nested_list 1, 2, 3],
[4, 5],
[6, 7, 8, 9]
[
] nested_list
Indexing and slicing also work for nested lists:
2]
nested_list[2][1]
nested_list[2][1:3] nested_list[
Since lists are objects, they may be manipulated by special functions called methods. Methods are called like functions, but refer to a given object. Thus, the syntax for calling a method is as follows:
- name of the object
.
- name of the method
- left parenthesis etc. (like a function call)
Here are some of the methods supported by lists (check the contents of bases
after every statement!):
= ["A", "G", "X", "Y"]
bases "Z") # append an element to the list
bases.append(del bases[2] # delete element with index 2
= bases.pop() # remove the last element and return it
l "T", "U"]) # appends elements from another list
bases.extend([1, "C") # insert the element given by the second argument
bases.insert(# at the index given by the first argument
# reverse order of elements
bases.reverse() # delete all elements bases.clear()
Tuples
Tuples are similar to lists, since they also represent a sequence of items. In contrast to lists, however, tuples are immutable – once you have created a tuple, you can’t change it. Tuples support common list operations:
= ("A", "C", "C", "G", "T")
bases "A" in bases # check whether the tuple contains an element
1:3] # slicing
bases[len(bases) # number of elements
min(bases) # smallest element
max(bases) # largest element
"C") # count the number of a given element
bases.count("G") # find the index of a given element bases.index(
Sets
The set is a sequential data type that represents a mathematical set – thus, sets are unordered. Both mutable and immutable variants are available (set
and frozenset
, respectively). We create a set by placing their elements between curly braces, or by converting from another sequential type:
= {2, 4, 5} # variant 1: direct
A = set([2, 4, 5]) # variant 2: converting a list
A1
== A1 # are these sets equivalent? A
Besides adding and removing elements, you may also perform the classical set operations:
= {1, 2, 3, 4, 5}
B 5) # remove element if it exists
B.discard(7) # add element
B.add(7) # set does not change – each element must be unique!
B.add(
| B # union (OR)
A & B # intersection (AND)
A - B # difference
A - A # note that the difference is not commutative
B ^ B # symmetric difference (XOR) A
Dictionaries
Dictionaries (“dicts”) are unordered collections of key-value pairs, where keys have to be unique. Create a dict
either by separating key-value pairs by colons and enclosing comma-separated pairs with curly braces,
= {"C": "carbon", "H": "hydrogen", "N": "nitrogen"} atom_names atom_names
or by using the built-in function
dict()
, to which we supply the key-value-pairs as keyword arguments:= dict(C="carbon", H="hydrogen", N="nitrogen") atom_names_2 atom_names_2== atom_names atom_names_2
Typical dict operations include inserting, deleting, and – most importantly – searching:
"C"] # find the value associated with key "S"
atom_names["O"] = "oxygen" # insert a new key-value pair
atom_names[del atom_names["H"] # delete the value associated with "H"
You may iterate over all key-value pairs in a dict via the .items()
method and a for
loop:
for key, value in atom_names.items():
print(key, value)
Exercises
Store all files that you generate for unit 2 in the folder python2
in your home directory.
Exercise 2.1 (3 P)
You have successfully solved this exercise as soon as you have worked through this unit. In particular, the folder python2
must contain the following files, which you have created in the course of this unit:
test_sign.py
calc_powers.py
Exercise 2.2 (3 P)
Implement a function get_charge
that checks whether an amino acid is positively charged (e.g., arginine), negatively charged (e.g., aspartate), or neutral (e.g., valine).
- The function should be called with a single argument that specifies the amino acid using its single-letter abbreviation. Only the 21 eukaryotic proteinogenic amino acids should be considered.
- Depending on the charge, the function should return one of the strings
"positive"
,"negative"
, or"neutral"
. - If the user does not supply a valid abbreviation, the function should return
"invalid input"
.
Store the function in exercise_2_2.py
.
You may check that your program works correctly by using the following exemplary calls:
print(get_charge("D"))
#> "negative"
print(get_charge("F"))
#> "neutral"
print(get_charge("H"))
#> "positive"
print(get_charge("foo"))
#> "invalid input"
Exercise 2.3 (3 P)
Implement a function get_average_gc
that calculates the average GC-content of several DNA or RNA sequences.
- The function should be called with a single argument. It will receive a list containing an arbitrary number of strings, each of which represents a nucleotide sequence.
- The function should return a single number, i.e., the average GC-content across all sequences.
Store the function in exercise_2_3.py
.
You may check that your program works correctly by using the following exemplary calls:
print(get_average_gc(["CGACCACATTAATGGACTAC", "CTTGGATTATCACCCCCGTC"]))
#> 0.5
print(get_average_gc(["CCACCGAGACGCCAG", "CGGCCTCCCGAC", "GCCCGGGCCCGCGGGGGT"]))
#> 0.837037037037037
- Use a for loop to iterate over the strings in the list.
- You will need a counter variable.
s.count("X")
counts the number of times the character"X"
appears in the string stored ins
.- The
len()
function also works on strings.
Exercise 2.4 (3 P)
If you code a lot, you will often use third-party packages and consult their documentation. This is what we will do in this exercise: Implement a function make_sequence
that creates a random DNA or RNA sequence of a given length and calculates how often a given base appears in this sequence.
- The first (required) parameter sets the sequence length.
- The second (optional) parameter
rna
decides whether an RNA or DNA sequence should be created. This parameter has a Boolean type and is false by default. - The third (optional) parameter
count_base
names the base that should be counted (default:"A"
). - The function should return two values: (1) A string representing the created sequence, and (2) how often the specified base appears.
Since the function should create the sequence randomly, you might consider to use the function choices()
available in the random
module. This module is part of the standard library – please read the documentation. You will also need the string method join()
to convert the value returned by choices()
into a string. Since string
is a built-in data type, it is also documented in the standard library.
Store the function in exercise_2_4.py
.
You may check that your program works correctly by using the following exemplary calls:
42)
random.seed(print(make_sequence(20, False, "A"))
#> ('GACAGGTACAAGAAGGAGTA', 9)
print(make_sequence(15, rna=True, count_base="C"))
#> ('UGCAUCAAUGUGGUC', 3)
- Exactly follow the specification of function parameters and return values:
- Should a parameter have a certain name?
- If a parameter is optional, what is its default value?
- How many values should the function return?
- Remember how to import a function from a module.
Exercise 2.5 (3 P)
In this exercise, you will create a command line program that calculates the similarity of two tissue samples by means of their expressed genes.
- The program should process two command line arguments
genes1
andgenes2
. - Each argument specifices the genes that are expressed in the respective sample. Individual genes should be separated by semicolons (e.g.,
GENE1;GENE2;GENE3;GENE4
). It does not matter whether a gene appears more than once. - The program should print three values, each on a separate line:
- the number of (unique) genes detected in the first sample
- the number of (unique) genes detected in the second sample
- the similarity of the samples, which is measured by the Jaccard index.
Store the function in exercise_2_5.py
.
You may check that your program works correctly by using the following exemplary calls:
$ python exercise_2_5.py "RAB5C;PITX2;ZNF222;LMTK2;LMTK2" "RAB5C;PABPC4;ZNF222;PKN1;PTMA"
4
5
0.2857142857142857
$ python exercise_2_5.py "THOC5;RAD23B;GPR31;PIRC85;PANO1" "THOC5;ATP1A2;GPR31;THOC5"
5
3
0.3333333333333333
$ python exercise_2_5.py "RAD23B" "RAD23B"
1
1 1.0
- Your program has to process command line arguments.
- To split each argument into individual genes, the string method
split()
may be useful – see documentation. - Consult the sets section on set creation from lists and calculation of union and intersection.