Python Unit 2

Control flow

The Python interpreter executes the statements in a Python script one by one, and quits after the last statement.

a = 4
b = 9
c = 5
mean = (a + b + c) / 3
print(mean)

Evidently, this basic flow of control only allows you to write the most simple programs. Therefore, Python supports compound statements that alter the control flow.

Note

You may run all of the following examples in interactive mode. However, if you are writing a compound statement, entering several indented lines may be tedious. Thus, you might prefer to create one Python script per example (such as func_test.py) and then run this script (e.g., python func_test.py).

Functions

Call a function by writing

  • function name
  • left parenthesis
  • arguments separated by commas
  • right parenthesis

We already encountered print(), a so-called built-in function that prints its argument(s) to the standard output (stdout).

print("Hello world!")

Define a function by writing

  • keyword def
  • function name
  • left parenthesis
  • parameters separated by commas
  • right parenthesis
  • colon
  • indented statements (suite)

The following function calculates the mean of three numbers:

def mean_of_three(a, b, c):
    result = (a + b + c) / 3
    return result

m = mean_of_three(5, 8, 2)
print(m)

A function may pass values to the caller by using the return statement. In the example above, Python assigns the return value to the variable m.

  • A statement such as return value1, value2, value3 tells a function to return several values.
  • A function that lacks a return statement implicitly returns None.

In order to correctly use functions, we need to understand the difference between parameters and arguments:

  • A parameter (also called “formal parameter”) is part of the function definition and specifies the number (and sometimes also the type) of input values the function may receive.
  • An argument (also called “actual parameter”) is the value that is passed to a function when it is called.

When defining a function, we need to distinguish between required and optional parameters:

  • A required parameter must receive a value when the function is called.
  • An optional parameter may be omitted when the function is called. In this case, a default value that was specified in the function definition is assigned to the parameter.

Below, we define an exponential function. The parameter exponent is required; by contrast, base is optional, with a default value of 2.71828.

def exp(exponent, base=2.71828):
    return base ** exponent

print(exp(4))  # equal to exp(4, 2.71828)
print(exp(4, 10))

When calling a function, we need to distinguish between positional and keyword arguments:

  • A positional argument is assigned to a parameter based on its position.
  • A keyword argument is assigned to a parameter based on its key.
print(exp(4, 10))                # 4 -> exponent, 10 -> base
print(exp(base=10, exponent=4))  # 10 -> base, 4 -> exponent

If statements

if statements will execute a block of code if and only if a condition is true:

n = -1

if n < 0:
    print(n, "is negative")

Create an if statement by writing

  • keyword if
  • condition (an expression that yields a Boolean value)
  • colon
  • indented statements (suite)

Python executes the code block following the if statement if the condition is true. If Python should also do something if the condition is false, you must add an else block:

n = 5

if n < 0:
    print(n, "is negative")
else:
    print(n, "is positive")

elif blocks allow you to chain if statements, i.e., to test an arbitrary number of conditions and execute a distinct set of statements for each condition. Thus, our final if statement, which also checks whether a provided number is zero, is:

if n < 0:
    print(n, "is negative")
elif n > 0:
    print(n, "is positive")
else:
    print(n, "is zero")
Example

Let’s put the if statement which we developed above into a function. This will allow us to test several numbers for their sign. We will replace the print() statements with return, since our function should not print its result, but rather return it.

The function will format the result by means of an f-string (“formatted string literal”). An f-string is written as f" ... ". It behaves like a normal string, but each part enclosed in curly braces {} is interpreted as Python statement whose result is inserted into the string.

f"An addition: {2+3}"
#> 'An addition: 5' 

Store the following code in test_sign.py and execute the file with Python:

def test_sign(n):
    if n < 0:
        return f"{n} is negative"
    elif n > 0:
        return f"{n} is positive"
    else:
        return f"{n} is zero"

print(test_sign(102))
print(test_sign(-12))
print(test_sign(0))

Loops

while loops

A while loop executes its suite as long as the given condition is true. Create such a loop by writing

  • keyword while
  • condition (an expression that yields a Boolean value)
  • colon
  • indented statements (suite, loop body)

Store the following code in calc_powers.py and execute the file with Python:

i = 1

while i < 10000:
    i = i * 2
    print(i)

Typically, the condition will become false after several iterations of the loop body. Above, the variable i receives a new value whenever the loop body is run.

for loops

In Python, a for loop is a so-called iterative loop, since it executes the same code block once for each value in a sequence and may access the current element within the loop body.

Create a for loop by writing

  • keyword for
  • loop variable
  • keyword in
  • iterable object (e.g., string or list)
  • colon
  • indented statements (suite, loop body)
for letter in "Python":
    print(letter)

The loop above is executed six times, and during each run, the loop variable letter contains a different character of the string "Python" (i.e., the values "P", "y", "t", "h", "o", and "n").

Example

A common theme is to iterate over a list and store some information on each list element for later. Below, we initialize a counter variable with zero. We then iterate over a list of numbers and add each element to the counter variable. After the loop has finished, this variable therefore contains the sum of all numbers in the list.

counter = 0

for n in [1, 5, 13, 9, 4]:
    counter = counter + n

print(counter)
#> 32

Sometimes we wish to iterate over a long sequence of numbers. It would be cumbersome if we had to explicitly write a list containing the numbers from, say, 0 to 10000 – there must be a simpler approach, right? Indeed, there is the built-in range() function, which generates a sequence of numbers.

Depending on the numbers of arguments passed to range(), it will return

  • a sequence of integers from 0 to a final value (excluded)

    for i in range(10):
        print(i)
    #> 0, 1, …, 9
  • a sequence of integers from an initial value (included) to a final value (excluded)

    for i in range(2, 7):
        print(i)
    #> 2, 3, 4, 5, 6
  • a sequence of integers as above, using the third argument as step size:

    for i in range(20, 3, -3):
        print(i)
    #> 20, 17, 14, 11, 8, 5

Modularization

Modular programming is a technique that allows to split code into several files, each of which contains related functions and data types. Thereby, code may easily be reused, since a function

  • has to be implemented only once in a module (also called package)
  • and then may be used by every program that imports this module.

For instance, the math module defines common mathematical functions. (math is part of the Python standard library, whose modules are available on each Python installation. Other modules must be installed by the user.)

The keyword import imports a module and creates a namespace with the same name, which allows us to call a function from the module as follows:

  • namespace
  • .
  • function name
import math

math.factorial(7)

We also may select a different name for the namespace upon importing:

import math as mathematics

mathematics.factorial(7)

Moreover, individual (or all) functions of a module may be imported into the global namespace:

from math import factorial

factorial(7)

To increase readability of your code, you should place all import statements at the top of your Python script.

Data structures

Lists

Lists belong to the class of compound data types, which serve as a “collection” of other data types. Lists are created by enclosing comma-separated elements in brackets. For instance, the following list comprises the first eight prime numbers:

primes = [2, 3, 5, 7, 11, 13, 17, 19]
primes

Indexing and slicing work as explained for strings, returning a single element or a new list containing the sliced elements, respectively:

primes[3]
primes[2:5]

Since lists are mutable, you may change their elements:

primes[3] = 100
primes

The built-in function len() returns the number of elements in a list:

len(primes)

Importantly, you may iterate over a list via a for loop:

for p in primes:
    print(p)

The elements of a list may be lists themselves, which yields nested lists:

nested_list = [
    [1, 2, 3],
    [4, 5],
    [6, 7, 8, 9]
]
nested_list

Indexing and slicing also work for nested lists:

nested_list[2]
nested_list[2][1]
nested_list[2][1:3]

Since lists are objects, they may be manipulated by special functions called methods. Methods are called like functions, but refer to a given object. Thus, the syntax for calling a method is as follows:

  • name of the object
  • .
  • name of the method
  • left parenthesis etc. (like a function call)

Here are some of the methods supported by lists (check the contents of bases after every statement!):

bases = ["A", "G", "X", "Y"]
bases.append("Z")         # append an element to the list
del bases[2]              # delete element with index 2
l = bases.pop()           # remove the last element and return it
bases.extend(["T", "U"])  # appends elements from another list
bases.insert(1, "C")      # insert the element given by the second argument
                          # at the index given by the first argument
bases.reverse()           # reverse order of elements
bases.clear()             # delete all elements

Tuples

Tuples are similar to lists, since they also represent a sequence of items. In contrast to lists, however, tuples are immutable – once you have created a tuple, you can’t change it. Tuples support common list operations:

bases = ("A", "C", "C", "G", "T")
"A" in bases      # check whether the tuple contains an element
bases[1:3]        # slicing
len(bases)        # number of elements
min(bases)        # smallest element
max(bases)        # largest element
bases.count("C")  # count the number of a given element
bases.index("G")  # find the index of a given element

Sets

The set is a sequential data type that represents a mathematical set – thus, sets are unordered. Both mutable and immutable variants are available (set and frozenset, respectively). We create a set by placing their elements between curly braces, or by converting from another sequential type:

A = {2, 4, 5}        # variant 1: direct
A1 = set([2, 4, 5])  # variant 2: converting a list

A == A1              # are these sets equivalent?

Besides adding and removing elements, you may also perform the classical set operations:

B = {1, 2, 3, 4, 5}
B.discard(5)  # remove element if it exists
B.add(7)      # add element
B.add(7)      # set does not change – each element must be unique!

A | B  # union (OR)
A & B  # intersection (AND)
A - B  # difference
B - A  # note that the difference is not commutative
A ^ B  # symmetric difference (XOR)

Dictionaries

Dictionaries (“dicts”) are unordered collections of key-value pairs, where keys have to be unique. Create a dict

  • either by separating key-value pairs by colons and enclosing comma-separated pairs with curly braces,

    atom_names = {"C": "carbon", "H": "hydrogen", "N": "nitrogen"}
    atom_names
  • or by using the built-in function dict(), to which we supply the key-value-pairs as keyword arguments:

    atom_names_2 = dict(C="carbon", H="hydrogen", N="nitrogen")
    atom_names_2
    atom_names_2 == atom_names

Typical dict operations include inserting, deleting, and – most importantly – searching:

atom_names["C"]             # find the value associated with key "S"
atom_names["O"] = "oxygen"  # insert a new key-value pair
del atom_names["H"]         # delete the value associated with "H"

You may iterate over all key-value pairs in a dict via the .items() method and a for loop:

for key, value in atom_names.items():
    print(key, value)

Exercises

Store all files that you generate for unit 2 in the folder python2 in your home directory.

Exercise 2.1 (3 P)

You have successfully solved this exercise as soon as you have worked through this unit. In particular, the folder python2 must contain the following files, which you have created in the course of this unit:

  • test_sign.py
  • calc_powers.py

Exercise 2.2 (3 P)

Implement a function get_charge that checks whether an amino acid is positively charged (e.g., arginine), negatively charged (e.g., aspartate), or neutral (e.g., valine).

  • The function should be called with a single argument that specifies the amino acid using its single-letter abbreviation. Only the 21 eukaryotic proteinogenic amino acids should be considered.
  • Depending on the charge, the function should return one of the strings "positive", "negative", or "neutral".
  • If the user does not supply a valid abbreviation, the function should return "invalid input".

Store the function in exercise_2_2.py.

You may check that your program works correctly by using the following exemplary calls:

print(get_charge("D"))
#> "negative"

print(get_charge("F"))
#> "neutral"

print(get_charge("H"))
#> "positive"

print(get_charge("foo"))
#> "invalid input"

You have learned how to put an if statement into a function. In the section on tuples, you checked whether a tuple contained a given element. A similar test will be required for the if statement in this exercise.

Exercise 2.3 (3 P)

Implement a function get_average_gc that calculates the average GC-content of several DNA or RNA sequences.

  • The function should be called with a single argument. It will receive a list containing an arbitrary number of strings, each of which represents a nucleotide sequence.
  • The function should return a single number, i.e., the average GC-content across all sequences.

Store the function in exercise_2_3.py.

You may check that your program works correctly by using the following exemplary calls:

print(get_average_gc(["CGACCACATTAATGGACTAC", "CTTGGATTATCACCCCCGTC"]))
#> 0.5

print(get_average_gc(["CCACCGAGACGCCAG", "CGGCCTCCCGAC", "GCCCGGGCCCGCGGGGGT"]))
#> 0.837037037037037
  • Use a for loop to iterate over the strings in the list.
  • You will need a counter variable.
  • s.count("X") counts the number of times the character "X" appears in the string stored in s.
  • The len() function also works on strings.

Exercise 2.4 (3 P)

If you code a lot, you will often use third-party packages and consult their documentation. This is what we will do in this exercise: Implement a function make_sequence that creates a random DNA or RNA sequence of a given length and calculates how often a given base appears in this sequence.

  • The first (required) parameter sets the sequence length.
  • The second (optional) parameter rna decides whether an RNA or DNA sequence should be created. This parameter has a Boolean type and is false by default.
  • The third (optional) parameter count_base names the base that should be counted (default: "A").
  • The function should return two values: (1) A string representing the created sequence, and (2) how often the specified base appears.

Since the function should create the sequence randomly, you might consider to use the function choices() available in the random module. This module is part of the standard library – please read the documentation. You will also need the string method join() to convert the value returned by choices() into a string. Since string is a built-in data type, it is also documented in the standard library.

Store the function in exercise_2_4.py.

You may check that your program works correctly by using the following exemplary calls:

random.seed(42)
print(make_sequence(20, False, "A"))
#> ('GACAGGTACAAGAAGGAGTA', 9)

print(make_sequence(15, rna=True, count_base="C"))
#> ('UGCAUCAAUGUGGUC', 3)
  • Exactly follow the specification of function parameters and return values:
    • Should a parameter have a certain name?
    • If a parameter is optional, what is its default value?
    • How many values should the function return?
    See the functions section for details.
  • Remember how to import a function from a module.

Exercise 2.5 (3 P)

In this exercise, you will create a command line program that calculates the similarity of two tissue samples by means of their expressed genes.

  • The program should process two command line arguments genes1 and genes2.
  • Each argument specifices the genes that are expressed in the respective sample. Individual genes should be separated by semicolons (e.g., GENE1;GENE2;GENE3;GENE4). It does not matter whether a gene appears more than once.
  • The program should print three values, each on a separate line:
    1. the number of (unique) genes detected in the first sample
    2. the number of (unique) genes detected in the second sample
    3. the similarity of the samples, which is measured by the Jaccard index.

Store the function in exercise_2_5.py.

You may check that your program works correctly by using the following exemplary calls:

$ python exercise_2_5.py "RAB5C;PITX2;ZNF222;LMTK2;LMTK2" "RAB5C;PABPC4;ZNF222;PKN1;PTMA"
4
5
0.2857142857142857

$ python exercise_2_5.py "THOC5;RAD23B;GPR31;PIRC85;PANO1" "THOC5;ATP1A2;GPR31;THOC5"
5
3
0.3333333333333333

$ python exercise_2_5.py "RAD23B" "RAD23B"
1
1
1.0
  • Your program has to process command line arguments.
  • To split each argument into individual genes, the string method split()may be useful – see documentation.
  • Consult the sets section on set creation from lists and calculation of union and intersection.