## Section 1: Data Type¶

A. Overview:

• Number
• Boolean
• List
• String
• Data Type Conversion
• Dictionary
• DataFrame

For more details, please refer to Python official documentation: https://docs.python.org/3.7/library/datatypes.html

B. Number:

• Integer: abbr. int, signed whole number, e.g. 1 or 0
• Float: signed decimal number, e.g. 1.0 or 0.0
**Extra Knowledge** We can use funtion type(variable name) to figure out the data type of a certain variable.

Try running cells below:

In [ ]:
a=0
print(type(a)) #This is a function of function. print() will use return value of type() as its input value.

In [ ]:
b=0.0
print(type(b))

**Tip** Python sets variable's type based on its value. If you change its value to another data type, variable's type will also be changed accordingly.
In [ ]:
a=0
print(type(a))
a=0.0
print(type(a))

**Tip** You can use round(number,decimals) to round numbers.
In [ ]:
a=1.24343
print(round(a))
print(round(a,2))


#### Practice:¶

Suppose a=1.0, b=1, c=2

1. What is the data type of (a+b)?

2. What is the data type of (a/b)?

3. What is the data type of (a*b)?

4. What is the data type of (b+c)?

5. What is the data type of (b/c)?

In [ ]:
#Write down your code here
#---------------------------------------------------------
#HINT:
#Step 1: assign values to a, b and c.
#Step 2: print out data types of required questions.
#---------------------------------------------------------


C. Boolean:

• abbr. bool
• Special type of Integer which is Dichotomous with only two potential values True and False
In [ ]:
a=True
print(type(a))

In [ ]:
#Boolean and Integer are often used interchangably. True = 1 and False = 0.
#Try applying mathematical calculation onto variable "a".


D. List:

• List contains a series of values. Each value is an "element" or "item" of list.
• List elements can be of heterogenous data types.
• How to create a list?
• Use sqaure brackets, separating elements with commas: [a,b,c]
• How to refer to an element?
• Format: list_name[index]
• 0-Based Index: the index of element starts from 0, i.e. the first element is with index of 0.
Element H e l l o !
Index 0 1 2 3 4 5
In [ ]:
a=[1,2,3] #list of whole numbers
print(type(a))

b=[1,2,True] #list of whole numbers and a boolean value
print(type(b))

c=[1,2,'hello','world'] #list of whole numbers and strings
print(type(c))

d=[1,2,[3,4]] #list of list
print(type(d))

**Extra Knowledge**
1. We can use funtion len(list) to check the length of list.
2. We can use function sum(list) to get the total of a list of numbers.
3. We can use function max(list) to get the max value of a list of numbers.
4. We can use function min(list) to get the min value of a list of numbers.
5. We can use method .count(value) to count the frequency of a certain value.
6. We can use method .extend(list) and .append(list) to add new elements to the list.
In [ ]:
#Try len(), sum(), max(), min(), .count(), .extend() and .append()


E. String

• String is a special type of list whose elements are all characters.
• Use quotes to denote string
• No difference between single quotes and double quotes.
• Either single quote or double quote must be used in pairs, namely a string beginning with a double quote must end with a double quote.
In [ ]:
a='hello world!'
print(type(a))
a="hello world!"
print(type(a))

**Extra Knowledge** We can use method .lower() and .upper() to quickly change the case of a string.
In [ ]:
#Try lower() and upper() methods.

**Extra Knowledge** We can use method .split(char) to split the string by a given character and get a list of sub-strings.
In [ ]:
a='This is an apple.'
#How many words are there in sentence a?


F. Data Type Conversion

• Forced conversion by function: int(), float(), bool(), list(), str()

#### Practice:¶

Write some commands to figure out which of the following pairs of conversion are workable?

Hint: to solve the first question, you can code as follows:

a = 0
print(type(a))
a=float(a)
print(type(a))

In [ ]:
#Write down your code here
#------------------------------------------------


# Break¶

G. Dictionary

• Dictionary is a list of key â†’ value pairs. Keys are unique identifiers of elements. Values can be hetergeneous.
• The main usage of dictionary is to look up a value based its key.
• How to create a dictionary?
• Use curly bracket, separating elements with comma.
• Each element contains a unique key and a value, separated by colon.
• Keys and values can be of any data type
• Format: {key1:value1, key2:value2}
• How to look up a value for a given key?
• Format: dictionary_name[key]
In [ ]:
# Suppose we have three participants in our class: king-wa (id=1001), junior (id=1002), benjamin (id=1003).
# Try generating a dictionary of participants' ids and their names.

In [ ]:
# What are the name, gender and affiliation of participant whose id is 1003?

**Tip** Dictionary has a famous counterpart in JavaScript named JSON. Compared with previous five data types, dictionary operates at a higher level as it can represent not only the values but also the relationship between values. Moreover, it is highly human readable.
**Extra Knowledge** We can use method .keys() to extract all keys and .values() to extract all values.
In [ ]:
# Try .keys() and .values() methods.


H. DataFrame

• A special data type of Pandas library.
• The main usage of dictionary is to look up a value based its key.
• How to create a DataFrame?
• Use Pandas' Function pd.DataFrame(data=...[,index=...,columns=...]). Arguments in square bracket are optional.
• Import from other data source, like csv, excel or json. For example, pd.read_csv(file path [, indexcol=..., header=...]).
• How to look up a value for a given key?
• Format: dataframe.loc[index]
In [ ]:
import pandas as pd
#We have three ways to create a DataFrame, i.e. Progressive way, Radical way and Easy way.
#1: Progressive way
a=pd.DataFrame([[1001,'king-wa','M','JMSC'],[1002,'junior','F','JMSC'],[1003,'benjamin','M','SHKS']])

In [ ]:
a.columns=['id','name','gender','affiliation']

In [ ]:
a=a.set_index('id')

In [ ]:
a

In [ ]:
#2: Radical way
a=pd.DataFrame([['king-wa','M','JMSC'],['junior','F','JMSC'],['benjamin','M','SHKS']],columns=['id','name','gender','affiliation'],index=[1001,1002,1003])

In [ ]:
#Retrieve values

In [ ]:
#3: Easy way\

In [ ]:
b


## Quiz¶

### Q1. Among the following which will create a list?

(A) a = "1,2,3"
(B) a = [1,2,3]
(C) a = (1,2,3)
(D) a = {1,2,3}


### Q2. Among the following which will create a dictionary?

(A) a = {1:2}
(B) a = {1,2}
(C) a = {1;2}
(D) a = (1,2)


#### Suppose I have a dictionary named "dic"

dic = {'a':5,'b':4,'c':3,'d':2,'e':1}


### Q3. Among the following which can help retrieve the value of "a" from dic?

(A) dic{a}
(B) dic[a]
(C) dic['a']
(D) dic{'a'}


## Section 2: Some useful built-in functions¶

A. File I/O:

• Use open(path[,mode='r']) function.
• Modes: read ('r') or write ('w') or both ('r+') or append ('a')
• Input: .readlines() method will extract all content in the file as a list of strings. One paragraph, one string.
• Output: .write(string) method write the given content to the file, from the beginning of file if mode 'w' is used or from the bottom of file if mode 'a' is used.
• Save and Close File: .close() method
In [ ]:
# Create a new file, add new lines to it and close it.

In [ ]:
# Open file you just created, read the existing lines and print them one by one.


B. For Loop

• for loop is used to iterate through every element in a list and repeatedly execute commands after the colon.
• Format: for a in list_name: ...
• Usually coupled with range([start=0,] stop[, step=1]) function, which will automatically create a list of continuous whole numbers ranging from the start number and stopping at but not including the stop number.
• Syntax: In above format, ... is a block of commands subordinate to for loop. They are only functional within for loop. Python requires indent to group commands into block. For example:
for a in range(5):
print(a)       #Use indentation to denote a Block subordinate to above statement
print(a+1)
print(a+2)

In [ ]:
a=[0,1,2,3]
for i in a:
print(i)

In [ ]:
for i in range(4):
print(i)

In [ ]:
for i in range(0,4,2):
print(i)

**Extra Knowledge**
We can use a loop in the list to create a new list based on an old one.
In [7]:
a=[0,1,2,3]
#two ways to increase every element in a by one unit
#---------------------------------
#1

#---------------------------------
#2

**Extra Knowledge**
1. We can use continue statement to ignore following commands and directly jump to next iteration.
2. We can use break statement to quit the loop.
In [ ]:
#Try continue and break

In [ ]:
for i in 'Hello!':
print(i)

In [ ]:
#Print the characters at even indexes in 'Hello!'
a='Hello!'

In [ ]:
#Repeat every line in above created file three times, i.e. copy each line and paste it three times to the file. Save the outputs.

In [ ]:
#Cut lines in above created file by words. One word, one line. Save the outputs.


C. If/Else Statement

• If/Else Statement is used to test whether a condition is True. If yes, do something. If not, do something else. Else statement is optional.
• Format: if logical_condition1 :... (else: ...)
• Example:
if a==1:
print('yes') #Block A
else:
print('no') #Block B

In [ ]:
#Please use for loop and If/Else statement to select all even numbers from 0 to 19 and print them out one by one.

**Extra Knowledge** If/Else statement can be upgraded into a If/Elif/Else statement.
In [ ]:
a=1
if a<0:
print('negative')
elif a==0:
print('neutral')
else:
print('positive')


#### Practice¶

In [ ]:
#Use If/Elif/Else statement to allocate a patient with records as below:
a={'new patient':False,'unpaid bill':False}


## Section 3: Build our own function¶

• Function is a block of reusable codes. Annotation: y=f(x), where x is a list of input variables and y is a list of output variables.
• Terminology: input variables = parameters, output variables = returned variables and their actual values = arguments
• Global vs Local: function can create its local variables that are only used inside its boundary. Local variables can use same names as global variables without overriding their values.
• Format:
def function_name(input1[,input2,input3...]):
command line
return

• The function of function is to transform x into y. Like a magic trick turning a girl into a tiger.
In [ ]:
#Wrap our preview If/Elif/Else statements into a customer function, which takes patient record dictionary as input and return.
a={'new patient':False,'unpaid bill':False}


Presidential inauguration speeches capture the sentiment of the time.

## Practice: Inauguration Speech¶

Expected Objectives:

1. Total number of sentences in the speech
2. Total number of words in the speech
3. Average length of sentences

### Coleman–Liau index:¶

CLI = 0.0588 * L - 0.296 * S - 15.8
L is the average number of letters per 100 words and S is the average number of sentences per 100 words.

In [ ]:
presidents=['Washington','Jefferson','Lincoln','Roosevelt','Kennedy','Nixon','Reagan','Bush','Clinton','W Bush','Obama','Trump']

In [ ]:
for president in presidents:
file=open('doc\\'+president+'.txt','r')
CLI=0.0588*(letter_count/word_count*100)-0.296*(sentence_count/word_count*100) - 15.8
if CLI <= 6:
elif CLI<=12:
elif CLI<=16:
else:

In [ ]:
def readablity_test(paragraphs):
#Define a customer function readablity_test() to output sentence_count,word_count and letter_count

return sentence_count,word_count,letter_count

In [ ]:
#Save results to a new file