image.png


Section 1: Data Type

A. Overview:

  • Number
  • Boolean
  • List
  • String
  • Data Type Conversion
  • Dictionary
  • DataFrame

For more details, please refer to Python official documentation: https://docs.python.org/3.7/library/datatypes.html

B. Number:

  • Integer: abbr. int, signed whole number, e.g. 1 or 0
  • Float: signed decimal number, e.g. 1.0 or 0.0
**Extra Knowledge** We can use funtion type(variable name) to figure out the data type of a certain variable.

Try running cells below:

In [ ]:
a=0
print(type(a)) #This is a function of function. print() will use return value of type() as its input value.
In [ ]:
b=0.0
print(type(b))
**Tip** Python sets variable's type based on its value. If you change its value to another data type, variable's type will also be changed accordingly.
In [ ]:
a=0
print(type(a))
a=0.0
print(type(a))
**Tip** You can use `round(number,decimals)` to round numbers.
In [ ]:
a=1.24343
print(round(a))
print(round(a,2))

Practice:


Suppose a=1.0, b=1, c=2

1. What is the data type of (a+b)?

2. What is the data type of (a/b)?

3. What is the data type of (a*b)?

4. What is the data type of (b+c)?

5. What is the data type of (b/c)?

In [ ]:
#Write down your code here
#---------------------------------------------------------
#HINT:
#Step 1: assign values to a, b and c.
#Step 2: print out data types of required questions.
#---------------------------------------------------------

C. Boolean:

  • abbr. bool
  • Special type of Integer which is Dichotomous with only two potential values True and False
In [ ]:
a=True
print(type(a))
In [ ]:
#Boolean and Integer are often used interchangably. True = 1 and False = 0.
#Try applying mathematical calculation onto variable "a".

D. List:

  • List contains a series of values. Each value is an "element" or "item" of list.
  • List elements can be of heterogenous data types.
  • How to create a list?
    • Use sqaure brackets, separating elements with commas: [a,b,c]
  • How to refer to an element?
    • Format: list_name[index]
    • 0-Based Index: the index of element starts from 0, i.e. the first element is with index of 0.
Element H e l l o !
Index 0 1 2 3 4 5
In [ ]:
a=[1,2,3] #list of whole numbers
print(type(a))

b=[1,2,True] #list of whole numbers and a boolean value
print(type(b))

c=[1,2,'hello','world'] #list of whole numbers and strings
print(type(c))

d=[1,2,[3,4]] #list of list
print(type(d))
**Extra Knowledge**
1. We can use funtion len(list) to check the length of list.
2. We can use function sum(list) to get the total of a list of numbers.
3. We can use function max(list) to get the max value of a list of numbers.
4. We can use function min(list) to get the min value of a list of numbers.
5. We can use method .count(value) to count the frequency of a certain value.
6. We can use method .extend(list) and .append(list) to add new elements to the list.
In [ ]:
#Try len(), sum(), max(), min(), .count(), .extend() and .append()

E. String

  • String is a special type of list whose elements are all characters.
  • Use quotes to denote string
    • No difference between single quotes and double quotes.
    • Either single quote or double quote must be used in pairs, namely a string beginning with a double quote must end with a double quote.
In [ ]:
a='hello world!'
print(type(a))
a="hello world!"
print(type(a))
**Extra Knowledge** We can use method .lower() and .upper() to quickly change the case of a string.
In [ ]:
#Try lower() and upper() methods.
**Extra Knowledge** We can use method .split(char) to split the string by a given character and get a list of sub-strings.
In [ ]:
a='This is an apple.'
#How many words are there in sentence a?

F. Data Type Conversion

  • Forced conversion by function: int(), float(), bool(), list(), str()

Practice:


Write some commands to figure out which of the following pairs of conversion are workable?

Hint: to solve the first question, you can code as follows:

a = 0
print(type(a))
a=float(a)
print(type(a))
In [ ]:
#Write down your code here
#------------------------------------------------

Break


G. Dictionary

  • Dictionary is a list of key → value pairs. Keys are unique identifiers of elements. Values can be hetergeneous.
  • The main usage of dictionary is to look up a value based its key.
  • How to create a dictionary?
    • Use curly bracket, separating elements with comma.
    • Each element contains a unique key and a value, separated by colon.
    • Keys and values can be of any data type
    • Format: {key1:value1, key2:value2}
  • How to look up a value for a given key?
    • Format: dictionary_name[key]
In [ ]:
# Suppose we have three participants in our class: king-wa (id=1001), junior (id=1002), benjamin (id=1003).
# Try generating a dictionary of participants' ids and their names.
In [ ]:
# What are the name, gender and affiliation of participant whose id is 1003?
**Tip** Dictionary has a famous counterpart in JavaScript named JSON. Compared with previous five data types, dictionary operates at a higher level as it can represent not only the values but also the relationship between values. Moreover, it is highly human readable.
**Extra Knowledge** We can use method `.keys()` to extract all keys and `.values()` to extract all values.
In [ ]:
# Try .keys() and .values() methods.

H. DataFrame

  • A special data type of Pandas library.
  • The main usage of dictionary is to look up a value based its key.
  • How to create a DataFrame?
    • Use Pandas' Function pd.DataFrame(data=...[,index=...,columns=...]). Arguments in square bracket are optional.
    • Import from other data source, like csv, excel or json. For example, pd.read_csv(file path [, indexcol=..., header=...]).
  • How to look up a value for a given key?
    • Format: dataframe.loc[index]
In [ ]:
import pandas as pd
#We have three ways to create a DataFrame, i.e. Progressive way, Radical way and Easy way.
#1: Progressive way
a=pd.DataFrame([[1001,'king-wa','M','JMSC'],[1002,'junior','F','JMSC'],[1003,'benjamin','M','SHKS']])
In [ ]:
a.columns=['id','name','gender','affiliation']
In [ ]:
a=a.set_index('id')
In [ ]:
a
In [ ]:
#2: Radical way
a=pd.DataFrame([['king-wa','M','JMSC'],['junior','F','JMSC'],['benjamin','M','SHKS']],columns=['id','name','gender','affiliation'],index=[1001,1002,1003])
In [ ]:
#Retrieve values
In [ ]:
#3: Easy way\
#format: pd.read_csv(file path [, header=..., indexcol=...])
b=pd.read_csv('COMM_journals.csv',header=0,index_col=0)
In [ ]:
b

Quiz

Q1. Among the following which will create a list?

(A) a = "1,2,3"
(B) a = [1,2,3]
(C) a = (1,2,3)
(D) a = {1,2,3}

Q2. Among the following which will create a dictionary?

(A) a = {1:2}
(B) a = {1,2}
(C) a = {1;2}
(D) a = (1,2)

Suppose I have a dictionary named "dic"

dic = {'a':5,'b':4,'c':3,'d':2,'e':1}

Q3. Among the following which can help retrieve the value of "a" from dic?

(A) dic{a}
(B) dic[a]
(C) dic['a']
(D) dic{'a'}

Section 2: Some useful built-in functions

A. File I/O:

  • Use open(path[,mode='r']) function.
    • Modes: read ('r') or write ('w') or both ('r+') or append ('a')
  • Input: .readlines() method will extract all content in the file as a list of strings. One paragraph, one string.
  • Output: .write(string) method write the given content to the file, from the beginning of file if mode 'w' is used or from the bottom of file if mode 'a' is used.
  • Save and Close File: .close() method
In [ ]:
# Create a new file, add new lines to it and close it.
In [ ]:
# Open file you just created, read the existing lines and print them one by one.

B. For Loop

  • for loop is used to iterate through every element in a list and repeatedly execute commands after the colon.
    • Format: for a in list_name: ...
    • Usually coupled with range([start=0,] stop[, step=1]) function, which will automatically create a list of continuous whole numbers ranging from the start number and stopping at but not including the stop number.
    • Syntax: In above format, ... is a block of commands subordinate to for loop. They are only functional within for loop. Python requires indent to group commands into block. For example:
      for a in range(5):
      print(a)       #Use indentation to denote a Block subordinate to above statement
      print(a+1)
      print(a+2)
      
In [ ]:
a=[0,1,2,3]
for i in a:
    print(i)
In [ ]:
for i in range(4):
    print(i)
In [ ]:
for i in range(0,4,2):
    print(i)
**Extra Knowledge**
We can use a loop in the list to create a new list based on an old one.
In [7]:
a=[0,1,2,3]
#two ways to increase every element in a by one unit
#---------------------------------
#1



#---------------------------------
#2
**Extra Knowledge**
1. We can use continue statement to ignore following commands and directly jump to next iteration.
2. We can use break statement to quit the loop.
In [ ]:
#Try continue and break
In [ ]:
for i in 'Hello!':
    print(i)
In [ ]:
#Print the characters at even indexes in 'Hello!'
a='Hello!'
In [ ]:
#Repeat every line in above created file three times, i.e. copy each line and paste it three times to the file. Save the outputs.
In [ ]:
#Cut lines in above created file by words. One word, one line. Save the outputs.

C. If/Else Statement

  • If/Else Statement is used to test whether a condition is True. If yes, do something. If not, do something else. Else statement is optional.
  • Format: if logical_condition1 :... (else: ...)
  • Example:
    if a==1:
      print('yes') #Block A
    else:
      print('no') #Block B
    
In [ ]:
#Please use for loop and If/Else statement to select all even numbers from 0 to 19 and print them out one by one.
**Extra Knowledge** If/Else statement can be upgraded into a If/Elif/Else statement.
In [ ]:
a=1
if a<0:
    print('negative')
elif a==0:
    print('neutral')
else:
    print('positive')

Practice

In [ ]:
#Use If/Elif/Else statement to allocate a patient with records as below:
a={'new patient':False,'unpaid bill':False}

Section 3: Build our own function

  • Function is a block of reusable codes. Annotation: y=f(x), where x is a list of input variables and y is a list of output variables.
    • Terminology: input variables = parameters, output variables = returned variables and their actual values = arguments
    • Global vs Local: function can create its local variables that are only used inside its boundary. Local variables can use same names as global variables without overriding their values.
    • Format:
      def function_name(input1[,input2,input3...]):
        command line
        return
      
  • The function of function is to transform x into y. Like a magic trick turning a girl into a tiger.
In [ ]:
#Wrap our preview If/Elif/Else statements into a customer function, which takes patient record dictionary as input and return.
a={'new patient':False,'unpaid bill':False}

Presidential inauguration speeches capture the sentiment of the time.

Practice: Inauguration Speech

Expected Objectives:

  1. Total number of sentences in the speech
  2. Total number of words in the speech
  3. Average length of sentences
  4. Coleman–Liau index of Readablity

Coleman–Liau index:

CLI = 0.0588 * L - 0.296 * S - 15.8
L is the average number of letters per 100 words and S is the average number of sentences per 100 words.

In [ ]:
presidents=['Washington','Jefferson','Lincoln','Roosevelt','Kennedy','Nixon','Reagan','Bush','Clinton','W Bush','Obama','Trump']
In [ ]:
for president in presidents:
    file=open('doc\\'+president+'.txt','r')
    paragraphs=file.readlines()
    paragaraph_count=           #Write your command here
    sentence_count,word_count,letter_count=readablity_test(paragraphs)
    CLI=0.0588*(letter_count/word_count*100)-0.296*(sentence_count/word_count*100) - 15.8
    if CLI <= 6:
        grade_level='primary'
    elif CLI<=12:
        grade_level='secondary'
    elif CLI<=16:
        grade_level='undergrad'
    else:
        grade_level='postgrad'
    print(president,':',sentence_count,'sentences,',word_count,'words,',round(word_count/sentence_count),'words/sentence, CLI at',round(CLI),',',grade_level,' level')
In [ ]:
def readablity_test(paragraphs):
#Define a customer function readablity_test() to output sentence_count,word_count and letter_count







    return sentence_count,word_count,letter_count
In [ ]:
#Save results to a new file