Week 4: Function


Section 1: Assignment QA¶


Section 2: Data Type (Continued) & Function¶

G. Dictionary

  • Dictionary is a list of key → value pairs. Keys are unique identifiers of elements. Values can be of different data type.
  • The main usage of dictionary is to look up a value based its on key.
  • How to create a dictionary?
    • Use curly braces {}, separating elements with comma.
    • Each element contains a unique key and a value, separated by colon.
    • Keys and values can be of any data type
    • Format: {key1:value1, key2:value2}
  • How to look up a value for a given key?
    • Format: dictionary[key]
In [ ]:
# Suppose we have three participants in our class: peter (id=1001), yuner (id=1002), benjamin (id=1003).
# Try generating a dictionary of participants' ids and their names.
student_dic={1001:"peter",1002:'yuner',1003:'benjamin'}
In [ ]:
student_dic[1001]
In [ ]:
# We can expand the dictionary to have more values about students, such as gender and age
student_dic={1001:["peter","M",23],1002:['yuner','F',32],1003:['benjamin','M',24]}
In [ ]:
student_dic

We can use method .keys() to access all keys and .values() to access all values.

In [ ]:
# Try .keys() and .values() methods.
In [ ]:
# How many items are there in student_dic?

Exercise

Create two dictionaries of the top 5 movies on IMDB (https://www.imdb.com/chart/top/).
1.First dictionary: Use ranking as the key. Include movie name, rating and release year as the values.
2.Second dictionary: Use movie name as the key. Include ranking, rating, and release year as the values.

In [ ]:
 

To add new items, you can simply use the following syntax, much similar to variable assignment:


dictionary[key] = value


In [ ]:
#add one more movie to the dictionary
**Tip** Dictionary has a famous counterpart in JavaScript named JSON. Compared with previous five data types, dictionary operates at a higher level as it can represent not only the values but also the relationship between values. Moreover, it is highly human readable and indexing a dictionary by key is often the fastest way to search in Python.

H. DataFrame

  • Dataframe is a data structure supported by the pandas module, which is equivalent to table in common sense.
  • How to create a DataFrame?
    • Use Pandas' Function pd.DataFrame(data=...[,index=...,columns=...]). Arguments in square bracket are optional.
    • Import from other data source, like csv, excel or json. For example, pd.read_csv(file path [, indexcol=..., header=...]).
  • Dataframe structure: rows and columns
    • columns are also known as Series
  • Look up a value by index: dataframe.loc[index]
  • Look up a value by relative location: dataframe.iloc[index]
  • Access a single column: dataframe[column_name]
  • Access multiple columns: dataframe[[column1, column2 ...]]
In [ ]:
! pip3 install pandas
In [ ]:
import pandas as pd
#We have two ways to create a DataFrame, i.e. progressive way and import way.
#1.1: Progressive way BY ROW
a=pd.DataFrame([[1001,'petter','M','HKU'],[1002,'yuner','F','HKBU'],[1003,'benjamin','M','CityU']])
In [ ]:
a
In [ ]:
#change headers
a.columns=['id','name','gender','affiliation']
In [ ]:
a
In [ ]:
a.index=['a','b','c']
In [ ]:
a
In [ ]:
a.columns
In [ ]:
#1.2: Progressive way BY COLUMN
a=pd.DataFrame()
a['id']=[1001,1002,1003]
a['name']=['petter','yuner','benjamin']
a['gender']=['M','F','M']
a['affiliation']=['HK','HKBU','CityU']
In [ ]:
#check out the index for each row
a.index
In [ ]:
a
In [ ]:
#assign a column to be index column
a=a.set_index('id')
In [ ]:
a
In [ ]:
#Retrieve values by index
a.loc[1001]
In [ ]:
#Slice dataframe by index
a.loc[1001:1003]
In [ ]:
#Retrieve values by relative location
a.iloc[0]
In [ ]:
#Slice dataframe by relative location
a.iloc[:3]
In [ ]:
#Access a single column/series
a['name']
In [ ]:
#Access two columns/series
a[['name','gender']]
In [ ]:
#Value comparison
a['gender']=='M'
In [ ]:
#filter dataframe based on value comparison results
a[a['gender']=='M']

Download the data here: https://juniorworld.github.io/python-workshop/doc/COMM_journals.csv

In [ ]:
#2: Easy way
#format: pd.read_csv(file path [, header=..., index_col=...])
b=pd.read_csv('COMM_journals.csv',header=0,index_col=0)
In [ ]:
b
In [ ]:
b=pd.read_csv('https://juniorworld.github.io/python-workshop/doc/COMM_journals.csv',header=0,index_col=0)
In [ ]:
b
In [ ]:
b.head()
In [ ]:
b.tail()

Differences in Format/Synatx:

  • format of method: object.method_name(), e.g. L.sort(), L.remove(1)
  • format of function: function_name(), e.g. max(L), min(L)
  • format of attribute: object.attribute_name, e.g. b.shape, b.index, b.columns
In [ ]:
#check the dimensions of the dataframe with .shape attribute
b.shape
In [ ]:
#min, max, mode, median, sum, var, std of Journal Impact Factor

We can use methods:

  • Dataframe.sort_values(column_name[, ascending=True]) to sort dataframe by a column
  • Series.value_counts([normalize=False])
  • Dataframe.pivot_table(index=column1, [column=column2,] values=column3, aggfunc=function_name) to generate aggregate cross-tabulation about the dataframe
    • A combination of splitting dataframe, applying some function, and combining results
In [ ]:
b.sort_values('Journal Impact Factor')
In [ ]:
b.sort_values('Journal Impact Factor',ascending=False)
In [ ]:
#save Dataframe to a csv file
b.to_csv("output_data.csv")

Download the stackoverflow developer survey data here: https://juniorworld.github.io/python-workshop/doc/stack-overflow-developer-survey-2022-first1000.csv

In [ ]:
#read survey data into dataframe c
c = pd.read_csv("https://juniorworld.github.io/python-workshop/doc/stack-overflow-developer-survey-2022-first1000.csv")
In [ ]:
c.head()
In [ ]:
c.columns
In [ ]:
c.shape
In [ ]:
#break down dataframe by Gender, EdLevel, and Age
#.value_counts() & .value_counts(normalize=True)

Quiz 1¶

https://www.menti.com/alyoht9a49ac

Break¶


Section 3: Built-in functions¶

A. File I/O:

  • Use open(path[,mode='r']) function.
    • Modes: read ('r') or write ('w') or both ('r+') or append ('a')
  • Input: .readlines() method will extract all content in the file as a list of strings. One paragraph, one string.
  • Output: .write(string) method write the given content to the file, from the beginning of file if mode 'w' is used or from the bottom of file if mode 'a' is used.
  • Save and Close File: .close() method
  • Tips:
    • Add line break \n
    • Remove line break or whitespaces at the beginning or the end of a string: method .strip()
    • Use slash / to denote directory
In [ ]:
# Create a new file named "text.txt" in the current folder
In [ ]:
#add three new lines, separated by line break mark \n
In [ ]:
#close it
In [ ]:
# Create a child folder test
# Create another file named "text2.txt" in the child folder
In [ ]:
# Create a file named "text3.txt" in the parent folder
In [ ]:
# grandparent folder?
In [ ]:
#Open text.txt file
In [ ]:
#read the existing lines
In [ ]:
#print out the first line
In [ ]:
#calculate the length (number of characters, including spaces) of first sentence

B. If/Else Statement

  • If/Else Statement is used to test whether a condition is True. If statement is always followed by a question, a comparison condition, a logical operator, which will give you a boolen value of True or False. If True, do something. If False, do something else.
  • Format: if logical_condition1 :... (else: ...)
  • Example:

if a==1: print('yes') #Block A else: print('no') #Block B``` >

**Extra Knowledge** If/Else statement can be upgraded into a If/Elif/Else statement. Elif = Else + If

Practice: Realization of Decision Tree¶

In [ ]:
#Use If/Elif/Else statement to realize this decision tree
In [ ]:
#Demonstration
#Use If/Elif/Else statement to allocate a patient with records as below:
a={'new patient':False,'unpaid bill':False}