Week 2: Data Type


Section 1: Data Type

A. Overview:

  • Number
  • Boolean
  • List
  • String
  • Data Type Conversion
  • Dictionary
  • DataFrame

For more details, please refer to Python official documentation: https://docs.python.org/3.7/library/datatypes.html

B. Number:

  • Integer: abbr. int, signed whole number, e.g. 1 or 0
  • Float: signed decimal number, e.g. 1.0 or 0.0
**Extra Knowledge** We can use funtion type(variable name) to figure out the data type of a certain variable.

Try running cells below:

In [ ]:
a=0
print(type(a)) #This is a function of function. print() will use return value of type() as its input value.
In [ ]:
b=0.0
print(type(b))
**Tip** Python sets variable's type based on its value. If you change its value to another data type, variable's type will also be changed accordingly.
In [ ]:
a=0
print(type(a))
a=0.0
print(type(a))
**Tip** You can use round(number,decimals) to round numbers to its nearest decimal places.
In [ ]:
a=1.24543
print(round(a))
print(round(a,2))

Practice:


Suppose a=1.0, b=1, c=2

1. What is the data type of (a+b)?

2. What is the data type of (a*b)?

3. What is the data type of (b+c)?

4. What is the data type of (b/c)?

</font>

In [ ]:
#Write down your code here
#---------------------------------------------------------
#HINT:
#Step 1: assign values to a, b and c.
#Step 2: print out data types of required questions.
#---------------------------------------------------------

Advanced Math

  • Try math or numpy module: modules are additional tools that can be imported to use
  • Similar to the shortcut in CLIs, you can use Tab to autofill syntax
  • You can use ? function_name to open the official explanations for a function
In [ ]:
import math
print(math.pi) #get the value of pi
print(math.sqrt(9)) #get the squared root of 9

Practice

Write two Python programs, one converting degree to radian and the other converting radian to degree.</font>

In [ ]:
#Converting degree to radian
degree = 23

#Converting radian to degree
radian = 216
In [ ]:
#try autocompleting syntax by Tab
In [ ]:
#try retrieving the official explanation for math.cos, math.radians, and  

C. Boolean:

  • abbr. bool
  • Special type of Integer which is Dichotomous with only two potential values True and False
In [ ]:
a=False
print(type(a))
In [ ]:
#Boolean and Integer are often used interchangably. True = 1 and False = 0.
#Try applying mathematical calculation onto boolean variable "a".
print(a+1)
print(a/2)
print(a-1)
print(a*2)

D. List:

  • List contains a series of values. Each value is an "element" or "item" of list.
  • List elements can be of heterogenous data types.
  • How to create a list?
    • Use sqaure brackets, separating elements with commas: [a,b,c]
In [ ]:
a=[1,2,3] #list of whole numbers
print(type(a))

b=[1,2,True] #list of whole numbers and a boolean value
print(type(b))

c=[1,2,'hello','world'] #list of whole numbers and strings
print(type(c))

d=[1,2,[3,4]] #list of list
print(type(d))
  • How to refer to an element?
    • Format: list_name[index]
    • 0-Based Index: the index of element starts from 0, i.e. the first element is with index of 0.
    • Backward index: negative indexes count back from the right and start with -1, i.e. the last element is with index of -1
    • Slicing: Use list_name[start_index : end_index] to access a part of the list
      • stop right before the end_index. The element at the end_index is not included in the returned list.
      • the left bound defaults to zero, and the right bound defaults to the length of the sequence being sliced.
      • start_index=0 can be omitted while end_index=last_index can be omitted
Element H e l l o !
Index 0 1 2 3 4 5
In [ ]:
e = [1, 2, 3, 4, 5, 6, 7]
#access the first element
print(e[0])
In [ ]:
#access the first two elements. 
#you cannot print the list directly. We will learn out to handle this issue later on. 
#Now, don't use print function but system output - one line a cell - to demonstrate results.
e[0:2]
In [ ]:
#access everything past the first element
e[1:]
In [ ]:
#access the second through the fourth element
e[1:5]
In [ ]:
#when the slice starts with the first element, the start index 0 can be omitted
e[:2]
In [ ]:
#access the last two elements
e[-2:]

Practice Initialize a list variable X=[1, 0, 1, 2, 3, 5, 10]

  1. Get the third element
  2. Get the first four elements
  3. Get the first to the fourth elements
  4. Get the last three elements
  5. Get the last element</font>
In [ ]:
#Q1
In [ ]:
#Q2
In [ ]:
#Q3
In [ ]:
#Q4
In [ ]:
#Q5

Extra Knowledge

  • Use function len(list) to check the length of list.
  • Use function sum(list) to get the total of a list of numbers.
  • Use function max(list) to get the max value of a list of numbers.
  • Use function min(list) to get the min value of a list of numbers.
  • Use method .count(value) to count the frequency of a certain value.
  • Use method .extend(list) and .append(list) to add new elements to the list.</div>
  • Extending list: list_1.extend(list_2) = list_1 + list_2
  • Use method .remove(value) to remove items by value
  • Use function del list[index] to remove an item by index
  • Use method .sort() to sort a list by value ascendingly
  • Use method .reverse() to reverse a list

Difference between Methods and Functions: Methods are associated with ONE object and will make changes to the object directly. Function is not associated with any object but it can take an object or SEVERAL objects as inputs for a calculation.

In [ ]:
#Try len(), sum(), max(), min(), .count(), .extend(), .append(), +, .remove(), del, .sort() and .reverse()

Break


E. String

  • String is a special type of list whose elements are all characters.
  • Use quotes to denote string
    • No difference between single quotes and double quotes.
    • A string beginning with a double quote must end with a double quote.
In [ ]:
a='hello world!'
#print the length of a -- How many CHARACTERS are there in the sentence?

#print the first letter in a

#print the first two letters in a

#print the last three letters in a
**Extra Knowledge** We can use method .lower() and .upper() to quickly change the case of a string.
In [ ]:
#Try lower() and upper() methods.
a='Hello World!'
a.upper()
In [ ]:
b="HeLLo WorlD!"
b.lower()
**Extra Knowledge**
Use method .split(char) to split the string by a given character and get a list of sub-strings.
Use method .replace(old_text, new text) to replace a given set of characters with another.
In [ ]:
name = "Jay Chou"
name.split(" ")
In [ ]:
random_text='fasdjkfakhfewiljrhewhfkjanfkjdsahgkjadhdfgjkald'
#How many character a are there in this piece of text?
In [ ]:
a='This is an apple. An apple a day keeps the doctor away.'
#How many WORDS does a have?

#How many times does the word "apple" appear in a?

#Replace "apple" with "orange"
In [ ]:
b='This is an apple. Apple is good for our health.'
#How many times does the word "apple" appear in b?

F. Data Type Conversion

  • Forced conversion by function: int(), float(), bool(), list(), str()

Exercise:
Write some commands to figure out which of the following pairs of conversion are workable?

Conversion Conversion Conversion Conversion
integer -> float integer -> boolean integer -> list integer -> string
float -> integer float -> boolean float -> list float -> string
boolean -> integer boolean -> float boolean -> list boolean -> string
list -> integer list -> float list -> boolean list -> string
string -> integer string -> float string -> boolean string -> list
In [ ]:
a = 1 #integer
float(a)
In [ ]:
#Write down your code here
#------------------------------------------------

G. Dictionary

  • Dictionary is a list of key → value pairs. Keys are unique identifiers of elements. Values can be of different data type.
  • The main usage of dictionary is to look up a value based its key.
  • How to create a dictionary?
    • Use curly braces {}, separating elements with comma.
    • Each element contains a unique key and a value, separated by colon.
    • Keys and values can be of any data type
    • Format: {key1:value1, key2:value2}
  • How to look up a value for a given key?
    • Format: dictionary[key]

In [ ]:
# Suppose we have three participants in our class: peter (id=1001), junior (id=1002), benjamin (id=1003).
# Try generating a dictionary of participants' ids and their names.
student_dic={1001:'peter',1002:'yuner',1003:'benjamin'}
In [ ]:
student_dic[1001]
In [ ]:
# We can expand the dictionary to have more values about students, such as gender and age
We can use method `.keys()` to extract all keys and `.values()` to extract all values.
In [ ]:
# Try .keys() and .values() methods.

Exercise

Create two dictionaries of the top 10 movies on IMDB (https://www.imdb.com/chart/top/).
1.First dictionary: Use ranking as the key. Include movie name, rating and release year as the values.
2.Second dictionary: Use movie name as the key. Include ranking, rating, and release year as the values.

In [ ]:

To add new items, you can simply use the following syntax, much similar to variable assignment:

dictionary[key] = value
In [ ]:
#add one more movie to the dictionary
**Tip** Dictionary has a famous counterpart in JavaScript named JSON. Compared with previous five data types, dictionary operates at a higher level as it can represent not only the values but also the relationship between values. Moreover, it is highly human readable and indexing a dictionary by key is often the fastest way to search in Python.

H. DataFrame

  • Dataframe is a data structure supported by the pandas module, which is equivalent to table in common sense.
  • How to create a DataFrame?
    • Use Pandas' Function pd.DataFrame(data=...[,index=...,columns=...]). Arguments in square bracket are optional.
    • Import from other data source, like csv, excel or json. For example, pd.read_csv(file path [, indexcol=..., header=...]).
  • Look up a value by index: dataframe.loc[index]
  • Look up a value by relative location: dataframe.iloc[index]
  • Access a single column: dataframe[column_name]
  • Access multiple columns: dataframe[[column1, column2 ...]]
In [ ]:
import pandas as pd
#We have three ways to create a DataFrame, i.e. Progressive way, Radical way and Easy way.
#1: Progressive way
a=pd.DataFrame([[1001,'petter','M','HKU'],[1002,'yuner','F','HKBU'],[1003,'benjamin','M','CityU']])
In [ ]:
a
In [ ]:
#change headers
a.columns=['id','name','gender','affiliation']
In [ ]:
a.columns
In [ ]:
#check out the index for each row
a.index
In [ ]:
#assign a column to be row index
a=a.set_index('id')
In [ ]:
a
In [ ]:
#2: Radical way
a=pd.DataFrame([['petter','M','HKU'],['yuner','F','HKBU'],['benjamin','M','CityU']],columns=['name','gender','affiliation'],index=[1001,1002,1003])
In [ ]:
a
In [ ]:
#Retrieve values by index
a.loc[1001]
In [ ]:
#Slice datafram by index
a.loc[1001:1003]
In [ ]:
#Retrieve values by relative location
a.iloc[0]
In [ ]:
#Slice datafram by relative location
a.iloc[:3]
In [ ]:
#Access a single columns
a['name']
In [ ]:
#Access two columns
a[['name','gender']]
In [ ]:
#Value comparison
a['gender']=='M'
In [ ]:
#filter dataframe based on value comparison results
a[a['gender']=='M']
In [ ]:
#3: Easy way
#format: pd.read_csv(file path [, header=..., indexcol=...])
b=pd.read_csv('COMM_journals.csv',header=0,index_col=0)
In [ ]:
b=pd.read_csv('https://juniorworld.github.io/python-workshop/doc/COMM_journals.csv',header=0,index_col=0)
In [ ]:
b.head()
In [ ]:
#check the dimensions of the datafram with .shape attribute
b.shape
In [ ]:
#Try different math operation!
#min, max, mode, median, sum, var, std of Journal Impact Factors
We can use methods: - `Dataframe.sort_values(column_name[, ascending=True])` to sort dataframe by a column - `Series.value_counts([normalize=False])` - `Dataframe.pivot_table(index=column1, [column=column2,] values=column3, aggfunc=function_name)` to generate aggregate cross-tabulation about the dataframe - A combination of splitting dataframe, applying some function, and combining results
In [ ]:
b.sort_values()
In [ ]:
#read survey data into dataframe c
c =
In [ ]:
c.columns
In [ ]:
#break down dataframe by Gender, EdLevel, and Age
#.value_counts() & .value_counts(normalize=True)
In [ ]:
#create a cross-tabulation between gender and ethnicity