A. Overview:
For more details, please refer to Python official documentation: https://docs.python.org/3.7/library/datatypes.html
B. Number:
int
, signed whole number, e.g. 1 or 0Try running cells below:
a=0
print(type(a)) #This is a function of function. print() will use return value of type() as its input value.
b=0.0
print(type(b))
a=0
print(type(a))
a=0.0
print(type(a))
a=1.24543
print(round(a))
print(round(a,2))
Suppose a=1.0, b=1, c=2
1. What is the data type of (a+b)?
2. What is the data type of (a*b)?
3. What is the data type of (b+c)?
4. What is the data type of (b/c)?
</font>#Write down your code here
#---------------------------------------------------------
#HINT:
#Step 1: assign values to a, b and c.
#Step 2: print out data types of required questions.
#---------------------------------------------------------
math
or numpy
module: modules are additional tools that can be imported to use? function_name
to open the official explanations for a functionimport math
print(math.pi) #get the value of pi
print(math.sqrt(9)) #get the squared root of 9
Practice
Write two Python programs, one converting degree to radian and the other converting radian to degree.</font>
#Converting degree to radian
degree = 23
#Converting radian to degree
radian = 216
#try autocompleting syntax by Tab
#try retrieving the official explanation for math.cos, math.radians, and
C. Boolean:
bool
a=False
print(type(a))
#Boolean and Integer are often used interchangably. True = 1 and False = 0.
#Try applying mathematical calculation onto boolean variable "a".
print(a+1)
print(a/2)
print(a-1)
print(a*2)
D. List:
[a,b,c]
a=[1,2,3] #list of whole numbers
print(type(a))
b=[1,2,True] #list of whole numbers and a boolean value
print(type(b))
c=[1,2,'hello','world'] #list of whole numbers and strings
print(type(c))
d=[1,2,[3,4]] #list of list
print(type(d))
list_name[index]
list_name[start_index : end_index]
to access a part of the listElement | H | e | l | l | o | ! |
---|---|---|---|---|---|---|
Index | 0 | 1 | 2 | 3 | 4 | 5 |
e = [1, 2, 3, 4, 5, 6, 7]
#access the first element
print(e[0])
#access the first two elements.
#you cannot print the list directly. We will learn out to handle this issue later on.
#Now, don't use print function but system output - one line a cell - to demonstrate results.
e[0:2]
#access everything past the first element
e[1:]
#access the second through the fourth element
e[1:5]
#when the slice starts with the first element, the start index 0 can be omitted
e[:2]
#access the last two elements
e[-2:]
Practice Initialize a list variable X=[1, 0, 1, 2, 3, 5, 10]
#Q1
#Q2
#Q3
#Q4
#Q5
Extra Knowledge
Difference between Methods and Functions: Methods are associated with ONE object and will make changes to the object directly. Function is not associated with any object but it can take an object or SEVERAL objects as inputs for a calculation.
#Try len(), sum(), max(), min(), .count(), .extend(), .append(), +, .remove(), del, .sort() and .reverse()
E. String
a='hello world!'
#print the length of a -- How many CHARACTERS are there in the sentence?
#print the first letter in a
#print the first two letters in a
#print the last three letters in a
#Try lower() and upper() methods.
a='Hello World!'
a.upper()
b="HeLLo WorlD!"
b.lower()
name = "Jay Chou"
name.split(" ")
random_text='fasdjkfakhfewiljrhewhfkjanfkjdsahgkjadhdfgjkald'
#How many character a are there in this piece of text?
a='This is an apple. An apple a day keeps the doctor away.'
#How many WORDS does a have?
#How many times does the word "apple" appear in a?
#Replace "apple" with "orange"
b='This is an apple. Apple is good for our health.'
#How many times does the word "apple" appear in b?
F. Data Type Conversion
int()
, float()
, bool()
, list()
, str()
Exercise:
Write some commands to figure out which of the following pairs of conversion are workable?
Conversion | Conversion | Conversion | Conversion |
---|---|---|---|
integer -> float | integer -> boolean | integer -> list | integer -> string |
float -> integer | float -> boolean | float -> list | float -> string |
boolean -> integer | boolean -> float | boolean -> list | boolean -> string |
list -> integer | list -> float | list -> boolean | list -> string |
string -> integer | string -> float | string -> boolean | string -> list |
a = 1 #integer
float(a)
#Write down your code here
#------------------------------------------------
G. Dictionary
key → value
pairs. Keys
are unique identifiers of elements. Values
can be of different data type.value
based its key
.{key1:value1, key2:value2}
dictionary[key]
# Suppose we have three participants in our class: peter (id=1001), junior (id=1002), benjamin (id=1003).
# Try generating a dictionary of participants' ids and their names.
student_dic={1001:'peter',1002:'yuner',1003:'benjamin'}
student_dic[1001]
# We can expand the dictionary to have more values about students, such as gender and age
# Try .keys() and .values() methods.
Exercise
Create two dictionaries of the top 10 movies on IMDB (https://www.imdb.com/chart/top/).
1.First dictionary: Use ranking as the key. Include movie name, rating and release year as the values.
2.Second dictionary: Use movie name as the key. Include ranking, rating, and release year as the values.
To add new items, you can simply use the following syntax, much similar to variable assignment:
dictionary[key] = value
#add one more movie to the dictionary
H. DataFrame
pandas
module, which is equivalent to table in common sense.pd.DataFrame(data=...[,index=...,columns=...])
. Arguments in square bracket are optional.pd.read_csv(file path [, indexcol=..., header=...])
.dataframe.loc[index]
dataframe.iloc[index]
dataframe[column_name]
dataframe[[column1, column2 ...]]
import pandas as pd
#We have three ways to create a DataFrame, i.e. Progressive way, Radical way and Easy way.
#1: Progressive way
a=pd.DataFrame([[1001,'petter','M','HKU'],[1002,'yuner','F','HKBU'],[1003,'benjamin','M','CityU']])
a
#change headers
a.columns=['id','name','gender','affiliation']
a.columns
#check out the index for each row
a.index
#assign a column to be row index
a=a.set_index('id')
a
#2: Radical way
a=pd.DataFrame([['petter','M','HKU'],['yuner','F','HKBU'],['benjamin','M','CityU']],columns=['name','gender','affiliation'],index=[1001,1002,1003])
a
#Retrieve values by index
a.loc[1001]
#Slice datafram by index
a.loc[1001:1003]
#Retrieve values by relative location
a.iloc[0]
#Slice datafram by relative location
a.iloc[:3]
#Access a single columns
a['name']
#Access two columns
a[['name','gender']]
#Value comparison
a['gender']=='M'
#filter dataframe based on value comparison results
a[a['gender']=='M']
Download the data here: https://juniorworld.github.io/python-workshop/doc/COMM_journals.csv
#3: Easy way
#format: pd.read_csv(file path [, header=..., indexcol=...])
b=pd.read_csv('COMM_journals.csv',header=0,index_col=0)
b=pd.read_csv('https://juniorworld.github.io/python-workshop/doc/COMM_journals.csv',header=0,index_col=0)
b.head()
#check the dimensions of the datafram with .shape attribute
b.shape
#Try different math operation!
#min, max, mode, median, sum, var, std of Journal Impact Factors
b.sort_values()
Download the stackoverflow developer survey data here: https://juniorworld.github.io/python-workshop/doc/stack-overflow-developer-survey-2022-first1000.csv
#read survey data into dataframe c
c =
c.columns
#break down dataframe by Gender, EdLevel, and Age
#.value_counts() & .value_counts(normalize=True)
#create a cross-tabulation between gender and ethnicity