WEEK 1¶
Section 1: Knowing Python¶
Python is a Programming Language
- Element: English words, numbers, and special characters.
- Syntax: sensitive to case and indent
Python is basically a simplified and logical version of English language. Super easy to learn.
Why Python is so popular?
- Readability: Coherence, easy to understand even if you don't write it
- Open source: free and fast-developing
- General-purpose: suitable for computer scientists, statistics, enterprises and generally everyone
- Computational efficiency: low-cost, fast-speed and requiring fewer lines (one-fifth the size of equivalent C++ or Java code)
- Extensive libraries: statistical analytics, visualization, machine learning
If you only minimum time to learn only one new programming lanugage, Python would be your best choice.
Everyone can program, no matter what your background is.
You don't need to know that much to make little incredible things in Python. Just some willingness to try and courage to make errors.
Section 2: Course Outline¶
Lectures
We will have a three-hour meeting every time, involving lectures and hands-on exercises, and 13 classes in total. Please bring along YOUR OWN LAPTOP to class.For students who are on sick leave, you should study the course materials on your own and book a face-to-face appointment with me or Randy if necessary.
Weekly Assignment
To consolidate the digestion of classroom knowledge, I will assign some homework after every class.Assignments are due at 23:59, every Saturday. Late submissions will be subject to a 10% deduction per day.
Quiz
We will have graded quizzes during the lecture, each accounting for 1 point only. The purpose of these quizzes is to strengthen your instant memory and motivate more attentive learning.Hackathon
At the end of the course, we will have an integrated Hackathon to demonstrate your coding and storytelling skills.
The Hackathon will last for 3 hours and you need to team with 4 other students. You will need to turn in draft and codes immediately after the Hackathon, which will be graded over a scale of 8 points in terms of the intensive problem-solving abilities.
You can continue refining report afterwards and submit a final version in 7 days. Hackathon's topic and format will be released one week ahead.
Time Management
Learning how to program is hard and time-consuming. You need to put in real efforts. I expect you to spend at least four hours every week, in your full capacity and fully concentrated.
If you don't have that much time, please skip this course. It would be disastrous for you.
Don't Cheat
- Plagiarism and dishonesty in all forms will result in an F grade.
Gen AI
Generative AI tools, such as ChatGPT, are allowed in this class. However, you are expected to:
- Use Gen AI ethically, wisely, and responsibly.
- Ensure that the material submitted for assessment is your own work based on your own ideas.
- Modify AI's answers to include your own insights.
- ACKNOWLEDGE all and any use of Generative AI tools in assignments. An acknowledgement template will be offered and you need to indicate your usage honestly by filling out that template.
- Assignment is a pathway to success. Gen AI might erode your potential to fully digest knowledge and generate original viewpoints. Overreliance on Gen AI tools might reduce your opportunities to develop your own distinctive intellectual qualities.
- Different grading criteria will be adopted for assignment submissions using and not using Gen AI.
Grading Criteria¶
- AI-assisted submissions:
- Integrity: Students who have used Gen AI tools without proper acknowledgement will lose all grades for a specific assignment.
- Accuracy-based grading: Errors and bugs will result in a deduction of grades.
- Students should demonstrate a clear understanding of codes submitted.
- Submissions without AI assistance:
- Integrity: Do your own work. No plagarism will be tolerated.
- Effort-based grading: You will get a decent (despite not necessarily a full) grade as long as you have made effort.
If you are uncertain about the correct answer, you can:
- Put down remarks using # syntax to demonstrate your chain of thought.
- Display different versions of codes that you have tried out
- You will get compensated even if you fail to obtain the correct answer.
Course Preview
Week | Content | Week | Content | |
---|---|---|---|---|
1 | Introduction & Installation | 8 | Web Crawling | |
2 | Data Structure | 9 | Web Crawling (Continued) | |
3 | Function | 10 | Natural Language Processing | |
4 | Function (continued) | 11 | Word2Vec | |
5 | Data Visualization | 12 | Unsupervised Machine Learning | |
6 | No class | 13 | No class | |
7 | Knowing HTML | 14 | Topic Modeling |
Assessment¶
Item | Credit |
---|---|
Quiz | 8 * 1% |
Weekly Assignment [scalable] | 12 * 6% |
Hackathon Team Project | |
※ 3-hour Draft (findings & codes) | 8% |
※ Final Report (text & codes) | 8% |
※ Peer Evaluation | 4% |
Total | 100% |
Section 3: Intended Outcomes¶
After this course, you will be able to:
- know how to program in Python
- have more skills to analyze communication technology
- have more common language with developers and have better understanding of tech terminologies, such as data frames, functions, supervised machine learning, unsupervised machine learning
- have hands-on experiences with popular Python libraries and know they are there if needed
Section 4: Computational Research Workflow¶
Association between Research Tasks and Python Libraries to learn¶
- Data Collection
Selenium
- Data Cleaning (Preprocessing)
Pandas
- Data Exploration
Numpy
Pandas
- Model Development
Sciki-learn
- Visualization
Plot.ly
Section 4: Installing Python¶
A. install python environment
Python 3.X is recommended. The number following "Python", i.e. 3.X, is the 'version number', which is composed of generation number and minor version number. The greater number, the more updated.
After 11 years' development, Python 3 has become very mature and stable. Almost all libraries and packages are supporting Python 3.X. And some important libraries, like TensorFlow, are exclusive to Python 3.X.
Tick "Add python.exe to PATH"
B. run python via CLIs
- Open Command Line Interface (CLI)
CLI is an interface allowing you to write textual instructions line by line to control computer do something at your command.
Compared to GUI (Graphical User Interface), CLI is faster and many functions are only available for CLI.
There are several CLIs available in Mac OS and Windows.
Mac OS:
Terminal
(built-in CLI): Spotlight Search/Launchpad -> Type "Terminal" -> Type "python3" or "py3"IDLE
(python-specific CLI): Spotlight Search/Launchpad -> Type "IDLE"
Mac OS Terminal:
string inside the blue box indicates current working directory
Or, you can type "pwd" to print working directory
Mac OS IDLE:
number inside red box is the version number
Windows:
PowerShell
(built-in CLI): Start Menu -> Type "PowerShell" -> Type "py"Command Prompt
(built-in CLI):Start Menu -> Type "Command Prompt" -> Type "py"Python Command Line
(python-specific CLI): Start Menu -> Type "Python" -> Choose "Python (Command Line)"
PowerShell: Windows+R -> Type "powershell"
Command Prompt:Windows+R -> Type "cmd"
Windows Command Prompt:
string inside the blue box indicates current working directory
C. run some basic commands
- math
- add: 1
+
2 - substract: 1
-
2 - multiply: 1
*
2 - divide: 1
/
2 - quotient: 14
//
2 - remainder: 14
%
2 - power: 5
**
3
- add: 1
Quiz 1: Basic Math
String
- string: single, double or even triple quotes
print(string)
: print('hi')- extend:'hi'
+
'!' - duplicate: 'hello'
*
20 - To print out any special character, you need to add a backslash before
\
'I don\'t think so'
'I don't think so'
Cell In[2], line 1 'I don't think so' ^ SyntaxError: unterminated string literal (detected at line 1)
'I don\'t think so'
"I don't think so"
"I don't think so"
"I don't think so"
'She says, "look at me and answer my question."'
'She says, "look at me and answer my question."'
Comparison
- comparison: also called logical conditions, because its return value is logical boolean value, i.e. True or False.
- equal: 1
is
0 or 1==
0 - not equal: 1
is not
0 or 1!=
0 - less than: 1
<
0 - less than or equal: 1
<=
0 - greater than: 1
>
0 - greater than or equal: 1
>=
0
- equal: 1
#comparison order -> you can add remarks like this by putting down a # in front of the line
2+1==1
False
2+1>1
True
2+1==2+1
True
Boolean
- boolean/logical operation:
- Only two possible results: True or False
- and: True only if all conditions are True.
and
,&
- or: True if at least one condition is True.
or
,|
- not: Reverse True to False and False to True.
not
,!
- boolean operation exercises:
- (1
>
0) and (1==
0) - (1
<
0) & (2<
4) - (1
>
0) or (3==
2) - (9
>
10) | (2>=
2) - (9
>
10) | ("hello world"=="Hello World") - not (1
==
0) - ! (11
>
10)
- (1
Variable
- Variables are placeholders, which can be used to hold some values, put something in memory, and later take it back out. They are the nicknames of values, designated by users.
- In Python, a variable can be directly assigned a value.
- Assign values to variable: a
=
1, here "a" is the variable name and 1 is its value.
- A variable's value can be updated anytime with different data types.
- a = 100 → a = "money" OK
- increase variables by 1 unit: a = a + 1 or a += 1
- decrease variables by 1 unit: a = a - 1 or a -= 1
- pass along a value from one variable to another: a = 100, b = a
a = 100
b = a
print(b)
100
Input
- input: an interactive way to initialize a variable:
input
: print + variable initialization- format:
a = input("some instruction for users:")
- effects: (1) print out the instruction, (2) take the input from users, (3) assign the input value to variable a
name = input("what is your name?")
Quiz 2: String, Boolean & Variable
D. exit python
- Use Quit Function:
quit()
- Shortcut: Ctrl+Z -> Press Enter or return
E. navigate CLI
- Using command
cd [folder name]
to a child folder or any folder with absolute directory path- child folder: cd desktop
- grandchild folder: cd desktop\backup
- folder with absolute path: cd C:\Users\yuner\Desktop
- Magic Trick of Autofill: Tab (or Tab several times to choose the correct directory)
- Clean screen:
cls
in Windows andclear
in MacOS
- Home, End, ←,→, Ctrl+←,Ctrl+→
- Using command
dir [folder name]
in Windows andls [folder name]
in MacOS to list all content under a directory- omitting folder name will give you content list of the current working directory
- understand the result list returned
- Using command
cd ..
to navigate to the parent folder of current folder
- Using command
mkdir [folder name]
to create a new directory- Don't use special characters, such as space, in the folder name
- Using command
rmdir [folder name]
to remove an existing directory- Only empty directory can be deleted directly
- To remove a non-empty directory, you should add /s before the folder name:
rmdir /s [folder name]
F. Summary
Generally speaking, every operating system has equipped with Linux-like built-in CLIs by default. They can be instantly switched to Python environment by simply one command, i.e. "py" or "py3".
Besides, a Python-specific CLI is provided after installation. You could find it by searching its name in Start Menu or Spotlight. Different from built-in CLIs, it is running under python environment by default so it doesn't require extra command to turn into this way.
However, the biggest weakness of CLIs is that you can only write commands line by line, which is inefficient and even disruptive to integrative thinking. To overcome this, we will introduce two alternatives to CLIs, namely Jupyter Notebook and Sublime Text (optional).
G. Alternatives
1. Text Editors
We recommend that you use the Sublime Text editor for this course.
You can download it from here: https://www.sublimetext.com/download
print('Hello World!')
- Save the file as "test.py"
- Naming convention: most python executable files are named with a suffix .py for consistency
- Color changes to indicate Python syntax
- Run "python test.py"
- Tell python to execute the file by running all statements line by line from top to bottom
- Save output to a new file "output.txt":
python test.py > output.txt
print("Hello World!")
name = input("What is your name?")
print("Hello "+name+"!")
2. Jupyter Notebook
Other than CLIs, we can choose to use external softwares to run Python environment.
Here we will learn an interactive python editor called Jupyter Notebook, which has been widely adopted as a norm in IT industry.
Step 1. Install jupyter via pip
- pip is an installation assistant library, which has been installed along with Python.
- usage: you can use pip by typing
pip3 install [library name]
in system built-in CLIs (Terminal/PowerShell/Command Prompt).
- example: to install jupyter notebook, you should type
pip3 install jupyter
Step 2. Run jupyter notebook
- type "jupyter notebook" in CLIs
- it will automatically open a new page in your default browser
- the page provides a view of current folder
As for iMac in the lab, you may fail to open jupyter because it is not added to the system path, which is only allowed by admin. In this case, you can:
- Type
export export PATH=/Users/your_SSOID/Library/Python/3.9/bin:$PATH
to add jupyter temporarily to the system path and then rerun "jupyter notebook" in Terminal - Or, you can double click jupyternotebook.exe directly in the abovementioned directory
- You will need to copy the address "localhost:8888/blablabla" to broswer, which contains the token (secret key) for the notebook
Step 3. Create new notebook and run cells
- Click New▾
- Select "Notebook" -> "Python 3"
- Unit of codes here is Cell, not Line. You can write multiple lines in one cell and run all of them in a batch.
- How to run Cell:
- Click ▶ Run
- Use shortcut: Crtl+Enter or return
- Repeat what we have learned for CLIs
- maths
- string
- comparing values
- boolean operation
- assign values to varaibles
- input value to variable
Step 4. Rename and save notebook
Step 5. Relocate notebook and re-open it
3. Google Colab
Google Colab is a free online Jupyter Notebook platform which is running completely in the cloud. It is highly portable, which allows you code anywhere anytime without using your own computer's memory and other computational resources. Most used libraries have been all installed beforehand. However, the downside about Colab is that it only supports intensive programming within a short period of time. Everything you get will be removed and the connection will be aborted if you do not interact with it for 90 minutes. It is not suitable for those long-running tasks.
Besides, it cannot access local files, which means you need to upload files to your Google Drive or Colab in order to use them. File import and output are troublesome in Colab.
H. Install requisite libraries
- Data Collection
Selenium
: pip3 install selenium
- Data Cleaning (Preprocessing)
Pandas
: pip3 install pandas
- Data Exploration
Numpy
: pip3 install numpy
- Model Development
Sciki-learn
: pip3 install sklearn
- Visualization
Plot.ly
: pip3 install plotly
Section 4: Assignment¶
To help you better grasp the knowledge, I will prepare assignment for you every week. Assignments will be mindfully designed to make sure average people can finish it within four hours.
I will display and explain the solutions in the next class.
May you have any question, feel free to find me via yunerzhu@hkbu.edu.hk.