WEEK 1¶

Section 1: Knowing Python¶

Python is a Programming Language

  • Element: English words, numbers, and special characters.
  • Syntax: sensitive to case and indent

Python is basically a simplified and logical version of English language. Super easy to learn.

Why Python is so popular?

  • Readability: Coherence, easy to understand even if you don't write it
  • Open source: free and fast-developing
  • General-purpose: suitable for computer scientists, statistics, enterprises and generally everyone
  • Computational efficiency: low-cost, fast-speed and requiring fewer lines (one-fifth the size of equivalent C++ or Java code)
  • Extensive libraries: statistical analytics, visualization, machine learning

If you only minimum time to learn only one new programming lanugage, Python would be your best choice.

Everyone can program, no matter what your background is.
You don't need to know that much to make little incredible things in Python. Just some willingness to try and courage to make errors.

Section 2: Course Outline¶

Lectures

We will have a three-hour meeting every time, involving lectures and hands-on exercises, and 13 classes in total. Please bring along YOUR OWN LAPTOP to class.

For students who are on sick leave, you should study the course materials on your own and book a face-to-face appointment with me or Yajing if necessary.

Weekly Assignment

To consolidate the digestion of classroom knowledge, I will assign some homework after every class.
Assignments are due at 12:30, Monday Late submissions will be subject to a 10% deduction per day.

Quiz

We will have graded quizzes during the lecture, each accounting for 1 point only. The purpose of these quizzes is to strengthen your instant memory and motivate more attentive learning.

Hackathon

At the end of the course, we will have an integrated Hackathon to demonstrate your coding and storytelling skills.

The Hackathon will last for 3 hours and you need to team with 3/4 other students. You will need to turn in draft and codes immediately after the Hackathon, which will be graded over a scale of 8 points in terms of the intensive problem-solving abilities.

You can continue refining report afterwards and submit a final version in 14 days. Hackathon's topic and format will be released one week ahead.

Time Management

Learning how to program is hard and time-consuming. You need to put in real efforts. I expect you to spend around four hours every week, in your full capacity and fully concentrated.
If you don't have that much time, please skip this course. It would be disastrous for you.

Don't Cheat

  • Plagiarism and dishonesty in all forms will result in an F grade.

Gen AI

Generative AI tools, such as ChatGPT, are allowed in this class. However, you are expected to:

  • Use Gen AI ethically, wisely, and responsibly.
  • Ensure that the material submitted for assessment is your own work based on your own ideas.
    • Modify AI's answers to include your own insights.
  • ACKNOWLEDGE all and any use of Generative AI tools in assignments. An acknowledgement template will be offered and you need to indicate your usage honestly by filling out that template.
  • Assignment is a pathway to success. Gen AI might erode your potential to fully digest knowledge and generate original viewpoints. Overreliance on Gen AI tools might reduce your opportunities to develop your own distinctive intellectual qualities.
  • Different grading criteria will be adopted for assignment submissions using and not using Gen AI.

Grading Criteria¶

  • AI-assisted submissions:
    • Integrity: Students who have used Gen AI tools without proper acknowledgement will lose all grades for a specific assignment.
    • Accuracy-based grading: Errors and bugs will result in a deduction of grades.
    • Students should demonstrate a clear understanding of codes submitted.
  • Submissions without AI assistance:
    • Integrity: Do your own work. No plagarism will be tolerated.
    • Effort-based grading: You will get a decent (despite not necessarily a full) grade as long as you have made effort. If you are uncertain about the correct answer, you can:
      • Put down remarks using # syntax to demonstrate your chain of thought.
      • Display different versions of codes that you have tried out
      • You will get compensated even if you fail to obtain the correct answer.

Course Preview

Week Content Week Content
1Introduction & Installation 8Web Crawling
2Data Structure 9Web Crawling (Continued)
3No class 10Natural Language Processing
4Function 11Word2Vec
5Function (continued) 12Unsupervised Machine Learning
6Data Visualization 13Topic Modeling & GenAI
7Knowing HTML 14Hackathon

Assessment¶

Item Credit
Quiz8 * 1%
Weekly Assignment [scalable]12 * 6%
Hackathon Team Project
※ 3-hour Draft (findings & codes)8%
※ Final Report (text & codes)8%
※ Peer Evaluation4%
Total100%

Section 3: Intended Outcomes¶

After this course, you will be able to:

  • know how to program in Python
  • have more skills to analyze communication technology
  • have more common language with developers and have better understanding of tech terminologies, such as data frames, functions, supervised machine learning, unsupervised machine learning
  • have hands-on experiences with popular Python libraries and know they are there if needed

Section 4: Computational Research Workflow¶

No description has been provided for this image

Association between Research Tasks and Python Libraries to learn¶

  • Data Collection
    • Selenium
  • Data Cleaning (Preprocessing)
    • Pandas
  • Data Exploration
    • Numpy
    • Pandas
  • Model Development
    • Sciki-learn
  • Visualization
    • Plot.ly

Section 4: Installing Python¶

A. install python environment

Download Link

Python 3.X is recommended. The number following "Python", i.e. 3.X, is the 'version number', which is composed of generation number and minor version number. The greater number, the more updated.

After 11 years' development, Python 3 has become very mature and stable. Almost all libraries and packages are supporting Python 3.X. And some important libraries, like TensorFlow, are exclusive to Python 3.X.

Tick "Add python.exe to PATH"

No description has been provided for this image
Tip: Some MacOs systems have pre-installed Python 2.7 by default. Python 2.7 and 3.X can work parallely as they are compatible to each other. You can switch between the two python versions freely.

B. run python via CLIs

  1. Open Command Line Interface (CLI)

    CLI is an interface allowing you to write textual instructions line by line to control computer do something at your command.

    Compared to GUI (Graphical User Interface), CLI is faster and many functions are only available for CLI.

    There are several CLIs available in Mac OS and Windows.

Mac OS:

  • Terminal (built-in CLI): Spotlight Search/Launchpad -> Type "Terminal" -> Type "python3" or "py3"
  • IDLE (python-specific CLI): Spotlight Search/Launchpad -> Type "IDLE"

Mac OS Terminal:

string inside the blue box indicates current working directory

Or, you can type "pwd" to print working directory

No description has been provided for this image

Mac OS IDLE:

number inside red box is the version number

No description has been provided for this image

Windows:

  • PowerShell(built-in CLI): Start Menu -> Type "PowerShell" -> Type "py"
  • Command Prompt(built-in CLI):Start Menu -> Type "Command Prompt" -> Type "py"
  • Python Command Line(python-specific CLI): Start Menu -> Type "Python" -> Choose "Python (Command Line)"
**Tip:** Windows users can use shortcuts to open PowerShell/Command Prompt.

PowerShell: Windows+R -> Type "powershell"

Command Prompt:Windows+R -> Type "cmd"

Windows Command Prompt:

string inside the blue box indicates current working directory

No description has been provided for this image

C. run some basic commands

  • math
    • add: 1+2
    • substract: 1-2
    • multiply: 1*2
    • divide: 1/2
    • quotient: 14//2
    • remainder: 14%2
    • power: 5**3
Order of Operation (from highest precedence to lowest) No description has been provided for this image
**Warning**: Python does consider the mathematical order. Use parentheses smartly to make sure you will get correct answers.
**Tip:** To quickly copy last line, you can press ↑. To copy the second last line, you can press ↑ twice. And so on.

Quiz 1: Basic Math

String

  • string: single, double or even triple quotes
    • print(string): print('hi')
    • extend:'hi'+'!'
    • duplicate: 'hello'*20
    • To print out any special character, you need to add a backslash before \ 'I don\'t think so'
In [2]:
'I don't think so'
  Cell In[2], line 1
    'I don't think so'
                     ^
SyntaxError: unterminated string literal (detected at line 1)
In [3]:
'I don\'t think so'
Out[3]:
"I don't think so"
In [4]:
"I don't think so"
Out[4]:
"I don't think so"
In [5]:
'She says, "look at me and answer my question."'
Out[5]:
'She says, "look at me and answer my question."'

Comparison

  • comparison: also called logical conditions, because its return value is logical boolean value, i.e. True or False.
    • equal: 1 is 0 or 1 == 0
    • not equal: 1 is not 0 or 1 != 0
    • less than: 1 < 0
    • less than or equal: 1 <= 0
    • greater than: 1 > 0
    • greater than or equal: 1 >= 0
In [6]:
#comparison order -> you can add remarks like this by putting down a # in front of the line
2+1==1
Out[6]:
False
In [7]:
2+1>1
Out[7]:
True
In [8]:
2+1==2+1
Out[8]:
True
**Warning**: Python is a case-sensitive language. So, if you try "1 Is 0", it will return error message.

Boolean

  • boolean/logical operation:
    • Only two possible results: True or False
    • and: True only if all conditions are True. and, &
    • or: True if at least one condition is True. or, |
    • not: Reverse True to False and False to True. not, !
  • boolean operation exercises:
    • (1>0) and (1==0)
    • (1<0) & (2<4)
    • (1>0) or (3==2)
    • (9>10) | (2>=2)
    • (9>10) | ("hello world"=="Hello World")
    • not (1==0)
    • ! (11>10)

Variable

  • Variables are placeholders, which can be used to hold some values, put something in memory, and later take it back out. They are the nicknames of values, designated by users.
  • In Python, a variable can be directly assigned a value.
  • Assign values to variable: a = 1, here "a" is the variable name and 1 is its value.
  • A variable's value can be updated anytime with different data types.
    • a = 100 → a = "money" OK
  • increase variables by 1 unit: a = a + 1 or a += 1
  • decrease variables by 1 unit: a = a - 1 or a -= 1
  • pass along a value from one variable to another: a = 100, b = a
In [38]:
a = 100
b = a
print(b)
100

Input

  • input: an interactive way to initialize a variable:
    • input: print + variable initialization
    • format: a = input("some instruction for users:")
    • effects: (1) print out the instruction, (2) take the input from users, (3) assign the input value to variable a
    • name = input("what is your name?")

Quiz 2: String, Boolean & Variable

D. exit python

  • Use Quit Function: quit()
  • Shortcut: Ctrl+Z -> Press Enter or return

E. navigate CLI

  • Using command cd [folder name] to a child folder or any folder with absolute directory path
    • child folder: cd desktop
    • grandchild folder: cd desktop\backup
    • folder with absolute path: cd C:\Users\yuner\Desktop
  • Magic Trick of Autofill: Tab (or Tab several times to choose the correct directory)
  • Clean screen: cls in Windows and clear in MacOS
  • Home, End, ←,→, Ctrl+←,Ctrl+→
**Tip**: To find absolute path in Mac OS, you need to right click target folder and 1) select Get Info -> Where -> ⌘ + C; or 2)⌘ + Option + C.
  • Using command dir [folder name] in Windows and ls [folder name] in MacOS to list all content under a directory
    • omitting folder name will give you content list of the current working directory
    • understand the result list returned
  • Using command cd .. to navigate to the parent folder of current folder
  • Using command mkdir [folder name] to create a new directory
    • Don't use special characters, such as space, in the folder name
  • Using command rmdir [folder name] to remove an existing directory
    • Only empty directory can be deleted directly
    • To remove a non-empty directory, you should add /s before the folder name: rmdir /s [folder name]

F. Summary

Generally speaking, every operating system has equipped with Linux-like built-in CLIs by default. They can be instantly switched to Python environment by simply one command, i.e. "py" or "py3".

Besides, a Python-specific CLI is provided after installation. You could find it by searching its name in Start Menu or Spotlight. Different from built-in CLIs, it is running under python environment by default so it doesn't require extra command to turn into this way.

However, the biggest weakness of CLIs is that you can only write commands line by line, which is inefficient and even disruptive to integrative thinking. To overcome this, we will introduce two alternatives to CLIs, namely Jupyter Notebook and Sublime Text (optional).

G. Alternatives

1. Text Editors

We recommend that you use the Sublime Text editor for this course.
You can download it from here: https://www.sublimetext.com/download

In [ ]:
print('Hello World!')
  • Save the file as "test.py"
    • Naming convention: most python executable files are named with a suffix .py for consistency
  • Color changes to indicate Python syntax
  • Run "python test.py"
    • Tell python to execute the file by running all statements line by line from top to bottom
  • Save output to a new file "output.txt": python test.py > output.txt
In [ ]:
print("Hello World!")
name = input("What is your name?")
print("Hello "+name+"!")

2. Jupyter Notebook

No description has been provided for this image

Other than CLIs, we can choose to use external softwares to run Python environment.

Here we will learn an interactive python editor called Jupyter Notebook, which has been widely adopted as a norm in IT industry.

Step 1. Install jupyter via pip

  • pip is an installation assistant library, which has been installed along with Python.
  • usage: you can use pip by typing pip3 install [library name] in system built-in CLIs (Terminal/PowerShell/Command Prompt).
  • example: to install jupyter notebook, you should type pip3 install jupyter
**Tip**: Here we need to use "pip3" instead of "pip", because by specifying "3" we can order pip to install a library that supports Python 3.X.

Step 2. Run jupyter notebook

  • type "jupyter notebook" in CLIs
  • it will automatically open a new page in your default browser
  • the page provides a view of current folder
No description has been provided for this image
**Tip**: Even though it looks like a web page, jupyter notebook is running in your local computer. As you could tell from the link address "localhost:8888", "localhost" means the file is running in local end, and "8888" is the port number.

As for iMac in the lab, you may fail to open jupyter because it is not added to the system path, which is only allowed by admin. In this case, you can:

  1. Type export export PATH=/Users/your_SSOID/Library/Python/3.9/bin:$PATH to add jupyter temporarily to the system path and then rerun "jupyter notebook" in Terminal
  2. Or, you can double click jupyternotebook.exe directly in the abovementioned directory
  3. You will need to copy the address "localhost:8888/blablabla" to broswer, which contains the token (secret key) for the notebook

Step 3. Create new notebook and run cells

  • Click New▾
  • Select "Notebook" -> "Python 3"
  • Unit of codes here is Cell, not Line. You can write multiple lines in one cell and run all of them in a batch.
  • How to run Cell:
    • Click ▶ Run
    • Use shortcut: Crtl+Enter or return
  • Repeat what we have learned for CLIs
    • maths
    • string
    • comparing values
    • boolean operation
    • assign values to varaibles
    • input value to variable

Step 4. Rename and save notebook

Step 5. Relocate notebook and re-open it

3. Google Colab

No description has been provided for this image
Google Colab is a free online Jupyter Notebook platform which is running completely in the cloud. It is highly portable, which allows you code anywhere anytime without using your own computer's memory and other computational resources. Most used libraries have been all installed beforehand. However, the downside about Colab is that it only supports intensive programming within a short period of time. Everything you get will be removed and the connection will be aborted if you do not interact with it for 90 minutes. It is not suitable for those long-running tasks.

Besides, it cannot access local files, which means you need to upload files to your Google Drive or Colab in order to use them. File import and output are troublesome in Colab.

H. Install requisite libraries

  • Data Collection
    • Selenium: pip3 install selenium
  • Data Cleaning (Preprocessing)
    • Pandas: pip3 install pandas
  • Data Exploration
    • Numpy: pip3 install numpy
  • Model Development
    • Sciki-learn: pip3 install sklearn
  • Visualization
    • Plot.ly: pip3 install plotly

Section 4: Assignment¶

To help you better grasp the knowledge, I will prepare assignment for you every week. Assignments will be mindfully designed to make sure average people can finish it within four hours.
I will display and explain the solutions in the next class.

May you have any question, feel free to find me via yunerzhu@hkbu.edu.hk.