My Projects

Please note that now I am currently working on restructuring my GitHub Profile and Repos, new contents will be available very soon!

Contents are reworking for better presentation.

Research Interest

I am interested in studying the difference and sentiment analysis of Cantonese and English Text in social media like FaceBook, Twitter etc. It is part of the technique of Natural Language Processing and I would love to explore their difference.

Current Projects

Twitter Tweet Text Classification Project

Live Personal Project

I am currently working on the data extraction using Twitter API v2 to extract recent and past tweets on various topics.

After data collection stage, it is time to apply preprocessing to data and perform some NLP techniques for further data and sentiment analysis of Twitter Text, classification and other insights.

Past Projects

Group Machine Learning Project

I lead a group of students working on small and medium size of machine learning and deep learning projects on real world data set, ideas from kaggle.com.

We will regularly meet on Zoom on every Monday evening for progress update and lead some tutorials. For conversations and group chat we use Slack for communication.

Event and Location Based Twitter Text Classification

This is the project that I am currently working on. My goal is to analyze the text in Social Media (In this case, Twitter), based on the user’s location and classify the event the user is mentioning, to see if there are any relationship with some variables, for example, Location, Time, Events, Level of Interest, Sentiment etc.

Current phase of this project is in data collection, data wrangling and pre-processing the text data.

Click Here to See my Project!

STAT694 Applied Research in Statistics Research Project

Worked from Aug 2020 - Dec 2020

This project involves the use of NLP algorithms in analyzing the effect of COVID-19 on Twitter’s Text.

Click Here to See my Project!

Click Here to See my Presentation!

STAT697 Issues in Statistics Project

Worked from Jan 2020 - Mar 2020

This project analyze the figures and trends of California K-12 Schools using SAS and SQL. We used the official data set produced by the California Department of Education. It is a team project with 2 people working including me. As the team lead, I was responsible to communicate with my partner and also to the professor of this class. Since this course purely instructed online, we used Google Meet and Slack for Team communication and weekly meetings with professor. Project Repo!

Lending Club Classification in R

Feb/Mar 2021

This competition is a Kaggle competition, which is part of the project in my STAT 653 course. I used basic neural network to predict the variable loan_status, the repayment status of a person. In this project, I compared various Machine Learning Algorithms that is widely used in classification and picked the one with best performance in the sample training and testing dataset for the full testing data set for prediction. The model comparison codes are located under the folder model_comparison, separated by each model.

Click Here to See my Project!

Titanic Survial Classification in R

Feb/Mar 2021

This is also a Kaggle Competition. In this project, I compared various Machine Learning Algorithms that is widely used in classification and picked the one with best performance in the sample training and testing dataset for the full testing data set for prediction. The model comparison is in the report rmd and pdf.

Click Here to See my Project!

OpenCV Projects

My latest trials on Python OpenCV. Will include some edge and feature detection, color conversion and detection, machine learning and deep learning etc.

GitHub repo is currently under construction. Please check back later!

Lyft BayWheel Data Dashboard

FlexDashboard of Lyft BayWheel Data using R Packages