My Projects
Please note that now I am currently working on restructuring my GitHub Profile and Repos, new contents will be available very soon!
Contents are reworking for better presentation.
Research Interest
I am interested in studying the difference and sentiment analysis of Cantonese and English Text in social media like FaceBook, Twitter etc. It is part of the technique of Natural Language Processing and I would love to explore their difference.
Current Projects
Twitter Tweet Text Classification Project
Live Personal Project
I am currently working on the data extraction using Twitter API v2 to extract recent and past tweets on various topics.
After data collection stage, it is time to apply preprocessing to data and perform some NLP techniques for further data and sentiment analysis of Twitter Text, classification and other insights.
Past Projects
Group Machine Learning Project
I lead a group of students working on small and medium size of machine learning and deep learning projects on real world data set, ideas from kaggle.com.
We will regularly meet on Zoom on every Monday evening for progress update and lead some tutorials. For conversations and group chat we use Slack for communication.
Event and Location Based Twitter Text Classification
This is the project that I am currently working on. My goal is to analyze the text in Social Media (In this case, Twitter), based on the user’s location and classify the event the user is mentioning, to see if there are any relationship with some variables, for example, Location, Time, Events, Level of Interest, Sentiment etc.
Current phase of this project is in data collection, data wrangling and pre-processing the text data.
STAT694 Applied Research in Statistics Research Project
Worked from Aug 2020 - Dec 2020
This project involves the use of NLP algorithms in analyzing the effect of COVID-19 on Twitter’s Text.
Click Here to See my Presentation!
STAT697 Issues in Statistics Project
Worked from Jan 2020 - Mar 2020
This project analyze the figures and trends of California K-12 Schools using SAS and SQL. We used the official data set produced by the California Department of Education. It is a team project with 2 people working including me. As the team lead, I was responsible to communicate with my partner and also to the professor of this class. Since this course purely instructed online, we used Google Meet and Slack for Team communication and weekly meetings with professor. Project Repo!
Lending Club Classification in R
Feb/Mar 2021
This competition is a Kaggle competition, which is part of the project in my STAT 653 course. I used basic neural network to predict the variable loan_status, the repayment status of a person. In this project, I compared various Machine Learning Algorithms that is widely used in classification and picked the one with best performance in the sample training and testing dataset for the full testing data set for prediction. The model comparison codes are located under the folder model_comparison, separated by each model.
Titanic Survial Classification in R
Feb/Mar 2021
This is also a Kaggle Competition. In this project, I compared various Machine Learning Algorithms that is widely used in classification and picked the one with best performance in the sample training and testing dataset for the full testing data set for prediction. The model comparison is in the report rmd and pdf.
OpenCV Projects
My latest trials on Python OpenCV. Will include some edge and feature detection, color conversion and detection, machine learning and deep learning etc.
GitHub repo is currently under construction. Please check back later!