Education
University of Virginia
Bachelor of Science in Computer Science
Minor: Statistics
Aug 2018 - May 2022
Overall GPA: 3.95
Major-course GPA: 4.0
Awards
Degree Honors: with Highest Distinction
Scholarship: UVA International Student Office’s scholarship
Competition: Ranked 5/284 at the Tencent Advertising Algorithm Competition (TAAC) ACM Multimedia 2021 Grand Challenge - Multimodal Video Ads Tagging
Internship Experience
Duolingo
Software Engineer Intern
Skills:
Python, Swift, Objective-C, HTML, JavaScript, React, Jira, Jenkins, Xcode, Postman, Agile, Prometheus, Grafana, Git
Tencent
Software Engineer / Data Scientist Intern
Skills:
Python, SQL, Pandas, Numpy, Matplotlib, Git, TensorFlow
Publications
Wu, X., Yang, F., Lin, X., Zhou, T August 2021. Rethinking the Impacts of Overfitting and Feature Quality on Small-Scale Video Classification. 29th ACM International Conference on Multimedia
[Python, PyTorch, TensorFlow]
https://dl.acm.org/doi/10.1145/3474085.3479226Alternative Link to Paper
Lin, X., Connors, J., Lim, C., Hott, J. R. March 2021. How Do Students Collaborate? Analyzing Group Choice in a Collaborative Learning Environment. 52nd ACM Technical Symposium on Computing Science Education (SIGCSE)
[Python, SQL, Gephi]
https://dl.acm.org/doi/10.1145/3408877.3432389Alternative Link to Paper
Link to Video Presentation on YouTube
Research and Projects
Footprints Web App [Python, SQL, Django, Apache, HTML, JS, CSS, Linux]
- Developed a map platform with Django to pin, describe, group, and share locations, supporting topic-based, location-based, and keyword-based filtering.
- Built upon Google Map features, used AJAX for asynchronous fast updates, implemented OAuth for authentication, designed databases, hosted images on Apache, and managed it with CICD pipelines using Heroku.
Machine Learning and Deep Learning Algorithms [Python, PyTorch, TensorFlow, Keras, Scikit-learn, AWS]
- Implemented a multi-payer perceptron with batch normalization and a convolution neural network with maxpooling for street-view house numbers classification.
- Built a deep belief network (DBN) with Restricted Boltzmann Machine (RBM) and trained it with gibbs sampling.
- Implemented a Natural Language Processing system to classify restaurant reviews, a Neural Network for alphabet images classifications, a Named Entity Recognition (NER) using hidden Markov models (HMM) to classify nouns into categories, and a Q-learning with linear function approximations to obtain optimal policies/actions for a car stuck in a mountain.
Advanced AWS Auto-scaling [Python, Linux, AWS]
- Improved Amazon EC2 auto-scaling efficiency for TensorFlow image classification with 3 strategies: applying terraform for automatic AWS resource provisioning, designing custom auto-scaling controller to collect metrics from running EC2 for scaling decisions, and directing requests to AWS lambda with event-driven serverless functions.
Distributed Bitcoin Miner [Golang]
- Implemented the Live Sequence Protocol (LSP) with Go to strengthen reliable client-server-API communications on top of Internet UDP protocol.
- Built a distributed bitcoin miner on top of LSP to handle various failures and to balance loads efficiently for scalability.
Multiuser Game Storage Backend [Golang]
- Architected the backend storage system for a massive text-based multiplayer online game with synchronizations using Go to ensure consistent game states across geographically distributed servers.
- Facilitate coordination across networks and enabled high concurrency and availability by implementing distributed actors with mailboxes and message passing RPC.
Raft Consensus Algorithm [Golang]
- Implemented voting RPC handlers and heartbeat mechanisms to ensure safety and liveness for leader election using Go.
- Employed log replication to commit commands and to maintain consistency across servers to prevent server/network failure.
Causal Inference Algorithm Toolkit [Python, PyTorch, TensorFlow, Keras, Scikit-learn]
- Developed a causal inference pipeline to automate causal effect estimations and analysis.
- Implemented machine learning and statistical models such as XLearner, Dragonnet, and Counterfactual Regression.
Software Development: Friend Matching [SQL, Django, Travis CI, Heroku, HTML, JS, CSS]
- Developed a server-side rendering web application to match users based on similarity. Implemented an institutional restricted authentication system, a matching algorithm, a profile display board, and a chat channel.
- Designed a database and managed it through modeling, migrations, and SQL queries. Conducted unit testing with Travis CI and deployed software on Heroku.
Machine Learning: Video Information Extraction [Python, PyTorch, TensorFlow] (refer to the Publications section)
https://doi.org/10.1145/3474085.3479226
- Designed algorithms to increase video labeling accuracy from 69% to 82%.
- Improved the baseline model performance by enhancing feature quality and alleviating overfitting. Strategies used included image features extraction using Pyramid EfficientNet, ASR Word2Vector insertion into the image flow, temporal shifting, data augmentation, random truncation, random dropout, random noise, and k-fold training.
- Ranked 5/284 in the Tencent Advertising Algorithm Competition (TAAC) ACM Multimedia 2021 Grand Challenge – Multimodal Video Ads Tagging.
Machine Learning Music Generation [Python, Magenta, TensorFlow]
- Implemented a Q-agent for music generation, transforming any given monophonic melody into a two-line polyphonic one. Extracted music notes, defined musical Q-learning states and designed the music theory reward function.
Natural Language Processing: Adversarial Attacks and Model Training [Python, Transformers, HuggingFace, Pytorch, Tensorflow]
github.com/QData/TextAttack     Documentations
- Trained Chinese and Korean adversarial models, designed a perturbation strategy, and added part-of-speech and cosine similarity constraints to perform adversarial attacks, dropping 25% success rate on average.
Data Pipeline for Student Collaboration Investigation [Python, SQL, Gephi](refer to the Publications section)
https://doi.org/10.1145/3408877.3432389
- Designed a data analysis pipeline, investigating raw data to determine the optimal study-group size and factors that affected how students collaborate. Used Excel and Gephi to visualize data interactions and results.
- Led group discussions to design research questions, distributed tasks, and organized timelines.
Selective Projects Demo
Music Generation
convert a one-line monophonic melody into a two-line polyphonic melody using Q-learning
Image and Animation Generation
contain multiple images and GIFs generated using algorithm techniques of rasterizer, raytracer, and scene graphs
Friend Finder Web App
create an account and log in to meet friends, view similarity scores, and talk to them!
Note: require UVA log in. Not clickable now because this project is archived.
FootPrint Web App
Use this app to pin, describe, group, and share locations!
Note: not clickable now because this project is archived.
Disney Movie Database
Use this application to lookup and filter Disney movie information!
Note: not clickable now because this project is archived.
Data Visualization with R
Machine Learning
My artwork
Skills
Programming and Data Analysis
- Python, Java, SQL, TensorFlow, PyTorch, Keras, R, SAS, C, C++, MATLAB, Latex, Excel, HTML, CSS, GitHub, Django, Heroku, Linux and Bootstrap
Communication and Presentation
- Chinese, English and Spanish
Computer Aided Design
- Inventor, iMovie, Photoshop
Citations
Below is a list of picture sources I used throughout this website.
https://www.visualpharm.com/free-icons/music-595b40b65ba036ed117d150av
https://blogs.sas.com/content/sascom/2014/12/05/data-visualization-first-prepare-your-data/#prettyPhoto
https://www.forbes.com/sites/tomtaulli/2019/03/02/what-you-need-to-know-about-machine-learning/?sh=22e691a82fe4
https://www.economist.com/united-states/2018/05/10/faced-with-a-housing-crisis-california-could-further-restrict-supply
https://pngtree.com/freepng/continuous-drawing-line-playing-the-piano_4175018.html?share=3