San Wang bio photo

San Wang

Direction is more important than speed

Email LinkedIn Github

Currently

Senior Data Scientist & Product Owner in Healthcare

Education

2016 - 2018 George Washington University

  • M.S. in Data Science

2011 - 2015 Sichuan University

  • B.S. in Mathematics, concerntration on Statistics

Publications

  • Wang S, Han J, Jung SY, et al. Development and implementation of patient-level prediction models of end-stage renal disease for type 2 diabetes patients using fast healthcare interoperability resources. Sci Rep. 2022;12(1):11232. Published 2022 Jul 4. doi:10.1038/s41598-022-15036-6

  • Koker TE, S.S. Chintapalli, Wang S, et al. On Identification and Retrieval of Near-Duplicate Biological Images: a New Dataset and Protocol. Published online January 10, 2021. doi:https://doi.org/10.1109/icpr48806.2021.9412849

Work

2019-current Senior Data Scientist & Product Owner, Enolink

2019 Computer Vision Research Associate, Harvard Medical School

2017 NLP Research Assistant, George Washington University

Projects

Fashion Product Discovery App [CV/NLP]

  • Equipped Glancer app with the power of machine learning solutions for indexing products, generating titles for mobile display, and exploring trending new products
  • Developed product tagging pipelines leveraging multimodal learning to analyze products’ image and description text
  • Improved tagging performance by assembling deep learning models with a keyword-matching agent

Movie Recommendation System [MySQL/Spark/Flask]
Link to Demo

  • Built a Flask demo backed with MySQL and integrated recommendation models to provide live movie recommendations
  • Created ETL pipelines for online analytical processing (OLAP) with Spark SQL to analyze user behavior and trending patterns
  • Trained Collaborative Filtering (CF) models for personalized recommendation using ALS matrix factorization from Spark MLlib and provided user-based CF model to handle item cold start

Product Demand Analysis [CV/NLP/ML]

  • Predicted product demand for an online ads platform by analyzing users and products’ unstructured (title and description in Russian, images) and structured data (price, category, location and time) using VGG16, Random Forest, lightGBM, and logistic regression

Master Capstone, German Traffic Sign Classification [Caffe/Tensorflow]
Blog for Caffe
Blog for Tensorflow

  • Classified 39K traffic sign images in 43 categories using the convolutional neural network (CNN) and benchmarked results against two deep learning frameworks (Tensorflow and Caffe)
  • Analyzed CNN models by building customized visualization to show the change of kernels during training using Tensorboard

Bachelor Thesis, Treatment Effect Prediction [SAS/SPSS]

  • Applied logistic regression model to predict NIPPV treatment effect for patients with respiratory failure using SPSS and SAS

Tableau Portfolio

Link to Tableau profile

...

Honors and Awards

05/2015 Outstanding Undergraduate Thesis(5%) 01/2015 Excellent Volunteer of SK Sunny Undergraduates Volunteer Service
11/2014 Second Class Scholarship
05/2013 Active Volunteer in Love Passing Voluntary Service of SCU
04/2013 Second Prize in Undergraduates Tennis Championship of SCU

Conference

Cambridge, 2019, 2020 Women in Data Science (WiDS) Cambridge Ann Arbor, 08/08/2019-08/10/2019 Machine Learning for Healthcare DC, 10/09/2017 DevFest DC 2017
DC, 05/15/2017-05/17/2017 Know Identity Conference
DC, 05/05/2017-05/06/2017 DevFest DC 2017
DC, 12/03/2016 GW DATA Data Driven Insights Conference: Extract, Transform, Learn
DC, 11/29/2016 Exploring some of the latest and greatest tools in Data Science
DC, 09/28/2016 Data Transparency 2016 with Open Data Innovation Summit
DC, 06/30/2016 ATARC Federal Big Data Summit
DC, 03/04/2016-03/05/2016 Open Data Day DC 2016

Community Involvement

Boston, 2021-2024 Athlete, CYPN STORM Dragon Boat Club

  • 2023 Rhode Island Race: Mixed Division 3rd Place
  • 2023 Boston Dragon Boat Festival: Club Division 2nd Place; A Major Division 3rd Place
  • 2022 Rhode Island Race (Captain): Mixed Division 1st Place
  • 2022 Mercer GWN Race: Sport Women Division 3rd Place
  • 2022 Riverfront Hartford Race: A Division 2nd Place
  • 2022 IDBF 13th Club Crew World Championship: Premier Mixed Division Participant
  • 2022 Boston Dragon Boat Festival: Club Division 1st Place; Women Division 1st Place; A Major Division 3rd Place
  • 2021 USDBF Club Crew National Championships: Premier Mixed Division Participant
  • 2021 Mercer GWN Race: Sport Mixed Division 2nd Place
  • 2021 ERDBA Regional Championship: Premier Mixed Division 2nd Place

New Orleans, LA, 03/2017 Habitat for Humanity
Arlington, VA, 10/2016 Marine Corp Marathon
Washington Monument Grounds, DC, 06/2016 Moving day DC (National Parkinson Fundraising)
DC, 04/2016 DC Central Kitchen
Chengdu, China, 03/2014-06/2014 SK Sunny Undergraduates Volunteer Service
Chengdu, China, 05/2013 Love Passing Voluntary Service
Chengdu, China, 04/2013 Ya’an earthquake Volunteer
Chengdu, China, 04/2013-05/2013 The Love-Package Volunteer Service