CV
Education
- B.S. in Physics, Georgia Tech, 2016
- M.S. in Informatics, Indiana University 2019
- Ph.D in Informatics - Complex Systems, Indiana University, 2022 (expected)
Work experience
- 2021: Data Scientist
- GeniusMesh
- Duties included: Data science, NLP, recommender systems. Working with development team to deploy a client-facing dashboard to production.
- 2019-present, 2016-2017: Assistant Instructor
- Indiana University. See Teaching Experience section below
- 2021-present, 2017-2019: Research Assistant
- Indiana University
- Duties included: Modeling and simulating online social systems to forecast user activity, spread of information, and market manipulation (2017-2019). Analyzing social media data and UK Parliamentary data (2021-Present).
- Supervisors: Fil Menczer 2017-2019, John Bryden 2021-Present
- Summer 2014, Summer 2015: Radiological Surveyer and CAD Technician
- USA Environment LP
- Duties included: Assessing land for possible radiological contamination. Making and maintaining hardware packages to perform these tasks. Making maps of surveyed land and 3D models of future excavation plans.
Ongoing Research
- Measuring memetic impacts of interacting with social media content
- Measuring changes in topical attention (using spacy models)
- Measuring changes in sentiment toward topics (using transformer models)
- Processing a ~20TB json.gz dataset of twitter data
- COVID vaccine hesitancy
- Collecting social media data on covid vaccine hesitancy
- Analyzing its relationship to public health behaviors and outcomes
- Classifying tweets as Antivax or not, examining relationship between antivax content and vaccine uptake in the USA
- UK Parliament and Social Media
- Analyzing Facebook, Twitter, New Media, and UK Parliementary speeches data.
- Measuring impacts of media sources on UK Parliamentary discourse
Skills
- Data Science and Machine Learning
- Python fluency. pandas, scikit-learn, spacy, etc.
- Classical machine learning. Clustering, classification, regression, collaborative filtering, anomaly detection, feature engineering, etc.
- Natural Language Processing (NLP). Text cleaning, NER and keyword extraction, targeted sentiment analysis, text classification and embedding, etc. Efficient text labeling with Prodigy.
- Deep learning. PyTorch, HuggingFace Transformers. Mainly NLP, some experience in vision.
- Big Data. pySpark, Koalas.
- High dimensional data methods. Dimensionality reduction, manifold learning, graph embeddings, etc.
- Network science. Community detection, diffusion modeling, graph characterization, etc.
- Signal processing. Denoising, source separation.
- Time Series. ARMAX, VAR, and related models.
- Point process models. Hawkes processes, etc.
- Causal inference. Granger causal modeling for time series and point processes.
- Geospatial data science. Spatial regression, clustering, etc.
- Model selection and model validation, Feature abalation studies.
- Data visualization. Seaborn, matplotlib, Atlair, plotly.
- High Performance Computing (HPC) for data science and simulation.
- Matlab, Mathematica, R. Non-expert, but some experience.
- SQL, MongoDB, Neo4j
- Unix systems, bash, etc.
- Git fluency.
- Modeling and Simulation
- Focus especially on complex systems and network systems, e.g. social systems, ecological systems, economic networks.
- Agent based models, Bayesian models, machine learning models, point process models.
- Large scale simulation and HPC
- Model validation
- Communications
- Technical writing. Academic publicaiton, grants, public-facing reports, etc.
- Interfacing with clients/funding agencies
- Presentations, demos, etc.
Publications
Conference Presentations
- “Forecasting Vaccine Refusal Rates by Modeling Social Influence,” NetSci, Full Talk, September 2020.
- “The empirical limits of prediction of microscopic dynamics of online conversation,” NetSci, e-Poster Session, September 2020.
- “Forecasting Vaccine Refusal Rates by Modeling Social Influence,” IC2S2, e-Poster Session, June 2020.
- “The empirical limits of prediction of microscopic dynamics of online conversation” IC2S2, e-Poster Session, June 2020.
Other Projects
- Comparison of spectrogram generation methods for audio classification using CNN classifiers, 2019.
- Used CNN classifiers (i.e. ResNet) to classify audio spectrograms
- Generated and compared audio spectrograms
- Forecasting Yellowstone Visitor Traffic, 2019.
- Used ARMAX-like methods to forecast Yellowstone National Park visitor traffic
- Performed feature ablation study on rich feature set
- Predicting Academic Citations with Collaborative Filtering, 2018.
- Used collaborative filtering methods to forecast academic citations
- Radon Decay Chain, 2015.
- Modeled concentrations of radioactive decay products over time for risk assessments
- Modeling Gravitational Wave Detection, 2014.
- Used MCMC methods to fit model parameters for gravitational wave detection events
- Observations of the Faraday Instability 2014.
- Used particle image velocimetry to measure internal fluid flows of Faraday waves
- Wrote related software tools to enable data processing
Teaching Experience
- 2019-2021: Assistant Instructor for I-453, an ethics of technology/computing course.
- Helped design curricula, including assignments, tests, and lectures; Occasionally lectured; Graded coursework.
- 2016-2017: Assistant Instructor for I-201, a discrete math course.
- Managed lab sessions and taught course material; Graded coursework.
References
Contact me for a current list of references.