Spring Semester Undergraduate Course at UVA
This course provides an introduction to a fundamental aspect of data science and engineering - working with data. Learn skills to efficiently and effectively obtain, manipulate, store, and analyze data (i.e., convert data to information) to support decision making and future data modeling (e.g., regression, data mining, machine learning) efforts. Emphasis on obtaining, cleaning, combining, and wrangling the data into a more usable form. Learn how to break up a large data set into manageable pieces and then use a variety of quantitative and visual tools to summarize and extract information from it. The challenges of big data (e.g., size, streaming data, mixed variable types) will be addressed throughout the course. As an introductory course, the focus will be on understanding basic concepts and how to implement them in R, a leading data science language.
Introduction to Data Science and Engineering
Data Collection
Getting to Know Your Data
Data Types
Basic Statistical Descriptions of Data
Data Visualization
Measuring Data Similarity and Dissimilarity
Data Preprocessing
Data Cleaning
Data Integration
Data Reduction and Transformation
Dimensionality Reduction
The COVID19 vaccine rollout has been different in every part of the US. This group explored the effects of income per capita, political party, and population on a state's vaccine rollout.
An analysis of the five most important player statistics in NBA basketball and their effect on win/lose outcomes in games.
This project analyzed societal factors such as geography, legalization, age, family structure, education, and employment and how they correspond with substance use, abuse, and recovery.
With sports betting recently becoming legal in certain states around the country, and still yet to be legal in others, this project explored sports betting over/under and spread, and how the weather and game status affect these metrics.
This study aims to investigate socioeconomic disparities in Virginia, specifically looking at education attainment, employment, income, poverty, and degree of urbanization.
For influencers, advertisers, and companies who want to effectively use social media platforms to promote their products to consumers and achieve a more effective and impactful social media presence, it is crucial to understand how users interact with them. This project analyzed how a company can optimize popularity and engagement of their posts on Facebook.
This project explores the question of how race impacts different aspects of American society, stemming from systemic racial bias.
This analysis addresses the various factors that have influenced the spread of COVID-19 across the US since early 2020. While there are numerous underlying factors, the focus of this analysis is on vaccines, mask policies, variants, and change of virus “hotspots” over time, as well as a comparison of the spread of COVID-19 in the USA to other countries and regions of the world.
Is there an optimal way to distribute money in order to get the most successful athletic team? This group took a closer look at whether spending more money can mean more success for sports teams.
Mental health is a pressing subject that can deeply affect anyone, anywhere. This project explores location, housing, demographics, COVID-19, and technology industries, observing how they affect mental health.
This group's analysis investigates cybersecurity trends to detect possible vulnerabilities that must be attended to when designing and implementing new cybersecurity measures.
Rising temperatures have created concerns among the scientific community regarding sea levels and the ways that communities and infrastructure will be affected by rising sea levels.
This project performed an analysis of ocean level rise, gross domestic profit, disastrous weather, Arctic ice concentration, and crop yields.
As COVID-19 began to spread throughout the world in March 2020, this project created a Coronavirus live tracker that focuses on state testing data to understand how states are impacted differently from one another.
This group created a twitter word cloud to visually represent and understand trends in topics that are relevant to users in specific locations.
This group utilized the expansive data on Twitter to analyze trends in relevant topics to users in specific locations around the United States.
As the coronavirus pandemic has drastically affected the US economy, this project attempts to discover underlying relationships between various sectors of the stock market and coronavirus data within the United States and the global setting.
The purpose of this project is to characterize behavior patterns of anonymized UVA undergraduate students including movement, social communication, and activities from Aware data and identify the relationships with their corresponding productivity and wellbeing levels.
In this study, the team sought to research and analyze crime data in their town of Charlottesville, VA by creating an interactive map of a dataset of crime from Charlottesville Open Data.
In this study, the team sought to research and analyze crime data in their town of Charlottesville, VA by creating an interactive map of a dataset of crime from Charlottesville Open Data.