Victor Yuan
  • Projects
  • Posts

rbiotechsalary

Published

May 15, 2024

In 2022 I pulled reddit’s r/biotech salary survey data to inform my job search. It involved extensive data cleaning, and ultimately was very useful information for a new graduate job seeker. Over time, I used this dataset to explore software, data visualization, and data science methods for my own personal interest. Some of these explorations I’ve shared here.

The raw data is a live google excel file connected to a google form survey. The excel file is automatically pulled every week using github actions, and an ETL script (source) publishes the dataset to github as a flat file (csv). The ETL script (quarto markdown document) also publishes the rendered ETL script using quarto here, which can be conveniently used to examine the different steps of the ETL pipeline in detail if needed.

To explore the data, I built a shiny app, which is a deployed docker container hosted on a digital ocean droplet. The app reads in the data from the flat file hosted on github. The app has interactive controls to filter the data and examine salary and other survey response information.

I also have used this dataset to explore using observable js to build dashboards