Colleges love to send email advertisements, so much so that it becomes inbox clutter. This project serves to analyze this spam, and look at some interesting trends in the emails I've received in the past year regarding college.
/clients
A Svelte frontend of statistics hosted on Netlify.
/scripts
A number of Node JS scripts to parse emails and get college data. These are designed to be used through Github Actions, but can also be run locally.
/scripts/index.js
Run all scripts, including downloading emails and generating statistics.
/scripts/utils
A set of utilities used to download and parse the data found in data
.
To create the same type of visualization locally for your own emails, follow these steps.
git clone https://github.com/louismeunier/college-emails.git
)client/src/data.json
, client/src/dates.json
, and client/src/updated.json
.cd client && yarn && cd ../scripts && yarn
to install dependencies.scripts/credentials.json
node scripts
, and if your setup was done correctly, it should prompt you to visit a URL and authenticate. This should save a file scripts/token.json
.node scripts
a second time. It should now actually run the program, and regularly print output to the screen indicating progress. client/src/dev_data.json
, client/src/dev_data.json
, and client/src/dev_updated.json
. Delete dev_
from each of these.cd client && yarn dev
.The dataset used containing college websites, names, locations, etc. was found here.
Because of the way emails are linked to their respective college (via the domain name of the sender), there are some emails that are unable to be linked to a college and are thus not included in the final statistics. This, however, only accounts for ~2% of all the emails parsed per run.