Web Scraping Employee Happiness

Introduction

If you are in the finance space, you are surely already quite aware of environmental, social and governance (ESG) implementation. If you are not, it is an incredibly exciting and relatively new development that aims to add historically qualitative factors to financial analysis. For this blog post, I will be diving into the popular ESG-related concept of employee happiness. My personal value-driven and long-term-focused investment philosophy aligns well with the fundamentals of ESG, and therefore I would like to explore the merit of some ESG metrics that might work well for a trading bot down the line. I will start this exploration with quantifying employee happiness.

Quantifying Employee Happiness

Goal:

  • Input: list of stock company names
  • Output: same list of companies ranked in order of happiest employees to unhappiest employees based on a composite score

The resulting table in this analysis can be used in a variety of ways. It can be a tool to filter companies for investment, employment, or partnership. For my investment purposes, I will most likely be using the results to aid in a company filtering process.

Methodology

Very different from the moving average strategy post that I did prior, there are a lot of assumptions that needed to be made to accomplish the task set out here. Employee happiness cannot be drilled down into a single metric. Yet, for this project, that is exactly what I am attempting to do. Therefore, many assumptions were made to produce the resulting composite score.

Companies

To simplify this process, I used the first 70 companies listed in alphabetical order from the S&P 500. I wanted enough results to make a solid case for comparison, but, given the time delays I implemented into the scraper, I didn’t want it to take forever. That said, these scrapers can be used to get data on any company as long as the sites used have available data. Here’s the code I used the grab the company names:

page = requests.get('https://en.wikipedia.org/wiki/List_of_S%26P_500_companies')
soup = BeautifulSoup(page.content, 'html.parser')
table = soup.find(id='constituents').tbodytable_rows = table.find_all('tr')companies = []
for row in table_rows[1:70]:
elements = row.find_all('td')
company = elements[1].text
companies.append(company)

Target Sites

For anybody familiar with employee review sites, Glassdoor is widely considered the market leader. The strength of its offerings, however, come in the form of text-based employee reviews. Other sites provided far more numerical data for the companies in question. I found that Indeed and Comparably both offer extensive information about a wide range of companies in the form of numerical data. Therefore, I went with those two options. Down the line, I may very well utilize text-based reviews to complement this analysis. If you are interested in web scraping Glassdoor, the sign-in pop up is the challenge. Within this project notebook, I have a commented-out Glassdoor scraper that can bypass this issue.

Calculations

Here is where the largest assumptions had to be made. There were two issues to overcome: (1) For each company I was scraping about 20 metrics. To condense these metrics into a single composite score, I categorized the metrics into 5 categories (company culture, company opportunity, company perks and benefits, company executive team and company employee treatment). Each was comprised of original 3+ metrics that I weighted equally to produce the category. To get the final composite score, all 5 categories were equally weighted as well. Therefore, the composite score runs on the assumption that employee happiness can be summed up by equally weighting those four categories. For the purpose of this exploration project, that calculation will have to do. (2) Not all metrics were available for all companies. To address this problem, the final calculations reflect only the available metrics. Therefore, the final composite scores are not necessarily comprised of the same underlying metrics.

Challenges

The main challenge faced was getting my IP address blocked while web scraping. This article below was extremely helpful. My biggest takeaway from being blocked a few times is that it is crucial to stay patient. If you are in a rush while web scraping, you are setting yourself up for trouble as so many sites now have protections in place to any fast-paced and automated behavior. Be patient and be random!

Findings

Below are the first 20 companies in the dataframe listed in descending order for final composite metric:

A couple things to note about the final product here. (1) The data does hold up. After doing some digging to confirm the results separately, I found that the information provided in the data frame is generally quite accurate. (2) The company employee treatment metric is missing for most companies. Despite its absence, it is worth keeping in for increased accuracy in the final composite metric.

Conclusion

While far from perfect, the project was a success. That being said, the code certainly needs a lot of optimization, and I may want to move into the more text-based reviews moving forward. With that in mind, if anybody reading this has any suggestions for improving any/all elements of this project, please to reach out — I would love to discuss. Finally, below I have included my code used for scraping Indeed and Comparably, along with the code that gets you into Glassdoor. Feel free to check out the full notebook here.

--

--

--

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Here’s an infographic describing the GazeTV reward Mechanism.

Making choropleth maps with blender.

Top 5 unknown ML libraries to help you through your day-to-day tasks

Scraping and Classifying Indeed Job Postings for Data Occupations, Part 1

Latest Updates on Google Data Analytics (May 2021)

Notes on Barbara Liskov paper on data abstraction and hierarchy.

Contactless Health Insurance Cards: An Approach to Safeguard Healthcare Users and Frontline Admin

ACL Explained; A Use Case for Data Protection

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Michael Wirtz

Michael Wirtz

More from Medium

Top 5 tips to improve AI development in 2022

BIG DATA AND BUSINESS INTELLIGENCE

Top Machine Learning Startups that Aim to Excel in 2022

Top 5 Use Of Artificial Intelligence in Business.