Job Vacancies Website Analysis

Introduction:

The digital age has transformed the job market, making data analysis an invaluable tool for navigating it. In this project, sourced from the Coursera Project Network, involved a deep dive into the world of job vacancies websites. The goal was to uncover trends and insights that could benefit both job seekers and recruiters.

Step 1: Setting Up the Environment

The first step in any data analysis project is to establish a solid foundation. For this project, Python was my go-to language, known for its powerful data analysis libraries. Here’s a snippet of the initial setup:


# Importing essential libraries
import requests
from bs4 import BeautifulSoup
import pandas as pd
import matplotlib.pyplot as plt

Step 2: Data Collection

Collecting data is crucial, and ethical web scraping is a must. To avoid legal issues with real job sites, I used a mock job site, ensuring a responsible approach to data gathering.


# Defining the base URL for the mock job board
base_url = 'https://example-job-board.w3spaces.com'

# Define the URLs for page 1, page 2, and page 3
page_urls = [base_url] + [f"{base_url}/page-{page_num}.html" for page_num in range(2, 4)]

# List to hold job data
jobs_data = []

# Iterate through each page URL
for url in page_urls:
    # Send a GET request to the page
    response = requests.get(url)
    # Parse the HTML content
    soup = BeautifulSoup(response.text, 'html.parser')
    # Find all job postings on the page
    job_postings = soup.find_all('div', class_='job-posting')
    # Extract data from each job posting
    for job in job_postings:
        title = job.find('h2', class_='title').text.strip()
        company = job.find('span', class_='company').text.strip()
        location = job.find('span', class_='location').text.strip()
        description = job.find('p', class_='description').text.strip()
        # Add the job data to the list
        jobs_data.append({
            'Title': title,
            'Company': company,
            'Location': location,
            'Description': description
        })

Step 3: Data Processing

With raw data in hand, I moved on to cleaning and structuring it, ensuring the dataset was ready for analysis.


# Convert the list to a DataFrame
jobs_df = pd.DataFrame(jobs_data)

# Save the DataFrame to a CSV file
jobs_df.to_csv('job_postings.csv', index=False)
# Load the scraped data
jobs_df = pd.read_csv('job_postings.csv')

# Data Cleaning: Remove any duplicates
jobs_df.drop_duplicates(inplace=True)

# Handling Missing Values: Fill missing descriptions with a generic placeholder
jobs_df['Description'] = jobs_df['Description'].fillna('Not provided')

# Feature Engineering: Add a column for the length of the job description
jobs_df['Description_Length'] = jobs_df['Description'].apply(len)

Step 4: Data Analysis

The analysis phase was where patterns began to emerge. I focused on the most common job titles, the distribution of jobs across locations and the average length of job descriptions.


# Most common job titles
common_titles = jobs_df['Title'].value_counts()
common_titles


# Distribution of jobs by location
location_distribution = jobs_df['Location'].value_counts()
location_distribution


# Average length of job descriptions
average_description_length = jobs_df['Description_Length'].mean()
average_description_length

Step 5: Data Visualization

Visualizations brought the data to life, making the insights clear and actionable.


import matplotlib.pyplot as plt

# Plot the most common job titles
common_titles.plot(kind='bar')
plt.title('Most Common Job Titles')
plt.xlabel('Job Title')
plt.ylabel('Frequency')
plt.show()

# Plot the distribution of jobs by location
location_distribution.plot(kind='bar')
plt.title('Job Distribution by Location')
plt.xlabel('Location')
plt.ylabel('Number of Jobs')
plt.show()

# Histogram of job description lengths
jobs_df['Description_Length'].plot(kind='hist', bins=20)
plt.title('Histogram of Job Description Lengths')
plt.xlabel('Length of Description')
plt.ylabel('Frequency')
plt.show()

Conclusions:

The project’s findings were illuminating. The average job description length provided insight into employer expectations, while the job distribution and common titles highlighted market demands.

Recommendations:

Armed with these insights, I recommend job seekers focus on areas with the most opportunities and align their applications with prevalent job titles. Recruiters should craft detailed job descriptions to attract the right talent.

Closing Thoughts:

This project, supported by the Coursera Project Network, was a testament to the power of data analysis in understanding job market dynamics. While the job board was a mock-up, the skills and insights are real and applicable, demonstrating the potential of data science in the recruitment industry.