Sesugh Barnabas

MSc Data Science

5+ DS Projects

MSc Data Science Graduate

I'm finishing an MSc in Data Science, working across the full pipeline — from multi-source ETL and database design to NLP classifiers and deployed ML dashboards. My coursework uses real datasets and follows structured methodologies like CRISP-DM, so what I build is traceable, reproducible, and honest about its limitations.

What drives me is closing the gap between a notebook experiment and something genuinely useful. I care about clean pipelines, fair evaluation, and outputs that people outside data science can understand and act on. Looking for graduate roles in the UK where data has a real stake in the outcome.

Degree MSc Data Science

University University of Sunderland

Location United Kingdom

Availability Grad Roles · Visa Sponsorship

View LinkedIn

AirGuard UK — Air Quality Dashboard

Urban air quality data exists but rarely reaches the people who need it. Built a Streamlit dashboard on the UCI Air Quality Dataset (9,000+ sensor readings) that surfaces daily and weekly pollution trends and trains a Random Forest to predict CO concentrations. The model achieves R² > 0.85 on held-out data. Any local authority can deploy this template in under 10 minutes.

PythonStreamlitscikit-learnPlotlyPandas

Tweet Topic Classifier — NLP Pipeline

Manual content moderation doesn't scale. Built a full NLP classification pipeline on 6,443 tweets across 6 imbalanced topic categories, benchmarking three models under a CRISP-DM framework. TF-IDF bigrams with a LinearSVC achieved 79.83% accuracy and 63.18% macro F1 — outperforming the Naive Bayes baseline by 5 points on a genuinely hard, skewed dataset.

PythonNLTKscikit-learnTF-IDFLinearSVC

Multi-Source ETL Pipeline

Organisations routinely receive customer records from incompatible systems. Built a Python ETL pipeline that extracts from CSV, JSON, and XML sources, applies regex-based normalisation (name standardisation, type casting, deduplication), and loads into a unified SQLite schema via PonyORM. A production-ready template for consolidating heterogeneous data — a daily reality in enterprise data engineering.

PythonPonyORMSQLiteETLData Engineering

UK Property Price Predictor

HM Land Registry publishes every UK residential sale — 28M+ transactions going back to 1995, all open data. Building an XGBoost regression model to predict property values by postcode, with SHAP feature attribution so predictions are explainable to non-technical stakeholders. Deploying as a Streamlit app on Render. Directly applicable to PropTech and housing analytics roles.

PythonXGBoostSHAPStreamlitIn Progress

NHS A&E Demand Forecaster

NHS trusts spend heavily on staffing without reliable short-term demand forecasts. Using NHS England weekly A&E attendance open data to train a Prophet time-series model with seasonal and bank-holiday effects. Predictions served via FastAPI with a Streamlit dashboard for trust-level comparison — built for the kind of operational analytics roles the NHS and its partners hire for.

PythonProphetFastAPIStreamlitIn Progress

Customer Churn Intelligence

Most churn models tell you who will leave but not why — making them hard for business teams to act on. Building a LightGBM classifier on the IBM Telco Churn dataset with SHAP waterfall charts for per-customer explanations and a segment-level Streamlit dashboard. The explainability layer is the differentiator — UK employers in finance and telecoms increasingly require models they can interrogate.

PythonLightGBMSHAPStreamlitIn Progress

Titanic Survival Analysis — EDA

Exploratory data analysis on 891 Titanic passenger records to uncover what factors determined survival. Handled missing data in Age (20% gap) and Cabin (77% missing), then visualised survival rates across gender, passenger class, and age groups. Women in first class survived at 97% vs 13% for men in third class — a stark signal that rescue priority was shaped by both gender and economic class.

PythonPandasMatplotlibSeabornEDA

Superstore Sales Dashboard

Analysis of 8,399 retail transactions across 21 variables to surface regional performance gaps, category trends, and top-selling products. The West region led with £3.6M in sales while Nunavut trailed at £116K — a 31× gap pointing to untapped regional opportunity. Monthly trend analysis revealed consistent Q4 spikes, informing inventory and staffing decisions for a business running on seasonal demand.

PythonPandasMatplotlibGroupByEDA

Who I Am

MSc Data Science Graduate

Featured Projects

AirGuard UK — Air Quality Dashboard

Tweet Topic Classifier — NLP Pipeline

Multi-Source ETL Pipeline

UK Property Price Predictor

NHS A&E Demand Forecaster

Customer Churn Intelligence

Titanic Survival Analysis — EDA

Superstore Sales Dashboard

Skills & Technologies

Core Proficiency

Tech Stack

Get In Touch