Hi, I'm
I build data products that turn messy, real-world datasets into decisions people can act on. MSc Data Science graduate seeking roles in the UK — open to visa sponsorship.
I'm finishing an MSc in Data Science, working across the full pipeline — from multi-source ETL and database design to NLP classifiers and deployed ML dashboards. My coursework uses real datasets and follows structured methodologies like CRISP-DM, so what I build is traceable, reproducible, and honest about its limitations.
What drives me is closing the gap between a notebook experiment and something genuinely useful. I care about clean pipelines, fair evaluation, and outputs that people outside data science can understand and act on. Looking for graduate roles in the UK where data has a real stake in the outcome.
Urban air quality data exists but rarely reaches the people who need it. Built a Streamlit dashboard on the UCI Air Quality Dataset (9,000+ sensor readings) that surfaces daily and weekly pollution trends and trains a Random Forest to predict CO concentrations. The model achieves R² > 0.85 on held-out data. Any local authority can deploy this template in under 10 minutes.
Manual content moderation doesn't scale. Built a full NLP classification pipeline on 6,443 tweets across 6 imbalanced topic categories, benchmarking three models under a CRISP-DM framework. TF-IDF bigrams with a LinearSVC achieved 79.83% accuracy and 63.18% macro F1 — outperforming the Naive Bayes baseline by 5 points on a genuinely hard, skewed dataset.
Organisations routinely receive customer records from incompatible systems. Built a Python ETL pipeline that extracts from CSV, JSON, and XML sources, applies regex-based normalisation (name standardisation, type casting, deduplication), and loads into a unified SQLite schema via PonyORM. A production-ready template for consolidating heterogeneous data — a daily reality in enterprise data engineering.
HM Land Registry publishes every UK residential sale — 28M+ transactions going back to 1995, all open data. Building an XGBoost regression model to predict property values by postcode, with SHAP feature attribution so predictions are explainable to non-technical stakeholders. Deploying as a Streamlit app on Render. Directly applicable to PropTech and housing analytics roles.
NHS trusts spend heavily on staffing without reliable short-term demand forecasts. Using NHS England weekly A&E attendance open data to train a Prophet time-series model with seasonal and bank-holiday effects. Predictions served via FastAPI with a Streamlit dashboard for trust-level comparison — built for the kind of operational analytics roles the NHS and its partners hire for.
Most churn models tell you who will leave but not why — making them hard for business teams to act on. Building a LightGBM classifier on the IBM Telco Churn dataset with SHAP waterfall charts for per-customer explanations and a segment-level Streamlit dashboard. The explainability layer is the differentiator — UK employers in finance and telecoms increasingly require models they can interrogate.
I'm actively looking for graduate data science and ML engineer roles in the UK. If you're hiring or working on something interesting, feel free to reach out — I respond quickly.