LLM-MCTS Ensemble Game Playing Agent
Sep. 2025 – Present
MSc Thesis
Java
TAG Framework
Gemini API
MCTS
LLM
- Designing and implementing an ensemble game playing agent that dynamically switches between LLM
(Gemini via Vertex AI) and MCTS at designated game phases within the Tabletop
Games (TAG) framework.
- Implementing game-specific state summarisation and prompt engineering to enable LLMs to play TAG games
(Catan, Sushi Go, Poker, Connect4) competitively and identify their strategic strengths and weaknesses.
- Developing a phase-based switching mechanism with manually selected rules, with plans to extend to learned
selectors via supervised classification and multi-armed bandit approaches.
- Evaluating ensemble variants against game-specific optimised MCTS opponents under fair time budgets,
measuring win rates and mean ordinal rankings across games with varying strategic properties.
Temporal-Aware Hybrid Retrieval System
Feb. 2026 – Apr. 2026
QMUL · ECS736P
Python
SPLADE
Sentence-BERT
ChromaDB
RRF
MMR
- Designed and implemented a five-stage hybrid retrieval pipeline for COVID-19 medical
literature search over the TREC-COVID benchmark (171,332 papers), serving as Data Engineering &
Pre-processing Lead.
- Combined neural sparse retrieval (SPLADE-inspired, neural TF-IDF) with dense semantic retrieval
(Sentence-BERT + ChromaDB HNSW index), fused via Reciprocal Rank Fusion (RRF) and an
alternative projection-based fusion strategy for empirical comparison.
- Integrated temporal filtering to prioritise documents within a ±1 year window of the query date, and
applied Maximal Marginal Relevance (MMR) re-ranking to reduce result redundancy.
- Targeted nDCG@10 > 0.60 and sub-second query latency using entirely open-source tools; evaluated across
six ablation configurations with paired t-tests.
RL-Based LLM Fine-Tuning for Abstractive Summarisation
Oct. 2025 – Nov. 2025
QMUL · ECS7032P
Python
PyTorch
GRPO
LoRA
Qwen2.5
HuggingFace TRL
NLP
- Fine-tuned Qwen2.5-0.5B-Instruct on 3,000 CNN/DailyMail articles using Group Relative
Policy Optimisation (GRPO) with LoRA (r=16, α=32), training only 4.8M parameters (0.96% of model).
- Engineered a multi-objective reward function through 5 iterative design cycles to combat
reward hacking, combining capped ROUGE-L, cosine similarity, piecewise word-length reward, and a
completeness signal.
- Achieved +22% total reward over the base model (1.46 vs. 1.20), reduced incomplete
sentences from 12% to 4%, without degrading semantic fidelity.
- Evaluated cross-domain generalisation across News, Scientific, Business, Sports, and Technology domains;
compared against BART-Large-CNN and T5-Small supervised baselines.
Sushi Go Game AI Agent (TAG Framework)
Sep. 2024 – Nov. 2024
QMUL · ECS7032P
Java
MCTS
Statistical Planning
- Designed and implemented an autonomous agent for the card game 'Sushi Go' using the Tabletop Games
(TAG) framework.
- Engineered improvements to the Monte Carlo Tree Search (MCTS) algorithm to handle
imperfect information scenarios.
- Developed domain-specific heuristics to guide the search process, significantly increasing the agent's win
rate against baseline models.
Blackjack Strategy Analysis & Simulation
Graduation Project
Yeditepe University
Python
PyTorch
Statistical Modeling
- Engineered a simulation environment for Blackjack to analyse win/loss probabilities across 7
distinct gameplay strategies.
- Implemented supervised learning models using PyTorch to predict optimal moves based on
dealer up-cards and player hand capability.
- Utilised statistical optimisation techniques to visualise distribution graphs, identifying the most
effective long-term betting strategies.
- Modelled game logic and state management, bridging the gap between game mechanics and data analysis.