home
cv ↗

LLM-MCTS Ensemble Game Playing Agent

Sep. 2025 – Present MSc Thesis
Java TAG Framework Gemini API MCTS LLM
  • Designing and implementing an ensemble game playing agent that dynamically switches between LLM (Gemini via Vertex AI) and MCTS at designated game phases within the Tabletop Games (TAG) framework.
  • Implementing game-specific state summarisation and prompt engineering to enable LLMs to play TAG games (Catan, Sushi Go, Poker, Connect4) competitively and identify their strategic strengths and weaknesses.
  • Developing a phase-based switching mechanism with manually selected rules, with plans to extend to learned selectors via supervised classification and multi-armed bandit approaches.
  • Evaluating ensemble variants against game-specific optimised MCTS opponents under fair time budgets, measuring win rates and mean ordinal rankings across games with varying strategic properties.

Temporal-Aware Hybrid Retrieval System

Feb. 2026 – Apr. 2026 QMUL · ECS736P
Python SPLADE Sentence-BERT ChromaDB RRF MMR
  • Designed and implemented a five-stage hybrid retrieval pipeline for COVID-19 medical literature search over the TREC-COVID benchmark (171,332 papers), serving as Data Engineering & Pre-processing Lead.
  • Combined neural sparse retrieval (SPLADE-inspired, neural TF-IDF) with dense semantic retrieval (Sentence-BERT + ChromaDB HNSW index), fused via Reciprocal Rank Fusion (RRF) and an alternative projection-based fusion strategy for empirical comparison.
  • Integrated temporal filtering to prioritise documents within a ±1 year window of the query date, and applied Maximal Marginal Relevance (MMR) re-ranking to reduce result redundancy.
  • Targeted nDCG@10 > 0.60 and sub-second query latency using entirely open-source tools; evaluated across six ablation configurations with paired t-tests.

RL-Based LLM Fine-Tuning for Abstractive Summarisation

Oct. 2025 – Nov. 2025 QMUL · ECS7032P
Python PyTorch GRPO LoRA Qwen2.5 HuggingFace TRL NLP
  • Fine-tuned Qwen2.5-0.5B-Instruct on 3,000 CNN/DailyMail articles using Group Relative Policy Optimisation (GRPO) with LoRA (r=16, α=32), training only 4.8M parameters (0.96% of model).
  • Engineered a multi-objective reward function through 5 iterative design cycles to combat reward hacking, combining capped ROUGE-L, cosine similarity, piecewise word-length reward, and a completeness signal.
  • Achieved +22% total reward over the base model (1.46 vs. 1.20), reduced incomplete sentences from 12% to 4%, without degrading semantic fidelity.
  • Evaluated cross-domain generalisation across News, Scientific, Business, Sports, and Technology domains; compared against BART-Large-CNN and T5-Small supervised baselines.

Sushi Go Game AI Agent (TAG Framework)

Sep. 2024 – Nov. 2024 QMUL · ECS7032P
Java MCTS Statistical Planning
  • Designed and implemented an autonomous agent for the card game 'Sushi Go' using the Tabletop Games (TAG) framework.
  • Engineered improvements to the Monte Carlo Tree Search (MCTS) algorithm to handle imperfect information scenarios.
  • Developed domain-specific heuristics to guide the search process, significantly increasing the agent's win rate against baseline models.

Blackjack Strategy Analysis & Simulation

Graduation Project Yeditepe University
Python PyTorch Statistical Modeling
  • Engineered a simulation environment for Blackjack to analyse win/loss probabilities across 7 distinct gameplay strategies.
  • Implemented supervised learning models using PyTorch to predict optimal moves based on dealer up-cards and player hand capability.
  • Utilised statistical optimisation techniques to visualise distribution graphs, identifying the most effective long-term betting strategies.
  • Modelled game logic and state management, bridging the gap between game mechanics and data analysis.