Projects — Deniz Genco Atilla

LLM-MCTS Ensemble Game Playing Agent

Sep. 2025 – Present MSc Thesis

Java TAG Framework Gemini API MCTS LLM

Designing and implementing an ensemble game playing agent that dynamically switches between LLM (Gemini via Vertex AI) and MCTS at designated game phases within the Tabletop Games (TAG) framework.
Implementing game-specific state summarisation and prompt engineering to enable LLMs to play TAG games (Catan, Sushi Go, Poker, Connect4) competitively and identify their strategic strengths and weaknesses.
Developing a phase-based switching mechanism with manually selected rules, with plans to extend to learned selectors via supervised classification and multi-armed bandit approaches.
Evaluating ensemble variants against game-specific optimised MCTS opponents under fair time budgets, measuring win rates and mean ordinal rankings across games with varying strategic properties.

Feb. 2026 – Apr. 2026 QMUL · ECS736P

Python SPLADE Sentence-BERT ChromaDB RRF MMR

Designed and implemented a five-stage hybrid retrieval pipeline for COVID-19 medical literature search over the TREC-COVID benchmark (171,332 papers), serving as Data Engineering & Pre-processing Lead.
Combined neural sparse retrieval (SPLADE-inspired, neural TF-IDF) with dense semantic retrieval (Sentence-BERT + ChromaDB HNSW index), fused via Reciprocal Rank Fusion (RRF) and an alternative projection-based fusion strategy for empirical comparison.
Integrated temporal filtering to prioritise documents within a ±1 year window of the query date, and applied Maximal Marginal Relevance (MMR) re-ranking to reduce result redundancy.
Targeted nDCG@10 > 0.60 and sub-second query latency using entirely open-source tools; evaluated across six ablation configurations with paired t-tests.

Oct. 2025 – Nov. 2025 QMUL · ECS7032P

Python PyTorch GRPO LoRA Qwen2.5 HuggingFace TRL NLP

Fine-tuned Qwen2.5-0.5B-Instruct on 3,000 CNN/DailyMail articles using Group Relative Policy Optimisation (GRPO) with LoRA (r=16, α=32), training only 4.8M parameters (0.96% of model).
Engineered a multi-objective reward function through 5 iterative design cycles to combat reward hacking, combining capped ROUGE-L, cosine similarity, piecewise word-length reward, and a completeness signal.
Achieved +22% total reward over the base model (1.46 vs. 1.20), reduced incomplete sentences from 12% to 4%, without degrading semantic fidelity.
Evaluated cross-domain generalisation across News, Scientific, Business, Sports, and Technology domains; compared against BART-Large-CNN and T5-Small supervised baselines.

Sep. 2024 – Nov. 2024 QMUL · ECS7032P

Java MCTS Statistical Planning

Designed and implemented an autonomous agent for the card game 'Sushi Go' using the Tabletop Games (TAG) framework.
Engineered improvements to the Monte Carlo Tree Search (MCTS) algorithm to handle imperfect information scenarios.
Developed domain-specific heuristics to guide the search process, significantly increasing the agent's win rate against baseline models.

Graduation Project Yeditepe University

Python PyTorch Statistical Modeling

Engineered a simulation environment for Blackjack to analyse win/loss probabilities across 7 distinct gameplay strategies.
Implemented supervised learning models using PyTorch to predict optimal moves based on dealer up-cards and player hand capability.
Utilised statistical optimisation techniques to visualise distribution graphs, identifying the most effective long-term betting strategies.
Modelled game logic and state management, bridging the gap between game mechanics and data analysis.