Ahana Deb (অহনা দেব)

I am a ~~first~~ second year PhD student at the AI-ML group, part of the School of Engineering at Universitat Pompeu Fabra, where I work on Reinforment Learning. My PhD advisor is Anders Jonsson.

Previously, I did my Master's in Intelligent Interactive Systems at UPF, and have a Bachelor's in Instrumentation and Electronics Engineering from Jadavpur University, India. I'm currently really into film photography, see my photos here!

GitHub / Google Scholar / BlueSky / LinkedIn

Research

I'm broadly interested in everything reinforcement learning, learning theory, representation learning, and also a bit of robotics. Currently I'm working on Heirarchical Reinforcement Learning, and also multimodal alignment in LLMs.

News

New! [Mar '25] Starting my research stay at INRIA, Lille, France, under the supervision of Debabrota Basu.
New! [Jan '25] Our paper "Offline RL in Regular Decision Processes: Sample Efficiency via Language Metrics" was accepted at ICLR 2025!
[Jan '25] Gave a talk at Indian Statistical Institute (ISI), Kolkata!
[Nov '24] Attending Learning on Graphs Conference 2024!
[Oct '24] Presented our paper Tractable Offline Learning of Regular Decision Processes at EWRL, Toulouse!
[Oct '24] Participated in the first LeRobot Hackathon by HuggingFace!
[Jul '24] Our poster was accepted at EWRL 2024!
[May '24] Attented AISTATS 2024 at Valencia, Spain!
[Mar '24] Attended Workshop on Optimal Transport at Berlin!
[Sep '23] Started my PhD under the supervision of Anders Jonsson at the AI&ML lab, UPF!
[Aug '23] Presented our paper (Oral Presentation!) BeAts: Bengali Speech Acts Recognition using Multimodal Attention Fusion at INTERSPEECH 2023 in Dublin!
[July '23] Co-organised Reinforcement Summer School 2023!
[July '23] Attended the ESSAI summer school 2023 at Ljubljana, Slovenia!
[July '23] Defended my Master Thesis at Universitat Pompeu Fabra, 2023

Publications

	Offline RL in Regular Decision Processes: Sample Efficiency via Language Metrics Ahana Deb, Roberto Cipollone, Anders Jonsson, Alessandro Ronca, Mohammad Sadegh Talebi ICLR (Poster), 2025 code / poster / website / In this paper, we consider episodic RDPs and show that it is possible to overcome the limitations of existing offline RL algorithms for RDPs via the introduction of two original techniques: a novel metric grounded in formal language theory and an approach based on Count-Min-Sketch (CMS). We derive Probably Approximately Correct (PAC) sample complexity bounds associated to each of these techniques, and validate the approach experimentally.
	Tractable Offline Learning of Regular Decision Processes Ahana Deb, Roberto Cipollone, Anders Jonsson, Alessandro Ronca, Mohammad Sadegh Talebi EWRL (Poster), 2024 arxiv / poster / We study offline Reinforcement Learning (RL) in a class of non-Markovian environments called Regular Decision Processes (RDPs). We introduce two novel algorithms with improved sample complexity and derive the associated PAC sample complexity bounds.
	BeAts: Bengali Speech Acts Recognition using Multimodal Attention Fusion Ahana Deb, Sayan Nag, Ayan Mahapatra, Soumitri Chattopadhyay,Aritra Marik, Pijush Kanti Gayen, Shankha Sanyal, Archi Banerjee, Samir Karmakar INTERSPEECH (Oral), 2023 arxiv / website / We develop a novel multimodal approach combining two models, wav2vec2.0 for audio and MarianMT for text translation, by using multimodal attention fusion to predict speech acts in our prepared Bengali speech corpus. We also show that our model BeAts (Bengali speech acts recognition using Multimodal Attention Fusion) significantly outperforms both the unimodal baseline using only speech data and a simpler bimodal fusion using both speech and text data.
	[Master Thesis] Inverse reinforcement learning with linearly-solvable MDPs for multiple reward functions Ahana Deb, Supervisors: Anders Jonsson, Vicenç Gómez, Mario Ceresa Universitat Pompeu Fabra, 2022 We aim to utilise the advantages in problem formulation and ease of computation for Linearly solvable Markov Decision Processes (LMDPs), for a multiple-agent, multiple- reward scenario, using non-parametric Bayesian inverse reinforcement learning. Also available here.

Talks

project image

Offline Learning in Regular Decision Processes

talk
2025-01-06
slides /

Invited talk at Indian Statistical Institute (ISI), Kolkata for ACMU seminar, 2025.

Other Projects

These include coursework, side projects and unpublished research work.

	Dia Internacional de la Llengua Materna UPF Catala 2025-02-26 slides / Vaig fer una breu presentació sobre la història del Dia Internacional de la Llengua Materna. [Translation: Gave a short presentation on the history of International Mother Language Day.]
	LeRobot Hackathon by HuggingFace HuggingFace 2024-10-26 Assembled a Moss v1 arm from scratch, and used the LeRobot library to configure, calibrate, and teleoperate the follower arm by manually controlling the leader arm. We used this to record dataset and used it to train a imitation learning policy and ran our trained policy on the real robot.
	EVA: Chatbot project UPF Natural Language Interaction 2023-12-07 code / We built a chatbot and a web application to replace the current virtual secretary of the Department of Information and Communications technologies at upf, featuring automatic speech recognition and text-to-speech, with Adriana Basbous and Tobias Glaninger – coursework for natural language interaction (2023).
	Hate Speech analysis in the Manosphere UPF 2023-07-21 Work with Adriana Basbous. We investigated misinformation and hate speech in alt-right and manosphere channels online. We analyzed web and social media corpora from online communities associated to the manosphere. We applied sentiment analyses and topic modeling using different machine learning models, exploring their varying capacity to identify and assess hate speech and misinformation – coursework for web intelligence given by Pablo Aragón (2023). [Image designed by Freepik]
	Flappy Birds project UPF 2022-06-14 paper / code / For our final project for Gergely Neu’s course on Reinforcement Learning, we try a bunch of new ideas to learn how to play Flappy Birds, and we compare what works versus what doesn’t. <!– CS225A Paper
	Racial Bias in Facial Recognition Technology UPF 2022-06-14 paper / In this work we aim to analyze recent publications on FRT benchmarks, and propose questions on the racial composition of datasets and accuracy reports on racial subgroups. Our analysis indicates that a significant portion of the papers does not consider any kind of bias, some racial groups are underrepresented in the datasets used, and there is a need for taking these factors into account while analyzing facial data, otherwise posing limitations in the performance of the FRTs. <!– CS225A Paper
	A Deep Learning Approach for Deeply Inaccurate Wordle Solving UPF 2019-06-14 paper / This is a fun (read mock) paper about (not) solving the WORDLE puzzle that I wrote with Sayan Goswami, for SIGBOVIK 2022.
	State estimation of dynamic particle from a moving camera TCS 2019-06-14 We present a solution to estimating motion and position of a body undergoing motion with a linear velocity while the camera independently undergoes a transformation. The solution narrows down the requirement to three consecutive frames of the camera to correctly estimate the position and velocity of the moving body with respect to the coordinate system of the camera.

Design and source code on lease from Jon Barron's website