Centre Inria de l'Université de Lille
Team project Scool -- Spring 2025
Keywords: Contextual Multi-armed bandits, Group-sequential, Geospatial statistics.
Investigator: The project is proposed and advised by Odalric-Ambrym Maillard from Inria team-project Scool.
Place: This project will be primarily held at the research center Inria Lille -- Nord Europe, 40 avenue Halley, 59650 Villeneuve d'Ascq, France.
Traditional experimental methodologies in medicine and agriculture often rely on long-term monitoring to assess treatment effectiveness. This makes purely sequential experimental designs impractical due to the extended time required before making informed decisions.
Group-sequential methodologies \cite{BartroffLai} address this limitation by allowing interim analyses on cohorts treated in parallel. While both hypothesis testing and bandit-based approaches have been explored within this framework, fundamental optimality properties remain underdeveloped.
This research program aims to extend the scope of group-sequential decision-making by incorporating reinforcement learning (RL) and bandit theory. Specifically, we will explore how contextual information, geospatial correlations, and adversarial robustness can be integrated into group-sequential adaptive sampling strategies. The integration of group-sequential methodologies with reinforcement learning opens exciting new research directions at the intersection of experimental design, adaptive decision-making, and geostatistics. By exploring contextual RL, adversarial robustness, structured bandits, and geospatial optimization, this program aims to develop novel methodologies with applications across medicine, agriculture, and environmental science.
One of the key advancements in modern bandit theory is the use of linear contextual structures, allowing external covariates (e.g., patient age, weather conditions) to inform decision-making. This motivates a reconsideration of group-sequential methodologies through the lens of contextual bandits. By leveraging this framework, we can design more efficient adaptive experiments where each batch of experiments contributes directly to improving future decisions.
Beyond contextual bandits, a natural extension is to incorporate MDP-based RL algorithms into the group-sequential paradigm. This shift enables decision-making over multiple stages, allowing for a more comprehensive adaptive learning process. Addressing these questions requires developing new methodological tools to balance exploration and exploitation within structured group-sequential setups.
An important theoretical direction is the adaptation of hypothesis testing methodologies to the group-sequential bandit framework. Unlike purely sequential bandit strategies, which operate on a one-step-at-a-time basis, group-sequential approaches allow for batch experimentation at each decision step. Key open questions include:
In structured settings, the challenge is to rapidly identify the optimal action within specific contexts while minimizing the number of required trials. Exploiting correlations across experimental slots naturally reduces sample complexity, but the characterization of optimal performance guarantees remains an open problem
Many real-world applications involve decision-making in geographically distributed environments. Standard contextual bandit models often rely on linear regression or kernel methods, but in geostatistical settings, more advanced techniques such as Kriging, recently revisited SivieroKriging, offer improved interpolation capabilities including in a dependent context.
However, integrating geostatistical methods into bandit-based decision-making presents several challenges:
This program will explore how recent advances in geostatistics can be combined with multi-armed bandit theory to improve experimental design and decision-making in geospatial contexts.
One fundamental challenge in real-world experiments is the presence of adversarial corruption where a subset of observations may be unreliable due to systematic biases, sensor failures, or external disruptions. For instance, in agricultural trials, sensor malfunctions could lead to incorrect soil moisture readings, or a pest outbreak in a subset of test fields might skew yield results. In clinical trials, certain patient subgroups might exhibit non-compliance with treatment protocols, leading to misleading outcomes. While corruption models have been studied as a variant of bandits Agrawal et al, 2024, Traditional sequential bandit models assume a single-arm selection per time step, limiting their ability to mitigate adversarial attacks.
A more robust alternative is to consider group-sequential bandits, where multiple experimental slots (e.g., multiple farms, patient cohorts) can be tested in parallel. This setting offers two major advantages:
Additionally, group-sequential bandits introduce new challenges related to group-correlated corruption. When experimental units (e.g., farms, clinical sites) are geographically proximate, external hazards (e.g., weather events, local epidemics) can simultaneously affect multiple slots. For instance, a prolonged drought might impact all test fields in a region, distorting yield comparisons, while a hospital-wide data logging error could corrupt multiple patient records. This motivates the need for diversity-enforcing strategies that avoid over-concentration on a single alternative within geographically correlated environments.
The project will be hosted at Centre Inria de l'Université de Lille, in the Scool team. Scool (Sequential COntinual and Online Learning) focuses on the study of sequential decision-making under uncertainty.