INRIA Logo CRIStAL Logo Université de Lille Logo CNRS Logo

Research Topic Proposal

Centre Inria de l'Université de Lille
Team project Scool -- Spring 2025

“Reinforcement Learning models for Spontaneous and Self-exciting Disturbances”

Keywords: Autonomous Systems, Markov Decision Process, Hawkes process.

Investigator: The project is proposed and advised by Odalric-Ambrym Maillard from Inria team-project Scool.

Place: This project will be primarily held at the research center Inria Lille -- Nord Europe, 40 avenue Halley, 59650 Villeneuve d'Ascq, France.


RL models for Spontaneous and Self-exciting Disturbances

Reinforcement learning (RL) has witnessed remarkable success in addressing decision-making problems across discrete and continuous domains. However, many real-world systems remain outside the effective scope of traditional Markov Decision Process (MDP) frameworks, particularly those governed by continuous-time dynamics punctuated by random events. My research aims to bridge this gap by extending RL methodologies to autonomous systems described by e.g. Piecewise Deterministic Markov Processes (PDMPs) or Hawkes processes. These systems naturally emerge in fields as diverse as physics-inspired RL, manufacturing, healthcare and agroecology, where control strategies must account for continuous evolution punctuated by stochastic disturbances.

From Classical Models to Poisson Disturbance Put simply, unlike classical modeling of bandits considering a treatment produces an immediate output, one should consider that in healthcare, a patient’s health status evolves continuously (e.g., tumor growth) but is interrupted by stochastic events like relapses or side effects. Similarly, in agroecology, the plant growth dynamics are influenced by random occurrences like pest outbreaks or adverse weather. Likewise, in manufacturing, a production line operates under predictable dynamics but can experience sudden failures or deterioration over time.

In hypothesis testing and clinical trials \cite{bartroff2012sequential}, these phenomenon have been modeled using Poisson processes \cite{li2010conditional}, particularly in vaccine studies considering safety requirement over a large population. Likewise in control, PDMPs where deterministic dynamics govern system evolution between stochastic Poisson events, offer a more expressive framework compared to the traditional MDP by enabling to model spontaneous changes. These scenarios share a critical feature: decision-making is influenced by the interplay of deterministic evolution, stochastic events, and require interventions that balance the timing of interventions (e.g., optimal treatment or agricultural practice scheduling) on top of the nature of the intervention (e.g., dosage or method).

However, current models typically assume static decision contexts, leaving the integration of such disturbances into dynamic systems and RL frameworks underexplored. Incorporating Poisson disturbances into sequential decision-making hence represents a significant opportunity to extend classical bandit and MDP to alternative models. By enabling adaptive, resource-efficient decision-making, PDMPs provide a rigorous foundation for adapting RL in these domains, enabling policies that address when to act and what actions to take.

From Poisson to Hawkes Processes Beyond Poisson processes, many systems exhibit self-exciting events, where one occurrence increases the likelihood of subsequent events. Hawkes processes are the natural extension, enabling the modeling of cascading failures in factories, relapse dynamics in healthcare (e.g., tumor recurrence), or pest infestations in agroecology. These processes capture the temporal dependencies between events, offering a richer representation of real-world phenomena than memoryless Poisson processes. Besides, many real-world \hl{interventions} introduce such disturbances that are not instantaneous but unfold over time, resembling aftershocks following an earthquake: administering a medical treatment may cause immediate effects but also trigger side effects at random future times, requiring careful monitoring. Similarly, agroecological practices (e.g., pesticide application) can have delayed effects, such as pest outbreaks influenced by environmental factors, while industrial repairs may induce future system fragility.

Towards piecewise-deterministic Hawkes-Markov Decision Processes

In this proposal we further study a variant of the traditional Markov Decision Process (MDP) formulation adapted to capture systems displaying an autonomous (that is, uncontrolled) dynamics. Such systems naturally appear when monitoring a physical system. For instance, think of a production line which usually works in a nominal way, but as the physical systems ages, some parts spontaneously deteriorate with time or even break, causing the dynamics of the system to change (autonomously). Another example is that of patient followed after an important treatment or medical intervention, whose health status evolves autonomously, usually in a positive way, but with potential relapses occurring spontaneously. Yet another example is the monitoring of an agrosytem, where plants autonomously grow from one stage to the other in a predictable manner, depending on the given conditions and resources available, and this evolution can be modified by sudden changes of weather conditions, or apparition of some pathogen. In each situation, the decision maker (maintenance officer, physician, farmer) may see a variant of a control problem, with some variables evolving in an autonomous, but predictable way, plus with a specific form on non-stationarity, where the dynamics may suddenly change at seemingly random times. The first type of evolution may be modeled e.g. with a popular linear system or PDE formulation, while the occasional changes modify the parameters of the dynamics. The change occurrences and intensities may further be modeled. Elaborating on the considered examples, when a change of dynamics spontaneously occurs, such as a system part that brokes, it is natural to model that some overshoots may appear, that is, a subsequent spontaneous change of the system, caused by the first change. Such self-excitation is reminiscent of the modeling of earthquakes, epidemic or biological neurons modeling. Now, the agent acts on this system, causing (controlled) impulse jumps of the dynamics. The autonomous evolution suggests that the decision maker should carefully choose not only what intervention to do but also when to perform it. In particular, this means the set of policies considered by a decision maker should handle decision time.

Modeling

Mathematically, the traditional way to model decision in dynamical systems is via the Markov Decision Process framework Puterman, 2014; Sutton, 2018, which models controlled systems. Autonomous dynamics are naturally specified by flow equations Davis, 1984; De Saporta et al., 2016; Cleynen et al., 2024, while spontaneous changes are naturally modeled using Hawkes processes Hawkes, 2018; Siviero et al., 2024. All three frameworks have been extensively studied in the literature, offering sound basis for possible extensions. We here combine the frameworks of Markov Decision Processes, autonomous dynamics via flow equations, and Hawkes processes to model spontaneous changes. This results in the model:

$${\bf M} = ((\mathcal{X}, \Theta), \mathcal{A}, (F, \Lambda, Q), c)$$

Where:

Light formalization

The process evolves in a state space \(\mathcal{X} \subseteq \mathbb{R}^d\) with \(d \in \mathbb{N}\), where the position at time \(t \in \mathbb{R}_+\) is denoted \(X_t \in \mathcal{X}\).

Autonomous Dynamics

Between jump times \(\tau_n, \tau_{n+1}\), the system follows:

$$\frac{dX_t}{dt} = F(\theta_n, X_t), \quad t \in (\tau_n, \tau_{n+1})$$

The solution flow \(\Phi_n\) satisfies:

$$X_t = \Phi_n(X_{\tau_n}, t - \tau_n), \quad t \in (\tau_n, \tau_{n+1})$$

Jump Dynamics

At jump times \(\tau=\tau_{n+1}\), the system transitions to a new position \(X_\tau\) and mode \(\theta_{n+1}\). Jump types include:

$$\lambda(x, t) = \mu(x - X_t) + \sum_{(x_n, t_n) \in \mathcal{H}_t} \phi(x - x_n, t - t_n)$$

Eventually, the next change occurs at \( \tau_{n+1} = \min\{\tau^\natural,\tau^\flat,\tau^\sharp\} \), and the system transits to a new mode according to

$$ \theta_{n+1}\sim Q(\theta_n,X_{\tau},a_{n+1}), \quad \text{with extended action }\,a_{n+1}\in \mathcal{A}\cup \{\natural,\flat\}$$

Cost Function

The instantaneous cost is defined as \(c(X_t)\) for \(t \notin \{\tau_1,\dots,\tau_n,\dots\}\) and \(c(X_\tau) + c_{a_\tau}\) at a any jump time \(\tau\).

Objective and Goals

The main objectives of this work are as follows:

Example: Pest Infestation Application: Illustrative Example

This example models the dynamics of an agroecosystem using a state-space approach. The system is represented by four key state variables, each normalized to lie within the range [0, 1]: growth (\(x_1\)), growth nutrients (\(x_2\)), health nutrients (\(x_3\)), and pest infestation (\(x_4\)).

Intuitively, the model captures the following interactions:

Together, these dynamics highlight the feedback loops between plant growth, nutrient levels, and pest infestation, illustrating the challenges in maintaining a healthy agroecosystem.

The free dynamics of the system, in the absence of external interventions (e.g., fertilizers or pesticides), can be mathematically defined as follows:

\[ F(X) = \begin{bmatrix} a x_2 - e x_4 \\ -b x_1 \\ -c x_1 \\ d(1 - x_3) \end{bmatrix} \]

Here, \(a, b, c, d, e\) are constants that define the specific rates of interaction between the variables. This model provides a foundation for simulating the agroecosystem's evolution and exploring strategies to optimize growth while managing nutrient levels and controlling pests.

To model the stochastic nature of pest infestations, we incorporate a Hawkes process into the system. Specifically, the infestation component (\(x_4\)) of the state \(X_t\) is influenced by the intensity function \(\lambda(x, t)\), which governs the occurrence of pest-related events over time.

The intensity function \(\lambda(x, t)\) is defined as:

When a pest infestation event occurs, the state \(X_t\) is updated with a small increase (\(\Delta\)) in the infestation component (\(x_4\)), reflecting the immediate impact of the event on the system.

This extended model captures the interplay between deterministic system dynamics and stochastic pest infestations, providing a framework for simulating and analyzing scenarios where pest occurrences depend on system conditions and past events.

Bibliography

Host Institution and Supervision

The proejct will be hosted at Centre Inria de l'Université de Lille, in the Scool team. Scool (Sequential COntinual and Online Learning) focuses on the study of sequential decision-making under uncertainty.