Hidden Markov Models¶

Hidden Markov Models (HMMs) are probabilistic models for sequential data where observations are generated by a system that transitions between hidden (latent) states. Rather than observing the states directly, we see only outputs that depend probabilistically on each state.

This module provides a high-performance Gaussian HMM implementation with a Rust backend, featuring the Forward-Backward algorithm, Viterbi decoding, and Baum-Welch parameter learning.

Mathematical Foundations¶

Model Definition¶

A Hidden Markov Model \(\lambda\) is defined by three components:

1. States¶

Number of states: \(N\)
State at time \(t\): \(q_t \in \{1, 2, \ldots, N\}\)
State sequence: \(Q = q_1, q_2, \ldots, q_T\)

2. Observations¶

Observation at time \(t\): \(o_t \in \mathbb{R}\) (continuous)
Observation sequence: \(O = o_1, o_2, \ldots, o_T\)

3. Parameters¶

Initial State Distribution:

\[ \pi_i = P(q_1 = i), \quad 1 \leq i \leq N \]

\[ \sum_{i=1}^N \pi_i = 1 \]

State Transition Matrix:

\[ a_{ij} = P(q_{t+1} = j \mid q_t = i), \quad 1 \leq i,j \leq N \]

\[ \sum_{j=1}^N a_{ij} = 1 \quad \text{for all } i \]

Emission Distribution (Gaussian):

\[ b_j(o_t) = P(o_t \mid q_t = j) = \mathcal{N}(o_t; \mu_j, \sigma_j^2) \]

\[ b_j(o_t) = \frac{1}{\sigma_j\sqrt{2\pi}} \exp\left(-\frac{(o_t - \mu_j)^2}{2\sigma_j^2}\right) \]

Complete model: \(\lambda = (\boldsymbol{\pi}, \mathbf{A}, \mathbf{B})\)

The Markov Property¶

First-Order Markov Assumption¶

The future state depends only on the current state, not the history:

\[ P(q_{t+1} \mid q_1, q_2, \ldots, q_t) = P(q_{t+1} \mid q_t) \]

Output Independence¶

Observations are conditionally independent given the state:

\[ P(o_t \mid q_1, \ldots, q_T, o_1, \ldots, o_{t-1}, o_{t+1}, \ldots, o_T) = P(o_t \mid q_t) \]

The Three Fundamental Problems¶

HMMs are used to solve three fundamental problems:

Problem	Given	Find	Solution
Evaluation	Model \(\lambda\), observations \(O\)	\(P(O \mid \lambda)\)	Forward Algorithm
Decoding	Model \(\lambda\), observations \(O\)	Most likely state sequence \(Q^*\)	Viterbi Algorithm
Learning	Observations \(O\)	Optimal parameters \(\lambda^*\)	Baum-Welch Algorithm

Forward Algorithm¶

Computes \(P(O \mid \lambda)\) efficiently using dynamic programming.

Forward Variable¶

\[ \alpha_t(i) = P(o_1, o_2, \ldots, o_t, q_t = i \mid \lambda) \]

The probability of observing the first \(t\) observations AND being in state \(i\) at time \(t\).

Algorithm¶

Initialization (\(t = 1\)):

\[ \alpha_1(i) = \pi_i \cdot b_i(o_1), \quad 1 \leq i \leq N \]

Recursion (\(1 \leq t < T\)):

\[ \alpha_{t+1}(j) = \left[\sum_{i=1}^N \alpha_t(i) \cdot a_{ij}\right] \cdot b_j(o_{t+1}) \]

Termination:

\[ P(O \mid \lambda) = \sum_{i=1}^N \alpha_T(i) \]

Complexity¶

Metric	Value
Time	\(O(N^2 T)\)
Space	\(O(N T)\)
Without DP	\(O(N^T)\) — exponential!

Backward Algorithm¶

Alternative computation for completeness and use in Baum-Welch.

Backward Variable¶

\[ \beta_t(i) = P(o_{t+1}, o_{t+2}, \ldots, o_T \mid q_t = i, \lambda) \]

Initialization (\(t = T\)):

\[ \beta_T(i) = 1, \quad 1 \leq i \leq N \]

Recursion (\(t = T-1, T-2, \ldots, 1\)):

\[ \beta_t(i) = \sum_{j=1}^N a_{ij} \cdot b_j(o_{t+1}) \cdot \beta_{t+1}(j) \]

Viterbi Algorithm¶

Finds the single most likely state sequence given observations.

Objective¶

\[ Q^* = \arg\max_Q P(Q \mid O, \lambda) = \arg\max_Q P(Q, O \mid \lambda) \]

Viterbi Variable¶

\[ \delta_t(i) = \max_{q_1, \ldots, q_{t-1}} P(q_1, \ldots, q_{t-1}, q_t = i, o_1, \ldots, o_t \mid \lambda) \]

The maximum probability of any path ending in state \(i\) at time \(t\).

Algorithm¶

Initialization (\(t = 1\)):

\[ \delta_1(i) = \pi_i \cdot b_i(o_1) \]

\[ \psi_1(i) = 0 \]

Recursion (\(2 \leq t \leq T\)):

\[ \delta_t(j) = \max_{1 \leq i \leq N} \left[\delta_{t-1}(i) \cdot a_{ij}\right] \cdot b_j(o_t) \]

\[ \psi_t(j) = \arg\max_{1 \leq i \leq N} \left[\delta_{t-1}(i) \cdot a_{ij}\right] \]

Termination:

\[ P^* = \max_{1 \leq i \leq N} \delta_T(i) \]

\[ q_T^* = \arg\max_{1 \leq i \leq N} \delta_T(i) \]

Backtracking (\(t = T-1, T-2, \ldots, 1\)):

\[ q_t^* = \psi_{t+1}(q_{t+1}^*) \]

Baum-Welch Algorithm¶

An Expectation-Maximization (EM) algorithm for learning HMM parameters from data.

Auxiliary Variables¶

State occupation probability:

\[ \gamma_t(i) = P(q_t = i \mid O, \lambda) = \frac{\alpha_t(i) \cdot \beta_t(i)}{\sum_{j=1}^N \alpha_t(j) \cdot \beta_t(j)} \]

Transition probability:

\[ \xi_t(i,j) = P(q_t = i, q_{t+1} = j \mid O, \lambda) \]

\[ = \frac{\alpha_t(i) \cdot a_{ij} \cdot b_j(o_{t+1}) \cdot \beta_{t+1}(j)}{\sum_{i=1}^N \sum_{j=1}^N \alpha_t(i) \cdot a_{ij} \cdot b_j(o_{t+1}) \cdot \beta_{t+1}(j)} \]

E-Step¶

Compute \(\gamma_t(i)\) and \(\xi_t(i,j)\) for all \(t\), \(i\), \(j\) using current parameters.

M-Step¶

Update parameters to maximize expected log-likelihood:

Initial state probabilities:

\[ \bar{\pi}_i = \gamma_1(i) \]

Transition probabilities:

\[ \bar{a}_{ij} = \frac{\sum_{t=1}^{T-1} \xi_t(i,j)}{\sum_{t=1}^{T-1} \gamma_t(i)} \]

Emission parameters (Gaussian):

\[ \bar{\mu}_j = \frac{\sum_{t=1}^T \gamma_t(j) \cdot o_t}{\sum_{t=1}^T \gamma_t(j)} \]

\[ \bar{\sigma}_j^2 = \frac{\sum_{t=1}^T \gamma_t(j) \cdot (o_t - \bar{\mu}_j)^2}{\sum_{t=1}^T \gamma_t(j)} \]

Convergence¶

Iterate E-step and M-step until:

\[ |L(\lambda^{(k+1)}) - L(\lambda^{(k)})| < \epsilon \]

where \(L(\lambda) = \log P(O \mid \lambda)\) is the log-likelihood.

Properties:

Guaranteed to converge to a local maximum
May converge to different solutions depending on initialization
Multiple random restarts recommended

Numerical Stability¶

Scaling¶

Raw probabilities underflow for long sequences. Use scaling factors:

\[ c_t = \frac{1}{\sum_{i=1}^N \alpha_t(i)} \]

Scaled forward variables:

\[ \hat{\alpha}_t(i) = c_t \cdot \alpha_t(i) \]

Log-Space Computation¶

For Viterbi, work in log-space:

\[ \log \delta_t(j) = \max_{1 \leq i \leq N} \left[\log \delta_{t-1}(i) + \log a_{ij}\right] + \log b_j(o_t) \]

Use log-sum-exp for stable additions:

\[ \log(e^a + e^b) = \max(a,b) + \log(1 + e^{-|a-b|}) \]

Model Selection¶

Number of States¶

Choose the number of states \(N\) using information criteria:

Criterion	Formula	Description
AIC	\(-2\log L + 2k\)	Akaike Information Criterion
BIC	\(-2\log L + k\log n\)	Bayesian Information Criterion

where \(k\) is the number of parameters and \(n\) is the sample size.

Lower values indicate better models (penalized for complexity).

Python API¶

Basic Usage¶

import numpy as np
from optimizr import HMM

# Generate synthetic regime data
np.random.seed(42)
returns = np.concatenate([
    np.random.normal(0.01, 0.02, 500),   # Bull regime
    np.random.normal(-0.015, 0.03, 500), # Bear regime
])

# Fit a 2-state HMM
model = HMM(n_states=2)
model.fit(returns, n_iterations=100)

# Decode the most likely state sequence
states = model.predict(returns)
print("State counts:", np.unique(states, return_counts=True))

Expected output:

State counts: (array([0, 1]), array([498, 502]))

Model Inspection¶

# Examine learned parameters
print("Initial distribution:", model.pi)
print("Transition matrix:\n", model.A)
print("Emission means:", model.means)
print("Emission stds:", model.stds)

# Compute log-likelihood
ll = model.score(returns)
print(f"Log-likelihood: {ll:.2f}")

Expected output:

Initial distribution: [0.95 0.05]
Transition matrix:
 [[0.992 0.008]
  [0.010 0.990]]
Emission means: [ 0.0098 -0.0152]
Emission stds: [0.0198 0.0301]
Log-likelihood: 2847.31

Feature Engineering for HMM¶

from optimizr import prepare_for_hmm_py

# Prepare features from price data with rolling statistics
features = prepare_for_hmm_py(
    prices,
    lag_periods=[1, 5, 20],  # multi-scale lookback
)

# Fit HMM with richer features
hmm = HMM(n_states=3).fit(features, n_iterations=120)

Use rolling statistics from timeseries_utils as additional features for richer regime classification:

from optimizr import rolling_hurst, rolling_halflife

hurst = rolling_hurst(prices, window=50)
halflife = rolling_halflife(prices, window=50)
features = np.column_stack([returns, hurst, halflife])

hmm = HMM(n_states=3).fit(features, n_iterations=100)

Visualization¶

State Sequence Plot¶

import matplotlib.pyplot as plt

fig, axes = plt.subplots(2, 1, figsize=(12, 6), sharex=True)

# Price/returns
axes[0].plot(returns, alpha=0.7)
axes[0].set_ylabel('Returns')
axes[0].set_title('Returns Time Series')

# State sequence
axes[1].plot(states, drawstyle='steps-post', color='orange')
axes[1].set_ylabel('State')
axes[1].set_xlabel('Time')
axes[1].set_title('Decoded State Sequence')
axes[1].set_yticks([0, 1])
axes[1].set_yticklabels(['Bull', 'Bear'])

plt.tight_layout()
plt.savefig('hmm_states.png', dpi=150)

Generated plot

Output:

This visualization creates a two-panel plot:

Top panel: Returns time series showing price movements
Bottom panel: Decoded state sequence highlighting regime transitions (e.g., Bull/Bear markets)

For complete examples with regime detection on real market data, see the HMM Tutorial notebook.

Transition Diagram¶

        stateDiagram-v2
    direction LR
    [*] --> Normal
    Bull : 📈 Bull Market
    Normal : 📊 Normal Regime
    Bear : 📉 Bear Market
    Bull --> Bull : a₁₁ (self)
    Bull --> Normal : a₁₂
    Normal --> Bull : a₂₁
    Normal --> Normal : a₂₂ (self)
    Normal --> Bear : a₂₃
    Bear --> Normal : a₃₂
    Bear --> Bear : a₃₃ (self)

Code example — build and visualize the transition graph programmatically:

import networkx as nx

G = nx.DiGraph()
for i in range(model.n_states):
    for j in range(model.n_states):
        if model.A[i, j] > 0.01:  # threshold small probabilities
            G.add_edge(f"State {i}", f"State {j}", weight=model.A[i, j])

pos = nx.spring_layout(G)
edge_labels = {(u, v): f"{d['weight']:.2f}" for u, v, d in G.edges(data=True)}

plt.figure(figsize=(8, 6))
nx.draw(G, pos, with_labels=True, node_size=2000, node_color='lightblue')
nx.draw_networkx_edge_labels(G, pos, edge_labels=edge_labels)
plt.title('HMM Transition Diagram')
plt.savefig('hmm_transitions.png', dpi=150)

Output:

Generates a directed graph visualization showing:

Nodes: Hidden states (e.g., State 0, State 1)
Edges: Transition probabilities between states (labeled with probability values)
Only transitions with probability > 0.01 are shown for clarity

📓 Full Tutorial: Explore 01_hmm_tutorial.ipynb for hands-on examples including:

Multi-state regime detection
Volatility clustering analysis
Mean-reversion regime identification
Viterbi decoding in financial time series

Applications¶

1. Financial Regime Detection¶

HMMs identify market regimes (bull/bear/sideways) from returns:

from optimizr import HMM

# Daily returns
hmm = HMM(n_states=3)  # bull, bear, high-volatility
hmm.fit(daily_returns)

regimes = hmm.predict(daily_returns)
regime_labels = ['Bull', 'Bear', 'High Vol']
current_regime = regime_labels[regimes[-1]]
print(f"Current market regime: {current_regime}")

2. Mean Reversion Trading¶

Use regime as a filter for mean-reversion strategies:

# Only trade mean-reversion when in low-volatility regime
regime = hmm.predict(recent_returns)[-1]
if regime == 0:  # low-vol regime
    # Execute mean-reversion strategy
    pass
else:
    # Hold or reduce position
    pass

3. Risk Management¶

Adjust position sizing based on detected regime:

regime_volatilities = {
    0: 0.02,  # low vol
    1: 0.03,  # medium vol
    2: 0.05,  # high vol
}

current_regime = hmm.predict(returns)[-1]
position_size = base_size * (target_vol / regime_volatilities[current_regime])

Performance¶

Benchmarks on Apple M1 (1000 observations):

States	Fit Time	Predict Time	Memory
2	45 ms	2 ms	0.5 MB
3	68 ms	3 ms	0.8 MB
5	124 ms	6 ms	1.4 MB
10	412 ms	18 ms	4.2 MB

Complexity: \(O(N^2 T)\) for both forward-backward and Viterbi.

Troubleshooting¶

Symptom	Cause	Fix
States not separating	Too few iterations	Increase `n_iterations` to 150–200
Converges to bad solution	Local optimum	Run multiple restarts with different seeds
Underflow errors	Long sequences	Enable log-space computation (automatic in Rust backend)
All observations in one state	Poor initialization	Try more states or normalize input data

Tips¶

Normalize inputs: Scale features to similar ranges for better learning.
Multiple restarts: Run 5–10 times with different seeds, keep best log-likelihood.
Feature engineering: Include lagged returns, rolling volatility, and technical indicators.
State interpretation: Higher \(\sigma\) usually indicates volatile/bearish regimes.

Implementation Notes¶

Uses Rust backend when available; falls back to Python if unavailable.
fit runs Baum-Welch; predict runs Viterbi.
Call score(X) to compute log-likelihood for model comparison.
Scaling is applied automatically to prevent numerical underflow.

References¶

Rabiner, L.R. (1989). “A tutorial on hidden Markov models and selected applications in speech recognition.” Proceedings of the IEEE, 77(2):257–286.
Baum, L.E. & Petrie, T. (1966). “Statistical inference for probabilistic functions of finite state Markov chains.” Annals of Mathematical Statistics, 37(6):1554–1563.
Viterbi, A. (1967). “Error bounds for convolutional codes and an asymptotically optimum decoding algorithm.” IEEE Trans. Info. Theory, 13(2):260–269.
Hamilton, J.D. (1989). “A new approach to the economic analysis of nonstationary time series and the business cycle.” Econometrica, 57(2):357–384.

Hidden Markov Models¶

Mathematical Foundations¶

Model Definition¶

1. States¶

2. Observations¶

3. Parameters¶

The Markov Property¶

First-Order Markov Assumption¶

Output Independence¶

The Three Fundamental Problems¶

Forward Algorithm¶

Forward Variable¶

Algorithm¶

Complexity¶

Backward Algorithm¶

Backward Variable¶

Viterbi Algorithm¶

Objective¶

Viterbi Variable¶

Algorithm¶

Baum-Welch Algorithm¶

Auxiliary Variables¶

E-Step¶

M-Step¶

Convergence¶

Numerical Stability¶

Scaling¶

Log-Space Computation¶

Model Selection¶

Number of States¶

Python API¶

Basic Usage¶

Model Inspection¶

Feature Engineering for HMM¶

Visualization¶

State Sequence Plot¶

Transition Diagram¶

Applications¶

1. Financial Regime Detection¶

2. Mean Reversion Trading¶

3. Risk Management¶

Performance¶

Troubleshooting¶

Tips¶

Implementation Notes¶

References¶

Related Topics¶