API Reference: Hidden Markov Model (HMM)¶

The HMM class provides a complete implementation of Hidden Markov Models with Gaussian emissions for regime detection, time series modeling, and state inference.

Quick Start¶

from optimizr import HMM
import numpy as np

# Create model with 2 hidden states (e.g., bull/bear market)
model = HMM(n_states=2)

# Train on returns data
returns = np.random.randn(1000, 1)  # Should be 2D: (n_samples, n_features)
model.fit(returns, n_iterations=100, tolerance=1e-6)

# Decode most likely state sequence (Viterbi)
states = model.predict(returns)

# Compute log-likelihood (for model comparison)
logp = model.score(returns)

print(f"Log-likelihood: {logp:.2f}")
print(f"Decoded states: {states[:10]}")

Constructor¶

`HMM(n_states: int)`¶

Creates a new Hidden Markov Model with Gaussian emissions.

Parameters:

n_states (int): Number of hidden states/regimes. Common choices:
- n_states=2: Binary regime (e.g., bull/bear, high/low volatility)
- n_states=3: Three-regime model (e.g., bull/sideways/bear)
- n_states>3: Fine-grained regime detection (requires more data)

Returns:

HMM object with random initialization

Initialization:

Transition matrix \(A\): Uniform with slight self-transition bias
Initial state distribution \(\pi\): Uniform
Emission parameters (means \(\mu_i\), covariances \(\Sigma_i\)): From K-means clustering

Example:

# Binary regime model
hmm_2 = HMM(n_states=2)

# Three-regime model for more nuanced detection
hmm_3 = HMM(n_states=3)

When to use:

n_states=2: Most common, sufficient for many applications
Higher n_states: When you have strong prior belief in multiple regimes and sufficient data (>1000 samples per state)

Methods¶

`fit(X, n_iterations=100, tolerance=1e-6, n_init=1, random_state=None)`¶

Trains the HMM on observed data using the Baum-Welch (Expectation-Maximization) algorithm.

Parameters:

X (np.ndarray): Training data of shape (n_samples, n_features)
- For univariate time series: reshape to (n, 1) with X.reshape(-1, 1)
- For multivariate: pass directly as (n, d) where d is feature dimension
n_iterations (int, default=100): Maximum number of EM iterations
- Typical range: 50-200
- More iterations → better convergence but slower training
tolerance (float, default=1e-6): Convergence threshold
- Algorithm stops when log-likelihood improvement < tolerance
- Typical range: 1e-8 to 1e-4
- Smaller values → tighter convergence but more iterations
n_init (int, default=1): Number of random initializations
- The best model (highest log-likelihood) is kept
- Recommended: 5-10 for production models (helps avoid local minima)
random_state (int, optional): Random seed for reproducibility

Returns:

self: The fitted HMM object (for method chaining)

Algorithm: Baum-Welch (EM for HMMs)

The Baum-Welch algorithm iteratively refines model parameters:

E-step: Compute state occupation probabilities
- Forward pass: \(\alpha_t(i) = P(O_1, \ldots, O_t, S_t = i \mid \lambda)\)
- Backward pass: \(\beta_t(i) = P(O_{t+1}, \ldots, O_T \mid S_t = i, \lambda)\)
- State probabilities: \(\gamma_t(i) = \frac{\alpha_t(i)\beta_t(i)}{\sum_j \alpha_t(j)\beta_t(j)}\)
- Transition probabilities: \(\xi_t(i,j) = \frac{\alpha_t(i)a_{ij}b_j(O_{t+1})\beta_{t+1}(j)}{\sum_{i,j}\alpha_t(i)a_{ij}b_j(O_{t+1})\beta_{t+1}(j)}\)
M-step: Update model parameters
- Initial probabilities: \(\pi_i = \gamma_1(i)\)
- Transition matrix: \(a_{ij} = \frac{\sum_{t=1}^{T-1}\xi_t(i,j)}{\sum_{t=1}^{T-1}\gamma_t(i)}\)
- Emission means: \(\mu_i = \frac{\sum_{t=1}^T \gamma_t(i) O_t}{\sum_{t=1}^T \gamma_t(i)}\)
- Emission covariances: \(\Sigma_i = \frac{\sum_{t=1}^T \gamma_t(i)(O_t - \mu_i)(O_t - \mu_i)^T}{\sum_{t=1}^T \gamma_t(i)}\)
Convergence: Repeat until log-likelihood change < tolerance

Example:

import numpy as np
from optimizr import HMM

# Simulate two-regime data
np.random.seed(42)
n = 2000

# Regime 1: low volatility (first 1000 samples)
regime1 = np.random.normal(0.0, 0.5, 1000)
# Regime 2: high volatility (last 1000 samples)
regime2 = np.random.normal(0.0, 2.0, 1000)
data = np.concatenate([regime1, regime2]).reshape(-1, 1)

# Train HMM
hmm = HMM(n_states=2)
hmm.fit(data, n_iterations=200, tolerance=1e-6, n_init=5)

print("Training complete")

Convergence diagnostics:

# Plot log-likelihood over iterations (requires storing history)
# Check if converged before max_iter
# Verify parameters make sense (e.g., distinct means for each state)

Typical training time:

1000 samples, 2 states, 100 iterations: ~50-100ms
10000 samples, 3 states, 200 iterations: ~500ms-1s

`predict(X)`¶

Decodes the most likely sequence of hidden states using the Viterbi algorithm.

Parameters:

X (np.ndarray): Observation sequence of shape (n_samples, n_features)
- Must match feature dimension used in fit()

Returns:

states (np.ndarray): Most likely state sequence of shape (n_samples,)
- Values are integers in range [0, n_states-1]

Algorithm: Viterbi

The Viterbi algorithm finds the globally optimal state sequence:

Initialization: \(\delta_1(i) = \pi_i \cdot b_i(O_1)\)
Recursion: \(\delta_t(j) = \max_i[\delta_{t-1}(i) \cdot a_{ij}] \cdot b_j(O_t)\)
Termination: \(P^* = \max_i[\delta_T(i)]\)
Backtracking: Trace back from \(\arg\max_i[\delta_T(i)]\) to recover state sequence

Complexity: \(O(T \cdot K^2)\) where \(T\) is sequence length, \(K\) is number of states

Example:

# After training (see fit() example)
states = hmm.predict(data)

# Analyze regime distribution
unique, counts = np.unique(states, return_counts=True)
for state, count in zip(unique, counts):
    print(f"State {state}: {count} samples ({count/len(states)*100:.1f}%)")

# Identify regime switches
switches = np.where(np.diff(states) != 0)[0]
print(f"Number of regime switches: {len(switches)}")

# Use for trading: buy in regime 0, sell in regime 1
current_state = states[-1]
if current_state == 0:
    print("Signal: BUY (low volatility regime)")
else:
    print("Signal: SELL (high volatility regime)")

Use cases:

Regime detection: Identify market states (bull/bear, high/low vol)
Trading signals: Generate buy/sell signals based on regime
Risk management: Adjust position size based on estimated regime
Anomaly detection: Flag unusual regime transitions

`score(X)`¶

Computes the log-likelihood of the observation sequence under the fitted model.

Parameters:

X (np.ndarray): Observation sequence of shape (n_samples, n_features)

Returns:

logp (float): Log-likelihood \(\log P(O \mid \lambda)\)

Algorithm: Forward Algorithm

The forward algorithm efficiently computes the likelihood:

Initialization: \(\alpha_1(i) = \pi_i \cdot b_i(O_1)\)
Induction: \(\alpha_t(j) = \left[\sum_{i=1}^K \alpha_{t-1}(i) \cdot a_{ij}\right] \cdot b_j(O_t)\)
Termination: \(P(O \mid \lambda) = \sum_{i=1}^K \alpha_T(i)\)

Numerical stability: Uses log-space computation with scaling to avoid underflow.

Example:

# Model comparison: which number of states fits best?
logp_scores = {}
for n_states in [2, 3, 4]:
    hmm = HMM(n_states=n_states)
    hmm.fit(data, n_iterations=100)
    logp = hmm.score(data)
    logp_scores[n_states] = logp
    print(f"{n_states} states: log-likelihood = {logp:.2f}")

# Higher log-likelihood is better (but watch for overfitting)
best_k = max(logp_scores, key=logp_scores.get)
print(f"Best model: {best_k} states")

# Use BIC for model selection (penalizes complexity)
def bic(logp, n_params, n_samples):
    return -2 * logp + n_params * np.log(n_samples)

n_samples = len(data)
for n_states in [2, 3, 4]:
    n_params = n_states**2 + 2*n_states  # Approx: A, pi, means, variances
    bic_score = bic(logp_scores[n_states], n_params, n_samples)
    print(f"{n_states} states: BIC = {bic_score:.2f}")

Use cases:

Model selection: Compare models with different n_states using BIC/AIC
Convergence monitoring: Track log-likelihood during training
Outlier detection: Low likelihood → data doesn’t match model
Model reliability: Higher likelihood → better fit (but watch overfitting)

Complete Example: Market Regime Detection¶

Here’s a complete workflow for detecting market regimes in financial data:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from optimizr import HMM

# 1. Load financial data (example: S&P 500 returns)
# In practice, load from your data source
np.random.seed(42)
n_samples = 2000

# Simulate returns with regime changes
returns = []
for i in range(n_samples):
    if i < 500:  # Bull market
        returns.append(np.random.normal(0.001, 0.01))
    elif i < 1000:  # Correction
        returns.append(np.random.normal(-0.002, 0.02))
    elif i < 1500:  # Recovery
        returns.append(np.random.normal(0.001, 0.015))
    else:  # Bear market
        returns.append(np.random.normal(-0.001, 0.025))

returns = np.array(returns).reshape(-1, 1)

# 2. Train HMM with multiple initializations
print("Training HMM...")
hmm = HMM(n_states=3)  # 3 regimes: bull, neutral, bear
hmm.fit(returns, n_iterations=200, tolerance=1e-6, n_init=10)

# 3. Decode regimes
states = hmm.predict(returns)

# 4. Analyze regimes
print("\nRegime Statistics:")
for state_id in range(3):
    mask = (states == state_id)
    state_returns = returns[mask]
    mean_ret = np.mean(state_returns)
    std_ret = np.std(state_returns)
    count = np.sum(mask)
    print(f"State {state_id}:")
    print(f"  Count: {count} ({count/len(returns)*100:.1f}%)")
    print(f"  Mean return: {mean_ret:.4f}")
    print(f"  Volatility: {std_ret:.4f}")
    print(f"  Sharpe (annualized): {mean_ret/std_ret * np.sqrt(252):.2f}")

# 5. Identify regime switches
switches = np.where(np.diff(states) != 0)[0] + 1
print(f"\nRegime switches: {len(switches)}")
print(f"Average regime duration: {len(returns)/len(switches):.1f} days")

# 6. Visualize regimes
plt.figure(figsize=(14, 8))

# Plot returns with regime colors
plt.subplot(3, 1, 1)
colors = ['green', 'yellow', 'red']
for state_id in range(3):
    mask = (states == state_id)
    plt.scatter(np.where(mask)[0], returns[mask], 
                c=colors[state_id], alpha=0.5, s=10,
                label=f'State {state_id}')
plt.ylabel('Returns')
plt.title('Returns colored by HMM regime')
plt.legend()
plt.grid(True, alpha=0.3)

# Plot cumulative returns per regime
plt.subplot(3, 1, 2)
cumulative = np.cumsum(returns.flatten())
plt.plot(cumulative, color='black', linewidth=1)
for switch in switches:
    plt.axvline(switch, color='red', alpha=0.3, linestyle='--')
plt.ylabel('Cumulative Returns')
plt.title('Cumulative returns with regime switches')
plt.grid(True, alpha=0.3)

# Plot state sequence
plt.subplot(3, 1, 3)
plt.plot(states, linewidth=0.5)
plt.ylabel('State')
plt.xlabel('Time')
plt.title('Decoded state sequence')
plt.yticks(range(3))
plt.grid(True, alpha=0.3)

plt.tight_layout()
plt.savefig('hmm_regime_detection.png', dpi=150)
print("\nPlot saved to hmm_regime_detection.png")

# 7. Generate trading signals
current_state = states[-1]
state_returns = returns[states == current_state]
expected_return = np.mean(state_returns)
expected_vol = np.std(state_returns)

print(f"\nCurrent regime: State {current_state}")
print(f"Expected return: {expected_return:.4f}")
print(f"Expected volatility: {expected_vol:.4f}")

if expected_return > 0.0005:
    signal = "BUY"
    position_size = 1.0
elif expected_return < -0.0005:
    signal = "SELL"
    position_size = 0.0
else:
    signal = "HOLD"
    position_size = 0.5

print(f"Trading signal: {signal}")
print(f"Recommended position size: {position_size*100:.0f}%")

Advanced Usage¶

Model Selection with BIC¶

Choose the optimal number of states using Bayesian Information Criterion:

from optimizr import HMM
import numpy as np

def bic_score(hmm, X):
    """Compute BIC for HMM: BIC = -2*log(L) + k*log(n)"""
    logp = hmm.score(X)
    n_states = hmm.n_states  # Assuming this attribute exists
    n_features = X.shape[1]
    # Parameters: transition matrix + initial prob + means + covariances
    k = n_states**2 + n_states + n_states*n_features + n_states*n_features**2
    n = X.shape[0]
    return -2*logp + k*np.log(n)

# Test different numbers of states
results = []
for n_states in range(2, 6):
    hmm = HMM(n_states=n_states)
    hmm.fit(data, n_iterations=100, n_init=5)
    bic = bic_score(hmm, data)
    logp = hmm.score(data)
    results.append((n_states, logp, bic))
    print(f"{n_states} states: log-likelihood={logp:.2f}, BIC={bic:.2f}")

# Best model has lowest BIC
best_n_states = min(results, key=lambda x: x[2])[0]
print(f"\nBest model: {best_n_states} states")

Integration with Optimal Control¶

Combine HMM regime detection with regime-specific optimal control:

from optimizr import HMM, estimate_ou_params_py, solve_hjb_py

# 1. Detect regimes with HMM
returns = np.diff(spread)
hmm = HMM(n_states=2)
hmm.fit(returns.reshape(-1, 1), n_iterations=100)
regimes = hmm.predict(returns.reshape(-1, 1))

# 2. Estimate OU parameters per regime
thresholds = {}
for regime_id in range(2):
    mask = (regimes == regime_id)
    spread_regime = spread[1:][mask]  # Align with returns
    
    # Estimate OU parameters
    kappa, theta, sigma, half_life = estimate_ou_params_py(
        spread_regime, dt=1/252
    )
    
    # Solve HJB for regime-specific thresholds
    lower, upper, _, _ = solve_hjb_py(
        kappa=kappa, theta=theta, sigma=sigma,
        rho=0.04, transaction_cost=0.001
    )
    
    thresholds[regime_id] = (lower, upper)
    print(f"Regime {regime_id}: κ={kappa:.2f}, thresholds=({lower:.3f}, {upper:.3f})")

# 3. Apply regime-aware trading
current_regime = regimes[-1]
lower, upper = thresholds[current_regime]
current_spread = spread[-1]

if current_spread < lower:
    action = "BUY"
elif current_spread > upper:
    action = "SELL"
else:
    action = "HOLD"

print(f"\nCurrent regime: {current_regime}")
print(f"Current spread: {current_spread:.3f}")
print(f"Thresholds: ({lower:.3f}, {upper:.3f})")
print(f"Action: {action}")

Multivariate HMM¶

For multiple features (e.g., returns + volume + volatility):

# Prepare multivariate data
returns = np.random.randn(1000, 1)
volume = np.random.randn(1000, 1)
volatility = np.random.randn(1000, 1)

# Stack features
X = np.hstack([returns, volume, volatility])  # Shape: (1000, 3)

# Train multivariate HMM
hmm = HMM(n_states=3)
hmm.fit(X, n_iterations=150)

# Decode regimes based on all features
states = hmm.predict(X)

# Each state now captures joint patterns in returns, volume, and volatility

Best Practices¶

Data Preparation¶

Scaling: Standardize features to similar scales

from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

Stationarity: Ensure time series is stationary (use returns, not prices)
```
returns = np.diff(np.log(prices))  # Log returns
```

Outlier handling: Winsorize extreme values

from scipy.stats import mstats
X_winsorized = mstats.winsorize(X, limits=[0.01, 0.01])

Model Training¶

Multiple initializations: Use n_init=5-10 to avoid local minima
Convergence: Monitor log-likelihood, ensure convergence before max_iter
Validation: Use held-out data to verify generalization

Parameter Selection¶

Number of states: Start with 2-3, increase if necessary
Iterations: 100-200 typically sufficient
Tolerance: 1e-6 for production, 1e-4 for quick experimentation

Practical Tips¶

Minimum data: Use at least 500 samples per state (1000+ for 2-state model)
Regime persistence: Check average regime duration is meaningful (not too short)
Physical interpretation: Verify decoded regimes make sense (e.g., high-vol state has higher variance)
Robustness: Test on multiple time periods, verify stability

Troubleshooting¶

Model not converging¶

Symptom: Log-likelihood oscillating or not improving
Fix: Increase n_iterations; try different n_init; check data scaling

All samples assigned to one state¶

Symptom: predict() returns all 0s or all 1s
Fix: Reduce n_states; check data has sufficient variation; verify stationarity

Unrealistic regime switches¶

Symptom: State changes every few samples
Fix: Add transition probability constraints (requires model extension); increase minimum regime duration

Poor out-of-sample performance¶

Symptom: High in-sample log-likelihood but poor predictions on new data
Fix: Reduce n_states (overfitting); use cross-validation; add regularization

Performance Characteristics¶

Computational Complexity¶

Training (Baum-Welch): \(O(I \cdot T \cdot K^2)\)
- \(I\): number of iterations (~100-200)
- \(T\): sequence length
- \(K\): number of states
Prediction (Viterbi): \(O(T \cdot K^2)\)
Scoring (Forward): \(O(T \cdot K^2)\)

Memory Requirements¶

Model parameters: \(O(K^2 + K \cdot d^2)\) where \(d\) is feature dimension
Forward/backward matrices: \(O(T \cdot K)\)

Typical Runtimes (on modern CPU)¶

Train (1000 samples, 2 states, 100 iter): ~50-100ms
Train (10000 samples, 3 states, 200 iter): ~500ms-1s
Predict (1000 samples, 2 states): ~5-10ms
Score (1000 samples, 2 states): ~5-10ms

References¶

Hidden Markov Models¶

Rabiner, L. R. (1989). A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2), 257-286.
Murphy, K. P. (2012). Machine Learning: A Probabilistic Perspective. MIT Press. (Chapter 17: Markov and hidden Markov models)

Financial Applications¶

Guidolin, M., & Timmermann, A. (2008). International asset allocation under regime switching, skew, and kurtosis preferences. The Review of Financial Studies, 21(2), 889-935.
Nystrup, P., Madsen, H., & Lindström, E. (2015). Stylised facts of financial time series and hidden Markov models in continuous time. Quantitative Finance, 15(9), 1531-1541.
Ang, A., & Bekaert, G. (2002). Regime switches in interest rates. Journal of Business & Economic Statistics, 20(2), 163-182.

Algorithms¶

Forney, G. D. (1973). The Viterbi algorithm. Proceedings of the IEEE, 61(3), 268-278.
Baum, L. E., Petrie, T., Soules, G., & Weiss, N. (1970). A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains. The Annals of Mathematical Statistics, 41(1), 164-171.

API Reference: Hidden Markov Model (HMM)¶

Quick Start¶

Constructor¶

HMM(n_states: int)¶

Methods¶

fit(X, n_iterations=100, tolerance=1e-6, n_init=1, random_state=None)¶

predict(X)¶

score(X)¶

Complete Example: Market Regime Detection¶

Advanced Usage¶

Model Selection with BIC¶

Integration with Optimal Control¶

Multivariate HMM¶

Best Practices¶

Data Preparation¶

Model Training¶

Parameter Selection¶

Practical Tips¶

Troubleshooting¶

Model not converging¶

All samples assigned to one state¶

Unrealistic regime switches¶

Poor out-of-sample performance¶

Performance Characteristics¶

Computational Complexity¶

Memory Requirements¶

Typical Runtimes (on modern CPU)¶

References¶

Hidden Markov Models¶

Financial Applications¶

Algorithms¶

See Also¶

`HMM(n_states: int)`¶

`fit(X, n_iterations=100, tolerance=1e-6, n_init=1, random_state=None)`¶

`predict(X)`¶

`score(X)`¶