Differential Evolution¶

Differential Evolution (DE) is a population-based metaheuristic optimization algorithm introduced by Storn and Price (1997). It is particularly effective for continuous, non-convex, multimodal optimization problems where gradient information is unavailable or unreliable.

This module provides a high-performance Rust implementation with Python bindings, supporting multiple mutation strategies, adaptive parameter control (jDE), and parallel evaluation.

Mathematical Foundations¶

Problem Formulation¶

DE solves unconstrained (or box-constrained) minimization problems:

\[ \min_{\mathbf{x} \in \mathbb{R}^D} f(\mathbf{x}) \]

subject to box constraints:

\[ x_j \in [l_j, u_j], \quad j = 1, \ldots, D \]

DE is well-suited when:

\(f\) is continuous but non-differentiable
Multiple local minima exist
Gradient information is unavailable or expensive
Problem dimension is moderate (\(D < 100\))

Population¶

DE maintains a population of \(N_P\) candidate solutions:

\[ P_g = \{\mathbf{x}_{1,g}, \mathbf{x}_{2,g}, \ldots, \mathbf{x}_{N_P,g}\} \]

where \(g\) is the generation number and \(\mathbf{x}_{i,g} \in \mathbb{R}^D\).

Rule of thumb: \(N_P = 10 \times D\) where \(D\) is the problem dimension.

Main Loop¶

For each generation \(g = 0, 1, 2, \ldots\):

Mutation: Create mutant vectors by combining existing solutions
Crossover: Mix mutant with target vector to form trial vector
Selection: Keep better solution (greedy selection)

Mutation Strategies¶

The mutation operator creates a mutant vector \(\mathbf{v}_{i,g+1}\) from existing population members:

DE/rand/1 (Classic Strategy)¶

\[ \mathbf{v}_{i,g+1} = \mathbf{x}_{r_1,g} + F \cdot (\mathbf{x}_{r_2,g} - \mathbf{x}_{r_3,g}) \]

where:

\(r_1, r_2, r_3 \in \{1, \ldots, N_P\}\) are randomly chosen, distinct, and \(\neq i\)
\(F \in (0, 2]\) is the mutation factor (typically 0.5–1.0)

Interpretation: Start from a random population member \(\mathbf{x}_{r_1}\), move in direction given by the difference \((\mathbf{x}_{r_2} - \mathbf{x}_{r_3})\), scaled by \(F\).

Characteristics: Most explorative, good for diverse populations.

DE/best/1¶

\[ \mathbf{v}_{i,g+1} = \mathbf{x}_{\text{best},g} + F \cdot (\mathbf{x}_{r_1,g} - \mathbf{x}_{r_2,g}) \]

Advantage: Faster convergence toward the best-known solution.

Disadvantage: More likely to get stuck in local minima.

DE/current-to-best/1¶

\[ \mathbf{v}_{i,g+1} = \mathbf{x}_{i,g} + F \cdot (\mathbf{x}_{\text{best},g} - \mathbf{x}_{i,g}) + F \cdot (\mathbf{x}_{r_1,g} - \mathbf{x}_{r_2,g}) \]

Interpretation: Move current solution toward the best while also exploring.

Characteristics: Balanced exploration/exploitation.

DE/rand/2¶

\[ \mathbf{v}_{i,g+1} = \mathbf{x}_{r_1,g} + F \cdot (\mathbf{x}_{r_2,g} - \mathbf{x}_{r_3,g}) + F \cdot (\mathbf{x}_{r_4,g} - \mathbf{x}_{r_5,g}) \]

Characteristics: More disruptive, better for highly multimodal problems.

DE/best/2¶

\[ \mathbf{v}_{i,g+1} = \mathbf{x}_{\text{best},g} + F \cdot (\mathbf{x}_{r_1,g} - \mathbf{x}_{r_2,g}) + F \cdot (\mathbf{x}_{r_3,g} - \mathbf{x}_{r_4,g}) \]

Characteristics: Aggressive convergence to the best solution.

Crossover¶

After mutation, the trial vector \(\mathbf{u}_{i,g+1}\) is formed by mixing components from the mutant and the target vector.

Binomial Crossover¶

For each component \(j = 1, \ldots, D\):

\[\begin{split} u_{i,j,g+1} = \begin{cases} v_{i,j,g+1} & \text{if } \text{rand}(0,1) \leq CR \text{ or } j = j_{\text{rand}} \\ x_{i,j,g} & \text{otherwise} \end{cases} \end{split}\]

where:

\(CR \in [0, 1]\) is the crossover probability
\(j_{\text{rand}} \in \{1, \ldots, D\}\) ensures at least one component comes from the mutant

Effect: \(CR\) controls how much of the mutant vector is used.

CR Value	Effect
Low (0.1–0.3)	Less information exchange, slower convergence. Better for separable problems
High (0.7–0.9)	More information exchange, faster convergence. Better for non-separable problems
0.0	Pure mutation (except \(j_{\text{rand}}\))
1.0	Full crossover

Selection¶

Greedy selection (for minimization):

\[\begin{split} \mathbf{x}_{i,g+1} = \begin{cases} \mathbf{u}_{i,g+1} & \text{if } f(\mathbf{u}_{i,g+1}) \leq f(\mathbf{x}_{i,g}) \\ \mathbf{x}_{i,g} & \text{otherwise} \end{cases} \end{split}\]

Property: Population quality never decreases:

\[ f(\mathbf{x}_{\text{best},g+1}) \leq f(\mathbf{x}_{\text{best},g}) \]

Complete Algorithm¶

        flowchart TD
    A["🎲 Initialize Population\nx_i = l + rand · (u − l)"] --> B["📊 Evaluate Fitness\nf_i = f(x_i) for all i"]
    B --> C{{"g < max_iter?"}}
    C -->|Yes| D["Mutation\nv = x_r1 + F · (x_r2 − x_r3)"]
    D --> E["Crossover\nu_j = v_j if rand ≤ CR or j = j_rand\nelse u_j = x_j"]
    E --> F["Clip to bounds [l, u]"]
    F --> G{{"f(trial) ≤ f(target)?"}}
    G -->|"Yes — better"| H["✅ Accept trial\nx_i ← u_i"]
    G -->|"No — worse"| I["Keep current\nx_i unchanged"]
    H & I --> C
    C -->|No| J["🏆 Return x_best, f(x_best)"]

Parameter Selection Guidelines¶

Population Size (\(N_P\))¶

Size Category	Range	Use Case
Small	< 4D	Faster convergence; risk premature convergence. Simple unimodal problems
Medium	10D (default)	Good balance for most problems
Large	> 20D	Better exploration; slower convergence. Highly multimodal problems

Minimum: \(N_P \geq 4\) (needed for mutation with three distinct indices).

Mutation Factor (\(F\))¶

F Value	Effect
Low (0.4–0.6)	Fine-tuning, local search. Safer, less disruptive
High (0.8–1.2)	Exploration, global search. Escape local minima

Typical range: \(F \in [0.4, 1.0]\), default 0.8.

Crossover Probability (\(CR\))¶

CR Value	Effect
Low (0.1–0.3)	Best for separable problems
High (0.7–0.9)	Best for non-separable problems

Default: 0.7–0.9 for most problems.

Python API¶

Basic Usage¶

import numpy as np
from optimizr import differential_evolution

def rastrigin(x):
    """Multimodal benchmark function with many local minima."""
    A = 10
    return A * len(x) + sum(x**2 - A * np.cos(2 * np.pi * x))

best_x, best_fx = differential_evolution(
    objective_fn=rastrigin,
    bounds=[(-5.12, 5.12)] * 10,  # 10-dimensional problem
    strategy="best1",
    popsize=20,
    maxiter=500,
    adaptive=True,
)

print(f"Best fitness: {best_fx:.6f}")
print(f"Best solution: {best_x}")

Expected output:

Best fitness: 0.000042
Best solution: [ 0.00012 -0.00023  0.00018 ... ]

Configuration Options¶

from optimizr import DifferentialEvolution

de = DifferentialEvolution(
    bounds=[(-5, 5)] * 20,
    strategy="rand1",      # mutation strategy
    popsize=200,           # population size
    maxiter=1000,          # maximum generations
    F=0.8,                 # mutation factor
    CR=0.9,                # crossover probability
    tol=1e-8,              # convergence tolerance
    seed=42,               # reproducibility
)

result = de.minimize(sphere_function)
print(f"Converged in {result.nit} iterations")
print(f"Function evaluations: {result.nfev}")

Adaptive Control (jDE)¶

Optimiz-rs implements jDE (self-adaptive DE), where the parameters \(F\) and \(CR\) evolve with the population:

\[\begin{split} F_{i,g+1} = \begin{cases} F_l + \text{rand}(0,1) \cdot (F_u - F_l) & \text{if } \text{rand}(0,1) < \tau_1 \\ F_{i,g} & \text{otherwise} \end{cases} \end{split}\]

\[\begin{split} CR_{i,g+1} = \begin{cases} \text{rand}(0,1) & \text{if } \text{rand}(0,1) < \tau_2 \\ CR_{i,g} & \text{otherwise} \end{cases} \end{split}\]

Enable jDE:

de = DifferentialEvolution(
    bounds=[(-5, 5)] * 20,
    adaptive=True,   # enables jDE
    tau_F=0.1,       # probability of F mutation
    tau_CR=0.1,      # probability of CR mutation
)

Advantages:

No need to manually tune \(F\) and \(CR\)
Adapts to problem landscape during optimization
Generally robust across problem types

Parallel Evaluation (Rust Backend)¶

For pure-Rust objectives or when Python callbacks are not needed, enable data-parallel evaluation via Rayon:

from optimizr import parallel_differential_evolution_rust

result = parallel_differential_evolution_rust(
    objective="rastrigin",  # built-in benchmark
    dim=50,
    bounds=(-5.12, 5.12),
    popsize=500,
    maxiter=2000,
    n_threads=8,
)

print(f"Best fitness: {result.best_fitness:.8f}")

Speedup: Near-linear up to \(N_P\) processors for expensive objectives.

Convergence Analysis¶

Theoretical Properties¶

Global Convergence Theorem (Lampinen, 2001):

Under these sufficient conditions:

Population size \(N_P > 3\)
Mutation factor \(F > 0\)
At least one component crossed over (\(j_{\text{rand}}\))

DE is a global optimization method: any point can be reached with positive probability.

Diversity Measure¶

\[ D_g = \frac{1}{N_P D} \sum_{i=1}^{N_P} \sum_{j=1}^D |x_{i,j,g} - \bar{x}_{j,g}| \]

Diversity	Behavior
High	Exploration (global search)
Low	Exploitation (local search)

Empirical Budget¶

Rule of thumb: Budget \(10^4 \times D\) function evaluations for moderately difficult problems.

Performance Comparison¶

Algorithm	Gradient	Global	Constraints	Speed	Best For
DE	No	Yes	Box	Medium	Non-convex, continuous
Gradient Descent	Yes	No	Yes	Fast	Smooth, convex
Genetic Algorithm	No	Yes	Yes	Slow	Discrete, combinatorial
Particle Swarm	No	Yes	Box	Fast	Continuous, many dimensions
CMA-ES	No	Yes	Box	Fast	Continuous, noisy

Practical Tips¶

1. Start Simple¶

Use defaults: \(N_P = 10D\), \(F = 0.8\), \(CR = 0.7\), strategy="rand1".

2. Scale Variables¶

Normalize parameters to similar ranges for better performance.

3. Warm Start¶

If you have a good initial guess, seed the population around it.

4. Hybrid Approach¶

Use DE for global search, then a local optimizer for refinement:

# Global search with DE
best_x, _ = differential_evolution(f, bounds, maxiter=200)

# Local refinement with L-BFGS-B
from scipy.optimize import minimize
result = minimize(f, best_x, method='L-BFGS-B', bounds=bounds)

5. Monitor Convergence¶

Plot:

Best fitness vs. generation
Average population fitness vs. generation
Population diversity vs. generation

6. Restarts¶

If premature convergence detected, restart with new random population.

Troubleshooting¶

Symptom	Cause	Fix
Slow convergence	\(F\) or \(CR\) too low	Increase \(F\) to 0.8, \(CR\) to 0.9
Premature convergence	Population too small	Increase \(N_P\) to 15–20D
Oscillating fitness	\(F\) too high	Decrease \(F\) to 0.5–0.6
Stuck in local minimum	Using `best1` strategy	Switch to `rand1` or `rand2`

Benchmark Results¶

Performance on standard test functions (D=30, \(N_P=300\), 1000 generations):

Function	Best Fitness	Iterations	Time (s)
Sphere	1.2e-28	412	0.8
Rosenbrock	2.4e-08	891	1.4
Rastrigin	4.1e-05	1000	2.1
Ackley	8.8e-15	623	1.2
Griewank	3.7e-12	548	1.0

Advantages & Limitations¶

Advantages¶

✅ No gradient information needed

✅ Handles non-convex, multimodal functions well

✅ Few parameters to tune

✅ Simple to implement and understand

✅ Robust across problem types

✅ Naturally handles box constraints

✅ Population maintains diversity

Limitations¶

❌ Slower than gradient methods (when gradients are available)

❌ Scales poorly to high dimensions (\(D > 100\))

❌ No convergence guarantees in finite time

❌ Requires many function evaluations

❌ Performance sensitive to parameter choices

References¶

Storn, R. & Price, K. (1997). “Differential evolution – A simple and efficient heuristic for global optimization over continuous spaces.” Journal of Global Optimization, 11(4):341–359.
Price, K., Storn, R.M. & Lampinen, J.A. (2005). Differential Evolution: A Practical Approach to Global Optimization. Springer.
Das, S. & Suganthan, P.N. (2011). “Differential evolution: A survey of the state-of-the-art.” IEEE Transactions on Evolutionary Computation, 15(1):4–31.
Brest, J. et al. (2006). “Self-adapting control parameters in differential evolution.” IEEE Trans. Evolutionary Computation, 10(6):646–657. (jDE)
Tanabe, R. & Fukunaga, A. (2013). “Success-history based parameter adaptation for differential evolution.” IEEE CEC, pp. 71–78. (SHADE)