🇪🇸 Leer en Español 🇺🇸 English
Key Backtesting Metrics
The Numbers That Really Matter
You can have a backtest that shows +500% return, but if you don’t understand the right metrics, you’re seeing mirages. These are the metrics I use to evaluate whether a strategy is real or fantasy.
Profitability Metrics
1. CAGR (Compound Annual Growth Rate)
Definition: Measures the annualized growth rate of an investment over a specific time period.
Mathematical Formula:
CAGR = (Final_Value / Initial_Value)^(1/n) - 1
Where n = number of years (total_days / 252)
def CAGR(datos: pd.DataFrame, calculo_optimizado: bool = True, columna: str = "Close") -> float:
"""
Compound Annual Growth Rate - Optimized reference implementation
Parameters
----------
datos : pd.DataFrame
Historical data of a financial asset
calculo_optimizado : bool, default True
Whether to use direct method (True) or returns-based method (False)
columna : str, default "Close"
Column to use for calculation
Returns
-------
float
Annualized growth rate
"""
# Calculate years
n = np.ceil(datos.shape[0] / 252)
if calculo_optimizado:
# Direct method (more efficient)
valor_inicial = datos[columna].iloc[0]
valor_final = datos[columna].iloc[-1]
return (valor_final / valor_inicial) ** (1 / n) - 1
else:
# Method using daily returns
retornos_diarios = datos[columna].pct_change()
retornos_acumulados = (1 + retornos_diarios).cumprod()
return retornos_acumulados.iloc[-1] ** (1 / n) - 1
def calculate_returns(equity_curve):
"""Calculate different types of return"""
start_value = equity_curve[0]
end_value = equity_curve[-1]
num_years = len(equity_curve) / 252 # Assuming daily data
# Total return
total_return = (end_value - start_value) / start_value
# CAGR using reference formula
cagr = (end_value / start_value) ** (1/num_years) - 1
return {
'total_return': total_return,
'cagr': cagr,
'absolute_profit': end_value - start_value,
'years': num_years
}
2. Benchmark Comparison
def compare_to_benchmark(strategy_returns, benchmark_returns):
"""Compare against benchmark (SPY)"""
strategy_cumret = (1 + strategy_returns).cumprod()
benchmark_cumret = (1 + benchmark_returns).cumprod()
# Alpha (excess return)
alpha = strategy_cumret.iloc[-1] - benchmark_cumret.iloc[-1]
# Beta (correlation with market)
correlation = strategy_returns.corr(benchmark_returns)
beta = strategy_returns.cov(benchmark_returns) / benchmark_returns.var()
return {
'alpha': alpha,
'beta': beta,
'correlation': correlation,
'outperformed': alpha > 0
}
Risk Metrics
1. Sharpe Ratio - The King of Metrics
Definition: Measures the return an investment offers per unit of risk taken.
Mathematical Formula:
Sharpe = (Asset_Return - Risk_Free_Rate) / Annualized_Standard_Deviation
Interpretation:
- > 0: Return exceeds the risk-free rate
- < 0: Better to invest in risk-free assets
def coef_sharpe(datos: pd.DataFrame, tasa_lr: float = 0.03, columna: str = "Close") -> float:
"""
Sharpe Coefficient - Exact reference implementation
Parameters
----------
datos : pd.DataFrame
Historical data of a financial asset
tasa_lr : float, default 0.03
Risk-free rate (3% by default)
columna : str, default "Close"
Column to use for calculation
Returns
-------
float
Sharpe Coefficient
"""
# Calculate annualized asset return
retorno_activo = (datos[columna].iloc[-1] / datos[columna].iloc[0]) ** (1 / np.ceil(datos.shape[0] / 252)) - 1
# Annualized standard deviation
desviacion_estandar_anualizada = datos[columna].pct_change().std() * np.sqrt(252)
return (retorno_activo - tasa_lr) / desviacion_estandar_anualizada
def calculate_sharpe_ratio(returns, risk_free_rate=0.02):
"""Sharpe Ratio: Return per unit of risk - Modern version"""
excess_returns = returns.mean() * 252 - risk_free_rate # Annualized
volatility = returns.std() * np.sqrt(252) # Annualized
sharpe = excess_returns / volatility if volatility > 0 else 0
# Enhanced interpretation
if sharpe > 2:
quality = "Excellent"
interpretation = "Exceptional strategy - check for overfitting"
elif sharpe > 1:
quality = "Good"
interpretation = "Solid strategy with good risk-return tradeoff"
elif sharpe > 0.5:
quality = "Acceptable"
interpretation = "Viable strategy but improvable"
elif sharpe > 0:
quality = "Poor"
interpretation = "Barely beats the risk-free rate"
else:
quality = "Negative"
interpretation = "Losing money - better to invest in bonds"
return {
'sharpe_ratio': sharpe,
'quality': quality,
'interpretation': interpretation,
'excess_return': excess_returns,
'volatility': volatility
}
2. Maximum Drawdown - Your Worst Nightmare
Definition: Measures the worst loss suffered by an investment from a historical peak, allowing you to evaluate downside risk.
Mathematical Formula:
Drawdown = (Highest_Cumulative_Return - Current_Return) / Highest_Cumulative_Return
Max_Drawdown = MAX(Drawdown_Series)
def max_dd(datos: pd.DataFrame, columna: str = "Close") -> float:
"""
Maximum Drawdown - Reference implementation
Parameters
----------
datos : pd.DataFrame
Historical data of a financial instrument
columna : str, default "Close"
Column to use for calculation
Returns
-------
float
Maximum drawdown (as decimal, e.g.: 0.15 = 15%)
"""
# Calculate daily returns
rendimientos_diarios = datos[columna].pct_change()
# Cumulative returns
rendimientos_acumulados = (1 + rendimientos_diarios).cumprod()
# Highest cumulative return up to each point
mayor_rendimiento_acumulado = rendimientos_acumulados.cummax()
# Difference between the maximum and current value
diferencia = mayor_rendimiento_acumulado - rendimientos_acumulados
# Convert to percentage
diferencia_porcentaje = diferencia / mayor_rendimiento_acumulado
# Maximum drawdown
retroceso_maximo = diferencia_porcentaje.max()
return retroceso_maximo
def calculate_drawdown(equity_curve):
"""Drawdown: Your worst loss from the peak - Extended version"""
equity_series = pd.Series(equity_curve)
# Running maximum
peak = equity_series.cummax()
# Drawdown at each point
drawdown = (equity_series - peak) / peak
# Maximum drawdown
max_drawdown = drawdown.min()
# Drawdown duration
drawdown_duration = []
in_drawdown = False
start_dd = None
for i, dd in enumerate(drawdown):
if dd < 0 and not in_drawdown:
# Start of drawdown
in_drawdown = True
start_dd = i
elif dd == 0 and in_drawdown:
# End of drawdown
in_drawdown = False
drawdown_duration.append(i - start_dd)
max_dd_duration = max(drawdown_duration) if drawdown_duration else 0
# Drawdown interpretation
dd_abs = abs(max_drawdown)
if dd_abs < 0.05:
risk_level = "Very Low (Too good to be true?)"
elif dd_abs < 0.10:
risk_level = "Low"
elif dd_abs < 0.20:
risk_level = "Moderate"
elif dd_abs < 0.30:
risk_level = "High"
else:
risk_level = "Very High"
return {
'max_drawdown': max_drawdown,
'max_drawdown_pct': max_drawdown * 100,
'max_drawdown_duration': max_dd_duration,
'drawdown_series': drawdown,
'current_drawdown': drawdown.iloc[-1] if len(drawdown) > 0 else 0,
'risk_level': risk_level,
'peak_values': peak
}
3. Calmar Ratio
def calculate_calmar_ratio(returns, equity_curve):
"""Calmar: CAGR / Max Drawdown"""
cagr = calculate_returns(equity_curve)['cagr']
max_dd = abs(calculate_drawdown(equity_curve)['max_drawdown'])
calmar = cagr / max_dd if max_dd > 0 else 0
return {
'calmar_ratio': calmar,
'interpretation': 'Excellent' if calmar > 1 else 'Good' if calmar > 0.5 else 'Poor'
}
Trading Metrics
1. Win Rate and Profit Factor
def calculate_trade_metrics(trades_df):
"""Trade-specific metrics"""
if trades_df.empty:
return {'error': 'No trades to analyze'}
# Win rate
winning_trades = (trades_df['pnl'] > 0).sum()
total_trades = len(trades_df)
win_rate = winning_trades / total_trades
# Average win/loss
wins = trades_df[trades_df['pnl'] > 0]['pnl']
losses = trades_df[trades_df['pnl'] < 0]['pnl']
avg_win = wins.mean() if len(wins) > 0 else 0
avg_loss = losses.mean() if len(losses) > 0 else 0
# Profit factor
gross_profit = wins.sum()
gross_loss = abs(losses.sum())
profit_factor = gross_profit / gross_loss if gross_loss > 0 else float('inf')
# Expectancy
expectancy = (win_rate * avg_win) + ((1 - win_rate) * avg_loss)
# Largest win/loss
largest_win = wins.max() if len(wins) > 0 else 0
largest_loss = losses.min() if len(losses) > 0 else 0
return {
'total_trades': total_trades,
'win_rate': win_rate,
'avg_win': avg_win,
'avg_loss': avg_loss,
'profit_factor': profit_factor,
'expectancy': expectancy,
'largest_win': largest_win,
'largest_loss': largest_loss,
'gross_profit': gross_profit,
'gross_loss': gross_loss
}
2. Consecutive Wins/Losses
def analyze_streaks(trades_df):
"""Analyze winning and losing streaks"""
if trades_df.empty:
return {}
# Create wins/losses series
wins_losses = (trades_df['pnl'] > 0).astype(int)
# Calculate streaks
streaks = []
current_streak = 1
current_type = wins_losses.iloc[0]
for i in range(1, len(wins_losses)):
if wins_losses.iloc[i] == current_type:
current_streak += 1
else:
streaks.append({
'type': 'win' if current_type else 'loss',
'length': current_streak
})
current_streak = 1
current_type = wins_losses.iloc[i]
# Last streak
streaks.append({
'type': 'win' if current_type else 'loss',
'length': current_streak
})
# Statistics
win_streaks = [s['length'] for s in streaks if s['type'] == 'win']
loss_streaks = [s['length'] for s in streaks if s['type'] == 'loss']
return {
'max_consecutive_wins': max(win_streaks) if win_streaks else 0,
'max_consecutive_losses': max(loss_streaks) if loss_streaks else 0,
'avg_win_streak': np.mean(win_streaks) if win_streaks else 0,
'avg_loss_streak': np.mean(loss_streaks) if loss_streaks else 0,
'all_streaks': streaks
}
Consistency Metrics
1. Monthly Returns Analysis
def analyze_monthly_returns(equity_curve, timestamps):
"""Analyze monthly returns"""
equity_df = pd.DataFrame({
'timestamp': timestamps,
'equity': equity_curve
})
equity_df.set_index('timestamp', inplace=True)
# Monthly returns
monthly_equity = equity_df.resample('M').last()
monthly_returns = monthly_equity['equity'].pct_change().dropna()
# Metrics
positive_months = (monthly_returns > 0).sum()
total_months = len(monthly_returns)
monthly_win_rate = positive_months / total_months
# Best and worst month
best_month = monthly_returns.max()
worst_month = monthly_returns.min()
# Consistency (std of monthly returns)
consistency = monthly_returns.std()
return {
'monthly_win_rate': monthly_win_rate,
'best_month': best_month,
'worst_month': worst_month,
'avg_monthly_return': monthly_returns.mean(),
'monthly_consistency': consistency,
'total_months': total_months,
'positive_months': positive_months,
'monthly_returns': monthly_returns
}
2. Rolling Performance
def rolling_performance(returns, window=252):
"""Performance in rolling windows"""
rolling_sharpe = []
rolling_returns = []
for i in range(window, len(returns)):
period_returns = returns[i-window:i]
# Rolling Sharpe
sharpe = calculate_sharpe_ratio(period_returns)['sharpe_ratio']
rolling_sharpe.append(sharpe)
# Rolling annual return
annual_return = period_returns.mean() * 252
rolling_returns.append(annual_return)
return {
'rolling_sharpe': rolling_sharpe,
'rolling_returns': rolling_returns,
'sharpe_stability': np.std(rolling_sharpe),
'return_stability': np.std(rolling_returns)
}
Advanced Metrics
1. Value at Risk (VaR)
def calculate_var(returns, confidence_level=0.05):
"""Value at Risk: Maximum expected loss"""
# Historical VaR
var_historical = np.percentile(returns, confidence_level * 100)
# Parametric VaR (assuming normal distribution)
mean_return = returns.mean()
std_return = returns.std()
var_parametric = mean_return - (1.96 * std_return) # 95% confidence
# Expected Shortfall (CVaR)
shortfall_returns = returns[returns <= var_historical]
expected_shortfall = shortfall_returns.mean() if len(shortfall_returns) > 0 else 0
return {
'var_historical': var_historical,
'var_parametric': var_parametric,
'expected_shortfall': expected_shortfall,
'confidence_level': confidence_level
}
2. Sortino Ratio - Penalizing Only Losses
Definition: Measures risk-adjusted return considering only downside volatility, making it more sensitive to losses than the Sharpe Ratio.
Mathematical Formula:
Sortino = (Asset_Return - Risk_Free_Rate) / Negative_Standard_Deviation
Advantage over Sharpe: Only penalizes downside volatility (unwanted losses), not gains.
def coef_Sortino(datos: pd.DataFrame, tasa_lr: float = 0.03, columna: str = "Close") -> float:
"""
Sortino Coefficient - Exact reference implementation
Parameters
----------
datos : pd.DataFrame
Historical data of a financial asset
tasa_lr : float, default 0.03
Risk-free rate (3% by default)
columna : str, default "Close"
Column to use for calculation
Returns
-------
float
Sortino Coefficient
"""
# Calculate annualized asset return
rendimiento_activo = (datos[columna].iloc[-1] / datos[columna].iloc[0]) ** (1 / np.ceil(datos.shape[0] / 252)) - 1
# Daily returns
rendimientos_diarios = datos[columna].pct_change()
# Only negative returns
rendimientos_diarios_negativos = rendimientos_diarios[rendimientos_diarios < 0]
# Annualized standard deviation of negative returns
desviacion_estandar_negativos = rendimientos_diarios_negativos.std() * np.sqrt(252)
return (rendimiento_activo - tasa_lr) / desviacion_estandar_negativos
def calculate_sortino_ratio(returns, risk_free_rate=0.02):
"""Sortino: Like Sharpe but only penalizes downside - Modern version"""
excess_returns = returns.mean() * 252 - risk_free_rate
# Downside deviation (only negative returns)
negative_returns = returns[returns < 0]
downside_deviation = negative_returns.std() * np.sqrt(252)
sortino = excess_returns / downside_deviation if downside_deviation > 0 else 0
# Interpretation
if sortino > 2:
quality = "Excellent"
interpretation = "Very good loss risk management"
elif sortino > 1:
quality = "Good"
interpretation = "Good downside risk management"
elif sortino > 0.5:
quality = "Acceptable"
interpretation = "Moderate loss risk management"
elif sortino > 0:
quality = "Poor"
interpretation = "Positive return but poor loss control"
else:
quality = "Negative"
interpretation = "Negative return with significant losses"
return {
'sortino_ratio': sortino,
'quality': quality,
'interpretation': interpretation,
'downside_deviation': downside_deviation,
'excess_return': excess_returns,
'negative_periods': len(negative_returns)
}
Benchmarking Framework
class PerformanceAnalyzer:
"""Complete framework for performance analysis"""
def __init__(self, equity_curve, returns, trades_df=None, benchmark_returns=None):
self.equity_curve = equity_curve
self.returns = returns
self.trades_df = trades_df
self.benchmark_returns = benchmark_returns
def full_analysis(self):
"""Complete analysis"""
analysis = {}
# Basic metrics
analysis['returns'] = calculate_returns(self.equity_curve)
analysis['sharpe'] = calculate_sharpe_ratio(self.returns)
analysis['drawdown'] = calculate_drawdown(self.equity_curve)
analysis['calmar'] = calculate_calmar_ratio(self.returns, self.equity_curve)
# Trading metrics
if self.trades_df is not None:
analysis['trades'] = calculate_trade_metrics(self.trades_df)
analysis['streaks'] = analyze_streaks(self.trades_df)
# Advanced metrics
analysis['var'] = calculate_var(self.returns)
analysis['sortino'] = calculate_sortino_ratio(self.returns)
# Benchmark comparison
if self.benchmark_returns is not None:
analysis['vs_benchmark'] = compare_to_benchmark(
self.returns, self.benchmark_returns
)
# Consistency
timestamps = pd.date_range(start='2023-01-01', periods=len(self.equity_curve), freq='D')
analysis['monthly'] = analyze_monthly_returns(self.equity_curve, timestamps)
return analysis
def generate_report(self):
"""Generate readable report"""
analysis = self.full_analysis()
report = f"""
BACKTEST PERFORMANCE REPORT
{'='*50}
PROFITABILITY
Total Return: {analysis['returns']['total_return']:.2%}
CAGR: {analysis['returns']['cagr']:.2%}
Profit: ${analysis['returns']['absolute_profit']:,.2f}
RISK
Sharpe Ratio: {analysis['sharpe']['sharpe_ratio']:.2f} ({analysis['sharpe']['quality']})
Max Drawdown: {analysis['drawdown']['max_drawdown']:.2%}
Calmar Ratio: {analysis['calmar']['calmar_ratio']:.2f}
Volatility: {analysis['sharpe']['volatility']:.2%}
TRADING METRICS
"""
if 'trades' in analysis:
trades = analysis['trades']
report += f"""Total Trades: {trades['total_trades']}
Win Rate: {trades['win_rate']:.2%}
Profit Factor: {trades['profit_factor']:.2f}
Expectancy: ${trades['expectancy']:.2f}
Avg Win: ${trades['avg_win']:.2f}
Avg Loss: ${trades['avg_loss']:.2f}
STREAKS
Max Consecutive Wins: {analysis['streaks']['max_consecutive_wins']}
Max Consecutive Losses: {analysis['streaks']['max_consecutive_losses']}
"""
report += f"""
ADVANCED METRICS
Sortino Ratio: {analysis['sortino']['sortino_ratio']:.2f}
VaR (95%): {analysis['var']['var_historical']:.2%}
Expected Shortfall: {analysis['var']['expected_shortfall']:.2%}
CONSISTENCY
Monthly Win Rate: {analysis['monthly']['monthly_win_rate']:.2%}
Best Month: {analysis['monthly']['best_month']:.2%}
Worst Month: {analysis['monthly']['worst_month']:.2%}
"""
return report
Red Flags in Metrics
def identify_red_flags(analysis):
"""Identify warning signals in metrics"""
red_flags = []
# Returns too good to be true
if analysis['returns']['cagr'] > 1.0: # >100% CAGR
red_flags.append("CAGR too high - possible overfitting")
# Win rate too high
if 'trades' in analysis and analysis['trades']['win_rate'] > 0.8:
red_flags.append("Win rate too high - check for look-ahead bias")
# Drawdown too low
if abs(analysis['drawdown']['max_drawdown']) < 0.05:
red_flags.append("Drawdown too low - not realistic")
# Too few trades
if 'trades' in analysis and analysis['trades']['total_trades'] < 30:
red_flags.append("Too few trades - lacks statistical significance")
# Profit factor too high
if 'trades' in analysis and analysis['trades']['profit_factor'] > 3:
red_flags.append("Profit factor too high - possible curve fitting")
# Sharpe too high
if analysis['sharpe']['sharpe_ratio'] > 3:
red_flags.append("Sharpe ratio too high - check data quality")
return red_flags
def validate_backtest(analysis):
"""Complete backtest validation"""
red_flags = identify_red_flags(analysis)
# Overall score
score = 0
max_score = 100
# Sharpe contribution (30 points max)
sharpe = analysis['sharpe']['sharpe_ratio']
if sharpe > 2:
score += 30
elif sharpe > 1:
score += 20
elif sharpe > 0.5:
score += 10
# Drawdown contribution (20 points max)
dd = abs(analysis['drawdown']['max_drawdown'])
if dd < 0.1:
score += 20
elif dd < 0.2:
score += 15
elif dd < 0.3:
score += 10
# Consistency (30 points max)
if 'monthly' in analysis:
monthly_wr = analysis['monthly']['monthly_win_rate']
if monthly_wr > 0.7:
score += 30
elif monthly_wr > 0.6:
score += 20
elif monthly_wr > 0.5:
score += 10
# Trade stats (20 points max)
if 'trades' in analysis:
if analysis['trades']['total_trades'] > 100:
score += 10
if 1.5 <= analysis['trades']['profit_factor'] <= 2.5:
score += 10
recommendation = "APPROVED" if score >= 70 and not red_flags else "NEEDS WORK"
return {
'score': score,
'max_score': max_score,
'red_flags': red_flags,
'recommendation': recommendation
}
My Personal Dashboard
def create_metrics_dashboard(analysis):
"""Visual metrics dashboard"""
import matplotlib.pyplot as plt
fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2, 2, figsize=(15, 10))
# 1. Equity curve
ax1.plot(analysis['equity_curve'])
ax1.set_title('Equity Curve')
ax1.grid(True)
# 2. Drawdown
dd = analysis['drawdown']['drawdown_series']
ax2.fill_between(range(len(dd)), dd, 0, alpha=0.3, color='red')
ax2.set_title(f'Drawdown (Max: {analysis["drawdown"]["max_drawdown"]:.2%})')
ax2.grid(True)
# 3. Monthly returns
if 'monthly' in analysis:
monthly_rets = analysis['monthly']['monthly_returns']
colors = ['green' if x > 0 else 'red' for x in monthly_rets]
ax3.bar(range(len(monthly_rets)), monthly_rets, color=colors, alpha=0.7)
ax3.set_title(f'Monthly Returns (WR: {analysis["monthly"]["monthly_win_rate"]:.1%})')
ax3.grid(True)
# 4. Key metrics
metrics_text = f"""
Sharpe: {analysis['sharpe']['sharpe_ratio']:.2f}
Calmar: {analysis['calmar']['calmar_ratio']:.2f}
CAGR: {analysis['returns']['cagr']:.1%}
Max DD: {analysis['drawdown']['max_drawdown']:.1%}
"""
if 'trades' in analysis:
metrics_text += f"""
Win Rate: {analysis['trades']['win_rate']:.1%}
Profit Factor: {analysis['trades']['profit_factor']:.2f}
Total Trades: {analysis['trades']['total_trades']}
"""
ax4.text(0.1, 0.5, metrics_text, fontsize=12, verticalalignment='center')
ax4.set_xlim(0, 1)
ax4.set_ylim(0, 1)
ax4.axis('off')
ax4.set_title('Key Metrics')
plt.tight_layout()
plt.show()
Practical Example: Evaluating a Strategy
import pandas as pd
import numpy as np
# Example using real data
def evaluate_strategy_example():
"""Complete strategy evaluation example"""
# Simulate equity curve data for a strategy
np.random.seed(42)
dates = pd.date_range('2023-01-01', '2023-12-31', freq='D')
# Simulate returns with slight positive drift
daily_returns = np.random.normal(0.0008, 0.02, len(dates)) # 0.08% daily return, 2% vol
equity_curve = 100000 * (1 + daily_returns).cumprod()
# Create simulated DataFrame
strategy_data = pd.DataFrame({
'Close': equity_curve
}, index=dates)
print("=== COMPLETE STRATEGY EVALUATION ===\n")
# 1. CAGR using both methods
cagr_optimized = CAGR(strategy_data, calculo_optimizado=True)
cagr_returns = CAGR(strategy_data, calculo_optimizado=False)
print(f"CAGR (Optimized Method): {cagr_optimized:.2%}")
print(f"CAGR (Returns Method): {cagr_returns:.2%}")
# 2. Sharpe Ratio
daily_rets = strategy_data['Close'].pct_change().dropna()
sharpe_result = coef_sharpe(strategy_data, tasa_lr=0.03)
print(f"\nSharpe Coefficient: {sharpe_result:.2f}")
if sharpe_result > 0:
print(" Return exceeds the risk-free rate")
else:
print(" Better to invest in risk-free assets")
# 3. Sortino Ratio
sortino_result = coef_Sortino(strategy_data, tasa_lr=0.03)
print(f"\nSortino Coefficient: {sortino_result:.2f}")
print(" (Only penalizes downside volatility)")
# 4. Maximum Drawdown
max_drawdown = max_dd(strategy_data)
print(f"\nMaximum Drawdown: {max_drawdown:.2%}")
print(f" Maximum loss from peak: ${100000 * max_drawdown:,.2f}")
# 5. Complete analysis using modern framework
analyzer = PerformanceAnalyzer(
equity_curve=equity_curve.values,
returns=daily_rets
)
print("\n" + "="*50)
print(analyzer.generate_report())
# 6. Backtest validation
analysis = analyzer.full_analysis()
validation = validate_backtest(analysis)
print(f"\nFINAL SCORE: {validation['score']}/{validation['max_score']}")
print(f"RECOMMENDATION: {validation['recommendation']}")
if validation['red_flags']:
print("\nRED FLAGS DETECTED:")
for flag in validation['red_flags']:
print(f" {flag}")
# Run example
if __name__ == "__main__":
evaluate_strategy_example()
Metrics Best Practices
Do’s
- Use multiple metrics: Never rely on a single metric
- Compare with benchmark: Always evaluate vs SPY or relevant index
- Analyze drawdown: A strategy with 50% DD is not viable
- Validate statistically: Minimum 30-50 trades for significance
- Consider period consistency: Metrics should be stable across periods
Don’ts
- Don’t ignore transaction costs: Include commissions and slippage
- Don’t optimize only Sharpe: Can lead to overfitting
- Don’t use future data: Avoid look-ahead bias
- Don’t ignore red flags: “Perfect” metrics are suspicious
- Don’t trade without out-of-sample: Always reserve data for validation
Realistic Targets for Small Caps
# Realistic benchmarks for small cap strategies
REALISTIC_METRICS = {
'sharpe_ratio': {
'excellent': '>1.5',
'good': '1.0-1.5',
'acceptable': '0.7-1.0',
'poor': '<0.7'
},
'max_drawdown': {
'excellent': '<15%',
'good': '15-25%',
'acceptable': '25-35%',
'poor': '>35%'
},
'cagr': {
'excellent': '>25%',
'good': '15-25%',
'acceptable': '10-15%',
'poor': '<10%'
},
'win_rate': {
'excellent': '>60%',
'good': '50-60%',
'acceptable': '45-50%',
'poor': '<45%'
}
}
Validation Checklist
Before going live with a strategy:
- Sharpe > 1.0
- Max Drawdown < 30%
- Minimum 50 trades in backtest
- Profit Factor between 1.3-2.5
- Win Rate 45-70% (not extreme)
- Consistency in rolling windows
- Out-of-sample testing passed
- No red flags in metrics
Next Step
With metrics mastered and reference formulas integrated, let’s move on to How to Avoid Overfitting to ensure your results are real.