Model Evaluation & Backtesting Results
Evaluated on AAPL, GOOGL, TSLA test sets · January 2015 – January 2025
1Predictive Performance
| Model | RMSE ↓ | MAE ↓ | R² ↑ | vs Baseline |
|---|---|---|---|---|
Standard LSTM (Price Only) | 3.452 | 2.120 | 0.824 | baseline |
XGBoost Regressor | 3.105 | 1.984 | 0.851 | -10.1% RMSE |
Hybrid LSTM + VADER | 2.850 | 1.755 | 0.887 | -17.4% RMSE |
🏆Proposed Multi-modal (FinBERT + TLSTM) | 2.341 🏆 | 1.428 🏆 | 0.942 🏆 | -32.2% RMSE |
* Evaluated on AAPL test set · January 2015 – January 2025· Lower RMSE = better · Our model: 32.1% RMSE improvement over LSTM baseline
2Backtest: Predicted vs Actual
3Trading Strategy Performance (2015–2025)
PPO Agent Total Return
315.6%
vs 210.4% Buy-and-Hold
Sharpe Ratio
2.45
Risk-adjusted return
Max Drawdown
-12.4%
Lowest point from peak
Strategy Comparison — Total Return (%)
Buy-and-Hold
Return: 210.4%
Sharpe: 1.2
Drawdown: -33.9%
Standard LSTM Trader
Return: 245.8%
Sharpe: 1.55
Drawdown: -28.2%
Win Rate: 54.2%
Our PPO Agent
Return: 315.6%
Sharpe: 2.45
Drawdown: -12.4%
Win Rate: 63.7%
4Ablation Study
💡 Key Finding: FinBERT sentiment module contributes +45.8% Sharpe improvement (2.45 vs 1.68) and transductive LSTM weighting reduces RMSE by 17.8% (2.341 vs 2.850)
Sharpe Ratio Comparison
RMSE Comparison (lower = better)
5COVID-19 Crash Case Study (Feb–Apr 2020)
📌 System correctly switched to Sell/Hold on Feb 24, 2020 as FinBERT detected negative news sentiment — protecting portfolio from the subsequent 36% market crash.
Portfolio value indexed to 100 at Feb 3, 2020