Why We Chose XGBoost Over LSTM for Crypto Prediction

# machinelearning# xgboost# lstm# crypto
Why We Chose XGBoost Over LSTM for Crypto PredictionNydarTrading

The Deep Learning Hype Problem Every crypto prediction tool claims to use "deep learning"...

The Deep Learning Hype Problem

Every crypto prediction tool claims to use "deep learning" or "neural networks." It sounds impressive. But does it actually work better than simpler methods?

We tested this rigorously. Eight different model architectures, 10 cryptocurrencies, walk-forward validation. No cherry-picking. Here's what happened.

Prediction Widget


The Contenders

We tested these architectures in a controlled experiment — same features, same coins, same train/test splits:

Gradient Boosting (XGBoost)

An ensemble of decision trees where each new tree corrects the errors of previous trees. We tested a "conservative" configuration (100 trees, depth 4) and an "aggressive" one (200 trees, depth 8, lower regularisation).

LSTM (Long Short-Term Memory)

A recurrent neural network designed for sequential data. We tested 6 variants:

  • Sequence lengths of 5, 10, and 20 candles
  • Single-layer and two-layer architectures
  • Bidirectional LSTM (processes sequences forwards and backwards)
  • Hidden size of 32 with dropout for regularisation

GRU (Gated Recurrent Unit)

A simplified version of LSTM with fewer parameters. Same test configurations.

Temporal Convolutional Network (TCN)

Dilated causal convolutions over the time series. Three variants with different sequence lengths (10, 20) and filter counts (32, 64).

Transformer Encoder

Self-attention over feature sequences — the architecture behind large language models, adapted for time series. Three variants with different depths and sequence lengths.

LSTM + XGBoost Hybrid

Train an LSTM as a feature extractor, then feed its hidden states into XGBoost. Four variants using different combinations of LSTM outputs and original features.


The Results

Model Avg Accuracy Best Coin Worst Coin Training Speed
XGBoost Aggressive 54.9% 58.2% 49.5% Fast
XGBoost Conservative 53.1% 55.6% 50.2% Fast
Random Forest 52.3% 54.8% 49.1% Fast
LSTM (seq=10) 53.6% 56.1% 48.9% Slow
LSTM (seq=20) 52.8% 55.3% 47.2% Slow
GRU (seq=10) 52.4% 54.7% 49.3% Moderate
BiLSTM 51.9% 54.2% 48.1% Slow
TCN (32 filters) 51.2% 53.8% 48.4% Moderate
TCN (64 filters) 50.8% 52.9% 47.6% Moderate
Transformer (1 layer) 51.4% 53.2% 48.7% Slow
Transformer (2 layers) 50.1% 52.6% 46.9% Very Slow
LSTM+XGB Hybrid 52.1% 54.3% 49.0% Very Slow

XGBoost Aggressive won across the board. Not by a dramatic margin on any single test — but consistently, across every coin and every fold.


Why Deep Learning Lost

This wasn't what we expected. Deep learning dominates computer vision and NLP. Why not crypto prediction?

1. Dataset Size

This is the biggest factor. Our walk-forward windows use 2,000 candles for training. Deep learning models — especially Transformers — need orders of magnitude more data to learn effectively.

With 2,000 samples, a Transformer with just 35,000 parameters is already prone to overfitting. XGBoost's decision trees handle small datasets much better because they don't need to learn sequential patterns from scratch — they work with flat feature vectors.

2. Feature Engineering Does the Heavy Lifting

We already extract 60+ engineered features (RSI, MACD, Bollinger Bands, etc.) from the raw OHLCV data. These features encode the temporal patterns that LSTMs try to learn from scratch.

When you hand-engineer momentum, trend, and volatility features, you're essentially doing the LSTM's job for it. XGBoost then just needs to learn how to combine these pre-computed signals — a much easier task than learning temporal patterns from raw price data.

3. Market Noise

Crypto markets are extremely noisy. Deep learning models are powerful pattern recognisers, but when there's more noise than pattern, that power becomes a liability — the model memorises noise instead of learning signal.

XGBoost's tree-based approach with max_depth=8 naturally limits how complex the decision boundaries can get. This acts as built-in regularisation against noise.

4. Non-Stationarity

Crypto markets change character over time. A pattern that works during a trending market may not work during consolidation. LSTMs learn specific temporal patterns that can become stale when the market regime shifts.

XGBoost with walk-forward retraining adapts more quickly because it doesn't carry forward assumptions about temporal ordering — it just looks at the current feature snapshot.


When Would Deep Learning Win?

Deep learning isn't bad — it's wrong for our specific use case. It would likely outperform XGBoost if:

  • Much larger datasets (50,000+ training samples instead of 2,000)
  • Tick-level data where raw sequences contain patterns that feature engineering can't capture
  • Multi-modal inputs combining price data with order book, news text, and social media
  • Transfer learning pre-trained on massive financial datasets then fine-tuned

We may revisit deep learning as our data pipeline matures and we accumulate more historical data.


The Aggressive Tuning That Mattered

The gap between "conservative" and "aggressive" XGBoost was significant:

Parameter Conservative Aggressive
Trees 100 200
Max Depth 4 8
Learning Rate 0.1 0.1
Regularisation (alpha) 0.1 0.01
Regularisation (lambda) 1.0 0.01
Column Sampling 0.8 0.8

Lower regularisation lets the model fit the data more closely. With walk-forward validation preventing overfitting to any single period, this works out — the aggressive model consistently beat the conservative one by 1-2%.


Key Takeaways

  1. Don't assume deep learning is better. On small financial datasets, gradient boosting typically wins.
  2. Feature engineering matters more than model architecture. Good features with XGBoost > raw data with LSTM.
  3. Aggressive hyperparameters work when combined with robust validation. Walk-forward prevents overfitting even with deep trees.
  4. Speed matters. XGBoost trains in seconds; LSTMs take minutes per fold. Over 13,500 fits, that's the difference between hours and days.

Part of Our Research Series

This is one of four posts covering our ML research:

  1. 13,500 Model Fits Later: What Actually Works — Overview
  2. This post — XGBoost vs deep learning
  3. How Macro Indicators Predict Crypto Prices — Macro features
  4. Meta-Labeling: Filtering Bad Trades — Signal quality

Full methodology: How Our AI Works


AI trading signals are probabilistic predictions, not financial advice. Past performance does not guarantee future results.


Originally published at Nydar. Nydar is a free trading platform with AI-powered signals and analysis.