🇪🇸 Leer en Español 🇺🇸 English
Historical Data Sources
Overview
The quality of your data determines the quality of your backtesting. Here are the sources I currently use, each with their pros and cons.
Yahoo Finance
Features
- Free for EOD data
- Global stock coverage
- Data from the 1960s for many tickers
- Simple Python API with
yfinance
Installation and Usage
pip install yfinance
import yfinance as yf
import pandas as pd
# Daily data
ticker = yf.Ticker("AAPL")
data = ticker.history(start="2023-01-01", end="2023-12-31")
# Multiple tickers
tickers = ["AAPL", "MSFT", "GOOGL"]
data = yf.download(tickers, start="2023-01-01", end="2023-12-31")
# Intraday data (last 60 days max)
intraday = ticker.history(period="1d", interval="1m")
Limitations
- Only reliable EOD data
- Intraday limited to 60 days
- Does not include dark pool or odd lots
- Splits and dividends sometimes incorrect
When to Use
- Initial backtesting of swing strategies
- Quick idea screening
- Long historical data
Polygon.io
Features
- Professional API with tick-by-tick data
- WebSocket for real-time data
- Complete history since 2003
- Includes dark pools, odd lots, conditions
Setup
pip install polygon-api-client
from polygon import RESTClient
import os
client = RESTClient(api_key=os.getenv("POLYGON_API_KEY"))
# Aggregates (bars)
aggs = client.get_aggs(
ticker="AAPL",
multiplier=5,
timespan="minute",
from_="2023-01-01",
to="2023-12-31"
)
# Trades (tick data)
trades = client.list_trades(
ticker="AAPL",
timestamp="2023-06-01"
)
# Quote data (NBBO)
quotes = client.list_quotes(
ticker="AAPL",
timestamp="2023-06-01"
)
Plans and Pricing (2024)
- Basic: $0/month - 2 years historical, EOD, 5 calls/min
- Starter: $29/month - 5 years historical, 15-min delayed, WebSockets
- Developer: $79/month - 10 years historical, 15-min delayed, trades data
- Advanced: $199/month - 20+ years historical, real-time data, quotes
Note: Prices for individual users. Professional plans have different costs.
When to Use
- Intraday strategies that need precision
- Microstructure analysis
- Backtesting with realistic fills
Interactive Brokers TWS
Features
- Real-time data included with account
- Robust API for automation
- Limited but free history
- Direct connection for live trading
Configuration with ib_insync
pip install ib_insync
from ib_insync import *
ib = IB()
ib.connect('127.0.0.1', 7497, clientId=1) # Paper: 7497, Live: 7496
# Contract
contract = Stock('AAPL', 'SMART', 'USD')
# Historical data
bars = ib.reqHistoricalData(
contract,
endDateTime='',
durationStr='30 D',
barSizeSetting='1 min',
whatToShow='TRADES',
useRTH=True
)
# Convert to DataFrame
df = util.df(bars)
# Real-time bars
def onBarUpdate(bars, hasNewBar):
print(bars[-1])
bars = ib.reqRealTimeBars(contract, 5, 'TRADES', True)
bars.updateEvent += onBarUpdate
Limitations
- Maximum 1 year of minute data
- Strict rate limits
- Requires TWS open
When to Use
- Automated trading in production
- Paper trading with real data
- Verification of other data sources
DAS Trader
Features
- Professional day trading platform
- Complete Level 2
- Hotkeys and automation
- Integration with multiple brokers
Exporting Data
# DAS saves logs in CSV format
import pandas as pd
import glob
# Read executed trades
trades_files = glob.glob('C:/DAS/Trades/*.csv')
trades = pd.concat([pd.read_csv(f) for f in trades_files])
# Process for analysis
trades['Time'] = pd.to_datetime(trades['Time'])
trades['PnL'] = trades['ExitPrice'] - trades['EntryPrice']
Python Integration
# Use DAS API (requires DAS Trader Pro)
import win32com.client
das = win32com.client.Dispatch("DAS.Application")
das.SendOrder("BUY", "AAPL", 100, "MARKET")
When to Use
- Manual execution with post-trade analysis
- Combine manual execution with quant analysis
- Strategy testing before automating
QuantConnect
Features
- Complete cloud platform
- Data included (equities, options, futures, forex, crypto)
- Professional backtesting engine
- Direct deploy to live trading
Algorithm Example
class MyAlgorithm(QCAlgorithm):
def Initialize(self):
self.SetStartDate(2023, 1, 1)
self.SetEndDate(2023, 12, 31)
self.SetCash(100000)
# Add securities
self.spy = self.AddEquity("SPY", Resolution.Minute)
# Indicators
self.sma = self.SMA("SPY", 20, Resolution.Daily)
def OnData(self, data):
if not self.sma.IsReady:
return
if data["SPY"].Price > self.sma.Current.Value:
self.SetHoldings("SPY", 1.0)
else:
self.Liquidate("SPY")
Advantages
- No need to manage infrastructure
- Clean, adjusted data
- Extensive community and examples
When to Use
- Complex multi-asset strategies
- When you don’t want to manage data
- Quick transition from backtest to live
Flash Research
Features
- Focused on market microstructure
- Tape reading analysis
- Institutional footprint identification
- Options flow data
Use Cases
# Conceptual example - Flash Research provides insights, not raw data
insights = {
'institutional_accumulation': ['AAPL', 'MSFT'],
'unusual_options_activity': [
{'ticker': 'NVDA', 'strike': 500, 'volume': 10000}
],
'dark_pool_prints': [
{'ticker': 'TSLA', 'size': 500000, 'price': 250.50}
]
}
# Use these insights to filter universe
universe = screen_stocks(insights['institutional_accumulation'])
When to Use
- Technical signal confirmation
- Identify institutional accumulation
- Options flow for directionality
My Current Stack
# config.py
DATA_SOURCES = {
'historical': 'polygon', # For precise backtesting
'realtime': 'ibkr_tws', # For execution
'screening': 'yahoo', # For quick ideas
'research': 'quantconnect', # For complex strategies
'insights': 'flash_research' # For confirmation
}
# data_manager.py
class DataManager:
def __init__(self):
self.polygon = PolygonClient()
self.yahoo = YahooClient()
self.ibkr = IBKRClient()
def get_data(self, ticker, source='auto'):
if source == 'auto':
# Logic to choose best source
if self.need_intraday:
return self.polygon.get_data(ticker)
else:
return self.yahoo.get_data(ticker)
Monthly Costs
| Source | Cost | What You Get |
|---|---|---|
| Yahoo Finance | $0 | EOD data, basic screening |
| Polygon.io | $79 | 10 years intraday, websocket |
| IBKR | $10 + comms | Real-time, execution |
| DAS Trader | $150 | Pro platform + data |
| QuantConnect | $0-$200 | Backtest + live deployment |
| Flash Research | Variable | Market intelligence |
Total: ~$300-400/month for professional setup
Tips for Getting Started
- Start free: Yahoo Finance + IBKR paper trading
- First upgrade: Polygon.io Developer ($79)
- When you’re consistent: Add DAS or similar
- To scale: QuantConnect for multiple strategies
Data Quality Checklist
def validate_data(df):
checks = {
'no_gaps': df.index.is_monotonic_increasing,
'no_nulls': not df.isnull().any().any(),
'volume_positive': (df['Volume'] >= 0).all(),
'prices_positive': (df[['Open', 'High', 'Low', 'Close']] > 0).all().all(),
'high_low_valid': (df['High'] >= df['Low']).all(),
'ohlc_valid': (
(df['High'] >= df[['Open', 'Close']].max(axis=1)).all() &
(df['Low'] <= df[['Open', 'Close']].min(axis=1)).all()
)
}
return pd.Series(checks)
Next Step
Continue with Data Types to understand the differences between EOD, intraday, and tick data.