🇪🇸 Leer en Español 🇺🇸 English
Sentiment Analysis in Financial Markets
Introduction
Sentiment analysis quantifies the emotions and opinions expressed in text, providing an additional dimension for market analysis. In trading, sentiment can anticipate price movements before they are reflected in traditional technical data.
Core Concepts
Why Does Sentiment Analysis Work?
Psychological Impact on Markets:
- News directly influences investment decisions
- Retail sentiment can create momentum in small caps
- Social media amplifies the impact of sentiment
- Institutional algorithms now incorporate sentiment data
Sentiment Data Sources:
- Financial news (Bloomberg, Reuters, FinViz)
- Social media (Twitter, Reddit, StockTwits)
- Analyst reports
- Earnings call transcripts
- Specialized investment forums
Implementation with VADER Sentiment
VADER (Valence Aware Dictionary and sEntiment Reasoner) is specifically designed to analyze sentiment in social media texts and news.
Base Analysis Framework
import pandas as pd
import numpy as np
import requests
from bs4 import BeautifulSoup
import nltk
import string
from datetime import datetime, timedelta
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
from fake_useragent import UserAgent
import matplotlib.pyplot as plt
import seaborn as sns
from textblob import TextBlob
import yfinance as yf
from warnings import filterwarnings
filterwarnings("ignore")
# Download required resources
nltk.download('punkt', quiet=True)
nltk.download('wordnet', quiet=True)
nltk.download('stopwords', quiet=True)
class SentimentAnalyzer:
"""
Sentiment analyzer for financial markets
"""
def __init__(self):
self.vader = SentimentIntensityAnalyzer()
self.stop_words = set(nltk.corpus.stopwords.words('english'))
self.lemmatizer = nltk.stem.WordNetLemmatizer()
self.stemmer = nltk.stem.PorterStemmer()
# Financial market-specific words
self.financial_keywords = {
'bullish': ['bull', 'bullish', 'rally', 'moon', 'rocket', 'pump', 'surge', 'soar'],
'bearish': ['bear', 'bearish', 'crash', 'dump', 'plunge', 'tank', 'drop', 'fall'],
'neutral': ['hold', 'sideways', 'flat', 'consolidate', 'range']
}
def preprocess_text(self, text, advanced=True):
"""
Preprocess text for sentiment analysis
Parameters
----------
text : str
Original text
advanced : bool
Whether to apply advanced preprocessing
Returns
-------
str
Processed text
"""
if not advanced:
return text.lower().strip()
# 1. Tokenization
tokens = nltk.tokenize.word_tokenize(text.lower())
# 2. Lemmatization (convert to base form)
lemmatized_tokens = [self.lemmatizer.lemmatize(token) for token in tokens]
# 3. Stemming (reduce to root)
stemmed_tokens = [self.stemmer.stem(token) for token in lemmatized_tokens]
# 4. Remove stop words
filtered_tokens = [token for token in stemmed_tokens if token not in self.stop_words]
# 5. Normalization (remove punctuation)
normalized_tokens = [token for token in filtered_tokens if token not in string.punctuation]
# 6. Reassemble processed text
processed_text = " ".join(normalized_tokens)
return processed_text
def analyze_sentiment(self, text, method='vader'):
"""
Analyze sentiment of a text
Parameters
----------
text : str
Text to analyze
method : str
Analysis method ('vader', 'textblob', 'both')
Returns
-------
dict
Sentiment scores
"""
results = {}
if method in ['vader', 'both']:
# VADER Analysis
vader_scores = self.vader.polarity_scores(text)
results['vader'] = {
'compound': vader_scores['compound'],
'positive': vader_scores['pos'],
'negative': vader_scores['neg'],
'neutral': vader_scores['neu'],
'classification': 'positive' if vader_scores['compound'] >= 0.05 else 'negative' if vader_scores['compound'] <= -0.05 else 'neutral'
}
if method in ['textblob', 'both']:
# TextBlob Analysis
blob = TextBlob(text)
results['textblob'] = {
'polarity': blob.sentiment.polarity,
'subjectivity': blob.sentiment.subjectivity,
'classification': 'positive' if blob.sentiment.polarity > 0.1 else 'negative' if blob.sentiment.polarity < -0.1 else 'neutral'
}
# Financial keyword analysis
results['financial_sentiment'] = self.analyze_financial_keywords(text)
return results
def analyze_financial_keywords(self, text):
"""
Analyze financial market-specific keywords
"""
text_lower = text.lower()
bullish_count = sum(1 for word in self.financial_keywords['bullish'] if word in text_lower)
bearish_count = sum(1 for word in self.financial_keywords['bearish'] if word in text_lower)
neutral_count = sum(1 for word in self.financial_keywords['neutral'] if word in text_lower)
total_keywords = bullish_count + bearish_count + neutral_count
if total_keywords == 0:
return {'score': 0, 'classification': 'neutral', 'keywords_found': 0}
# Score based on proportion of bullish vs bearish words
score = (bullish_count - bearish_count) / total_keywords
if score > 0.2:
classification = 'bullish'
elif score < -0.2:
classification = 'bearish'
else:
classification = 'neutral'
return {
'score': score,
'classification': classification,
'keywords_found': total_keywords,
'bullish_words': bullish_count,
'bearish_words': bearish_count
}
def batch_analyze(self, texts, preprocess=True):
"""
Analyze sentiment of multiple texts
"""
results = []
for text in texts:
if preprocess:
processed_text = self.preprocess_text(text, advanced=True)
raw_sentiment = self.analyze_sentiment(text, method='both')
processed_sentiment = self.analyze_sentiment(processed_text, method='both')
results.append({
'original_text': text,
'processed_text': processed_text,
'raw_sentiment': raw_sentiment,
'processed_sentiment': processed_sentiment
})
else:
sentiment = self.analyze_sentiment(text, method='both')
results.append({
'text': text,
'sentiment': sentiment
})
return results
class NewsScraperFinViz:
"""
FinViz news scraper for sentiment analysis
"""
def __init__(self):
self.base_url = "https://finviz.com/quote.ashx?t={}&p=d"
self.sentiment_analyzer = SentimentAnalyzer()
def scrape_news(self, tickers, max_retries=3):
"""
Extract news from FinViz for multiple tickers
Parameters
----------
tickers : list
List of stock symbols
max_retries : int
Maximum number of attempts per ticker
Returns
-------
pd.DataFrame
DataFrame with news and sentiment
"""
news_data = []
for ticker in tickers:
print(f"Scraping news for {ticker}...")
for attempt in range(max_retries):
try:
# Random user agent to avoid blocks
ua = UserAgent()
headers = {"User-Agent": str(ua.chrome)}
# Make request
response = requests.get(
self.base_url.format(ticker),
headers=headers,
timeout=10
)
response.raise_for_status()
# Parse HTML
soup = BeautifulSoup(response.content, "html.parser")
news_table = soup.find(id="news-table")
if news_table is None:
print(f"No news table found for {ticker}")
break
# Extract individual news items
news_rows = news_table.findAll("tr")
for row in news_rows:
try:
# Extract headline
news_link = row.find("a", class_="tab-link-news")
if news_link is None:
continue
headline = news_link.text.strip()
# Extract date and time
time_data = row.find("td").text.replace("\\n", "").strip().split()
if len(time_data) == 2:
date_str = time_data[0]
time_str = time_data[1]
# Handle "Today"
if date_str.lower() == "today":
date_str = datetime.now().strftime("%b-%d-%y")
elif len(time_data) == 1:
# Time only, use current date
time_str = time_data[0]
date_str = datetime.now().strftime("%b-%d-%y")
else:
continue
# Convert date
try:
news_date = datetime.strptime(date_str, "%b-%d-%y")
except:
news_date = datetime.now()
# Analyze sentiment
sentiment_result = self.sentiment_analyzer.analyze_sentiment(headline, method='both')
news_data.append({
'ticker': ticker,
'date': news_date,
'time': time_str,
'headline': headline,
'vader_compound': sentiment_result['vader']['compound'],
'vader_classification': sentiment_result['vader']['classification'],
'textblob_polarity': sentiment_result['textblob']['polarity'],
'financial_sentiment': sentiment_result['financial_sentiment']['score'],
'financial_classification': sentiment_result['financial_sentiment']['classification'],
'keywords_found': sentiment_result['financial_sentiment']['keywords_found']
})
except Exception as e:
print(f"Error processing news row for {ticker}: {e}")
continue
break # Success, exit retry loop
except Exception as e:
print(f"Attempt {attempt + 1} failed for {ticker}: {e}")
if attempt == max_retries - 1:
print(f"Failed to scrape {ticker} after {max_retries} attempts")
# Convert to DataFrame
if news_data:
df = pd.DataFrame(news_data)
df['date'] = pd.to_datetime(df['date'])
return df
else:
return pd.DataFrame()
def sentiment_trading_strategy(price_data, sentiment_data,
sentiment_threshold=0.1,
lookback_days=3):
"""
Trading strategy based on sentiment analysis
Parameters
----------
price_data : pd.DataFrame
Historical price data
sentiment_data : pd.DataFrame
Sentiment data with dates
sentiment_threshold : float
Threshold for generating signals
lookback_days : int
Days to look back for aggregating sentiment
"""
# Aggregate sentiment by day
daily_sentiment = sentiment_data.groupby('date').agg({
'vader_compound': 'mean',
'financial_sentiment': 'mean',
'keywords_found': 'sum'
}).reset_index()
# Create trading signals
signals = pd.DataFrame(index=price_data.index)
signals['price'] = price_data['Close']
signals['signal'] = 0
signals['sentiment_score'] = np.nan
signals['confidence'] = 0
for i, date in enumerate(price_data.index):
# Look for sentiment in the last N days
start_date = date - timedelta(days=lookback_days)
end_date = date
period_sentiment = daily_sentiment[
(daily_sentiment['date'] >= start_date) &
(daily_sentiment['date'] <= end_date)
]
if len(period_sentiment) > 0:
# Calculate weighted average score (more weight to recent days)
weights = np.linspace(0.5, 1.0, len(period_sentiment))
avg_vader = np.average(period_sentiment['vader_compound'], weights=weights)
avg_financial = np.average(period_sentiment['financial_sentiment'], weights=weights)
total_keywords = period_sentiment['keywords_found'].sum()
# Combined score
combined_score = (avg_vader * 0.6 + avg_financial * 0.4)
# Adjust by news volume (more news = more confidence)
confidence = min(total_keywords / 10.0, 1.0) # Normalize to 0-1
signals.loc[date, 'sentiment_score'] = combined_score
signals.loc[date, 'confidence'] = confidence
# Generate signals only with minimum confidence
if confidence > 0.3:
if combined_score > sentiment_threshold:
signals.loc[date, 'signal'] = 1 # Buy
elif combined_score < -sentiment_threshold:
signals.loc[date, 'signal'] = -1 # Sell
return signals
def analyze_sentiment_correlation(price_data, sentiment_data, ticker):
"""
Analyze correlation between sentiment and price movements
"""
# Prepare daily data
daily_sentiment = sentiment_data.groupby('date').agg({
'vader_compound': 'mean',
'financial_sentiment': 'mean',
'keywords_found': 'count'
}).reset_index()
# Add price returns
price_returns = price_data['Close'].pct_change()
daily_data = pd.DataFrame({
'date': price_data.index,
'return': price_returns.values,
'price': price_data['Close'].values
})
# Combine data
combined_data = daily_data.merge(daily_sentiment, on='date', how='inner')
if len(combined_data) == 0:
return {'error': 'No matching dates between price and sentiment data'}
# Calculate correlations
correlations = {
'vader_sentiment_correlation': combined_data['vader_compound'].corr(combined_data['return']),
'financial_sentiment_correlation': combined_data['financial_sentiment'].corr(combined_data['return']),
'news_volume_correlation': combined_data['keywords_found'].corr(abs(combined_data['return'])),
}
# Lead/lag analysis
lead_lag_analysis = {}
for lag in range(-3, 4): # -3 to +3 days
if lag == 0:
continue
if lag > 0:
# Sentiment predicts future returns
shifted_returns = combined_data['return'].shift(-lag)
lead_lag_analysis[f'sentiment_leads_{lag}d'] = combined_data['vader_compound'].corr(shifted_returns)
else:
# Returns predict future sentiment
shifted_sentiment = combined_data['vader_compound'].shift(lag)
lead_lag_analysis[f'price_leads_{abs(lag)}d'] = combined_data['return'].corr(shifted_sentiment)
return {
'correlations': correlations,
'lead_lag_analysis': lead_lag_analysis,
'data_points': len(combined_data),
'date_range': f"{combined_data['date'].min()} to {combined_data['date'].max()}"
}
def create_sentiment_dashboard(tickers, sentiment_data):
"""
Create visual sentiment analysis dashboard
"""
# Configure subplot
fig, axes = plt.subplots(2, 2, figsize=(16, 12))
# 1. Sentiment Score by Ticker
daily_sentiment = sentiment_data.groupby(['ticker', 'date']).agg({
'vader_compound': 'mean',
'financial_sentiment': 'mean'
}).reset_index()
for ticker in tickers:
ticker_data = daily_sentiment[daily_sentiment['ticker'] == ticker]
axes[0, 0].plot(ticker_data['date'], ticker_data['vader_compound'], label=ticker, marker='o')
axes[0, 0].set_title('VADER Sentiment Score Over Time')
axes[0, 0].set_xlabel('Date')
axes[0, 0].set_ylabel('Sentiment Score')
axes[0, 0].legend()
axes[0, 0].grid(True, alpha=0.3)
axes[0, 0].axhline(y=0, color='black', linestyle='--', alpha=0.5)
# 2. Sentiment Distribution
axes[0, 1].hist(sentiment_data['vader_compound'], bins=30, alpha=0.7, edgecolor='black')
axes[0, 1].set_title('Distribution of Sentiment Scores')
axes[0, 1].set_xlabel('VADER Compound Score')
axes[0, 1].set_ylabel('Frequency')
axes[0, 1].axvline(x=0, color='red', linestyle='--', alpha=0.7)
axes[0, 1].grid(True, alpha=0.3)
# 3. Sentiment by Ticker (Box plot)
sentiment_by_ticker = [sentiment_data[sentiment_data['ticker'] == ticker]['vader_compound']
for ticker in tickers]
axes[1, 0].boxplot(sentiment_by_ticker, labels=tickers)
axes[1, 0].set_title('Sentiment Distribution by Ticker')
axes[1, 0].set_ylabel('VADER Score')
axes[1, 0].grid(True, alpha=0.3)
axes[1, 0].axhline(y=0, color='red', linestyle='--', alpha=0.7)
# 4. Keywords found per day
keywords_by_date = sentiment_data.groupby('date')['keywords_found'].sum()
axes[1, 1].plot(keywords_by_date.index, keywords_by_date.values, color='purple', linewidth=2)
axes[1, 1].set_title('Financial Keywords Found Over Time')
axes[1, 1].set_xlabel('Date')
axes[1, 1].set_ylabel('Keywords Count')
axes[1, 1].grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
return fig
# Complete usage example
def sentiment_analysis_example():
"""
Complete sentiment analysis example for trading
"""
# Tickers to analyze
tickers = ["AAPL", "TSLA", "NVDA", "AMZN"]
print("=== FINANCIAL SENTIMENT ANALYSIS ===\\n")
# 1. News scraping
print("Extracting news...")
scraper = NewsScraperFinViz()
news_data = scraper.scrape_news(tickers)
if news_data.empty:
print("Could not extract news")
return
print(f"Extracted {len(news_data)} news items")
# 2. Statistical analysis
print(f"\\nGENERAL STATISTICS:")
for ticker in tickers:
ticker_news = news_data[news_data['ticker'] == ticker]
if len(ticker_news) > 0:
avg_sentiment = ticker_news['vader_compound'].mean()
total_news = len(ticker_news)
positive_news = (ticker_news['vader_compound'] > 0.05).sum()
negative_news = (ticker_news['vader_compound'] < -0.05).sum()
print(f" {ticker}:")
print(f" Total News: {total_news}")
print(f" Average Sentiment: {avg_sentiment:.3f}")
print(f" Positive News: {positive_news} ({positive_news/total_news:.1%})")
print(f" Negative News: {negative_news} ({negative_news/total_news:.1%})")
# 3. Correlation analysis with prices
print(f"\\nCORRELATION ANALYSIS:")
for ticker in tickers:
try:
# Get price data
end_date = datetime.now()
start_date = end_date - timedelta(days=30)
price_data = yf.download(ticker, start=start_date, end=end_date, interval="1d")
ticker_sentiment = news_data[news_data['ticker'] == ticker]
if len(ticker_sentiment) > 0 and len(price_data) > 0:
correlation_analysis = analyze_sentiment_correlation(price_data, ticker_sentiment, ticker)
if 'error' not in correlation_analysis:
print(f" {ticker}:")
print(f" Sentiment-Return Correlation: {correlation_analysis['correlations']['vader_sentiment_correlation']:.3f}")
print(f" Data Points: {correlation_analysis['data_points']}")
except Exception as e:
print(f" {ticker}: Error in analysis - {e}")
# 4. Generate example strategy
print(f"\\nSTRATEGY EXAMPLE:")
ticker = "AAPL" # Use Apple as example
try:
price_data = yf.download(ticker, start=start_date, end=end_date, interval="1d")
ticker_sentiment = news_data[news_data['ticker'] == ticker]
if len(ticker_sentiment) > 0:
strategy_signals = sentiment_trading_strategy(price_data, ticker_sentiment)
total_signals = strategy_signals['signal'].abs().sum()
buy_signals = (strategy_signals['signal'] == 1).sum()
sell_signals = (strategy_signals['signal'] == -1).sum()
avg_confidence = strategy_signals[strategy_signals['confidence'] > 0]['confidence'].mean()
print(f" Ticker: {ticker}")
print(f" Total Signals: {total_signals}")
print(f" Buy Signals: {buy_signals}")
print(f" Sell Signals: {sell_signals}")
print(f" Average Confidence: {avg_confidence:.1%}")
except Exception as e:
print(f" Error generating strategy: {e}")
# 5. Create visualization
print(f"\\nGenerating dashboard...")
try:
create_sentiment_dashboard(tickers, news_data)
except Exception as e:
print(f"Error creating dashboard: {e}")
return news_data
# Sentiment analysis for small caps
def small_cap_sentiment_strategy(ticker, sentiment_threshold=0.15):
"""
Small cap-specific sentiment strategy
"""
# Small caps are more sensitive to sentiment
scraper = NewsScraperFinViz()
sentiment_data = scraper.scrape_news([ticker])
if sentiment_data.empty:
return {'error': 'No sentiment data available'}
# Get price data
end_date = datetime.now()
start_date = end_date - timedelta(days=30)
price_data = yf.download(ticker, start=start_date, end=end_date)
# Parameters adjusted for small caps
signals = sentiment_trading_strategy(
price_data,
sentiment_data,
sentiment_threshold=sentiment_threshold, # Higher threshold
lookback_days=1 # Faster reaction
)
# Add small cap-specific filters
signals['volume_filter'] = price_data['Volume'] > price_data['Volume'].rolling(20).mean()
signals['volatility_filter'] = price_data['Close'].pct_change().rolling(5).std() > 0.02
# Only generate signals when there is volume and volatility
signals['final_signal'] = np.where(
signals['volume_filter'] & signals['volatility_filter'],
signals['signal'],
0
)
return {
'signals': signals,
'sentiment_data': sentiment_data,
'price_data': price_data
}
if __name__ == "__main__":
sentiment_analysis_example()
Integration with Trading Strategies
1. Sentiment + Gap & Go
def sentiment_gap_strategy(ticker, gap_threshold=0.03):
"""
Combine sentiment analysis with Gap & Go strategy
"""
# Get data
scraper = NewsScraperFinViz()
sentiment_data = scraper.scrape_news([ticker])
end_date = datetime.now()
start_date = end_date - timedelta(days=30)
price_data = yf.download(ticker, start=start_date, end=end_date)
signals = pd.DataFrame(index=price_data.index)
signals['price'] = price_data['Close']
signals['gap_pct'] = (price_data['Open'] / price_data['Close'].shift(1)) - 1
signals['volume_ratio'] = price_data['Volume'] / price_data['Volume'].rolling(20).mean()
signals['signal'] = 0
# Get previous day sentiment
for i, date in enumerate(price_data.index[1:], 1):
prev_date = price_data.index[i-1]
# Look for previous day sentiment
day_sentiment = sentiment_data[
sentiment_data['date'].dt.date == prev_date.date()
]
if len(day_sentiment) > 0:
avg_sentiment = day_sentiment['vader_compound'].mean()
# Gap up with positive sentiment
if (signals.loc[date, 'gap_pct'] > gap_threshold and
avg_sentiment > 0.1 and
signals.loc[date, 'volume_ratio'] > 2):
signals.loc[date, 'signal'] = 1
# Gap down with very negative sentiment (potential reversal)
elif (signals.loc[date, 'gap_pct'] < -gap_threshold and
avg_sentiment < -0.2 and
signals.loc[date, 'volume_ratio'] > 2):
signals.loc[date, 'signal'] = 1 # Contrarian play
return signals
2. Sentiment + VWAP
def sentiment_vwap_strategy(ticker):
"""
Combine sentiment with VWAP strategy
"""
# Get intraday data if possible
price_data = yf.download(ticker, period="5d", interval="1h")
# Calculate VWAP
price_data['vwap'] = (price_data['Close'] * price_data['Volume']).cumsum() / price_data['Volume'].cumsum()
# Get sentiment
scraper = NewsScraperFinViz()
sentiment_data = scraper.scrape_news([ticker])
# Generate signals
signals = pd.DataFrame(index=price_data.index)
signals['price'] = price_data['Close']
signals['vwap'] = price_data['vwap']
signals['signal'] = 0
# Current day sentiment
current_date = datetime.now().date()
today_sentiment = sentiment_data[
sentiment_data['date'].dt.date == current_date
]
if len(today_sentiment) > 0:
avg_sentiment = today_sentiment['vader_compound'].mean()
for i, date in enumerate(price_data.index):
# Long: price near VWAP + positive sentiment
if (signals.loc[date, 'price'] > signals.loc[date, 'vwap'] * 0.999 and
signals.loc[date, 'price'] < signals.loc[date, 'vwap'] * 1.001 and
avg_sentiment > 0.05):
signals.loc[date, 'signal'] = 1
# Short: price rejected at VWAP + negative sentiment
elif (signals.loc[date, 'price'] < signals.loc[date, 'vwap'] and
avg_sentiment < -0.05):
signals.loc[date, 'signal'] = -1
return signals
Best Practices
1. Sentiment Data Validation
def validate_sentiment_data(sentiment_df):
"""
Validate sentiment data quality
"""
validation_results = {
'total_articles': len(sentiment_df),
'date_range': (sentiment_df['date'].min(), sentiment_df['date'].max()),
'sentiment_distribution': sentiment_df['vader_compound'].describe(),
'missing_data': sentiment_df.isnull().sum(),
'duplicate_headlines': sentiment_df['headline'].duplicated().sum()
}
# Detect potential issues
warnings = []
if validation_results['total_articles'] < 10:
warnings.append("Too few news items for reliable analysis")
if abs(sentiment_df['vader_compound'].mean()) > 0.5:
warnings.append("Extremely biased sentiment")
if validation_results['duplicate_headlines'] > len(sentiment_df) * 0.1:
warnings.append("Many duplicate news items")
validation_results['warnings'] = warnings
return validation_results
2. Temporal Normalization
def normalize_sentiment_by_time(sentiment_df, method='zscore'):
"""
Normalize sentiment by time period
"""
sentiment_df = sentiment_df.copy()
if method == 'zscore':
# Z-score normalization
sentiment_df['normalized_sentiment'] = (
sentiment_df['vader_compound'] - sentiment_df['vader_compound'].mean()
) / sentiment_df['vader_compound'].std()
elif method == 'rolling_zscore':
# Rolling z-score (30-day window)
rolling_mean = sentiment_df['vader_compound'].rolling(30).mean()
rolling_std = sentiment_df['vader_compound'].rolling(30).std()
sentiment_df['normalized_sentiment'] = (
sentiment_df['vader_compound'] - rolling_mean
) / rolling_std
elif method == 'percentile':
# Percentile ranking
sentiment_df['normalized_sentiment'] = sentiment_df['vader_compound'].rank(pct=True)
return sentiment_df
3. Quality Filters
def apply_quality_filters(sentiment_df, min_keywords=1, confidence_threshold=0.5):
"""
Apply quality filters to sentiment data
"""
filtered_df = sentiment_df.copy()
# Filter by financial keywords found
filtered_df = filtered_df[filtered_df['keywords_found'] >= min_keywords]
# Filter very short headlines (probably not informative)
filtered_df = filtered_df[filtered_df['headline'].str.len() > 20]
# Remove exact duplicates
filtered_df = filtered_df.drop_duplicates(subset=['headline'])
# Filter by classification confidence
abs_sentiment = abs(filtered_df['vader_compound'])
filtered_df = filtered_df[abs_sentiment > confidence_threshold * abs_sentiment.std()]
return filtered_df
Limitations and Considerations
1. Limitations of Sentiment Analysis
- Sarcasm and context: Models may not detect sarcasm
- Financial jargon: Sector-specific words may be misinterpreted
- News volume: Small caps may have few news articles
- Timing: The impact of sentiment can be immediate or delayed
2. Implementation Best Practices
SENTIMENT_BEST_PRACTICES = {
'data_quality': {
'min_articles_per_day': 3,
'max_sentiment_abs': 0.8, # Avoid suspiciously extreme sentiments
'min_headline_length': 20,
'duplicate_threshold': 0.1
},
'trading_integration': {
'sentiment_weight': 0.3, # No more than 30% weight in decisions
'confirmation_required': True, # Confirm with technical indicators
'volume_filter': True, # Only trade with confirming volume
'time_decay': 24 # Hours before sentiment loses relevance
},
'risk_management': {
'max_position_sentiment': 0.05, # Maximum 5% of capital in sentiment trades
'stop_loss_tight': True, # Tighter stops for sentiment trades
'sentiment_correlation_limit': 0.7 # Avoid too much correlation with sentiment
}
}
Alternative Data Sources
1. Reddit/Twitter Integration
def reddit_sentiment_analysis(ticker, subreddit='wallstreetbets'):
"""
Placeholder for Reddit sentiment analysis
(Requires Reddit API)
"""
# Implementation requires praw library and API keys
pass
def twitter_sentiment_analysis(ticker):
"""
Placeholder for Twitter sentiment analysis
(Requires Twitter API)
"""
# Implementation requires tweepy library and API keys
pass
2. StockTwits Integration
def stocktwits_sentiment(ticker):
"""
Placeholder for StockTwits sentiment
(Requires StockTwits API)
"""
pass
Next Step
With Sentiment Analysis implemented, let’s continue with Fundamental Analysis to complete the quantitative tools arsenal.