Análisis de Sentimiento en Mercados Financieros
Introducción
El análisis de sentimiento cuantifica las emociones y opiniones expresadas en texto, proporcionando una dimensión adicional para el análisis de mercados. En trading, el sentimiento puede anticipar movimientos de precios antes de que se reflejen en datos técnicos tradicionales.
Conceptos Fundamentales
¿Por Qué Funciona el Análisis de Sentimiento?
Impacto Psicológico en Mercados:
- Las noticias influyen directamente en las decisiones de inversión
- El sentimiento retail puede crear momentum en small caps
- Las redes sociales amplifican el impacto del sentimiento
- Los algoritmos institucionales ahora incorporan datos de sentimiento
Fuentes de Datos de Sentimiento:
- Noticias financieras (Bloomberg, Reuters, FinViz)
- Redes sociales (Twitter, Reddit, StockTwits)
- Informes de analistas
- Transcripciones de earnings calls
- Foros de inversión especializados
Implementación con VADER Sentiment
VADER (Valence Aware Dictionary and sEntiment Reasoner) está específicamente diseñado para analizar sentimiento en textos de redes sociales y noticias.
Framework Base de Análisis
import pandas as pd
import numpy as np
import requests
from bs4 import BeautifulSoup
import nltk
import string
from datetime import datetime, timedelta
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
from fake_useragent import UserAgent
import matplotlib.pyplot as plt
import seaborn as sns
from textblob import TextBlob
import yfinance as yf
from warnings import filterwarnings
filterwarnings("ignore")
# Descargar recursos necesarios
nltk.download('punkt', quiet=True)
nltk.download('wordnet', quiet=True)
nltk.download('stopwords', quiet=True)
class SentimentAnalyzer:
"""
Analizador de sentimiento para mercados financieros
"""
def __init__(self):
self.vader = SentimentIntensityAnalyzer()
self.stop_words = set(nltk.corpus.stopwords.words('english'))
self.lemmatizer = nltk.stem.WordNetLemmatizer()
self.stemmer = nltk.stem.PorterStemmer()
# Palabras específicas de mercados financieros
self.financial_keywords = {
'bullish': ['bull', 'bullish', 'rally', 'moon', 'rocket', 'pump', 'surge', 'soar'],
'bearish': ['bear', 'bearish', 'crash', 'dump', 'plunge', 'tank', 'drop', 'fall'],
'neutral': ['hold', 'sideways', 'flat', 'consolidate', 'range']
}
def preprocess_text(self, text, advanced=True):
"""
Preprocesar texto para análisis de sentimiento
Parámetros
----------
text : str
Texto original
advanced : bool
Si aplicar preprocesamiento avanzado
Returns
-------
str
Texto procesado
"""
if not advanced:
return text.lower().strip()
# 1. Tokenización
tokens = nltk.tokenize.word_tokenize(text.lower())
# 2. Lematización (convertir a forma base)
lemmatized_tokens = [self.lemmatizer.lemmatize(token) for token in tokens]
# 3. Stemming (reducir a raíz)
stemmed_tokens = [self.stemmer.stem(token) for token in lemmatized_tokens]
# 4. Eliminar stop words
filtered_tokens = [token for token in stemmed_tokens if token not in self.stop_words]
# 5. Normalización (eliminar puntuación)
normalized_tokens = [token for token in filtered_tokens if token not in string.punctuation]
# 6. Reunir texto procesado
processed_text = " ".join(normalized_tokens)
return processed_text
def analyze_sentiment(self, text, method='vader'):
"""
Analizar sentimiento de un texto
Parámetros
----------
text : str
Texto a analizar
method : str
Método de análisis ('vader', 'textblob', 'both')
Returns
-------
dict
Scores de sentimiento
"""
results = {}
if method in ['vader', 'both']:
# VADER Analysis
vader_scores = self.vader.polarity_scores(text)
results['vader'] = {
'compound': vader_scores['compound'],
'positive': vader_scores['pos'],
'negative': vader_scores['neg'],
'neutral': vader_scores['neu'],
'classification': 'positive' if vader_scores['compound'] >= 0.05 else 'negative' if vader_scores['compound'] <= -0.05 else 'neutral'
}
if method in ['textblob', 'both']:
# TextBlob Analysis
blob = TextBlob(text)
results['textblob'] = {
'polarity': blob.sentiment.polarity,
'subjectivity': blob.sentiment.subjectivity,
'classification': 'positive' if blob.sentiment.polarity > 0.1 else 'negative' if blob.sentiment.polarity < -0.1 else 'neutral'
}
# Financial keyword analysis
results['financial_sentiment'] = self.analyze_financial_keywords(text)
return results
def analyze_financial_keywords(self, text):
"""
Analizar palabras clave específicas de mercados financieros
"""
text_lower = text.lower()
bullish_count = sum(1 for word in self.financial_keywords['bullish'] if word in text_lower)
bearish_count = sum(1 for word in self.financial_keywords['bearish'] if word in text_lower)
neutral_count = sum(1 for word in self.financial_keywords['neutral'] if word in text_lower)
total_keywords = bullish_count + bearish_count + neutral_count
if total_keywords == 0:
return {'score': 0, 'classification': 'neutral', 'keywords_found': 0}
# Score basado en proporción de palabras bullish vs bearish
score = (bullish_count - bearish_count) / total_keywords
if score > 0.2:
classification = 'bullish'
elif score < -0.2:
classification = 'bearish'
else:
classification = 'neutral'
return {
'score': score,
'classification': classification,
'keywords_found': total_keywords,
'bullish_words': bullish_count,
'bearish_words': bearish_count
}
def batch_analyze(self, texts, preprocess=True):
"""
Analizar sentimiento de múltiples textos
"""
results = []
for text in texts:
if preprocess:
processed_text = self.preprocess_text(text, advanced=True)
raw_sentiment = self.analyze_sentiment(text, method='both')
processed_sentiment = self.analyze_sentiment(processed_text, method='both')
results.append({
'original_text': text,
'processed_text': processed_text,
'raw_sentiment': raw_sentiment,
'processed_sentiment': processed_sentiment
})
else:
sentiment = self.analyze_sentiment(text, method='both')
results.append({
'text': text,
'sentiment': sentiment
})
return results
class NewsScraperFinViz:
"""
Scraper de noticias de FinViz para análisis de sentimiento
"""
def __init__(self):
self.base_url = "https://finviz.com/quote.ashx?t={}&p=d"
self.sentiment_analyzer = SentimentAnalyzer()
def scrape_news(self, tickers, max_retries=3):
"""
Extraer noticias de FinViz para múltiples tickers
Parámetros
----------
tickers : list
Lista de símbolos de acciones
max_retries : int
Número máximo de intentos por ticker
Returns
-------
pd.DataFrame
DataFrame con noticias y sentimiento
"""
news_data = []
for ticker in tickers:
print(f"Scraping news for {ticker}...")
for attempt in range(max_retries):
try:
# User agent aleatorio para evitar bloqueos
ua = UserAgent()
headers = {"User-Agent": str(ua.chrome)}
# Realizar petición
response = requests.get(
self.base_url.format(ticker),
headers=headers,
timeout=10
)
response.raise_for_status()
# Parsear HTML
soup = BeautifulSoup(response.content, "html.parser")
news_table = soup.find(id="news-table")
if news_table is None:
print(f"No news table found for {ticker}")
break
# Extraer noticias individuales
news_rows = news_table.findAll("tr")
for row in news_rows:
try:
# Extraer titular
news_link = row.find("a", class_="tab-link-news")
if news_link is None:
continue
headline = news_link.text.strip()
# Extraer fecha y hora
time_data = row.find("td").text.replace("\\n", "").strip().split()
if len(time_data) == 2:
date_str = time_data[0]
time_str = time_data[1]
# Manejar "Today"
if date_str.lower() == "today":
date_str = datetime.now().strftime("%b-%d-%y")
elif len(time_data) == 1:
# Solo hora, usar fecha actual
time_str = time_data[0]
date_str = datetime.now().strftime("%b-%d-%y")
else:
continue
# Convertir fecha
try:
news_date = datetime.strptime(date_str, "%b-%d-%y")
except:
news_date = datetime.now()
# Analizar sentimiento
sentiment_result = self.sentiment_analyzer.analyze_sentiment(headline, method='both')
news_data.append({
'ticker': ticker,
'date': news_date,
'time': time_str,
'headline': headline,
'vader_compound': sentiment_result['vader']['compound'],
'vader_classification': sentiment_result['vader']['classification'],
'textblob_polarity': sentiment_result['textblob']['polarity'],
'financial_sentiment': sentiment_result['financial_sentiment']['score'],
'financial_classification': sentiment_result['financial_sentiment']['classification'],
'keywords_found': sentiment_result['financial_sentiment']['keywords_found']
})
except Exception as e:
print(f"Error processing news row for {ticker}: {e}")
continue
break # Éxito, salir del loop de reintentos
except Exception as e:
print(f"Attempt {attempt + 1} failed for {ticker}: {e}")
if attempt == max_retries - 1:
print(f"Failed to scrape {ticker} after {max_retries} attempts")
# Convertir a DataFrame
if news_data:
df = pd.DataFrame(news_data)
df['date'] = pd.to_datetime(df['date'])
return df
else:
return pd.DataFrame()
def sentiment_trading_strategy(price_data, sentiment_data,
sentiment_threshold=0.1,
lookback_days=3):
"""
Estrategia de trading basada en análisis de sentimiento
Parámetros
----------
price_data : pd.DataFrame
Datos de precios históricos
sentiment_data : pd.DataFrame
Datos de sentimiento con fechas
sentiment_threshold : float
Umbral para generar señales
lookback_days : int
Días hacia atrás para agregar sentimiento
"""
# Agregar sentimiento por día
daily_sentiment = sentiment_data.groupby('date').agg({
'vader_compound': 'mean',
'financial_sentiment': 'mean',
'keywords_found': 'sum'
}).reset_index()
# Crear señales de trading
signals = pd.DataFrame(index=price_data.index)
signals['price'] = price_data['Close']
signals['signal'] = 0
signals['sentiment_score'] = np.nan
signals['confidence'] = 0
for i, date in enumerate(price_data.index):
# Buscar sentimiento en los últimos N días
start_date = date - timedelta(days=lookback_days)
end_date = date
period_sentiment = daily_sentiment[
(daily_sentiment['date'] >= start_date) &
(daily_sentiment['date'] <= end_date)
]
if len(period_sentiment) > 0:
# Calcular score promedio ponderado (más peso a días recientes)
weights = np.linspace(0.5, 1.0, len(period_sentiment))
avg_vader = np.average(period_sentiment['vader_compound'], weights=weights)
avg_financial = np.average(period_sentiment['financial_sentiment'], weights=weights)
total_keywords = period_sentiment['keywords_found'].sum()
# Score combinado
combined_score = (avg_vader * 0.6 + avg_financial * 0.4)
# Ajustar por cantidad de noticias (más noticias = más confianza)
confidence = min(total_keywords / 10.0, 1.0) # Normalizar a 0-1
signals.loc[date, 'sentiment_score'] = combined_score
signals.loc[date, 'confidence'] = confidence
# Generar señales solo con confianza mínima
if confidence > 0.3:
if combined_score > sentiment_threshold:
signals.loc[date, 'signal'] = 1 # Comprar
elif combined_score < -sentiment_threshold:
signals.loc[date, 'signal'] = -1 # Vender
return signals
def analyze_sentiment_correlation(price_data, sentiment_data, ticker):
"""
Analizar correlación entre sentimiento y movimientos de precios
"""
# Preparar datos diarios
daily_sentiment = sentiment_data.groupby('date').agg({
'vader_compound': 'mean',
'financial_sentiment': 'mean',
'keywords_found': 'count'
}).reset_index()
# Agregar retornos de precios
price_returns = price_data['Close'].pct_change()
daily_data = pd.DataFrame({
'date': price_data.index,
'return': price_returns.values,
'price': price_data['Close'].values
})
# Combinar datos
combined_data = daily_data.merge(daily_sentiment, on='date', how='inner')
if len(combined_data) == 0:
return {'error': 'No matching dates between price and sentiment data'}
# Calcular correlaciones
correlations = {
'vader_sentiment_correlation': combined_data['vader_compound'].corr(combined_data['return']),
'financial_sentiment_correlation': combined_data['financial_sentiment'].corr(combined_data['return']),
'news_volume_correlation': combined_data['keywords_found'].corr(abs(combined_data['return'])),
}
# Análisis de lead/lag
lead_lag_analysis = {}
for lag in range(-3, 4): # -3 a +3 días
if lag == 0:
continue
if lag > 0:
# Sentimiento predice retornos futuros
shifted_returns = combined_data['return'].shift(-lag)
lead_lag_analysis[f'sentiment_leads_{lag}d'] = combined_data['vader_compound'].corr(shifted_returns)
else:
# Retornos predicen sentimiento futuro
shifted_sentiment = combined_data['vader_compound'].shift(lag)
lead_lag_analysis[f'price_leads_{abs(lag)}d'] = combined_data['return'].corr(shifted_sentiment)
return {
'correlations': correlations,
'lead_lag_analysis': lead_lag_analysis,
'data_points': len(combined_data),
'date_range': f"{combined_data['date'].min()} to {combined_data['date'].max()}"
}
def create_sentiment_dashboard(tickers, sentiment_data):
"""
Crear dashboard visual de análisis de sentimiento
"""
# Configurar subplot
fig, axes = plt.subplots(2, 2, figsize=(16, 12))
# 1. Sentiment Score por Ticker
daily_sentiment = sentiment_data.groupby(['ticker', 'date']).agg({
'vader_compound': 'mean',
'financial_sentiment': 'mean'
}).reset_index()
for ticker in tickers:
ticker_data = daily_sentiment[daily_sentiment['ticker'] == ticker]
axes[0, 0].plot(ticker_data['date'], ticker_data['vader_compound'], label=ticker, marker='o')
axes[0, 0].set_title('VADER Sentiment Score Over Time')
axes[0, 0].set_xlabel('Date')
axes[0, 0].set_ylabel('Sentiment Score')
axes[0, 0].legend()
axes[0, 0].grid(True, alpha=0.3)
axes[0, 0].axhline(y=0, color='black', linestyle='--', alpha=0.5)
# 2. Distribución de Sentimiento
axes[0, 1].hist(sentiment_data['vader_compound'], bins=30, alpha=0.7, edgecolor='black')
axes[0, 1].set_title('Distribution of Sentiment Scores')
axes[0, 1].set_xlabel('VADER Compound Score')
axes[0, 1].set_ylabel('Frequency')
axes[0, 1].axvline(x=0, color='red', linestyle='--', alpha=0.7)
axes[0, 1].grid(True, alpha=0.3)
# 3. Sentimiento por Ticker (Box plot)
sentiment_by_ticker = [sentiment_data[sentiment_data['ticker'] == ticker]['vader_compound']
for ticker in tickers]
axes[1, 0].boxplot(sentiment_by_ticker, labels=tickers)
axes[1, 0].set_title('Sentiment Distribution by Ticker')
axes[1, 0].set_ylabel('VADER Score')
axes[1, 0].grid(True, alpha=0.3)
axes[1, 0].axhline(y=0, color='red', linestyle='--', alpha=0.7)
# 4. Keywords encontradas por día
keywords_by_date = sentiment_data.groupby('date')['keywords_found'].sum()
axes[1, 1].plot(keywords_by_date.index, keywords_by_date.values, color='purple', linewidth=2)
axes[1, 1].set_title('Financial Keywords Found Over Time')
axes[1, 1].set_xlabel('Date')
axes[1, 1].set_ylabel('Keywords Count')
axes[1, 1].grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
return fig
# Ejemplo de uso completo
def sentiment_analysis_example():
"""
Ejemplo completo de análisis de sentimiento para trading
"""
# Tickers para analizar
tickers = ["AAPL", "TSLA", "NVDA", "AMZN"]
print("=== ANÁLISIS DE SENTIMIENTO FINANCIERO ===\\n")
# 1. Scraping de noticias
print("📰 Extrayendo noticias...")
scraper = NewsScraperFinViz()
news_data = scraper.scrape_news(tickers)
if news_data.empty:
print("❌ No se pudieron extraer noticias")
return
print(f"✅ Extraídas {len(news_data)} noticias")
# 2. Análisis estadístico
print(f"\\n📊 ESTADÍSTICAS GENERALES:")
for ticker in tickers:
ticker_news = news_data[news_data['ticker'] == ticker]
if len(ticker_news) > 0:
avg_sentiment = ticker_news['vader_compound'].mean()
total_news = len(ticker_news)
positive_news = (ticker_news['vader_compound'] > 0.05).sum()
negative_news = (ticker_news['vader_compound'] < -0.05).sum()
print(f" {ticker}:")
print(f" Total Noticias: {total_news}")
print(f" Sentimiento Promedio: {avg_sentiment:.3f}")
print(f" Noticias Positivas: {positive_news} ({positive_news/total_news:.1%})")
print(f" Noticias Negativas: {negative_news} ({negative_news/total_news:.1%})")
# 3. Análisis de correlación con precios
print(f"\\n🔍 ANÁLISIS DE CORRELACIÓN:")
for ticker in tickers:
try:
# Obtener datos de precios
end_date = datetime.now()
start_date = end_date - timedelta(days=30)
price_data = yf.download(ticker, start=start_date, end=end_date, interval="1d")
ticker_sentiment = news_data[news_data['ticker'] == ticker]
if len(ticker_sentiment) > 0 and len(price_data) > 0:
correlation_analysis = analyze_sentiment_correlation(price_data, ticker_sentiment, ticker)
if 'error' not in correlation_analysis:
print(f" {ticker}:")
print(f" Correlación Sentimiento-Retorno: {correlation_analysis['correlations']['vader_sentiment_correlation']:.3f}")
print(f" Puntos de Datos: {correlation_analysis['data_points']}")
except Exception as e:
print(f" {ticker}: Error en análisis - {e}")
# 4. Generar estrategia de ejemplo
print(f"\\n📈 EJEMPLO DE ESTRATEGIA:")
ticker = "AAPL" # Usar Apple como ejemplo
try:
price_data = yf.download(ticker, start=start_date, end=end_date, interval="1d")
ticker_sentiment = news_data[news_data['ticker'] == ticker]
if len(ticker_sentiment) > 0:
strategy_signals = sentiment_trading_strategy(price_data, ticker_sentiment)
total_signals = strategy_signals['signal'].abs().sum()
buy_signals = (strategy_signals['signal'] == 1).sum()
sell_signals = (strategy_signals['signal'] == -1).sum()
avg_confidence = strategy_signals[strategy_signals['confidence'] > 0]['confidence'].mean()
print(f" Ticker: {ticker}")
print(f" Total Señales: {total_signals}")
print(f" Señales de Compra: {buy_signals}")
print(f" Señales de Venta: {sell_signals}")
print(f" Confianza Promedio: {avg_confidence:.1%}")
except Exception as e:
print(f" Error generando estrategia: {e}")
# 5. Crear visualización
print(f"\\n📊 Generando dashboard...")
try:
create_sentiment_dashboard(tickers, news_data)
except Exception as e:
print(f"Error creando dashboard: {e}")
return news_data
# Análisis de sentimiento para small caps
def small_cap_sentiment_strategy(ticker, sentiment_threshold=0.15):
"""
Estrategia específica de sentimiento para small caps
"""
# Small caps son más sensibles al sentimiento
scraper = NewsScraperFinViz()
sentiment_data = scraper.scrape_news([ticker])
if sentiment_data.empty:
return {'error': 'No sentiment data available'}
# Obtener datos de precio
end_date = datetime.now()
start_date = end_date - timedelta(days=30)
price_data = yf.download(ticker, start=start_date, end=end_date)
# Parámetros ajustados para small caps
signals = sentiment_trading_strategy(
price_data,
sentiment_data,
sentiment_threshold=sentiment_threshold, # Umbral más alto
lookback_days=1 # Reacción más rápida
)
# Agregar filtros específicos para small caps
signals['volume_filter'] = price_data['Volume'] > price_data['Volume'].rolling(20).mean()
signals['volatility_filter'] = price_data['Close'].pct_change().rolling(5).std() > 0.02
# Solo generar señales cuando hay volumen y volatilidad
signals['final_signal'] = np.where(
signals['volume_filter'] & signals['volatility_filter'],
signals['signal'],
0
)
return {
'signals': signals,
'sentiment_data': sentiment_data,
'price_data': price_data
}
if __name__ == "__main__":
sentiment_analysis_example()
Integración con Estrategias de Trading
1. Sentimiento + Gap & Go
def sentiment_gap_strategy(ticker, gap_threshold=0.03):
"""
Combinar análisis de sentimiento con estrategia Gap & Go
"""
# Obtener datos
scraper = NewsScraperFinViz()
sentiment_data = scraper.scrape_news([ticker])
end_date = datetime.now()
start_date = end_date - timedelta(days=30)
price_data = yf.download(ticker, start=start_date, end=end_date)
signals = pd.DataFrame(index=price_data.index)
signals['price'] = price_data['Close']
signals['gap_pct'] = (price_data['Open'] / price_data['Close'].shift(1)) - 1
signals['volume_ratio'] = price_data['Volume'] / price_data['Volume'].rolling(20).mean()
signals['signal'] = 0
# Obtener sentimiento del día anterior
for i, date in enumerate(price_data.index[1:], 1):
prev_date = price_data.index[i-1]
# Buscar sentimiento del día anterior
day_sentiment = sentiment_data[
sentiment_data['date'].dt.date == prev_date.date()
]
if len(day_sentiment) > 0:
avg_sentiment = day_sentiment['vader_compound'].mean()
# Gap up con sentimiento positivo
if (signals.loc[date, 'gap_pct'] > gap_threshold and
avg_sentiment > 0.1 and
signals.loc[date, 'volume_ratio'] > 2):
signals.loc[date, 'signal'] = 1
# Gap down con sentimiento muy negativo (potencial reversal)
elif (signals.loc[date, 'gap_pct'] < -gap_threshold and
avg_sentiment < -0.2 and
signals.loc[date, 'volume_ratio'] > 2):
signals.loc[date, 'signal'] = 1 # Contrarian play
return signals
2. Sentimiento + VWAP
def sentiment_vwap_strategy(ticker):
"""
Combinar sentimiento con estrategia VWAP
"""
# Obtener datos intraday si es posible
price_data = yf.download(ticker, period="5d", interval="1h")
# Calcular VWAP
price_data['vwap'] = (price_data['Close'] * price_data['Volume']).cumsum() / price_data['Volume'].cumsum()
# Obtener sentimiento
scraper = NewsScraperFinViz()
sentiment_data = scraper.scrape_news([ticker])
# Generar señales
signals = pd.DataFrame(index=price_data.index)
signals['price'] = price_data['Close']
signals['vwap'] = price_data['vwap']
signals['signal'] = 0
# Sentimiento del día actual
current_date = datetime.now().date()
today_sentiment = sentiment_data[
sentiment_data['date'].dt.date == current_date
]
if len(today_sentiment) > 0:
avg_sentiment = today_sentiment['vader_compound'].mean()
for i, date in enumerate(price_data.index):
# Long: precio cerca de VWAP + sentimiento positivo
if (signals.loc[date, 'price'] > signals.loc[date, 'vwap'] * 0.999 and
signals.loc[date, 'price'] < signals.loc[date, 'vwap'] * 1.001 and
avg_sentiment > 0.05):
signals.loc[date, 'signal'] = 1
# Short: precio rechaza VWAP + sentimiento negativo
elif (signals.loc[date, 'price'] < signals.loc[date, 'vwap'] and
avg_sentiment < -0.05):
signals.loc[date, 'signal'] = -1
return signals
Mejores Prácticas
1. Validación de Datos de Sentimiento
def validate_sentiment_data(sentiment_df):
"""
Validar calidad de datos de sentimiento
"""
validation_results = {
'total_articles': len(sentiment_df),
'date_range': (sentiment_df['date'].min(), sentiment_df['date'].max()),
'sentiment_distribution': sentiment_df['vader_compound'].describe(),
'missing_data': sentiment_df.isnull().sum(),
'duplicate_headlines': sentiment_df['headline'].duplicated().sum()
}
# Detectar posibles problemas
warnings = []
if validation_results['total_articles'] < 10:
warnings.append("Muy pocas noticias para análisis confiable")
if abs(sentiment_df['vader_compound'].mean()) > 0.5:
warnings.append("Sentimiento extremadamente sesgado")
if validation_results['duplicate_headlines'] > len(sentiment_df) * 0.1:
warnings.append("Muchas noticias duplicadas")
validation_results['warnings'] = warnings
return validation_results
2. Normalización Temporal
def normalize_sentiment_by_time(sentiment_df, method='zscore'):
"""
Normalizar sentimiento por período de tiempo
"""
sentiment_df = sentiment_df.copy()
if method == 'zscore':
# Z-score normalization
sentiment_df['normalized_sentiment'] = (
sentiment_df['vader_compound'] - sentiment_df['vader_compound'].mean()
) / sentiment_df['vader_compound'].std()
elif method == 'rolling_zscore':
# Rolling z-score (ventana de 30 días)
rolling_mean = sentiment_df['vader_compound'].rolling(30).mean()
rolling_std = sentiment_df['vader_compound'].rolling(30).std()
sentiment_df['normalized_sentiment'] = (
sentiment_df['vader_compound'] - rolling_mean
) / rolling_std
elif method == 'percentile':
# Percentile ranking
sentiment_df['normalized_sentiment'] = sentiment_df['vader_compound'].rank(pct=True)
return sentiment_df
3. Filtros de Calidad
def apply_quality_filters(sentiment_df, min_keywords=1, confidence_threshold=0.5):
"""
Aplicar filtros de calidad a datos de sentimiento
"""
filtered_df = sentiment_df.copy()
# Filtrar por keywords financieras encontradas
filtered_df = filtered_df[filtered_df['keywords_found'] >= min_keywords]
# Filtrar headlines muy cortas (probablemente no informativas)
filtered_df = filtered_df[filtered_df['headline'].str.len() > 20]
# Remover duplicados exactos
filtered_df = filtered_df.drop_duplicates(subset=['headline'])
# Filtrar por confianza en clasificación
abs_sentiment = abs(filtered_df['vader_compound'])
filtered_df = filtered_df[abs_sentiment > confidence_threshold * abs_sentiment.std()]
return filtered_df
Limitaciones y Consideraciones
1. Limitaciones del Análisis de Sentimiento
- Sarcasmo y contexto: Los modelos pueden no detectar sarcasmo
- Jerga financiera: Palabras específicas del sector pueden ser malinterpretadas
- Volumen de noticias: Small caps pueden tener pocas noticias
- Timing: El impacto del sentimiento puede ser inmediato o retrasado
2. Mejores Prácticas de Implementación
SENTIMENT_BEST_PRACTICES = {
'data_quality': {
'min_articles_per_day': 3,
'max_sentiment_abs': 0.8, # Evitar sentimientos extremos sospechosos
'min_headline_length': 20,
'duplicate_threshold': 0.1
},
'trading_integration': {
'sentiment_weight': 0.3, # No más del 30% del peso en decisiones
'confirmation_required': True, # Confirmar con indicadores técnicos
'volume_filter': True, # Solo operar con volumen confirmatorio
'time_decay': 24 # Horas antes de que el sentimiento pierda relevancia
},
'risk_management': {
'max_position_sentiment': 0.05, # Máximo 5% del capital en trades sentimiento
'stop_loss_tight': True, # Stops más ajustados para trades sentimiento
'sentiment_correlation_limit': 0.7 # Evitar demasiada correlación con sentimiento
}
}
Fuentes de Datos Alternativas
1. Integración con Reddit/Twitter
def reddit_sentiment_analysis(ticker, subreddit='wallstreetbets'):
"""
Placeholder para análisis de sentimiento de Reddit
(Requiere API de Reddit)
"""
# Implementación requiere praw library y API keys
pass
def twitter_sentiment_analysis(ticker):
"""
Placeholder para análisis de sentimiento de Twitter
(Requiere Twitter API)
"""
# Implementación requiere tweepy library y API keys
pass
2. StockTwits Integration
def stocktwits_sentiment(ticker):
"""
Placeholder para StockTwits sentiment
(Requiere StockTwits API)
"""
pass
Siguiente Paso
Con Análisis de Sentimiento implementado, continuemos con Análisis Fundamental para completar el arsenal de herramientas cuantitativas.