多因子模型是量化选股的基础框架。这篇从 Fama-French 三因子讲起,介绍因子计算、IC 值分析和一个 Python 多因子选股的实现框架。
Fama-French 三因子模型
经典的 CAPM 只用市场因子解释收益,Fama-French 在此基础上加了两个因子:
- 市场因子 (MKT) — 市场整体超额收益,Rm - Rf
- 规模因子 (SMB) — Small Minus Big,小盘股组合收益减大盘股组合收益
- 价值因子 (HML) — High Minus Low,高 Book-to-Market 组合减低 B/M 组合
回归方程:
Ri - Rf = α + β1(Rm - Rf) + β2·SMB + β3·HML + ε
α 代表策略的超额收益能力。如果一个策略的 α 显著为正,说明它确实有选股能力而不只是暴露在某个因子上。
用 Python 做因子回归很简单:
import pandas as pd
import statsmodels.api as sm
def fama_french_regression(stock_returns, factor_data):
'''
stock_returns: Series, 个股/组合超额收益
factor_data: DataFrame, 包含 MKT, SMB, HML 三列
'''
X = factor_data[['MKT', 'SMB', 'HML']]
X = sm.add_constant(X)
y = stock_returns
model = sm.OLS(y, X).fit()
print(f"Alpha: {model.params['const']:.6f} "
f"(t={model.tvalues['const']:.2f}, p={model.pvalues['const']:.4f})")
print(f"MKT beta: {model.params['MKT']:.4f}")
print(f"SMB beta: {model.params['SMB']:.4f}")
print(f"HML beta: {model.params['HML']:.4f}")
print(f"R-squared: {model.rsquared:.4f}")
return model
常用选股因子
在多因子选股中,我们需要定义一系列因子来量化股票特征。常用的因子类别:
估值因子
def calc_valuation_factors(df):
'''
df: 包含财务数据的 DataFrame
'''
factors = pd.DataFrame(index=df.index)
# EP: 市盈率倒数,越大表示越便宜
factors['EP'] = df['net_profit_ttm'] / df['market_cap']
# BP: 市净率倒数
factors['BP'] = df['book_value'] / df['market_cap']
# SP: 市销率倒数
factors['SP'] = df['revenue_ttm'] / df['market_cap']
# DP: 股息率
factors['DP'] = df['dividend_ttm'] / df['market_cap']
return factors
质量因子
def calc_quality_factors(df):
factors = pd.DataFrame(index=df.index)
# ROE
factors['ROE'] = df['net_profit_ttm'] / df['book_value']
# ROA
factors['ROA'] = df['net_profit_ttm'] / df['total_assets']
# 毛利率
factors['GPM'] = df['gross_profit_ttm'] / df['revenue_ttm']
# 资产负债率(取负,低负债为好)
factors['LEV'] = -df['total_debt'] / df['total_assets']
return factors
动量因子
def calc_momentum_factors(price_df, periods=[20, 60, 120]):
'''
price_df: 日收盘价 DataFrame, columns=股票代码, index=日期
'''
factors = {}
for p in periods:
mom = price_df.pct_change(p)
factors[f'MOM_{p}D'] = mom.iloc[-1] # 最新一期的动量
return pd.DataFrame(factors)
因子 IC 值分析
IC (Information Coefficient) 衡量因子对未来收益的预测能力。具体做法是计算因子值和下一期收益的截面相关系数:
def calc_factor_ic(factor_series, forward_returns, method='rank'):
'''
factor_series: 每期的因子值截面 (dict: date -> Series)
forward_returns: 下一期收益截面 (dict: date -> Series)
method: 'rank' 用 Spearman 秩相关(更稳健),'normal' 用 Pearson
'''
ic_values = []
dates = sorted(set(factor_series.keys()) & set(forward_returns.keys()))
for dt in dates:
f = factor_series[dt].dropna()
r = forward_returns[dt].dropna()
common = f.index.intersection(r.index)
if len(common) < 30:
continue
if method == 'rank':
corr = f[common].rank().corr(r[common].rank())
else:
corr = f[common].corr(r[common])
ic_values.append({'date': dt, 'IC': corr})
ic_df = pd.DataFrame(ic_values).set_index('date')
# IC 统计
ic_mean = ic_df['IC'].mean()
ic_std = ic_df['IC'].std()
icir = ic_mean / ic_std if ic_std > 0 else 0
ic_positive_ratio = (ic_df['IC'] > 0).mean()
print(f"IC Mean: {ic_mean:.4f}")
print(f"IC Std: {ic_std:.4f}")
print(f"ICIR: {icir:.4f}")
print(f"IC > 0: {ic_positive_ratio:.2%}")
return ic_df
IC 均值的绝对值一般在 0.03 以上就算有效因子,ICIR 大于 0.5 认为比较好。
多因子选股框架
把上面的因子计算、IC 分析整合成一个选股框架:
class MultiFactorStrategy:
def __init__(self):
self.factors = {} # name -> factor_func
self.weights = {} # name -> weight
self.ic_history = {} # name -> IC Series
def add_factor(self, name, func, weight=1.0):
self.factors[name] = func
self.weights[name] = weight
def calc_composite_score(self, stock_data, date):
'''计算综合因子得分'''
scores = pd.DataFrame()
for name, func in self.factors.items():
raw = func(stock_data)
# 截面标准化 (z-score)
standardized = (raw - raw.mean()) / raw.std()
# 去极值 (MAD 法)
median = standardized.median()
mad = (standardized - median).abs().median()
upper = median + 3 * 1.4826 * mad
lower = median - 3 * 1.4826 * mad
standardized = standardized.clip(lower, upper)
scores[name] = standardized * self.weights[name]
# 加权求和
composite = scores.sum(axis=1)
return composite
def select_stocks(self, stock_data, date, top_n=50):
'''选出得分最高的 top_n 只股票'''
scores = self.calc_composite_score(stock_data, date)
scores = scores.dropna()
selected = scores.nlargest(top_n)
return selected.index.tolist()
# 使用示例
strategy = MultiFactorStrategy()
strategy.add_factor('EP', lambda df: df['net_profit_ttm'] / df['market_cap'], weight=0.3)
strategy.add_factor('ROE', lambda df: df['net_profit_ttm'] / df['book_value'], weight=0.3)
strategy.add_factor('MOM_60D', lambda df: df['close'].pct_change(60), weight=0.2)
strategy.add_factor('BP', lambda df: df['book_value'] / df['market_cap'], weight=0.2)
因子数据预处理中,去极值和标准化很关键。上面代码用的 MAD (Median Absolute Deviation) 方法比简单的 3-sigma 更稳健,不容易被极端值影响。
实际投资中还需要考虑换手率约束、行业中性化、风险模型等,但基本框架就是这些。多因子模型的优势在于可解释性强,每个因子的贡献都是透明的。