A function to do pairs bootstrap | Python
笔记还没有整理,关键是链接中的内容,可以点击标题查看,关键的实现代码,我摘录过来了。
def draw_bs_pairs_linreg(x, y, size=1):
"""Perform pairs bootstrap for linear regression."""
# Set up array of indices to sample from: inds
inds = np.arange(len(x))
# Initialize replicates: bs_slope_reps, bs_intercept_reps
bs_slope_reps = np.empty(size)
bs_intercept_reps = np.empty(size)
# Generate replicates
for i in range(size):
bs_inds = np.random.choice(inds, size=len(inds))
bs_x, bs_y = x[bs_inds], y[bs_inds]
bs_slope_reps[i], bs_intercept_reps[i] = np.polyfit(bs_x, bs_y, 1)
return bs_slope_reps, bs_intercept_reps
Pairs bootstrap of literacy/fertility data | Python
# Generate replicates of slope and intercept using pairs bootstrap
bs_slope_reps, bs_intercept_reps = draw_bs_pairs_linreg(illiteracy, fertility,size=1000)
# Compute and print 95% CI for slope
print(np.percentile(bs_slope_reps, [2.5,97.5]))
# Plot the histogram
_ = plt.hist(bs_slope_reps, bins=50, normed=True)
_ = plt.xlabel('slope')
_ = plt.ylabel('PDF')
plt.show()
理解了,为什么\(\beta\)理解起来是一个distribution。 实际上,bootstrap的过程, 使得\(\beta\)出来了,并且\(\beta\)作为一个statistics, 当bootstrap的过程无穷大时,近似 是满足正态分布或者t分布的,因此可以直接用t分布的参数解决问题。