NMF矩阵变换的理解 - Jiaxiang Li's Blog

来源于 Unsupervised Learning in Python 学习笔记 - A Hugo website ，要改一起改。

Non-negative matrix factorization (NMF) | Python

interpretable 真的PCA，看不懂啊。完全是梦幻走步，瞎给idea啊。但是，NMF必须都要非负啊，啊啊啊啊。

最后的NMF特征，可以看成是原特征的线形相加。

$\tilde{x} = β_{1} x_{1} + β_{2} x_{2} + b e t a_{3} x_{3}$ 其中， $x, x_{1}, x_{2}, x_{3}$ 都是向量。

fit() / transform()都可以用。

NumPy arrays and with csr_matrix都可以用。

NMF(n_components=2)设定好，和主成分分析很像。

from sklearn.decomposition import NMF引入。

model = NMF(n_components=2)设定好特征变量数量。

model.fit(samples)跑模型。

nmf_features = model.transform(samples)矩阵转置成功。

$N M F 特征变量行数 = 原样本行数$

但是我还是没弄懂怎么个相乘方法。

NMF applied to Wikipedia articles | Python

# Import NMF
from sklearn.decomposition import NMF

# Create an NMF instance: model
model = NMF(n_components = 6)

# Fit the model to articles
model.fit(articles)

# Transform the articles: nmf_features
nmf_features = model.transform(articles)

# Print the NMF features
print(nmf_features)

NMF features of the Wikipedia articles | Python

# Import pandas
import pandas as pd

# Create a pandas DataFrame: df
df = pd.DataFrame(nmf_features, index=titles)

# Print the row for 'Anne Hathaway'
print(df.loc['Anne Hathaway'])

# Print the row for 'Denzel Washington'
print(df.loc['Denzel Washington'])

In [2]: print(df.loc['Denzel Washington'])
0    0.000000
1    0.005601
2    0.000000
3    0.422380
4    0.000000
5    0.000000
Name: Denzel Washington, dtype: float64

In [3]: print(df.loc['Denzel Washington'])
0    0.000000
1    0.005601
2    0.000000
3    0.422380
4    0.000000
5    0.000000
Name: Denzel Washington, dtype: float64

第三个特征非常屌啊。

终于看懂了。哈哈。

$S_{m \times n} = F_{m \times \hat{n}} \times C_{\hat{n} \times n}$

其中原样本是 $S$ ，二维矩阵， $m$ 表示ID数， $n$ 表示特征变量数，原样本是 $F$ ，二维矩阵， $m$ 表示ID数， $\hat{n}$ 表示NMF特征变量数，这是降维、变密集的结果，其中 $\hat{n} \leq n$ ，原样本是 $C$ ，二维矩阵， $\hat{n}$ 表示NMF特征变量数， $n$ 表示特征变量数。

其中 $S_{m \times n}$ 是高维度的、容易过拟合的，且稀疏、容易不显著的。 $F_{m \times \hat{n}}$ 是低维度的、不容易过拟合的，且密集、容易显著的。

$\hat{n} \leq n \to \hat{n} + n o i s e = n$

因此我们的假设认为，我们的降维是剔除了噪音的。噪音是过拟合的本质，因此NMF剔除了噪音，因此避免了过拟合。

为了方便理解，做一个简单的计算。

假设原样本的某一行 $[0.12, 0.18, 0.32, 0.14]$ ，表示某个用户的四个特征。产生 $S_{1 \times 4}$ ，假设NMF特征向量 $F_{1 \times 2}$ 计算结果为 $[0.15, 0.12]$ ，表示用户的两个NMF特征。假设NMF权重 $C_{2 \times 4}$ 为 $[\begin{matrix} 0.01 & 0 & 2.13 & 0.54 \\ 0.99 & 1.47 & 0 & 0.5 \end{matrix}]$

所以结果满足

$\begin{aligned} [0.15, 0.12] \times [\begin{matrix} 0.01 & 0 & 2.13 & 0.54 \\ 0.99 & 1.47 & 0 & 0.5 \end{matrix}] & = [0.1203, 0.1764, 0.3195, 0.141] \\ = [0.12, 0.18, 0.32, 0.14] \\ F_{1 \times 2} \times C_{2 \times 4} & = S_{1 \times 4} \end{aligned}$

这里的优势在于：

相当于逻辑回归的模块化，针对每一个用户的 $C_{2 \times 4}$ 都是不一样的，因此可以做到每个用户的精细化转化变量。

相当于主成分分析，针对每一个用户的 $C_{2 \times 4}$ 都是都反映了 $F_{1 \times 2}$ 中不同变量的权重，因此转化结果可以给业务部门解释。

$C_{2 \times 4} = [\begin{matrix} V_{1} & V_{2} & V_{3} & V_{4} \\ w_{11} & w_{12} & w_{13} & w_{14} \\ w_{21} & w_{22} & w_{23} & w_{24} \end{matrix}]$

其中 $w_{i j}$ 衡量了某用户第 $j$ 个原始变量在第 $i$ 个NMF变量中的权重。对于每个NMF变量，我们都可以做个排序，看看哪个原始变量最强！因此NMF权重矩阵，衡量我们变量优良中差顺序。

NMF learns interpretable parts | Python

没有看懂这个地方图片能干嘛！！！

# Import pandas
import pandas as pd

# Create a DataFrame: components_df
components_df = pd.DataFrame(model.components_,columns=words)

# Print the shape of the DataFrame
print(components_df.shape)

# Select row 3: component
component = components_df.iloc[3]

# Print result of nlargest
print(component.nlargest())

<script.py> output:
    (6, 13125)
    film       0.627877
    award      0.253131
    starred    0.245284
    role       0.211451
    actress    0.186398
    Name: 3, dtype: float64

Explore the LED digits dataset | Python

# Import pyplot
from matplotlib import pyplot as plt

# Select the 0th row: digit
digit = samples[0]

# Print digit
print(digit)

# Reshape digit to a 13x8 array: bitmap
bitmap = digit.reshape(13,8)

# Print bitmap
print(bitmap)

# Use plt.imshow to display bitmap
plt.imshow(bitmap, cmap='gray', interpolation='nearest')
plt.colorbar()
plt.show()

In [16]: print(bitmap)
[[ 0.  0.  0.  0.  0.  0.  0.  0.]
 [ 0.  0.  1.  1.  1.  1.  0.  0.]
 [ 0.  0.  0.  0.  0.  0.  1.  0.]
 [ 0.  0.  0.  0.  0.  0.  1.  0.]
 [ 0.  0.  0.  0.  0.  0.  1.  0.]
 [ 0.  0.  0.  0.  0.  0.  1.  0.]
 [ 0.  0.  0.  0.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.  0.  1.  0.]
 [ 0.  0.  0.  0.  0.  0.  1.  0.]
 [ 0.  0.  0.  0.  0.  0.  1.  0.]
 [ 0.  0.  0.  0.  0.  0.  1.  0.]
 [ 0.  0.  0.  0.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.  0.  0.  0.]]

这就是稀疏矩阵。