一个有3个features的矩阵,这三个feature相互独立并每一个都符合正态分布,copy一个feature,将此feature并入矩阵,组成4features矩阵,请问PCA之后得到的eigenvector里最大的值是多少?
感谢各位帮助,完全不知道 做
2个回答
答案是1。
问题应该是“PCA之后得到的eigenvector里绝对值最大的值是多少”。
3个独立feature,求PCA时,会依次把方差第$i$大feature的单位向量作为第$i$个component, 最后PCA的eigenvector(component)是xyz轴的正交基(orthonormal)。所以x轴[+/-1 0 0]的最大绝对值为1.
当复制一个feature后,xy轴不变,z轴扩展为一个平面,x轴扩展为[+/-1 0 0 0],最大绝对值还是1.
做实验验证下:
import numpy as np
from sklearn.decomposition import PCA
np.random.seed(0)
n=100000
## Gaussian
A=np.random.multivariate_normal([0,0,0],np.diag([300,20,1]),n)
## Poisson
# A=np.zeros([n,3])
# for i in range(3):
# A[:,i]=np.random.np.random.poisson((i+1)*100,size=n)
pca = PCA(3)
pca.fit(A)
print('---3 features PCA---')
print('eigenvectors:')
print(pca.components_)
print('eigenvalues:')
print(pca.explained_variance_)
print('max abs element of eignvectors is %f'%max(abs(pca.components_.flatten())))
B=np.concatenate((A, A[:,-1].reshape([n,1])),axis=1)
pca1 = PCA(4)
pca1.fit(B)
print('---3+1 features PCA---')
print('eigenvectors:')
print(pca1.components_)
print('eigenvalues:')
print(pca1.explained_variance_)
print('max abs element of eignvectors is %f'%max(abs(pca1.components_.flatten())))
结果:
---3 features PCA---
eigenvectors:
[[ 9.99999758e-01 6.61747537e-04 -2.16070381e-04]
[ 6.61727522e-04 -9.99999777e-01 -9.26909242e-05]
[-2.16131671e-04 9.25479220e-05 -9.99999972e-01]]
eigenvalues:
[299.24945095 19.99189396 0.99597203]
max abs element of eignvectors is 1.000000
---3+1 features PCA---
eigenvectors:
[[ 9.99999734e-01 6.61746150e-04 -2.16794328e-04 -2.16794328e-04]
[ 6.61703761e-04 -9.99999772e-01 -9.78196862e-05 -9.78196862e-05]
[-3.06684953e-04 1.38135016e-04 -7.07106741e-01 -7.07106741e-01]
[-0.00000000e+00 5.26484535e-17 7.07106781e-01 -7.07106781e-01]]
eigenvalues:
[2.99249465e+02 1.99918941e+01 1.99194394e+00 2.49383447e-32]
max abs element of eignvectors is 1.000000