理解SVD

SVD

奇异值分解(Singular Value Decomposition,简称SVD)是在机器学习领域广泛应用的算法,它不光可以用于降维算法中的特征分解,还可以用于推荐系统,以及自然语言处理等领域,是很多机器学习算法的基石。

参考

强大的矩阵奇异值分解(SVD)及其应用

机器学习(29)之奇异值分解SVD原理与应用详解

PCA、SVD实现

Lanczos迭代就是一种解对称方阵部分特征值的方法。Spark mllib 封装了ARPACK、LAPACK库,来进行求解,使用的就是Lanczos方法。

BLAS/ARPACK/LAPACK - CSDN博客

Spark里用LAPACK求解full-svd,用ARPACK求解本地或分布式svd

矩阵求特征值和特征向量用arpack和lapack哪个好些_百度知道

不同情况下采用的实现方法及各种代价

下面这段摘自spark源码注释

  • We assume n is smaller than m, though this is not strictly required.
    • The singular values and the right singular vectors are derived
    • from the eigenvalues and the eigenvectors of the Gramian matrix A’ * A. U, the matrix
    • storing the right singular vectors, is computed via matrix multiplication as
    • U = A (V S^-1^), if requested by user. The actual method to use is determined
    • automatically based on the cost:
      • If n is small (n < 100) or k is large compared with n (k > n / 2), we compute
    • the Gramian matrix first and then compute its top eigenvalues and eigenvectors locally
    • on the driver. This requires a single pass with O(n^2^) storage on each executor and
    • on the driver, and O(n^2^ k) time on the driver.
      • Otherwise, we compute (A’ A) v in a distributive way and send it to ARPACK’s DSAUPD to
    • compute (A’ * A)’s top eigenvalues and eigenvectors on the driver node. This requires O(k)
    • passes, O(n) storage on each executor, and O(n k) storage on the driver.

最后用spark进行PCA、SVD降维实践可参考 利用PCA、SVD进行数据降维 | Oath2yangmen’s Blog


0 Comments