fml blog   About

SVD via Lanczos Iteration

Written Mon, June 15, 2020.

Background

Every few years, I try to figure out the Lanczos method to approximate SVD of a rectangular matrix. Unfortunately, every resource I find always leaves out enough details to confuse me. All of the information I want is available across multiple writeups, but everyone uses different notation, making things even more confusing.

This time I finally sat down and got to a point where I finally felt like I understood it. This writeup will hopefully clarify how you can use the Lanczos iteration to estimate singular values/vectors of a rectangular matrix. It’s written for the kind of person who is interested in implementing it in software. If you want more information or mathematical proofs, I cite the important bits in the footnotes. Anything uncited is probably answered in 1.

I assume you know what singular/eigen-values/vectors are. If you don’t know why someone would care about computing approximations of them cheaply, a good applications is truncated PCA. If you just take (approximations of) the first few rotations, then you can visualize your data in 2 or 3 dimensions. And if your dataset is big, that can be a valuable time saver.

Throughout, we will use the following notation. For a vector $v$, let $\lVert v \rVert_2$ denote the Euclidean norm $\sqrt{\sum v_i^2}$. Whenever we refer to the norm of a vector, we will take that to be the Euclidean norm.

We can “square up” any rectangular matrix $A$ with dimension $m\times n$ by padding with zeros to create the square, symmetric matrix $H$ with dimension $(m+n)\times (m+n)$:

$$ \begin{align*} H = \begin{bmatrix} 0 & A \\
A^T & 0 \end{bmatrix} \end{align*} $$

Finally, given scalars $\alpha_1, \dots, \alpha_k$ and $\beta_1, \dots, \beta_{k-1}$, we can form the tri-diagonal matrix $T_k$:

$$ \begin{align*} T_k = \begin{bmatrix} \alpha_1 & \beta_1 & 0 & \dots & 0 & 0 \\
\beta_1 & \alpha_2 & \beta_2 & \dots & 0 & 0 \\
0 & \beta_2 & \alpha_3 & \dots & 0 & 0 \\
\vdots & \vdots & \vdots & \ddots & \vdots & \vdots \\
0 & 0 & 0 & \dots & \alpha_{k-1} & \beta_{k-1} \\
0 & 0 & 0 & \dots & \beta_{k-1} & \alpha_k \end{bmatrix} \end{align*} $$

Lanczos Iteration

We’re going to define the Lanczos iteration (Algorithm 36.1 of 2) in the abstract. Its mathematical motivation is probably too complicated for anyone who would find this writeup helpful. For now, we will just think of it as a process that may have some application in the future.

In R, this might look like:

l2norm = function(x) sqrt(sum(x*x))

lanczos = function(A, k=1)
{
  n = nrow(A)
  
  alpha = numeric(k)
  beta = numeric(k)
  
  q = matrix(0, n, k)
  q[, 1] = runif(n)
  q[, 1] = q[, 1]/l2norm(q[, 1])
  
  for (i in 1:k)
  {
    v = A %*% q[, i]
    alpha[i] = crossprod(q[, i], v)
    
    if (i == 1)
      v = v - alpha[i]*q[, i]
    else
      v = v - beta[i-1]*q[, i-1] - alpha[i]*q[, i]
    
    beta[i] = l2norm(v)
    
    if (i<k)
      q[, i+1] = v/beta[i]
  }
  
  list(alpha=alpha, beta=beta, q=q)
}

As presented, this has nothing to do with calculating eigen-and/or-singular values/vectors. That is the subject of the next section.

Application to Spectral Decomposition and SVD

If $A$ is a square, symmetric matrix of order $n$, then you can estimate its eigenvalues by applying the lanczos method. The $\alpha$ and $\beta$ values can be used to form the tridiagonal matrix $T_k$, and the eigenvalues of $T_k$ approximate the eigenvalues of $A$ (Theorem 12.5 of 3). For the eigenvectors, let $S_k$ be the eigenvectors of $T_k$. Then approximations to the eigenvectors are given by $Q_k S_k$ (Theorem 12.6 of 3). In practice with $k\ll n$ the $T_k$ will be small enough that explicitly forming it and using a symmetric eigensolver like LAPACK’s Xsyevr() or Xsyevd() will suffice. You could also use Xsteqr() instead.

To calculate the SVD of a rectangular matrix $A$ of dimension $m\times n$ and $m>n$, you square it up to the matrix $H$. With $H$, you now have a square, symmetric matrix of order $m+n$, so you can perform the Lanczos iteration $k$ times to approximate the eivenvalues of $H$ by the above. Note that for full convergence, we need to run the iteration $m+n$ times, not $n$ (which would be a very expensive way of computing them). The approximations to the singular values of $A$ are given by the non-zero eigenvalues of $H$. On the other hand, the singular vectors (left and right) are found in $Y_k := \sqrt{2}\thinspace Q_k S_k$ (Theorem 3.3.4 of 4). Specifically, the first $m$ rows of $Y_k$ are the left singular vectors, and the remaining $n$ rows are the right singular vectors (note: not their transpose).

Since the involvement of the input matrix $A$ in computing the Lanczos iteration is in the matrix-vector product $Hq_i$, it’s possible to avoid explicitly forming the squard up matrix $H$, or even to apply this to sparse problems, a common application of the algorithm. For similar reasons, this makes it very simple (or as simple as these things can be) to use it in out-of-core algorithms (although not online variants, given the iterative nature). Finally, note that if you do not need the eigen/singular vectors, then you do not need to store all column vectors $q_i$ of $Q_k$.

If you have $m<n$ then I think you would just use the usual “transpose arithmetic”, although maybe even the above is fine. I haven’t thought about it and at this point I don’t care.

Key Takeaways and Example Code

tldr

Ignoring some of the performance/memory concerns addressed above, we might implement this in R like so:

square_up = function(x)
{
  m = nrow(x)
  n = ncol(x)
  
  rbind(
    cbind(matrix(0, m, m), x),
    cbind(t(x), matrix(0, n, n))
  )
}

tridiagonal = function(alpha, beta)
{
  n = length(alpha)
  td = diag(alpha)
  for (i in 1:(n-1))
  {
    td[i, i+1] = beta[i]
    td[i+1, i] = beta[i]
  }
  
  td
}

#' @param A rectangular, numeric matrix with `nrow(A) > ncol(A)`
#' @param k The number of singular values/vectors to estimate. The number
#' of lanczos iterations is taken to be twice this number. Should be
#' "small".
#' @param only.values Should only values be returned, or also vectors?
#' @return Estimates of the first `k` singular values/vectors. Returns
#' the usual (for R) list of elements `d`, `u`, `v`.
lanczos_svd = function(A, k, only.values=TRUE)
{
  m = nrow(A)
  n = ncol(A)
  if (m < n)
    stop("only implemented for m>n")
  
  nc = min(n, k)
  
  kl = 2*k # double the lanczos iterations
  
  H = square_up(A)
  lz = lanczos(H, k=kl)
  T_kl = tridiagonal(lz$alpha, lz$beta)
  
  ev = eigen(T_kl, symmetric=TRUE, only.values=only.values)
  d = ev$values[1:nc]
  
  if (!only.values)
  {
    UV = sqrt(2) * (lz$q %*% ev$vectors)
    u = UV[1:m, 1:nc]
    v = UV[(m+1):(m+n), 1:nc]
  }
  else
  {
    u = NULL
    v = NULL
  }
  
  list(d=d, u=u, v=v)
}

And here’s a simple demonstration


set.seed(1234)
m = 300
n = 10
x = matrix(rnorm(m*n), m, n)

k = 3
lz = lanczos_svd(x, k=k, FALSE)
sv = svd(x)

round(lz$d, 5)
[1] 18.43443 15.83699  0.00160
round(sv$d[1:k], 5)
[1] 19.69018 18.65107 18.41093
firstfew = 1:4
round(lz$v[firstfew, ], 5)
        [,1]    [,2]     [,3]
[1,] 0.40748 0.02210 -0.00228
[2,] 0.21097 0.18573 -0.00051
[3,] 0.13179 0.01563 -0.00411
[4,] 0.39096 0.27893 -0.00117
round(sv$v[firstfew, 1:k], 5)
        [,1]     [,2]     [,3]
[1,] 0.44829 -0.28028 -0.06481
[2,] 0.18420  0.00095  0.67200
[3,] 0.07991  0.01054 -0.14656
[4,] 0.16961 -0.27779 -0.37683

  1. Golub, Gene H., and Charles F. Van Loan. Matrix computations. Vol. 3. JHU press, 2012. ↩︎

  2. Trefethen, Lloyd N., and David Bau III. Numerical linear algebra. Vol. 50. Siam, 1997. ↩︎

  3. Caramanis and Sanghavi, Large scale learning: lecture 12. (archive.org link) ↩︎

  4. Demmel, James W. Applied numerical linear algebra. Vol. 56. Siam, 1997. ↩︎


  Introducing fml - the Fused Matrix Library →


© 2020-2021. Post content is CC-BY-SA. Source code listings are CC0.