Alternating Least Squares

Given two matrices \(X\) and \(Y\) with the shape \(k \times n\) and \(k \times m\) respectively, we can find the optimal \(X\) and \(Y\) by formulating the problem as an optimiztion problem where we aim to minimize an objective function. In particular we aim to minimize the least squares error of the observed data and regularize: \[ \min_{X, Y} \sum_{r_{ui}observed}(r_{ui} - x^T_uy_i)^2 + \lambda(\sum_u||x_u||^2+\sum_i||y_i||^2) \]

This objective is non convex duto the the \(x^T_uy_i\) term. It is NP-hard to optimize. Gradient descent can be used as an approximate approach, however it is slow and costs lots of iterations. However, if we fix the set of variables \(X\) and treat them as constant, the the objective is a convex of \(Y\) and vice-versa. The approach is therefore to fix \(Y\) and optimze \(X\), then fix \(X\) and optimize \(Y\) and repeat until convergence. This is known as Alternating Least Squares (ALS).

For a single machine the computational cost of this algorithm when updating each \(x_u\) is \(O(n_uk^2 + k^3)\) where \(n_u\) is the number of values in \(X\) and \(O(n_ik^2+k^3)\) when updating each \(y_i\) where n_i is the number of values in \(Y\).


No notes link to this note