Solving linear systems with Eigen (and OpenCV)

So I guess this post starts my blog. Welcome! 🙂

Over the past couple of days I was doing a few experiments with the Eigen linear algebra library, because the learning part of my supervised descent library was very slow. Inverting a 2000 x 2000 matrix took a few minutes, while Matlab did it in under a second. I ended up acquiring a bit of knowledge about Eigen and performing a few experiments I thought might be worth sharing.

Initially, I was using the FullPivLU decomposition from Eigen, and checked if the matrix I inverted was actually invertible. If you’re not so familiar with Eigen, their documentation provides a very nice overview table about all the available linear algebra decompositions here. My code looked roughly like this:

using namespace Eigen; // to keep the example concise
cv::Mat AtA = A.t() * A;
Map<Matrix<float, Dynamic, Dynamic, RowMajor>> AtA_Eigen(AtA.ptr(), AtA.rows, AtA.cols);
FullPivLU<Matrix<float, Dynamic, Dynamic, RowMajor>> luOfAtA(AtA_Eigen);
if (!luOfAtA.isInvertible()) std::cout << "Not invertible." << std::endl;
Matrix<float, Dynamic, Dynamic, RowMajor> AtAInv_Eigen = luOfAtA.inverse();
cv::Mat AtAInv(static_cast(AtAInv_Eigen.rows()), static_cast(AtAInv_Eigen.cols()), CV_32FC1, AtAInv_Eigen.data());
cv::Mat x = AtAInv * A.t() * b;

I’m pretty much using OpenCV everywhere in my code, but OpenCV is horrendously slow at inverting matrices larger than 500 x 500, so I dropped it in favour of Eigen. To people not familiar with the code, what I am doing is mapping the memory allocated by OpenCV with an Eigen::Map, invert with Eigen, and map the data back to an OpenCV cv::Mat. Eigen::RowMajor is needed because OpenCV stores the data in row major layout in memory, whereas Eigen’s default is column major. No copying of data is involved here. I solve the normal equations AT * A * x = AT * b, thus calculating AT * A, inverting it, and then calculating x.

Now let’s analyse what is not good about above code. Profiling showed hotspots at the first and last line, which I didn’t expect, so I ran a few measurements:

Table 1: Runtime (in seconds) of the matrix-matrix multiplications
  A: 2000×2000, b: 2000×40 A: 2000×4400, b: 4400×6
  AT * A x = AtAInv * AT * b AT * A x = AtAInv * AT * b
OpenCV  6.6 6.4  14.5  13.8 
Eigen  0.8 0.8  1.6  1.7 

Numbers doubled again for both libraries for a 3500 x 3500 matrix. Basically, even for matrix-matrix multiplications, OpenCV is horribly slow.

Next, as I was still wondering what took Eigen so long to calculate the actual inverse, I wanted to figure out if the Eigen::Map causes a slowdown or possibly the row major memory layout. I ended up benchmarking a few of the decomposition algorithms as well, which yielded a few interesting conclusions. First the results:

Table 2: Runtime (in seconds) for decomposing ATA. Color-coded by the decomposition used.
  A: 2048×558, b: 2048×6 A: 2048×2048, b: 2048×40 A: 3500×3500, b: 3500×40
MatrixXf & LDLT::solve() 0.55 0.57 4.4
MatrixXf & PartialPivLU::inverse() 1.4 1.4 5.9
MatrixXf & PartialPivLU::solve() 0.47 0.46 1.8
MatrixXf & ColPivHouseholderQR::inverse() 6.7 10.8 59.0
Col-major Map & PartialPivLU::inverse() 1.4 13.8 5.8
Row-major Map & PartialPivLU::inverse() 1.3 12.9 5.6
Row-major Map & PartialPivLU::solve() 0.52 0.56 1.8
Col-major Map & ColPivHouseholderQR::inverse() 6.7 11.4 54.8
Row-major Map & ColPivHouseholderQR::inverse() 6.5 9.7 44.0
Row-major Map & FullPivLU::inverse() 40.0 38.4 142.5
Row-major Map & FullPivLU::solve() 35.5    
jacobiSvd(ComputeThinU | ComputeThinV)::solve()   993.8  

First, we can see that using an Eigen::Map to custom data doesn’t incur a speed penalty. It’s equally fast than the Eigen::MatrixXf counterpart – it’s essentially free, which is good news. Second, the fact that the data is in row-major layout doesn’t seem to be an issue – on the contrary, Eigen consistently ran about 10 percent faster when the data was in row-major memory order. This is a bit contradictory to what the Eigen documentation suggests (it says Eigen works best and is most tested in its default, column-major layout). Third, there is a huge difference between the slowest decomposition (FullPivLU) and the fastest (PartialPivLU). My initial choice of using FullPivLU might not have been the best. Note that I tested only a subset of all the available decompositions (for a complete list, see the link above).

The fourth conclusion requires a bit of an explanation first. You can see in the table that I sometimes used inverse() and sometimes solve(MatrixXf b). To solve Ax=b, we would calculate the inverse of A and then calculate x=A-1*b. However, inverting the matrix might not be the best choice. If we want to solve a linear system of equations in the form of Ax=b, and we have a b, we can directly solve for a given b, and this often turns out to be an easier problem to solve than inverting the matrix. Eigen provides the solve method to do this, and it indeed is a lot faster. Note that it is even faster than the table suggests because we don’t need to add the time to calculate A-1*b (or A-1*AT*b in our case). On the other hand, when using inverse(), that time needs to be added.

However, it’s unfortunately not always possible to use solve(): A lot of the decomposition algorithms have not (yet?) implemented the case where b is a matrix.

One important thing to note is that the computationally cheaper algorithms come with a steep price: They pose requirements on the matrix, for example that it is invertible or positive (semi-) definite. On the other hand, the more expensive decompositions like the FullPivLU are rank-revealing decompositions. Often, the more expensive algorithms are also more precise or more numerically stable.

There are actually a few more things to consider, which are outside the scope of this blog post: One reason why I solve the normal equations and not the original system in the first place is that the original system most likely has no solution anyway and I want to obtain the minimum least-squares solution. Instead of calculating the pseudo-inverse using SVD on the original problem (which is very slow), one can solve the normal equations and use a QR, LU or even LDLT decomposition. Plus, in practice, even in that case, a regularisation term is often added to ATA.

Technicalities: I used Eigen 3.2.2, OpenCV 2.4.8, and the experiments were run on a Core i7-4700MQ with turbo boost and HT enabled. I compiled with VS2013, 64bit, Release mode, and the default flags from CMake. I briefly ran with /openmp and some more optimisations turned on, and in some cases I could see Eigen use more than 1 core, but I turned that off for these experiments. It’s worth noting that I ran every experiment only a couple of times and recorded the numbers by hand – so take the numbers only as estimates (but they were pretty consistent).

To wrap it up – my lessons learned were:

  • OpenCV is very slow for large matrices, even for multiplications
  • Prefer solve() over inverse(), because it does the job we want to do (solving the linear system – not inverting a matrix)
  • b in matrix form is not supported by most of Eigen’s solve() methods, and we are left with few decomposition options
  • Checking for invertibility is computationally very expensive because it requires a full LU decomposition (or similar). A potential workaround may be to use a cheap algorithm and then check the solution with isClose()

For the interested reader, the newest version of above code is on my GitHub.

Leave a comment