# Stochastic trace estimation for newbies

In this post we are going to discuss an essetially important method for estimating the trace of the inverse matrix. Our target is to overview the stochastic approach for computation of the trace inverse of large sparse matrices.

__Definitions__

*Note: this paragraph will introduce some concepts of matrix calculations. It is in no way intended to explain everything that is going on, but rather just to convey some ideas to the reader that should help them understand the rest of the entry. *

A **matrix** is a collection of numbers shaped like a table.

**To a non mathematician, this could look just like a mere collection of numbers put in a shape, but we can define calculations with them, like one would do with "normal numbers", and this becomes helpful when solving so-called *systems of equations.*

A* *system of equations is nothing but a collection of equations that have to be satisfied at the same time. You could obtain a system of equations starting from a problem such as the following:

*I have a sister, she's older than I am. Today her age is equal to twice my age plus 7. If you sum our ages, you get 67. How old are my sister and I?*

If you called my age *x *and my syster's age *y*, you could rewrite these statements as

It is impossible for me to go through details, but it's possible to rearrange this system and write it in the so-called **matrix form**, which in this case is the following.

Note that in this case this is a matrix equation of the form

Our task is to find the "X" matrix, which is the collection of our unknowns. To do so, we would like to divide both the left and the right hand side by something *that makes A go away.* In standard mathematics, if A were a number, it would be the inverse number to A. So, for instance, if A were 2, it would be 1/2, or 0.5, or if A were 8 it would be 1/8, or 0.125.

For matrices, an analogous concept exists, that of **inverse matrix**. For instance, the inverse matrix of A, denoted as A^{-1}, is a matrix such that

where 1 in this case is the matrix analogous of 1. Computing the inverse of a matrix is a very computationally expensive task for most matrix, in particular, the bigger the matrix is, the longer it takes to compute its inverse. Unless the matrix has some nice properties, the time it takes to compute the inverse of a matrix of size *n* scales with *n ^{3}*, meaning if a matrix gets twice as big, the time will become 8 times as long.

Another concept that is related to the a matrix is that of **trace**. It is not necessary to know *why*, but it enters many calculations, in particular in Lattice QCD.

The trace of a matrix D, denoted with tr(D), is just the sum of the elements on the diagonal, for instance

This is easy to compute for a known matrix, meaning for a matrix of which we know each component, but in most cases in mathematics, one does not deal with the matrix in its "plain form", but rather knows how to compute the effect a matrix has on another matrix, or on a vector (which is just a very long matrix of only one column).

__Problems__

Lattice QCD (LQCD) is a computationally demanding branch of physics that tries to simulate the behaviour of particles like quarks and gluons, rather than rely on the results obtained experimentally. This offers a framework to compare theoretical and experimental results with an incredibly high precision, to measure any deviation from the *Standard Model* of particle physics, and to shine a light on unsolved problems in physics.

In order to obtain results, LQCD simulates the equation relating quarks and gluons in a finite box (since it wouldn't be possible to simulate an infinite volume) and using a grid of points at a finite distance between them (as having points be too close to one another would mean having an infinite number of points, which is again impossible). This gives a number of point that is defined as **volume** of the system, and the equations are written in matrix form, where the main matrix, called *Dirac operator* is a "square of numbers" with one side long 24 times the volume. So to simulate a volume of 80 points in each direction, the matrix has to have 11796480000 numbers in it. It would be impossible to store them all and make calculations with them would be too long.

We are aided by the fact that most of the numbers are actually 0, because only neighbouring points interact with one another, thus leaving all the entries corresponding to non-neighbouring points to 0. Regardless of this aid, it is still computationally challenging to operate the two main tasks that need to be done with this matrix

a) Computing the action of the inverse of this matrix on a vector;

b) Estimating the trace of the matrix - and the trace of the inverse matrix as well.

Numerical Linear Algebra, discipline that has matrix calculations at its core, helps us with many methods to carry out these tasks in the most efficient way possible.

__Stochastic method for the trace of the inverse matrix__

A possible solution to the problem is offered by the so-called *stochastic method*. In this case, we do not want to compute the trace exactly, but rather have an estimate. It is possible to show that if we have a collection of vectors *z*'s such that for each of these each entry is either 1 or -1 with 50% probability, the following equation holds

The more random vectors we multiply the matrix by, the better our estimate will be. The error from the real value will decrease with the number of vectors z that we use, in particular it goes as the inverse of the square root of *s*, meaning that if we have our estimate, to get an estimate with half the error, we need to extract four times as many vectors as the ones we have done so far. Yet, the starting value, that is the maximum error we could ever get, is the sum of the squares of the *off-diagonal elements* of the matrix we are considering. This means for very sparse matrices, like the cases we are considering, this number is relatively low. Therefore even a fairly small *s* could yield good results.

The important thing to note is that in this case the computation will be **much cheaper **than computing the whole inverse and then summing the values on the diagonal. In particular, we do not need to know the actual form of the inverse, but only how it acts on the vectors *z*. This problem is usually treated with *iterative solvers,* which again aid the computation by reducing its cost.

The idea behind iterative methods is somehow analogous, as we can compute the action of the inverse of a matrix on a vector by simply applying the matrix (**not the inverse**) on the vector several times. Furthermore, if we want to know the action of the inverse on a collection of vectors, and not only one, they can just be put together in a matrix, and we can estimate the action of the inverse on that matrix in a similar way, but with a much lower cost, as these methods have been studied for years and now software packages exist that do these computations in a surprisingly small amount of time.

__Conclusion__

Matrix calculation, and in particular the calculation of the inverse of a matrix and of the trace thereof, is one of the main aspects of Lattice QCD. The dimensionalities of the problems at hand and the need for fast computation made scientists and mathematicians derive better and faster algorithms to compute extremely accurate estimates of the quantities to be measured.

These estimates are obtained by repeating cheaper operations several times, rather than doing one very computationally demanding task only once. This is the to go strategy when studying systems of equations that can be related to **large** (big in size) **sparse** (most of the numbers are 0's) **matrices** (collections of numbers with their own algebraic rules).