Pragmatic Recurrence Relations: Why the Closed Form Won't Help You

The Lesson
#

It’s easy to get hooked on the idea of performance and optimisation, but when it comes to recurrence relation problems in programming, it’s easy to fall into the trap of chasing the perfect solution. You’ve done your homework, read up on all the maths, and now you’re ready to blast the next recurrence relation problem you see with algebra, find and utilise the closed form solution, and leave other’s code in the dust.

There’s only one problem: most of the time, it doesn’t work.

For starters, existing general purpose solutions like matrix exponentiation already work well: it’s simple to implement, applies to many recurrence problems and finds solutions in logarithmic time, and while I hate to admit it, in most situations that is fast enough.

Indeed, most of the time you would actually care about improving speed beyond logarithmic time is when the inputs are really large, but this means that you are almost certainly taking the modulus of the answer, at least in any programming challenge, because most recurrence relations grow fast.

Take for example the fibonacci sequence, the million’th term in the sequence has over 200,000 digits. These large numbers are awkward to deal with, so programming challenges ask for answers to be given to a modulus.

This use of modular arithmetic makes implementing closed form solutions much harder, because this solution will often involve real numbers instead of just integers, and the normal rules of modular arithmetic do not apply when using real numbers. You will see later in the problem I tried just how pesky these real numbers can be.

However, there may still be those out there who would be undeterred by these barriers, who want to push on for greater performance even when it might not seem strictly necessary. Unfortunately, the real killer to the fervent optimser’s dream is the real-world performance of these solutions: often they actually perform worse than easier methods like matrix exponentiation.

This is partially because at the end of the day you’re doing exponentiation, either with scalars in the closed form or with matrices, meaning the extent to which you can improve the overall time complexity is limited. However, what can actually make the recurrent solution slower is the requirement for the real numbers in the closed form solution to be extremely accurate for large inputs, meaning finding and using these numbers can be extremely computationally expensive.

You will now see how I tried to solve a fairly standard recurrence relation problem using a more mathematical approach than is traditional, and how it failed.

A Failed Attempt at getting Mathematical
#

The problem is one I have tackled before: Leetcode Problem 552. If you want more information about the problem and the solution using matrix exponentiation then please go read that article; the crux of the problem, if we want to use recurrence relations, is finding the following: $$ v(n) = \sum_{i=0}^{n-1} F(i)F(n-i-1) $$ where $F(n)$ is a Tribonacci number: $$ F(n)=F(n-1)+F(n-2)+F(n-3) $$

Mathematical Solution
#

Calculating F(n)
#

First, I found the roots of the characteristic equation: $$ x^3-x^2-x-1=0 $$

I then used sympy to find the roots and coefficients of the closed form solution: $$ F(n) = A r_1^n + B r_2^n + C r_3^n $$

Here are the numerical approximations to these values:

$A \approx 1.137 $
$r_1 \approx 1.839 $
$B \approx -0.0687 + 0.124 i $
$r_2 \approx -0.420 + 0.606 i$
$C \approx -0.0687 - 0.124 i$
$r_3 \approx -0.420 - 0.606 i$

The exact roots and coefficients are complicated, but what is important about these values are:

they aren’t rational
two of the terms in our function are complex
the two complex terms are very small and get smaller as n grows
when combined they make an integer (since $F(n)$ is an integer)

The small size of the two complex terms are very helpful, because they are small enough that we can simply ignore them and round the first term: $$ F(n) \approx A r_1^n $$

Calculating v(n)
#

Remember we have the following formula for $v(n)$: $$ v(n) = \sum_{i=0}^{n-1} F(i)F(n-i-1) $$

If we now sub in our formula for $F(n)$

$$ v(n) = \sum_{i=0}^{n-1} (Ar_1^i+Br_2^i+Cr_3^i)(Ar_1^{n-i-1}+Br_2^{n-i-1}+Cr_3^{n-i-1}) $$

We only need to consider terms involving $Ar_1^n$ in our approximation: $$ v(n) \approx \sum_{i=0}^{n-1} A^2r_1^{n-1} + Ar_1^i(Br_2^{n-i-1}+Cr_3^{n-i-1}) + Ar_1^{n-i-1}(Br_2^i+Cr_3^i) $$

The sum of terms 2 and 3 are the same: $$ v(n) \approx \sum_{i=0}^{n-1} A^2r_1^{n-1} + 2Ar_1^i(Br_2^{n-i-1} + Cr_3^{n-i-1}) $$

We can use the following formula to help us evaluate the 2nd term: $$ \sum_{i=0}^n a^i b^{n-i} = \frac{b^{n+1}-a^{n+1}}{b-a} $$

In this case, due to the size of the complex terms growing insignificant when raised to the power of n, we can ignore the $a^{n+1}$ part for those terms: $$ v(n) \approx nA^2r_1^{n-1} + 2A(B \frac{r_1^n}{r_1-r_2} + C \frac{r_1^n}{r_1-r_3}) $$

We are only interested in the real part of the sum, and we know that the answer we are approximating is real, so we can safely remove the imaginary part (it would have been cancelled out by the other complex terms if we hadn’t ignored them). The real part is the same for both terms in the brackets, so we can just take the real part of one and multiply by two: $$ v(n) \approx nA^2r_1^{n-1} + 4Ar_1^n \cdot \Re(\frac{B}{r_1-r_2}) $$

Final formula
#

The actual solution requires adding $F(n)$ to $v(n)$ and taking the modulus: $$ \text{solution} = \text{round}(Ar_1^{n-1}[nA+4r_1 \cdot \Re(\frac{B}{r_1-r_2})+r_1]) \mod m $$

This formula looks a little complicated, but there are only two parts of the formula that involve n. Therefore we can simply the formula to: $$ \text{solution} = \text{round}(r_1^{n-1}[A^2 n + c]) \mod m $$

Note that while the ‘round’ function may look a little messy, it only accounts for the complex terms in the closed form solution that we ignored, which get very small quickly, so even without the round function the answer would be very close, especially for larger n.

Implementing the solution
#

This is the point where maths meets reality. At this point a mathematician would have sat back and enjoyed their formula that gives the answer for any n neatly.

However, this is very difficult to implement in a program due to the fact that $r_1$ is a real number, and so are the other terms $A^2 n + c$, which stops us from being able to do any useful modular arithmetic tricks.

Let’s explore some of the ideas I tried to use modular arithmetic to get a fast solution.

modular exponentiation
#

Normally, you would see $r^n \mod m$ and think of using the multiplicative order of r: $$ r^k = 1 \mod m $$ $$ r^n = r^{n \ \% \ k} \mod m $$

Unfortunately, this does not hold when using real numbers. For example:

let $k = \log_{1.8}(8)$
so $1.8^k = 1 \mod 7$
$1.8^{10} \approx 0.05 \mod 7 $
$1.8^{10 \ \% \ k} \approx 5.8 \mod 7$

mixed approach with matrices
#

OK, so at this point I thought fine, if I can’t find $r_1^{n-1} % m$ using my closed form solution, I can just find the related value $F(n-1)$ using matrix exponentiation, and then multiply by $A^2 n + c$.

This should still be faster than finding the answer directly with matrix exponentiation since to find the solution directly using that technique requires a 7x7 matrix (in my solution), while to just find $F(n)$ only requires a 3x3 matrix. Since matrix multiplication is $O(n^3)$, this should result in roughly a 12x speed increase.

Unfortunately, once again the rules of modular arithmetic thwart us. In particular, while the following property holds for integers, it is not true for real numbers: $$ ab \mod m \ne ((a \mod m) \cdot (b \mod m)) \mod m $$

This means that even if we use matrix exponentiation to calculate $ r_1^{n-1}$ mod m, we cannot multiply it by our constant to get the answer mod m.

No modular tricks
#

So we can’t use any modular tricks, so what? We can still use our formula to calculate the final value and then just mod that.

This approach works and can be easily implemented:

def solve_with_recurrence(n, A, B, r1, r2):
    if n == 1:
        return 3
    elif n == 2:
        return 8

    c = A*(4*r1*sympy.re(B/(r1-r2)) + r1)
    return round(r1**(n-1) * (A*A*n+c)) % 1000000007

Analysing my solution
#

However, what’s being hidden in the previous code snippet is the calculation of the constants passed to the function, as well as exactly what they look like. The problem is that since our answer could be thousands of digits long, so do our floats. Doing calculations with these large floats is computationally expensive, but what is even more computationally expensive is calculating them in the first place.

def get_constants(precision):
    # get precomputed exact representations of the values
    with open("sympy_expr.txt", "r") as f:
        stored_expr = sp.sympify(f.read())

    return [term.evalf(precision) for term in stored_expr]

timeit.timeit(lambda: get_constants(digits_precision), number=5)/5

digits precision	time
1,000	0.09s
5,000	1.34s
10,000	4.6s
20,000	16s
100,000	5m 15s

It takes a long time to compute these values, seemingly exhibiting a time complexity of ~$O(n^2)$, but surely with thousands of places of precision we should be able to calculate solutions for massive n? Well, I found through experimentation that in order to maintain accuracy, you need roughly $\frac{n}{3}$ places of decimal precision to calculate the solution for n. This means that this method quickly becomes infeasible for large n.

However, let’s say we can precompute these values to a large precision on our supercomputer, we are then guaranteed our function will work up to a certain n and any normal computer can compute solutions blazingly fast; it won’t fix the fact that our program now has a linear space complexity but as long as it’s fast, it’s fine. Let’s take a look at how our solution using the closed form solution stacks up against the default matrix exponentiation solution:

A,B,r1,r2 = get_constants(n//3)
recurrence_time = timeit(lambda: solve_with_recurrence(n,A,B,r1,r2), number=1000)/1000
matrix_time = timeit(lambda: solve_with_matrix(n), number=1000)/1000

n	matrix time	recurrence time
50,000	<0.1ms	5ms
100,000	<0.1ms	11ms
300,000	0.1ms	75ms
$10^{61}$	1ms	?

It turns out the function, even with precomputed values, is actually much slower than the matrix exponent method due to the precision of the floats and the size of the intermediate numbers being calculated, since we cannot make use of any modular arithmetic tricks. In fact, due to the precision of the floats growing linearly, the new function displays a completely different (worse) time complexity.

Conclusion
#

There’s a lot of smart engineers and mathematicians out there that can probably improve on my attempts using the closed form solution to the recurrence and leverage its power to some extent.

However, I’d prefer to not have to be smart. If nothing else from my little journey, please take away the fact that you can spend a significant amount of time playing around with the maths and code to try and find a way to make your fancy mathematical method work, all for it to be no faster than what you could have done in 10 minutes.

Sometimes it pays to try and be clever, but other times it’s best to just use what’s well established. Sure, go ahead and check what the closed form solution is, but if you see irrational complex numbers, I’d recommend spending your time optimising another problem.

The Lesson#

A Failed Attempt at getting Mathematical#

Mathematical Solution#

Calculating F(n)#

Calculating v(n)#

Final formula#

Implementing the solution#

modular exponentiation#

mixed approach with matrices#

No modular tricks#

Analysing my solution#

Conclusion#