Đặt tít để câu view. Số là học kỳ này tôi dạy lớp cấu trúc dữ liệu. Đây là lớp undergrad đầu tiên tôi dạy từ trước đến nay, cho sinh viên năm 1 năm 2 nên viết bài giảng khá chi tiết. Có bài giảng này mới viết, có vẻ thích hợp cho blog nên đăng lại đây. Bạn đọc nào rảnh thì góp ý giùm.
Bài sau: sắp xếp và tìm kiếm.
1. The Fibonacci numbers
The Fibonacci numbers are the numbers in the sequence 0,1,1,2,3,5,8,… defined by F[0] = 0, F[1]=1, and F[n] = F[n-1]+F[n-2] for integers n ≥ 2. These numbers appeared in Indian mathematics, probably as early as 200BC. The numbers, however, are named after the Italian mathematician Fibonacci, also known as Leonardo of Pisa. Fibonacci wanted to study an idealized model of rabbit population growth (where they never die), and the nth Fibonacci number is the number of rabbits in the population in the nth month starting with a single pair of rabbits in the first month.
Using the method of generating functions, it is not hard to show that , where
is the Golden ratio. Using this formula to compute the
nth Fibonacci number is possible, but quite messy due to round-off errors and the complexity of floating point computation in general. So, let us try to write a program to compute F[n] without doing floating point computation.
1.1. The worst algorithm in the history of humanity
/**
* -----------------------------------------------------------------------
* the most straightforward implementation of Fibonacci comp. algo.
* -----------------------------------------------------------------------
*/
unsigned long long fib1(unsigned long n) {
if (n <= 1) return n;
return fib1(n-1) + fib1(n-2);
}
The above algorithm for computing F[n] turns out to be extremely bad. How do I know that? Well, on my laptop which runs 2.53 GHz Intel Core 2 Duo with 4GB of memory, I ran a little experiment. Computing up to F[37] took under 1 second. Things start to get interesting for F[38] and on.
| n | 37 | 38 | 39 | 40 | 41 | 42 | 43 | 44 | 45 | 46 | 47 |
| # secs | 1 | 2 | 2 | 3 | 5 | 7 | 13 | 22 | 31 | 53 | 104 |
Looks like the number of seconds is doubled for each additional 1 we add to n. We wouldn’t be able to compute F[110] this way. Remember that the age of the universe is probably less than 15 billion years, which is less than seconds. Now, wouldn’t you agree that the above algorithm is the worst algorithm in the history of the universe?
The shape of the run-time function, in terms of n, looks exponential. Is there a mathematical justification for this? Yes! We should have at least done some back-of-the-envelope estimate on the run-time of this algorithm before even writing the code.
- Note that when we say “function”, sometimes we mean a C++ funtion, and sometimes we mean a mathematical function. A C++ function might be used to compute a mathematical function, but not always. What kind of functions we mean should be clear from context. In this case, the C++ function
fib1()is used to compute the mathematical functionF[n]. -
fib1()contains calls to itself, with smaller parameters. These are recursive calls, and are one of the most important techniques of algorithmic design. Recursion, if used in the right way, makes the code very clean and easy to understand. That comes at a price: each time we call a function the operating system has to reserve some memory space for that function on a function stack that the operating system maintains. This operation takes both time and space (i.e. memory). Some compilers/interpreters are very smart and they might be able to apply a technique called tail recursion to save the use the time/space imposed by some class of recursive function calls. But you’ll need to take CSE305 to understand the technique. Anyhow, this is not the case here. - Now, let’s consider the call to
fib1(6). Our program spends some time before making the two recursive calls: compare 6 with 2 in theifstatement, computing 6-1 and 6-2, etc. So, there is a fixed amount of work done before the two recursive calls are performed. Then, after the two recursive calls return their values, some amount of work is done to get their sum and return. Overall, letT[n]denote the amount of time it takes the callfib1(n)to computeF[n], then we can estimateT[6]byT[6] = c + T[5] + T[4]
where
cis a constant, probably in the order of miliseconds. Using exactly the same reasoning, we can continue:T[6] = c + T[5] + T[4] = c + (c + T[4] + T[3]) + (c + T[3] + T[2]) = 3c + T[4] + 2T[3] + T[2] = 3c + (c + T[3] + T[2]) + 2(c + T[2] + T[1]) + (c + T[1] + T[0]) = 7c + T[3] + 3T[2] + 3T[1] + T[0] = 7c + (c + T[2] + T[1]) + 3(c + T[1] + T[0]) + 3T[1] + T[0] = 11c + T[2] + 7T[1] + 4T[0] = 11c + (c + T[1] + T[0]) + 7T[1] + 4T[0] = 12c + 8T[1] + 5T[0]Now,
T[1]andT[0]are slightly different because only theifstatement is executed; let’s assume that it takesdunits of time. Overall, we haveT[6] = 12c + 13d. But we want to estimateT[n], not justT[6]. How do we do that?
1.2. First technique for run-time estimation: guess and induct
Our first strategy is to guess a formula for T[n] and then prove it by induction. We can easily get the formulas for T[n] for n ≤ 6 by iterating “bottom-up”:
T[0] = d T[1] = d T[2] = c + 2d T[3] = 2c + 3d T[4] = 4c + 5d T[5] = 7c + 8d T[6] = 12c + 13d ... T[n] = c + T[n-1] + T[n-2]
The coefficients of d looks like … the Fibonacci sequence, and the coefficients of c is just one off. Our conjecture is then:
T[n] = (F[n+1]-1)c + F[n+1]d (*)
Let’s prove it by induction. The base cases are T[0] = d and T[1] = d, obviously hold! Assume (*) holds for all T[m] with m ≤ n-1; in particular, assume we knew that
T[n-1] = (F[n]-1)c + F[n]d T[n-2] = (F[n-1]-1)c + F[n-1]d
Then, we have
T[n] = c + T[n-1] + T[n-2] // from fib1()'s definition
= c + (F[n]-1)c + F[n]d + (F[n-1]-1)c + F[n-1]d // induction hypotheses
= (F[n] + F[n-1] -1)c + (F[n] + F[n-1])d // regrouping
= (F[n+1]-1)c + F[n+1]d // Fibonacci's sequence
Ah hah! We have a precise formula for the time it takes to run fib1(n). However, the formula still looks a little ugly. Let B = max(c,d) and S = min(c,d), then it follows that
(2F[n+1]-1) S ≤ T[n] ≤ (2F[n+1]-1) B
where B and S are off by a few miliseconds, depending on the speed of your computer. In fact, the values of B,S are also in the order of miliseconds. If your computer is twice as fast then probably B,S are twice as small. Now, if we use the golden-ratio formula for F[n] then because as
, we can estimate
.
Overall, there are two constants C ≤ D such that
.
Faster computers reduce those constants but they are limited by the speed of light, and the bulk of the run time of fib1() lies in the exponential factor . This explains the exponential-like curve we saw above.
1.3. Asymptotic analysis
Let’s play “devil’s advocate” and assume that we have an extremely fast computer which allows for C equals 1 nanosecond (10 to the minus 9). Recall that the golden ratio is slightly larger than 1.6. Then, fib1(140) takes at least seconds. The number of seconds since the big-bang is currently estimated to be
. If we had a computer which is a million times faster, does it help in running
fib1(140)? Absolutely not.
Asymptotic analysis of algorithms refers to a technique of doing back-of-the-envelope estimation of an algorithm’s run-time and/or space requirement, without worrying about the speed of the computer that the algorithm is running on. As we have seen above, if an algorithm is very bad then running it on a super-computer does not help much. We might be able to compute, say, fib1(120) but fib1(160) is certainly impossible. The factor is the correct measure of the run-time “quality” of
fib1(), while the constant factor is not. We can always reduce the constant factor by running the algorithm on a faster computer, but we can only change the factor — the dominating factor — by designing a new algorithm.
What we first need is a formal way to state an intuitive statement such as “this algorithm intrinsically runs in time “. Let
be any two functions from the natural numbers to non-negative real numbers.
Big-O
We write (read: f(n) is big-O of g(n)) if and only if there is a constant
and a natural number
such that
. In English, the function
is bounded above by some constant scaling of
when
is sufficiently large.
In the fib1() example, we have and
. The first relation
says that the growth-rate of T[n] is at most as much as the growth-rate of
(up to a constant factor), and the second relation
states that the growth-rate of
is also bounded by the growth rate of
. In other words, the shape of the graph for T[n] is very similar to that of the shape of the graph for
. In fact, we have seen that T[n] is sandwiched between two copies of
, which are different by some constant ratio.
Big-Omega and Theta
We write if and only if
. In other words,
if and only if there exists a constant C and some number
such that
. In the
fib1() example, .
If and
, then we say that
is Theta of
and write
. In the
fib1() example, we have . When
, they have the same growth-rate as n gets large. Two algorithms which run in times
and
on an input of size n are considered to have the same asymptotic running time.
If , then the algorithm with
run time is at least as good as the algorithm with
run time. When
we think of
as an asymptotic upper bound for
.
Conversely, when we say
is an asymptotic lower bound for
. We will consider more example functions later. Let us now get back to the Fibonacci number computation problem and try to improve
fib1(), with the knowledge that the running time of fib1() is in the (asymptotic) order of : an exponential run time. In fact,
fib1() also has an exponential space complexity because of all the recursive calls.
Exercise: suppose for each function call the operating system pushes x bytes of memory onto a “function stack”. The memory will be released when the function returns (and thus its function stack space will be popped out of the function stack). Roughly how many bytes are used by a call to fib1(n)?
2. Better algorithms for Fibonacci sequence
2.1. A slightly better solution
/**
* -----------------------------------------------------------------------
* a better implementation with a linear time/space
* -----------------------------------------------------------------------
*/
unsigned long long fib2(unsigned long n) {
if (n <= 1) return n;
vector<unsigned long long> A;
A.push_back(0); A.push_back(1);
for (int i=2; i<=n; i++) {
A.push_back(A[i-1]+A[i-2]);
}
return A[n];
}
This algorithm uses a vector. Initially the vector is empty, and each time we are able to compute A[i] using two previous entries A[i-1] and A[i-2] we append the vector with A[i]. If we assume the push_back operation takes a constant amount of time (machine dependent, but still a constant), then the total running time is O(n), and the total space required is also O(n). (Here, we also assume that the memory space needed to store a vector of n elements is about O(n). We will see later in the course when we implement the vector class that both the assumptions are valid.)
There is no recursion in this case. The for loop has about n-1 steps, each step takes a constant amount of work. Thus, the overall running time is . How good is this run time compared to the
we had earlier? Here’s a table of run time:
| n | 106 | 107 | 108 | 109 |
| # secs | 1 | 1 | 9 | eats up all my CPU/RAM |
This is dramatically better. We were only able to get up to the tens, and now n can be as large as a hundred million. Well, mathematically it is exponentially better. However, when n = 109 and beyond, my computer becomes extremely slow. The vector class is sort of like an array but the memory space for it has to be dynamically allocated. Those operations’ cost catch up to us eventually, even when they are hidden behind the constant in the big-O notation. Furthermore, when we consume so much memory, the assumption that each push_back takes constant time is no longer valid! Even if each element in the vector is only 8 bytes long (sizeof(unsigned long long) in my MacBook Pro), we need at least 8GB of memory space and my laptop has only 4GB RAM in total. Remember that in a 64-bit machine like mine, theoretically we can have up to 264 ~ 18*1018 bytes of memory. (Question: how large must your computer case be to hold that much of RAM?) So, with only 4GB of RAM, a lot of swapping in and out of memory is done and the operations become increasingly slow. I had to kill the process at n=109.
Now, suppose we keep the same algorithm but we change the data structure to something lighter than the vector, namely we use an array:
unsigned long long fib2(unsigned long n) {
if (n <= 1) return n;
unsigned long long* A = new unsigned long long[n];
A[0] = 0; A[1] = 1;
for (int i=2; i<=n; i++) {
A[i] = A[i-1]+A[i-2];
}
unsigned long long ret = A[n];
delete [] A;
return ret;
}
| n | 106 | 107 | 108 | 109 |
| # secs | 1 | 1 | 1 | got a segmentation fault! |
So we did get quite a dramatic improvement at n = 108, but at n = 109 we got a segmentation fault. This example illustrates 2 points. First, the data structure choice matters — even beyond asymptotic analysis. Second, some of the assumptions we made in asymptotic analysis which holds true for “small” n does not hold true for large n when space consumption is taken into account. A linear-space algorithm on huge inputs is still pretty bad.
2.2. A much better solution
Now, to compute F[n] we only need to hold the previous two values (constant space!). This simple observation leads to the following solution. This strategy is actually called dynamic programming, something you will learn about in CSE331 and CSE431. Please take those courses! They will make you a better computer engineer, a better computer scientist. I guarantee it.
/**
* -----------------------------------------------------------------------
* a better algorithm with a linear time, constant space
* -----------------------------------------------------------------------
*/
unsigned long long fib3(unsigned long n) {
if (n <= 1) return n;
unsigned long long a=0, b=1, temp;
unsigned long i;
for (unsigned long i=2; i<= n; i++) {
temp = a + b; // F[i] = F[i-2] + F[i-1]
a = b; // a = F[i-1]
b = temp; // b = F[i]
}
return temp;
}
The data for this fib3() algorithm are as follows.
| n | 108 | 109 | 1010 | 1011 |
| # secs | 1 | 3 | 35 | 359 |
The run-time still exhibits the exponential behavior, but this time it is exponential in the exponents 8, 9, 10, 11, etc. That makes sense, because a run time of n is exponential in log(n). Note also that we don’t get the correct answer because the 108th Fibonacci number is way larger than the largest number representable by an unsigned long long. But, that’s not our concern here. We want to analyze the running time of the algorithms. And, theoretically, if we had a better data type (an Integer class you can build on your own) then the algorithms will work.
2.3. An even faster solution
You might have thought fib3() is the best one can hope for, and you would be very wrong. The best we can hope for is exponentially faster than fib3(), and thus double-exponentially faster than fib1(). Now the title “the worst algorithm in the history of humanity” doesn’t sound far-fetch, does it? To design a good algorithm,we will often need to know a fair bit of mathematics, and use our knowledge in a smart way. Knowing math alone is not going to make it. Mathematicians, even the very best ones, aren’t necessarily good algorithm designers.
Here’s the key to the new algorithm:
and, in general,
Thus, we have
So, we can compute both F[n+1], F[n] by computing a matrix power and multiply the result with the vector
. How do we compute the
nth power of a matrix quickly?
The repeated squaring trick
In fact, how do we even compute the nth power of an integer quickly, say 3n. Here is a straightforward implementation:
const int base=3;
/**
* -----------------------------------------------------------------------
* a straightforward algorithm for computing base^n
* -----------------------------------------------------------------------
*/
unsigned long long power1(unsigned long n) {
unsigned long i;
unsigned long long ret=1;
for (unsigned long i=0; i<n; i++)
ret *= base;
return ret;
}
The above algorithm runs in time O(n) because it takes n steps and each step takes about a constant amount of time. (See pitfall below.). If you try to run the above for n = 1010 it takes about 44 seconds. It turns out we can do much better! The idea is not to multiply one base at a time. For example, if we want to compute 216, we can compute 28 somehow and then square the result instead of multiplying the result with 2 8 times. To compute 28, we square 24 and so on. The following algorithm should be self-explanatory:
const int base=3;
/**
* -----------------------------------------------------------------------
* a recursive algorithm using the repeated squaring trick
* -----------------------------------------------------------------------
*/
unsigned long long power2(unsigned long n) {
unsigned long long ret;
if (n == 0) return 1;
if (n % 2 == 0) {
ret = power2(n/2);
return ret * ret;
} else {
ret = power2((n-1)/2);
return base * ret * ret;
}
}
The above algorithm on n = 1019 takes … less than 1 second. I could not test for n = 1020 because that would be larger than the largest unsigned long (64-bit ingeger). This is really remarkable! The power1 algorithm runs in linear time, obviously. The power2 algorithm is not that easy to analyze.
Let T[n] denote the run-time of algorithm power2 on input n. When n is even, let’s say the time it takes is some constant c plus the time is take to compute power2(n/2). Similarly, when n is odd, let’s say it takes about d units of time (should be miliseconds) plus T[(n-1)/2]. Also, suppose T[0] == e. So we have the following recurrence relation:
T[0] = e T[n] = c + T[n/2] // if n is even T[n] = d + T[(n-1)/2] // if n is odd
We will apply the guess and induct strategy. When n is even, the binary representation of n/2 is the binary representation of n shifted to the right 1 bit. When n is odd, the binary representation of (n-1)/2 is also the 1-bit shift of the binary representation of n. Hence, each bit 1 in the binary representation of n accounts for d units and each bit 0 accounts for c units. Overall, it follows that
T[n] = O(# bits in the binary representation of n) = O(log n).
Back to the Fibonacci number problem
Well, we will use exactly the same idea. We can now compute F[1019] in under 1 second! Again, I didn’t try larger numbers because we run out of space for unsigned long. Here’s the full code listing for all fibi routines:
// fib.cpp: several routines to compute the n'th Fibonacci number
// usage: fib i n, where i is from 1 to 4, which picks an algorithm
// and n is the order of the Fibonacci number we want to compute
#include <iostream>
#include <sstream>
#include <vector>
using namespace std;
struct Matrix { // a 2x2 matrix
unsigned long long a11;
unsigned long long a12;
unsigned long long a21;
unsigned long long a22;
};
const Matrix base = {1, 1, 1, 0};
unsigned long long fib1(unsigned long); // straightforward implementation
unsigned long long fib2(unsigned long);
unsigned long long fib3(unsigned long);
unsigned long long fib4(unsigned long); // repeated squaring
int main(int argc, char* argv[]) {
if (argc != 3) return 0;
int algo = atoi(argv[1]);
unsigned long long (*fib)(unsigned long); // pointer to a fibi
switch (algo) {
case 1: fib = &fib1; break;
case 2: fib = &fib2; break;
case 3: fib = &fib3; break;
case 4: fib = &fib4; break;
default: return 0;
}
stringstream ss(argv[2]);
unsigned long n;
ss >> n;
cout << "Fib[" << n << "] = " << fib(n) << endl;
return 0;
}
/**
* -----------------------------------------------------------------------
* the most straightforward algorithm of Fibonacci comp. algo.
* -----------------------------------------------------------------------
*/
unsigned long long fib1(unsigned long n) {
if (n <= 1) return n;
return fib1(n-1) + fib1(n-2);
}
/**
* -----------------------------------------------------------------------
* a better algorithm with a linear time/space
* -----------------------------------------------------------------------
*/
unsigned long long fib2(unsigned long n) {
/*
// this is one implementation option
if (n <= 1) return n;
vector<unsigned long long> A;
A.push_back(0); A.push_back(1);
for (unsigned long i=2; i<=n; i++) {
A.push_back(A[i-1]+A[i-2]);
}
return A[n];
*/
// this one should be faster
if (n <= 1) return n;
unsigned long long* A = new unsigned long long[n+1];
A[0] = 0; A[1] = 1;
for (unsigned long i=2; i<=n; i++) {
A[i] = A[i-1]+A[i-2];
}
unsigned long long ret = A[n];
delete [] A;
return ret;
}
/**
* -----------------------------------------------------------------------
* a better algorithm with a linear time, constant space
* -----------------------------------------------------------------------
*/
unsigned long long fib3(unsigned long n) {
if (n <= 1) return n;
unsigned long long a=0, b=1, temp;
unsigned long i;
for (unsigned long i=2; i<= n; i++) {
temp = a + b; // F[i] = F[i-2] + F[i-1]
a = b; // a = F[i-1]
b = temp; // b = F[i]
}
return temp;
}
/**
* -----------------------------------------------------------------------
* multiply two matrices
* -----------------------------------------------------------------------
*/
Matrix matrix_mult(Matrix x, Matrix y) {
Matrix temp;
temp.a11 = x.a11*y.a11 + x.a12*y.a21;
temp.a12 = x.a11*y.a12 + x.a12*y.a22;
temp.a21 = x.a21*y.a11 + x.a22*y.a21;
temp.a22 = x.a21*y.a12 + x.a22*y.a22;
return temp;
}
/**
* -----------------------------------------------------------------------
* compute base^n using repeated squaring
* -----------------------------------------------------------------------
*/
Matrix matrix_power(unsigned long n) {
Matrix unit = {1, 0, 0, 1};
Matrix temp;
if (n == 0) return unit;
if (n%2 == 0) {
temp = matrix_power(n/2);
return matrix_mult(temp, temp);
} else {
temp = matrix_power((n-1)/2);
return matrix_mult(temp, matrix_mult(temp, base));
}
}
/**
* -----------------------------------------------------------------------
* a better implementation with a logarithmic time, constant space
* -----------------------------------------------------------------------
*/
unsigned long long fib4(unsigned long n) {
if (n <= 1) return n;
Matrix mat = matrix_power(n-1);
return mat.a11;
}
3. A pitfall in our assumption
We have assumed that multiplying and adding two numbers take constant time. This is generally true if those numbers are small enough to store in some fixed number of bytes. This assumption is no longer true when we want to compute the 1019th Fibonacchi number, which requires at least bits to store. In general, since the Fibonacci numbers are in the order of
, the number
F[n] requires bits to store. And thus, adding
F[n] and F[n-1] will take time O(n). Let’s not delve into this level of details. The hand-wavy analysis above should give you a good idea of the type of asymptotic analysis to come in this course. We will generally not work with numbers that large, so the assumption that arithmetic operations can be done in constant time is a good one for the rest of the semester.



9 Comments
Thật sự thì cho tới trước khi đọc bài này em vẫn dừng lại ở i=3
.
Sau khi search một hồi, thì ở stackoverflow đã có người có đưa ra công thức tương tự như thầy (nhưng không diễn giải chi tiết ạ).
http://stackoverflow.com/questions/4768781/time-complexity-of-fibonacci-algorithm
@Công Nguyễn, những ai đã từng học một chút về enumerative combinatorics thì chắc sẽ biết cách viết phương trình sai phân theo dạng ma trận. Dừng lại ở i=3 chắc cũng hữu lý. Trên thực tế mấy khi ta dùng Fibonacci
. Nhưng mà cái ý tưởng bình phương lặp thì xuất hiện nhiều nơi.
“Trên thực tế mấy khi ta dùng Fibonacci” <= Not really, Fibonacci numbers are being used everywhere, it called 'golden ratio'. Good designers (including software UI designers), architectures, photographers, artists… must know and care a lot about it
golden ratio ~ Fib(n)/Fib(n-1)
@NQH : Anh ơi typo:
“the age is the universe is ” –> “the age of the universe is”
@Duc Ho: cảm ơn — đã sửa!
Kì này em cũng đang học môn Algorithms. Ví dụ về dãy Fibonacci cũng là ví dụ đầu tiên, có lẽ đây là một ví dụ kinh điển. Tuy nhiên, do lớp em không phải sv khoa KHMT nên các bước trong thuật toán chỉ được mô tả (vẫn có các lệnh cơ bản trong C) chứ không viết tường minh như trong bài giảng.
Em thích câu này trong bài giảng trên: “To design a good algorithm,we will often need to know a fair bit of mathematics, and use our knowledge in a smart way”. Hôm trước em có một bài tập thế này: Cho trước một tập S và số x. Thiết kế một thuật toán để kiểm tra xem x có là tổng của hai phần tử trong S không với thời gian chạy là O(n.logn). Theo suy nghĩ thông thường thì cách nhanh nhất là xét hai số rồi xét tổng. Nhưng như vậy thì sẽ mất thời gian là O(n^2). Cách giải của nó cũng đơn giản, nhưng mà khéo.
Có một chi tiết nhỏ này nữa, đó là việc định nghĩa big-O thì như trên cần thêm điều kiện là f(n) không âm khi n đủ to; nếu không thì phải thêm giá trị tuyệt đối. Tuy nhiên nếu GS giả định là các hàm đều không âm (do đều là các hàm đếm số bước) thì định nghĩa như trên là ổn. Nếu định nghĩa cho tổng quát thì chưa hoàn toàn chặt chẽ về mặt toán học.
Chờ đợi những bài tiếp theo của GS.
Cảm ơn Tùng. Lớp này tập trung vào CTDL, nên ít phần thuật toán hơn một chút. Các hàm f, g tôi đều định nghĩa từ N vào R+, chắc bạn không để ý dấu cộng.
Fib2 loi~ roi` anh Hung oi,
should be:
A= new unsigned long long[n+1];
Cảm ơn Vinh. Đã sửa.