Photoshop

Algorithmic complexity. Search algorithms

You've probably come across notations like O(log n) more than once or heard phrases like “logarithmic computational complexity” addressed to some algorithms. And if you still don’t understand what this means, this article is for you.

Difficulty rating

The complexity of algorithms is usually measured by their execution time or memory usage. In both cases, the complexity depends on the size of the input data: an array of 100 elements will be processed faster than a similar one of 1000. However, few people are interested in the exact time: it depends on the processor, data type, programming language and many other parameters. Only the asymptotic complexity is important, i.e. the complexity when the size of the input data tends to infinity.

Let's say some algorithm needs to perform 4n 3 + 7n conditional operations to process n elements of input data. As n increases, the final operating time will be significantly more affected by raising n to a cube than by multiplying it by 4 or adding 7n. Then they say that the time complexity of this algorithm is O(n 3), i.e., it depends cubically on the size of the input data.

The use of capital O (or so-called O-notation) comes from mathematics, where it is used to compare the asymptotic behavior of functions. Formally, O(f(n)) means that the running time of the algorithm (or the amount of memory occupied) grows depending on the size of the input data no faster than some constant multiplied by f(n) .

Examples

O(n) - linear complexity

For example, the algorithm for finding the largest element in an unsorted array has this complexity. We will have to go through all n elements of the array to understand which one is the maximum.

O(log n) - logarithmic complexity

The simplest example is binary search. If the array is sorted, we can check if it contains a particular value using the halving method. Let's check the middle element, if it is larger than the one we are looking for, then we will discard the second half of the array - it is definitely not there. If it is less, then vice versa - we will discard the initial half. And so we will continue to divide in half, and in the end we will check log n elements.

O(n 2) - quadratic complexity

For example, the insertion sort algorithm has this complexity. In the canonical implementation, it consists of two nested loops: one to go through the entire array, and the second to find the place for the next element in the already sorted part. Thus, the number of operations will depend on the size of the array as n * n, i.e. n 2.

There are other difficulty ratings, but they are all based on the same principle.

It also happens that the running time of the algorithm does not depend at all on the size of the input data. Then the complexity is denoted as O(1) . For example, to determine the value of the third element of an array, you do not need to remember the elements or go through them many times. You always just need to wait for the third element in the input data stream and this will be the result, which takes the same time to calculate for any amount of data.

The same is true for memory assessments when this is important. However, algorithms can use significantly more memory when increasing the size of the input data than others, but still run faster. And vice versa. This helps to choose the best ways to solve problems based on current conditions and requirements.

We already know that correctness is far from the only quality that a good program. One of the most important is efficiency, which primarily characterizes lead time programs for various input data (parameter).

Finding the exact dependency for a specific program is a rather difficult task. For this reason, they are usually limited asymptotic estimates this function, that is, a description of its approximate behavior for large values of the parameter. Sometimes for asymptotic estimates they use the traditional relation (read “Big O”) between two functions , the definition of which can be found in any textbook on mathematical analysis, although it is more often used equivalence relation(read "theta big"). Its formal definition is, for example, in the book, although for now it will be enough for us to understand this issue in outline.

As a first example, let's return to the programs we just discussed for finding the factorial of a number. It is easy to see that the number of operations that must be performed to find the factorial ! numbers to a first approximation are directly proportional to this number, because the number of repetitions of the loop (iterations) in this program is equal to . In such a situation, it is customary to say that the program (or algorithm) has linear complexity(difficulty or ).

Is it possible to calculate factorial faster? It turns out yes. It is possible to write a program that will give the correct result for the same values for which all the above programs do, without using either iteration or recursion. Its complexity will be , which actually means organizing calculations according to some formula without using loops and recursive calls!

No less interesting is the example of calculating the th Fibonacci number. In the process of its research, it was actually already found out that its complexity is exponential and equal to . Such programs are practically not applicable in practice. It is very easy to verify this by trying to calculate the 40th Fibonacci number using it. For this reason, the following task is quite relevant.

Problem 5.4 linear complexity.

Here is a solution to this problem in which the variables j and k contain the values of two consecutive Fibonacci numbers.

Program text

public class FibIv1 ( public static void main(String args) throws Exception ( int n = Xterm.inputInt("Enter n ->< 0) { Xterm.print(" не определено\n"); } else if (n < 2) { Xterm.println(" = " + n); } else { long i = 0; long j = 1; long k; int m = n; while (--m >0) ( k = j; j += i; i = k; ) Xterm.println(" = " + j); ) ) )

The next question is quite natural - is it possible to find Fibonacci numbers even faster?

After studying certain branches of mathematics, it is quite simple to derive the following formula for the th Fibonacci number, which is easy to check for small values:

It might seem that based on it, it would be easy to write a program with complexity that does not use iteration or recursion.

Program text

public class FibIv2 ( public static void main(String args) throws Exception ( int n = Xterm.inputInt("Enter n -> "); double f = (1.0 + Math.sqrt(5.)) / 2.0; int j = (int)(Math.pow(f,n) / Math.sqrt(5.) + 0.5); Xterm.println("f(" + n + ") = " + j); ) )

This program actually uses a call to the exponentiation function ( Math.pow(f,n) ), which cannot be implemented in faster than logarithmic time (). Algorithms in which the number of operations is approximately proportional (in computer science it is customary not to indicate the base of the binary logarithm) are said to have logarithmic complexity ().

To calculate the th Fibonacci number, there is such an algorithm, the software implementation of which we will give without additional comments, otherwise you need to explain too much (the connection between Fibonacci numbers and powers of a certain matrix of order two, the use of classes for working with matrices, an algorithm for quickly raising a matrix to a power) .

Problem 5.5. Write a program that prints the th Fibonacci number that has logarithmic complexity.

Program text

public class FibIv3 ( public static void main(String args) throws Exception ( int n = Xterm.inputInt("Enter n -> "); Xterm.print("f(" + n + ")"); if (n< 0) { Xterm.println(" не определено"); } else if (n < 2) { Xterm.println(" = " + n); } else { Matrix b = new Matrix(1, 0, 0, 1); Matrix c = new Matrix(1, 1, 1, 0); while (n>0) ( if ((n&1) == 0) ( n >>>= 1; c.square(); ) else ( n -= 1; b.mul(c); ) ) Xterm.println(" = " + b.fib()); ) ) ) class Matrix ( private long a, b, c, d; public Matrix(long a, long b, long c, long d) ( this.a = a; this.b = b; this.c = c; this.d = d; ) public void mul(Matrix m) ( long a1 = a*m.a+b*m.c; long b1 = a*m.b+b*m.d; long c1 = c*m.a+ d*m.c; long d1 = c*m.b+d*m.d; a = a1; b = b1; c = c1; d = d1; ) public void square() ( mul(this); ) public long fib( ) ( return b; ) )

If you try to calculate the ten millionth Fibonacci number using this and the previous programs, the difference in calculation time will be quite obvious. Unfortunately, the result will be incorrect (in both cases) due to the limited range of long numbers.

In conclusion, we present comparison table execution times of algorithms with varying complexity and explain why, as computer speed increases, the importance of using fast algorithms increases significantly.

Let's consider four algorithms for solving the same problem, having complexities , , and , respectively. Let's assume that the second of these algorithms requires exactly one minute of time to execute on some computer with the parameter value. Then the execution time of all these four algorithms on the same computer for different parameter values will be approximately the same as 10 300000 years

When novice programmers test their programs, the values of the parameters on which they depend are usually small. Therefore, even if an ineffective algorithm was used when writing a program, this may go unnoticed. However, if similar program try to apply it in real conditions, its practical unsuitability will manifest itself immediately.

As the speed of computers increases, the values of the parameters for which the operation of a particular algorithm is completed in an acceptable time also increase. Thus, the average value of , and, consequently, the ratio of the execution times of the fast and slow algorithms increases. How faster computer, the greater the relative loss when using a bad algorithm!

If there is one algorithm to solve a problem, then you can come up with many other algorithms to solve the same problem. Which algorithm is best suited to solve a particular problem? What criteria should be used to select an algorithm from among many possible ones? Typically, we make judgments about an algorithm based on its evaluation by a human performer. An algorithm seems complicated to us if, even after carefully studying it, we cannot understand what it does. We can call the algorithm complex and confusing due to the fact that it has a branched logical structure containing many condition checks and transitions. However, for a computer, executing a program that implements such an algorithm will not be difficult, since it executes one command after another, and it does not matter to the computer whether it is a multiplication operation or a condition test.

Moreover, we can write a cumbersome algorithm in which repeated actions are written out in a row (without using a cyclic structure). However, from the point of view of the computer implementation of such an algorithm, there is practically no difference whether a loop operator is used in the program (for example, the word “Hello” is displayed on the screen 10 times) or the operators for displaying the word “Hello” are written 10 times sequentially.

To evaluate the effectiveness of algorithms, the concept is introduced complexity of the algorithm.

Definition. Computational process generated by an algorithm is the sequence of algorithm steps passed during the execution of this algorithm.

Definition. Algorithm complexity- number of elementary steps in computing process this algorithm.

Please note, it is in the computational process, and not in the algorithm itself. Obviously, to compare the complexity of different algorithms, it is necessary that the complexity be calculated in the same elementary actions.

Definition. Time complexity of the algorithm is the time T required to complete it. It is equal to the product of the number of elementary actions and the average time to complete one action: T = kt.

Because the t depends on the executor implementing the algorithm, it is natural to assume that the complexity of the algorithm primarily depends on k. Obviously, to the greatest extent, the number of operations when executing the algorithm depends on the amount of data being processed. Indeed, alphabetizing a list of 100 names requires significantly fewer operations than ordering a list of 100,000 names. Therefore, the complexity of the algorithm is expressed as a function of the amount of input data.

Let there be an algorithm A. For it there is a parameter A, characterizing the amount of data processed by the algorithm, this parameter is often called dimension of the problem. Let us denote by T(n) algorithm execution time, through f- some function from P.

Definition. Let's say that T(n) algorithm has a growth order f(n), or, in another way, the algorithm has theoretical complexityO*(f(n)), if there are such constants from 1, from 2> 0 and number n 0, What c 1 f(n)) < T(p)< c 2 f(n) in front of everyone n >= n 0 . Here it is assumed that the function f(n) non-negative, but at least for n >= n 0 . If for T(p) condition T(n) is satisfied<= cf(n), then they say that the algorithm has theoretical complexity O(n) (read "o" big from n).

So, for example, an algorithm that performs only operations of reading data and storing them in RAM has linear complexity 0(n). The direct selection sorting algorithm has quadratic complexity 0(n 2) , since when sorting any array you need to do (p 2 - p)/2 comparison operations (in this case, there may be no permutation operations at all, for example, on an ordered array; in the worst case, it will be necessary to perform P permutations). And the complexity of the multiplication algorithm matrices(tables) of size P x P it will already be cubic O(n 3) , since computing each element of the resulting matrix requires P multiplications and n - 1 additions, and all these elements n 2.

To solve the problem, algorithms of any complexity can be developed. It is logical to use the best among them, i.e. having the least complexity.

Along with complexity, an important characteristic of the algorithm is efficiency. Efficiency means the fulfillment of the following requirement: not only the entire algorithm, but also each step of it must be such that the performer is able to complete them in a finite, reasonable time. For example, if an algorithm that produces a weather forecast for the next day will be executed for a week, then such an algorithm is simply not needed by anyone, even if its complexity is minimal for the class of similar problems.

If we consider algorithms implemented on a computer, then to the requirement of execution in a reasonable time is added the requirement of execution in a limited volume random access memory.

It is known that in many programming languages there is no operation of exponentiation, therefore, the programmer must write such an algorithm independently. The operation of exponentiation is implemented through multiplication operations; As the exponent increases, the number of multiplication operations that take quite a long time to complete increases. Therefore, the question of creating an effective algorithm for exponentiation is relevant.

Let's consider a method for quickly calculating the natural power of a real number X. This method was described before our era in Ancient India.

Let's write it down P in the binary number system.

Let's replace each "1" in this entry with a pair of letters KH, and each "O" is a K.

Let's cross out the leftmost pair KH.

The resulting string, read from left to right, gives a quick calculation rule xn, if the letter TO be considered as the operation of squaring the result, and the letter X- as an operation of multiplying the result by X. Initially the result is X.

Example 1. Raise x to the power n = 100.

Let's convert the number n to the binary number system: n = 100 10 = 1100100,

We build the sequence KHKHKKHKK

Cross out AH on the left => KHKKKHKK

Calculate the desired value

K => square x => get x 2,

X => multiply the result by x => we get x 3

K => square the result => get x 6

K => square the result => we get x 12,

K => square the result => we get x 24,

X => multiply the result by x => we get x 25

K => square the result => get x 50

K => square the result => we get x 100.

Thus, we calculated the hundredth power of numbers in just 8 multiplications. This method is quite efficient, since it does not require additional RAM to store intermediate results. However, note that this method is not always the fastest.

For example, when n = 49, students can propose the following effective algorithm for exponentiation:

When implementing this algorithm, 7 multiplication operations were performed (instead of 48 operations when calculating “head-on”) and 3 cells were used (to store the variable X, to store the value x 16 and to store the current result obtained at each step). If we use the "Fast exponentiation" algorithm, we will get the same number of operations (7 multiplication operations), but to implement this algorithm we will only need 2 cells (to store the variable X and to store the current result).

The multiplication operation itself is implemented in the processor not “head-on”, but again through efficient recursive algorithms. You can read about one of these algorithms in the Reader. We will look at the “fast” multiplication algorithm, which was known back in Ancient Egypt; it is also called the “Russian” or “peasant” method. Let's look at an example of using this algorithm.

Example 2. Let's multiply 23 by 43 using the "Russian" method.

Answer: 23 x 43 = 23 + 46 + 184 + 736 = 989.

The total includes the numbers in the first column, next to which there are odd numbers in the second column.

O(1) - Most operations in a program are performed only once or only a few times. Algorithms of constant complexity. Any algorithm that always requires the same amount of time, regardless of the data size, has constant complexity.

O(N) - Program running time linear usually when each element of the input data needs to be processed only a linear number of times.

O(N 2), O(N 3), O(N a) - Polynomial complexity.

O(N 2)-quadratic complexity, O(N 3)-cubic complexity

O(Log(N)) - When is the program running time? logarithmic, the program starts to run much slower as N increases. This running time is usually found in programs that divide big problem into small ones and solve them separately.

O(N*log(N)) - This is the operating time for those algorithms that divide a big problem into small ones, and then, having solved them, combine their solutions.

O(2 N) = Exponential complexity. Such algorithms most often arise from an approach called brute force.

A programmer must be able to analyze algorithms and determine their complexity. The time complexity of an algorithm can be calculated based on an analysis of its control structures.

Algorithms without loops and recursive calls have constant complexity. If there is no recursion and loops, all control structures can be reduced to structures of constant complexity. Consequently, the entire algorithm is also characterized by constant complexity.

Determining the complexity of an algorithm mainly comes down to analyzing loops and recursive calls.

For example, consider an algorithm for processing array elements.
For i:=1 to N do
Begin
...
End;

The complexity of this algorithm is O(N), because The loop body is executed N times, and the complexity of the loop body is O(1).

If one loop is nested within another and both loops depend on the size of the same variable, then the entire design is characterized by quadratic complexity.
For i:=1 to N do
For j:=1 to N do
Begin
...
End;
The complexity of this program is O(N 2).

Exist two ways to analyze the complexity of an algorithm: bottom-up (from internal control structures to external ones) and descending (from external and internal).

O(H)=O(1)+O(1)+O(1)=O(1);
O(I)=O(N)*(O(F)+O(J))=O(N)*O(condition dominants)=O(N);
O(G)=O(N)*(O(C)+O(I)+O(K))=O(N)*(O(1)+O(N)+O(1))=O (N 2);
O(E)=O(N)*(O(B)+O(G)+O(L))=O(N)* O(N 2)= O(N 3);
O(D)=O(A)+O(E)=O(1)+ O(N 3)= O(N 3)

Complexity of this algorithm O(N 3).

Typically, about 90% of a program's running time requires repetition, and only 10% consists of actual calculations. Analysis of the complexity of programs shows which fragments these 90% fall into - these are the cycles of the greatest nesting depth. Repetitions can be organized as nested loops or nested recursion.

This information can be used by the programmer to build a more efficient program as follows. First of all, you can try to reduce the depth of nesting of repetitions. You should then consider reducing the number of statements in the most deeply nested loops. If 90% of the execution time is spent executing inner loops, then a 30% reduction in these small sections results in a 90% * 30% = 27% reduction in the execution time of the entire program.

This is the simplest example.

A separate branch of mathematics deals with the analysis of the effectiveness of algorithms, and finding the most optimal function is not so simple.

Let's evaluate algorithm for binary search in an array - dichotomy.

The essence of the algorithm: we go to the middle of the array and look for a match between the key and the value of the middle element. If we can't find a match, we look at the relative size of the key and the value of the middle element and then move to the bottom or top half of the list. In this half we again look for the middle and again compare it with the key. If that doesn’t work, divide the current interval by half again.

function search(low, high, key: integer): integer;
var
mid, data: integer;
begin
while low<=high do
begin
mid:=(low+high) div 2;
data:=a;
if key=data then
search:=mid
else
if key< data then
high:=mid-1
else
low:=mid+1;
end;
search:=-1;
end;

The first iteration of the loop deals with the entire list. Each subsequent iteration halves the size of the sublist. So, the list sizes for the algorithm are

n n/2 1 n/2 2 n/2 3 n/2 4 ... n/2 m

Eventually there will be an integer m such that

n/2 m<2 или n<2 m+1

Since m is the first integer for which n/2 m<2, то должно быть верно

n/2 m-1 >=2 or 2 m =

It follows that
2 m = We take the logarithm of each side of the inequality and get
m= The value of m is the largest integer that =<х.
So O(log 2 n).

Typically the problem to be solved has a natural "size" (usually the amount of data it processes) which we call N. Ultimately, we would like to have an expression for the time it takes the program to process data of size N as a function of N. Typically we are interested in the average case - the expected running time of the program on "typical" input data, and the worst case - the expected running time of the program on the worst input data.

Some algorithms are well understood in the sense that the exact mathematical formulas for the average and worst cases are known. Such formulas are developed by carefully studying the programs to find the running time in terms of mathematical characteristics, and then performing a mathematical analysis of them.

Several important reasons for this type of analysis:
1. Programs written in a high-level language are translated into machine codes, and it can be difficult to understand how long it will take to execute a particular statement.
2. Many programs are very sensitive to input data, and their effectiveness can greatly depend on them. The average case may turn out to be a mathematical fiction unrelated to the data on which the program is used, and the worst case is unlikely.

Best, average and worst cases play a very big role in sorting.
Amount of calculations when sorting

O-complexity analysis has become widespread in many practical applications. However, its limitations must be clearly understood.

TO main disadvantages of the approach The following can be included:
1) for complex algorithms, obtaining O-estimates is usually either very labor-intensive or practically impossible,
2) it is often difficult to determine the difficulty "on average",
3) O-scores are too coarse to reflect more subtle differences between algorithms,
4) O-analysis provides too little information (or none at all) for analyzing the behavior of algorithms when processing small amounts of data.

Defining complexity in O-notation is far from a trivial task. In particular, the efficiency of binary search is determined not by the depth of nesting of loops, but by the method of selecting each successive attempt.

Another difficulty is defining the “average case.” This is usually quite difficult to do due to the impossibility of predicting the operating conditions of the algorithm. Sometimes an algorithm is used as a fragment of a large, complex program. Sometimes the efficiency of the hardware and/or operating system, or some compiler component, significantly affects the complexity of the algorithm. Often the same algorithm can be used in many different applications.

Because of the difficulties associated with performing an "average" time complexity analysis of an algorithm, one often has to settle for worst-case and best-case estimates. These scores essentially define the lower and upper bounds of "average" difficulty. Actually, if it is not possible to analyze the complexity of an algorithm “on average,” it remains to follow one of Murphy’s laws, according to which the estimate obtained for the worst case can serve as a good approximation of the complexity “on average.”

Perhaps the main drawback of O-functions is their excessive coarseness. If algorithm A performs a certain task in 0.001*N s, while algorithm B requires 1000*N s to solve the same problem, then B is a million times faster than A. Nevertheless, A and B have the same same time complexity O(N).

Most of this lecture was devoted to the analysis of the time complexity of algorithms. There are other forms of complexity that should not be forgotten: spatial and intellectual complexity.

Space complexity as the amount of memory required to execute a program has already been mentioned earlier.

Analyzing the intellectual complexity of an algorithm examines the understandability of algorithms and the complexity of their development.

All three forms of complexity are usually interrelated. Typically, when developing an algorithm with a good time complexity estimate, you have to sacrifice its spatial and/or intellectual complexity. For example, the quicksort algorithm is significantly faster than the selection sort algorithm. The price for increasing sorting speed is expressed in the larger amount of memory required for sorting. The need for additional memory for quicksort is associated with multiple recursive calls.

The quicksort algorithm is also characterized by greater intellectual complexity compared to the insertion sort algorithm. If you ask a hundred people to sort a sequence of objects, most likely most of them will use the selection sort algorithm. It is also unlikely that any of them will use quick sort. The reasons for the greater intellectual and spatial complexity of quicksort are obvious: the algorithm is recursive, it is quite difficult to describe, the algorithm is longer (meaning the program text) than simpler sorting algorithms.

Hello! Today's lecture will be a little different from the rest. It will differ in that it is only indirectly related to Java. However, this topic is very important for every programmer. We'll talk about algorithms. What is an algorithm? In simple terms, this is a certain sequence of actions that must be performed to achieve the desired result. We often use algorithms in everyday life. For example, every morning you are faced with a task: to come to school or work, and at the same time be:

Dressed
Clean
Well-fed

Which algorithm will allow you to achieve this result?

Wake up to an alarm clock.
Take a shower, wash your face.
Prepare breakfast, make coffee/tea.
Eat.
If you haven’t ironed your clothes since the evening, iron them.
Get dressed.
Leave the house.

This sequence of actions will definitely allow you to get the desired result. In programming, the whole point of our work is to constantly solve problems. A significant part of these tasks can be performed using already known algorithms. For example, you are faced with the task: sort a list of 100 names in an array. This task is quite simple, but it can be solved in different ways. Here is one solution: Algorithm for sorting names alphabetically:

Buy or download on the Internet “Dictionary of Russian Personal Names” 1966 edition.
Find every name on our list in this dictionary.
Write down on a piece of paper which page of the dictionary the name is on.
Put the names in order using the notes on a piece of paper.

Will such a sequence of actions allow us to solve our problem? Yes, it will completely allow it. Will this be the solution effective? Hardly. Here we come to another very important property of algorithms - their efficiency. The problem can be solved in different ways. But both in programming and in everyday life, we choose the method that will be most effective. If your task is to make a sandwich with butter, you can, of course, start by sowing wheat and milking a cow. But it will be ineffective solution - it will take a very long time and will cost a lot of money. To solve your simple problem, you can simply buy bread and butter. And the algorithm with wheat and cow, although it allows you to solve the problem, is too complex to apply it in practice. To assess the complexity of algorithms in programming, a special notation was created called Big-O (“big O”). Big-O allows you to estimate how much the execution time of an algorithm depends on the data passed into it. Let's look at the simplest example - data transfer. Imagine that you need to transmit some information in the form of a file over a long distance (for example, 5000 kilometers). Which algorithm will be the most efficient? It depends on the data he has to work with. For example, we have an audio file of 10 megabytes in size.

In this case, the most efficient algorithm would be to transfer the file over the Internet. It will only take a couple of minutes! So, let's voice our algorithm again: “If you need to transfer information in the form of files over a distance of 5000 kilometers, you need to use data transmission over the Internet.” Great. Now let's analyze it. Does it solve our problem? In general, yes, it completely solves it. But what can you say about its complexity? Hmm, now this is where things get interesting. The fact is that our algorithm very much depends on the incoming data, namely the size of the files. Now we have 10 megabytes and everything is fine. What if we need to transfer 500 megabytes? 20 gigabytes? 500 terabytes? 30 petabytes? Will our algorithm stop working? No, all these amounts of data can still be transferred. Will it take longer to complete? Yes, it will! Now we know an important feature of our algorithm: The larger the size of the data to be transferred, the longer the algorithm will take to complete.. But we would like to understand more precisely what this relationship looks like (between the size of the data and the time it takes to transfer it). In our case, the complexity of the algorithm will be linear. “Linear” means that as the volume of data increases, the time for its transmission will increase approximately proportionally. If there is 2 times more data, and it will take 2 times more time to transfer it. If there is 10 times more data, the transfer time will increase 10 times. Using Big-O notation, the complexity of our algorithm is given by O(N). This notation is best remembered for future use - it is always used for algorithms with linear complexity. Please note: we are not talking here at all about various “variable” things: Internet speed, the power of our computer, and so on. When assessing the complexity of an algorithm, this simply makes no sense - we have no control over it anyway. Big-O evaluates the algorithm itself, regardless of the “environment” in which it will have to work. Let's continue with our example. Let's say it eventually turns out that the size of the files to be transferred is 800 terabytes. If we transmit them over the Internet, the problem, of course, will be solved. There's just one problem: transmission over a standard modern link (at 100 megabits per second) that most of us use in our homes will take approximately 708 days. Almost 2 years! :O So, our algorithm is clearly not suitable here. We need some other solution! Suddenly, the IT giant Amazon comes to our aid! Its Amazon Snowmobile service allows you to load large amounts of data into mobile storage units and deliver them to the desired address by truck!

So we have a new algorithm! “If you need to transfer information in the form of files over a distance of 5,000 kilometers, and the process will take more than 14 days when transferred over the Internet, you need to use Amazon truck transport.” The figure of 14 days was chosen here randomly: let’s say this is the maximum period we can afford. Let's analyze our algorithm. What about speed? Even if the truck travels at just 50 km/h, it will cover 5,000 kilometers in just 100 hours. That's just over four days! This is much better than the Internet transmission option. What about the complexity of this algorithm? Will it also be linear, O(N)? No, it will not. After all, the truck doesn’t care how much you load it - it will still drive at approximately the same speed and arrive on time. Whether we have 800 terabytes, or 10 times more data, the truck will still get there in 5 days. In other words, the algorithm for delivering data via a truck constant difficulty. “Constant” means that it does not depend on the data passed to the algorithm. Put a 1GB flash drive in the truck and it will arrive in 5 days. Put disks with 800 terabytes of data there and it will arrive in 5 days. When using Big-O, constant complexity is denoted as O(1). Since we met O(N) And O(1), let's now look at more “programmer” examples :) Let's say you are given an array of 100 numbers, and the task is to output each of them to the console. You write a regular for loop that performs this task int numbers = new int [ 100 ] ; // ..fill the array with numbers for (int i: numbers) ( System. out. println (i) ; ) What is the complexity of the written algorithm? Linear, O(N). The number of actions that the program must perform depends on exactly how many numbers were passed into it. If there are 100 numbers in the array, there will be 100 actions (outputs on the screen). If there are 10,000 numbers in the array, 10,000 actions will need to be performed. Can our algorithm be improved? No. In any case we will have to do N passes through the array and execute N outputs to the console. Let's look at another example. public static void main(String args) (LinkedList < Integer> numbers = new LinkedList< >() ; numbers. add(0, 20202); numbers. add(0, 123); numbers. add(0, 8283); ) We have an empty LinkedList into which we insert some numbers. We need to estimate the complexity of the algorithm for inserting a single number into the LinkedList in our example, and how it depends on the number of elements in the list. The answer will be O(1) - constant complexity. Why? Please note: each time we insert a number at the beginning of the list. In addition, as you remember, when you insert a number into a LinkedList, the elements do not move anywhere - the links are redefined (if you suddenly forgot how LinkedList works, take a look at one of ours). If now the first number in our list is the number x, and we insert the number y at the beginning of the list, all that is needed is x. previous = y; y. previous = null; y. next = x; For this reference override we don't care how many numbers are in LinkedList now- at least one, at least a billion. The complexity of the algorithm will be constant - O(1).

Logarithmic complexity

Don't panic! :) If the word “logarithmic” makes you want to close the lecture and not read further, wait a couple of minutes. There will be no mathematical difficulties here (there are plenty of such explanations in other places), and we will analyze all the examples “on the fingers”. Imagine that your task is to find one specific number in an array of 100 numbers. More precisely, check whether it is there at all. As soon as the required number is found, the search must be stopped, and the entry “The required number has been found!” should be displayed in the console. Its index in the array = ....” How would you solve such a problem? Here the solution is obvious: you need to iterate through the array elements one by one, starting with the first (or last) and check whether the current number coincides with the desired one. Accordingly, the number of actions directly depends on the number of elements in the array. If we have 100 numbers, then we need to go to the next element 100 times and check the number for a match 100 times. If there are 1000 numbers, then there will be 1000 check steps. This is obviously linear complexity, O(N). Now we will add one clarification to our example: the array in which you need to find a number is sorted in ascending order. Does this change anything for our task? We can still search for the desired number by brute force. But instead we can use the well-known binary search algorithm.

In the top row of the image we see a sorted array. In it we need to find the number 23. Instead of iterating over the numbers, we simply divide the array into 2 parts and check the average number in the array. We find the number that is located in cell 4 and check it (second row in the picture). This number is 16, and we are looking for 23. The current number is less. What does this mean? What all previous numbers (which are located before the number 16) do not need to be checked: they will definitely be smaller than the one we are looking for, because our array is sorted! Let's continue the search among the remaining 5 elements. Please note: we only did one check, but we have already eliminated half of the possible options. We only have 5 elements left. We will repeat our step - again divide the remaining array by 2 and again take the middle element (line 3 in the figure). This number is 56, and it is larger than the one we are looking for. What does this mean? That we discard 3 more options - the number 56 itself, and two numbers after it (they are definitely greater than 23, because the array is sorted). We have only 2 numbers left to check (the last row in the figure) - numbers with array indices 5 and 6. We check the first of them, and this is what we were looking for - the number 23! Its index = 5! Let's look at the results of our algorithm, and then we'll understand its complexity. (By the way, now you understand why it is called binary: its essence is to constantly divide data by 2). The result is impressive! If we were looking for the desired number using linear search, we would need 10 checks, but with binary search we did it in 3! In the worst case, there would be 4 of them, if at the last step the number we needed turned out to be the second, and not the first. What about its complexity? This is a very interesting point :) The binary search algorithm depends much less on the number of elements in the array than the linear search algorithm (that is, simple enumeration). At 10 elements in an array, linear search will need a maximum of 10 checks, and binary search will need a maximum of 4 checks. The difference is 2.5 times. But for an array in 1000 elements linear search will need 1000 checks, and binary search will need total 10! The difference is already 100 times! Please note: the number of elements in the array has increased 100 times (from 10 to 1000), but the number of necessary checks for binary search has increased only 2.5 times - from 4 to 10. If we get to 10000 elements, the difference will be even more impressive: 10,000 checks for linear search, and only 14 checks for binary. And again: the number of elements increased 1000 times (from 10 to 10000), but the number of checks increased only 3.5 times (from 4 to 14). The complexity of the binary search algorithm is logarithmic, or, if we use Big-O notation, - O(log n). Why is it called that? A logarithm is the inverse of raising to a power. The binary logarithm is used to calculate the power of 2. For example, we have 10,000 elements that we need to go through using a binary search.

Now you have a picture before your eyes, and you know that this requires a maximum of 14 checks. But what if there is no picture before your eyes, and you need to count the exact number of necessary checks? It is enough to answer a simple question: To what power should the number 2 be raised so that the resulting result is >= the number of elements being checked? For 10000 it will be the 14th power. 2 to the 13th power is too little (8192) But 2 to the 14th power = 16384, this number satisfies our condition (it is >= the number of elements in the array). We found the logarithm - 14. That's exactly how many checks we need! :) Algorithms and their complexity are a topic too vast to be included in one lecture. But knowing it is very important: in many interviews you will receive algorithmic problems. For theory, I can recommend you several books. A good place to start is the “Big-O video on YouTube.” See you at the next lectures! :)