|
| When a collection is sorted, more intelligent techniques for searching are available. One of these techniques is the binary search technique ( also called logarithmic search ).
This technique will continuously divide the search range into two parts ( therefore called BINARY search ) and will use the sort mechanism of the collections to decide which part could contain the value we're looking for.
The steps of this technique are listed below :
(1) determine the element in the middle of the array (2) check if the element in the middle contains the value we're looking for, if so, we can stop searching, if not, we go on with the next steps (3) check to see if the complete array is searched for the value, if so, we can stop searching, if not, we continue with the next step (4) determine ( based on the sort mechanism used in the array ) which part ( left of right from the middle element ) could contain the value we're looking for
Repeat steps (1), (2), (3) and (4) until the value is found, or the complete array is searched for the value.
We'll use next array to illustrate this technique : |
| index 0 1 2 3 4 5 6 7 8 9
value 5 10 15 20 25 30 35 40 45 50 |
| Suppose we need to search for value 20.
First we'll determine the position of the middle element of this array : |
| lowerbound upperbound middle index middle element
0 9 (0 + 9) \ 2 = 4 25 |
| 25 is not the value we are looking for, so well decide to ignore the right part ( right of the middle element ), and search thru the left part, because value 20 were looking for is less than the middle element 25 and can therefore only be found in that left part : |
| lowerbound (*) upperbound (*) middle index middle element
0 4 - 1 = 3 (0 + 3) \ 2 = 1 10
(*) of the search range |
| Value 20 is more than 10, so we decide to look thru the left part : |
| lowerbound (*) upperbound (*) middle index middle element
1 + 1 = 2 3 (2 + 3) \ 2 = 2 15
(*) of the search range |
| Value 20 is more than 15, so we decide to look thru the left part : |
| lowerbound (*) upperbound (*) middle index middle element
2 + 1 = 3 3 (3 + 3) \ 2 = 3 20
(*) of the search range |
| The middle element 20 is what we're looking for so we can stop searching, and conclude that the value is found at index 3.
Suppose we were looking for value 21 ( in stead of 20 ). Then 21 was more than 20, but because there are no more elements to look thru ( the search range only contained one element, which was not the value we were looking for ), we can decide to stop looking, and conclude that the value ( 21 in this case ) could not be found. |
| Module BinarySearch
Sub Main()
Dim base As Integer = 2
Dim count As Integer = 10
Dim upperbound As Integer = count - 1
Dim numbers(upperbound) As Integer
Dim index As Integer
For index = 0 To upperbound
numbers(index) = (index + 1) * base
Next
Dim number As Integer
For number = base - 1 To base * count + 1
upperbound = count - 1
Dim lowerbound As Integer = 0
Dim found As Boolean = False
Dim exhausted As Boolean = False
Do Until found OrElse exhausted
index = (lowerbound + upperbound) \ 2
found = (number = numbers(index))
exhausted = (upperbound <= lowerbound)
If Not (found OrElse exhausted) Then
If number > numbers(index) Then
lowerbound = index + 1
Else
upperbound = index - 1
End If
End If
Loop
If found Then
Console.Write(number & " found at index " & index)
Else
Console.Write(number & " not found")
End If
If exhausted Then
Console.Write(", search exhausted")
Else
Console.Write(", search not exhausted")
End If
Console.WriteLine(", last searched " & lowerbound & " to " & upperbound)
Next
Console.ReadLine()
End Sub
End Module Download Broncode |
| Output : 1 not found, search exhausted, last searched 0 to 0
2 found at index 0, search exhausted, last searched 0 to 0
3 not found, search exhausted, last searched 0 to 0
4 found at index 1, search not exhausted, last searched 0 to 3
5 not found, search exhausted, last searched 2 to 1
6 found at index 2, search not exhausted, last searched 2 to 3
7 not found, search exhausted, last searched 3 to 3
8 found at index 3, search exhausted, last searched 3 to 3
9 not found, search exhausted, last searched 3 to 3
10 found at index 4, search not exhausted, last searched 0 to 9
11 not found, search exhausted, last searched 5 to 4
12 found at index 5, search not exhausted, last searched 5 to 6
13 not found, search exhausted, last searched 6 to 6
14 found at index 6, search exhausted, last searched 6 to 6
15 not found, search exhausted, last searched 6 to 6
16 found at index 7, search not exhausted, last searched 5 to 9
17 not found, search exhausted, last searched 8 to 7
18 found at index 8, search not exhausted, last searched 8 to 9
19 not found, search exhausted, last searched 9 to 9
20 found at index 9, search exhausted, last searched 9 to 9
21 not found, search exhausted, last searched 9 to 9 |
| The more elements the collection we're searching in contains, the more interesting it is to use a binary search in stead of a linear search.
When we need to look for many values within a large unsorted collection, we could benefit by first sorting the collections before we start looking for values. If you only have 10 books ( so a small collection ) on your bookshelf, you will not benefit from sorting them alphabetically. Neither would you benefit from sorting 1000 books ( so a large collection ) when you never search for a book. You would benefit from sorting 1000 books when you often search for books. But always take in to account that a certain effort is needed to maintain the correct sequence. For instance when you add a book, you'll need to insert it on the correct position. |
Predefined Search Methods
| When the one-dimensional array is sorted we can use the shared method BinarySearch of class Array (1). This method will return the index of the searchvalue ( second argument ) within the array ( first argument ). If the searchvalue is not found this method will return -1. |
| Module BinarySearch
Sub Main()
Dim base As Integer = 2
Dim count As Integer = 10
Dim upperbound As Integer = count - 1
Dim numbers(upperbound) As Integer
Dim index As Integer
For index = 0 To upperbound
numbers(index) = (index + 1) * base
Next
Dim number As Integer
For number = base - 1 To base * count + 1
index = Array.BinarySearch(numbers, number)
Dim found As Boolean = index > -1
If found Then
Console.WriteLine(number & " found at index " & index)
Else
Console.WriteLine(number & " not found")
End If
Next
Console.ReadLine()
End Sub
End Module Download Broncode |
| Output : 1 not found
2 found at index 0
3 not found
4 found at index 1
5 not found
6 found at index 2
7 not found
8 found at index 3
9 not found
10 found at index 4
11 not found
12 found at index 5
13 not found
14 found at index 6
15 not found
16 found at index 7
17 not found
18 found at index 8
19 not found
20 found at index 9
21 not found |
Up
Exercise
| Task :
Search for the name of a city based on its zip code.
Manually code a binary search to do this. |
| Module ExerciseTask
Sub Main()
Dim count As Integer = 3
Dim names() As String = {"Brussels", "Antwerp", "Ghent"}
Dim zipCodes() As Integer = {1000, 2000, 9000}
Console.WriteLine("Zip Code ?")
Dim zipCode As String = Console.ReadLine()
Console.ReadLine()
End Sub
End Module Download Broncode |
| Output : Zip Code ?
2000
Antwerp |
| Output : Zip Code ?
4000
City not found. |
| Module ExerciseSolution
Sub Main()
Dim count As Integer = 3
Dim names() As String = {"Brussels", "Antwerp", "Ghent"}
Dim zipCodes() As Integer = {1000, 2000, 9000}
Console.WriteLine("Zip Code ?")
Dim zipCode As String = Console.ReadLine()
Dim found, exhausted As Boolean
Dim lowerbound, index As Integer
Dim upperbound As Integer = count - 1
Do Until found OrElse exhausted
index = (lowerbound + upperbound) \ 2
found = (zipCode = zipCodes(index))
exhausted = (upperbound <= lowerbound)
If Not (found OrElse exhausted) Then
If zipCode > zipCodes(index) Then
lowerbound = index + 1
Else
upperbound = index - 1
End If
End If
Loop
If found Then
Console.WriteLine(names(index))
Else
Console.WriteLine("City not found.")
End If
Console.ReadLine()
End Sub
End Module Download Broncode |
Up
Performance in Big-O-Notation
| To compare the linear and binary search we could have a look at the maximum number of search-iterations used in these techniques.
In the linear search ( with N elements ) this will be N.
In the binary search this is equal to the number of times we can divide the search range into two parts. With R being the first power of 2 above N this will be log2 R ( logarithm of base 2 ). This is why binary search is sometimes called logarithmic search. When an array has 10 elements, the first power of 2 above 10 is 16, so log2 16 ( or 4 ) is the maximum of search-iterations we'll need.
The number of search-iterations needed for the binary search is way below that of the linear search. This still doesn't mean that the binary search will always be more efficient than the linear search. One search-iteration of the binary search takes lots more effort for the processor to perform. All kinds of calculations are needed, calculate the middle index, shift the bound of the search range, ... .
How long it takes to execute a certain algorithm is hard to define. It depends on to many circumstances, for instance the platform, the processor, ... . Profile tools are the best way to check how algorithms/programs perform in a specific environment. The result of the test taken by these tools always take the environment settings ( platform, processor, ... ) into account.
Still it would be useful to compare performances of different algorithms on different runtime environments in one or the other way. This could help us decide what algorithm to chose for a specific need.
A notation often used to express the performance of an algorithm is the Big-O-Notation. This notation does not try to express the exact time it would take to execute an algorithm, but tries to express the scalability of the algorithm. It gives us an idea of how the growth of the input will effect our running-time.
For instance the instruction "x = x + 1" would take up a constant time. If it's executed several times on the same system ( under the same conditions ) it will always take up the same time to execute. This is what the call "constant running-time", expressed in Big-O with "O(1)". |
| For i = 1 to N
x = x + 1
Next |
| Where "x = x + 1" take up a constant time to execute, will here take up N times that constant time. This is what is called "linear running-time", "O(N)" in Big-O. When N multiplies 100 times, it will take 100 times longer to execute this code. There is a linear associating between the growth of N and the growth of the running-time. |
| For i = 1 to N
For j = 1 to N
x = x + 1
Next
Next |
| The instruction "x = x + 1" is now executed N by N times. The running-time is "quadratic" ( "O(N^2)" ).
The Big-O-Notation always tries to express the performance as simple as possible, therefore constant factors are ignored. "O(10N^2)" for instance can be simplified to "O(N^2)". If N is 5, N^2 is 25, if N is 30, N^2 is 900. So if N multiplies by 6 ( 30 / 5 = 6 ) then the running-time will be multiplied by 36 ( 900 / 25 = 36 ). So the running-time is quadratic, 6 ^ 2 = 36. If N is 5, 10N^2 is 250, if N is 30, 10N^2 is 9000. So if N multiplies by 6 then the running-time will be multiplied by 36 ( 9000 / 250 = 36 ). Here as well quadratic running-time is expressed, therefore constant factor 10 can be ignored.
Bases of logarithms can be left out. "O(log2 N)" expresses the same growth as "O(log3 N)" or "O(log4 N)".
If N is 1024, log2 N is 10, log4 N is 5. If N is 4096, log2 N is 12, log4 N is 6. So if N multiplies by 4 ( 4096 / 1024 = 4 ) then the running-time will be multiplied by 1.2 ( 12 / 10 = 1.2 and 6 / 5 = 1.2 ). So running-time is logarithmic, therefore "O(log N)" expresses enough.
Often ( with a large N ) smaller factors are ignored, because they are not very relevant. "O(N^2 + N)" could be expressed as "O(N^2)"/
If N is 1000, N^2 = 1 000 000 and N^2 + N = 1 001 000. If N is 3000, N^2 = 9 000 000 and N^2 + N = 9 003 000. So if N multiplies with 3 ( 3000 / 1000 = 3 ) the running-time will roughly multiply by 9 ( 9 000 000 / 1 000 000 = 9 and 9 003 000 / 1 001 000 ~= 9 ).
The higher N is, the more irrelevant the small factor "+ N" becomes, therefore it is often left out.
The performance expressed in Big-O for the above search algorithms is : - "O(N)" ( linear ) for the linear search - "O(log N)" ( logarithmic ) for the binary search
Again this does not mean that the binary search is always more efficient than the linear search. This only means that the binary search is more scalable ( will suffer less from a raising input ) than the linear search. |
|
|
|