Today I’m going to talk about Binary Search, which could easily be called “divide and conquer”. It’s more efficient than linear search and is sometimes called half-interval or logarithmic search. The caveat for binary search is that it only works with indexes in an array that has already been sorted from low to high or high to low.
Binary search starts at the middle of an array and determines if the desired value is potentially at an index in the first or second half of the array. If it’s the first half, the second half of the list gets dumped and vice versa. If there are an even number of elements, we find the midpoint and subtract one.
Once that new subset is determined, the process is reset and we jump to the middle of the remaining set. Again we determine if the item we’re searching for is potentially in the first or second half of the indices remaining.
This process continues until either the item is found or the whole set has been winnowed down and the item is not found in that set at all, at which point the search fails.
Unlike simple linear search, which starts at the beginning of a list of information and goes through each item until it finds the one you’re looking for, binary search is much more efficient. An array of 1,000 indices could take as much as 1,000 iterations in linear search to arrive at a result as it has to literally go through each index in order. In comparison, a binary search through 1,000 items would at the worst, take 10 guesses because half of the potential indices are eliminated with each guess.
In mathematical terms, binary search is logarithmic (the reverse of exponential) which means that very large sets of indices can be searched quickly as each doubling of the number of items in the array only adds one guess to the total number of guesses it takes to arrive at the result.
To calculate how many guesses are required, at most, to get to the result, we use lg n where n is the number of elements in the array and round up when we get a decimal rather than a whole number. I can use Spotlight to calculate this on my Mac. To find that answer to the problem above, I type: log(1000)/log(2) which yields 9.96 and is rounded up to 10. The log(2) is there because we are calculating this number off of base 2 (the doubling effect I mentioned earlier). So it would take, at most, 10 guesses to find out if an index exists in a set of 1,000 indices.
For comparison, if we double that list of items, we find that we only have to add 1 to the number of guesses. In other words, log(2000)/log(2) = 10.55 which rounds up to 11.
With a diagram or two, I hope this powerful algorithm makes sense. I’ve also included some resources if you’d like to learn a bit more.