Tangible Computing
13. Searching




13.1 Bruteforce Search

One of the most common computing tasks is to find a particular object in a collection, or to determine that the object is not present. There is usually some specific feature that we are looking for, in general called the key. Many of the objects in the collection could have the same key, in which case we often want to find all of them.

If we know nothing about the way a collection is organized, the best we can do is systematically go through the collection looking for objects with the given key. Picking an object in the collection and checking that it has a matching key is called is probe. So if the collection has n objects, then we will need O(n) probes to find the matching object(s). This is obviously a problem if the collection is large - think of trying to find a phone number for a given name by having to look through all the phone records for the city.

The basic pattern of all search is this:

code/Searching/GeneralSearch/GeneralSearch.txt

    # a general structure for a seach
     
    SEARCH: while ( 1 ) {
     
        # keep searching until we find what we want or discover it
        # isn't there
     
        # stop if the thing we are looking cannot be found
        if ( search space is empty ) {
                # thing we are looking for is not present
                last SEARCH;
            }
     
        # probe the search space, that is, pick a spot in the search 
        # space and take a look
     
        # stop if we find it
        if ( thing found ) {
                # thing we are looking for is present
                last SEARCH;
            }
     
        # narrow down the search space and keep looking
     
        } 


We can improve search by putting some structure on the data, that enables us to narrow down the search at each step. The most common way of doing this is to sort the data so that the keys are in order.

13.2 Binary Search

Everyone if familiar with the childrens's game of guess my number. One player picks a number in some range, like 0 to 100, and keeps it secret. The other player attempts to discover the secret number by picking a number in the range and asking if that number is equal to, less than, or greater than the secret number. Depending on the answer, the guesser then revises their guess and keeps asking questions until the secret number is discovered. The game is made interesting by limiting the number of guesses that the guesser can make.

It is a simple and fun exercise to write a program that plays the role of the guesser. Here is a typical interaction be such a program and a human who has chosen the secret number. Lines in green are are the responses from the machine.


     Please pick an integer not less than 0, and not greater than 100
     I guess 1, am I Low, Correct, or High?
     Low
     I guess 2, am I Low, Correct, or High?
     Low
     I guess 3, am I Low, Correct, or High?
     Low
     I guess 4, am I Low, Correct, or High?
     Low
     I guess 5, am I Low, Correct, or High?
     Low
     I guess 6, am I Low, Correct, or High?
     Low
     I guess 7, am I Low, Correct, or High?
     Correct
     Yes! I win!

After playing the game a bit, most people realize that rather than doing a linear search for the number (like the example above), it is more efficient to do something akin to what is done when you search for a word in a dictionary, or a name in a phoe book. In this case you divide and conquer, by splitting the range of choices into roughly half on each guess.


     Please pick an integer not less than 0, and not greater than 100
     I guess 50, am I Low, Correct, or High?
     Low
     I guess 75, am I Low, Correct, or High?
     Low
     I guess 88, am I Low, Correct, or High?
     High
     I guess 81, am I Low, Correct, or High?
     Low
     I guess 84, am I Low, Correct, or High?
     Low
     I guess 86, am I Low, Correct, or High?
     High
     I guess 85, am I Low, Correct, or High?
     Correct
     Yes! I win!

This divide and conquer approach is the essence of a search strategy called binary search. The idea of binary search is that at each step of the search you issue a probe question that divides the range of possible future probe questions in half. Every time you probe, you either find the thing that you are looking for, or have only half as many possible places to look in.

For binary search to work, your search space has to have this ability of being subdivided roughly in 2 every time you ask a probe question.

This is the case for the guess-a-number game. The secret number you are seeking starts off being in the interval [0, 100] (that is 0 <= secret <= 100). Every time you ask a probe question, the answer gives you a new smaller range of numbers that must contain the secret number. By choosing your probe question to be a number roughly in the middle of the range, the number of possible numbers is divided by 2 every time you probe, and thus you get binary search.

So in the example above, the range is narrowed down in the following sequence

[0,100]
[51, 100]
[76, 100]
[76, 87]
[82, 87]
[82, 85]

The final guess of 85 is the number.

The only information that we need to remember as we make our probes and narrow down the range is the low and high value of the range. Our next guess is roughly in the middle of this range.

13.3 A Typical Binary Search Pattern

Here is a typical program structure for performing a binary search.

code/Searching/BinarySearch01/BinarySearch01.pde

    /*
     
    Binary search
     
    Given a list of integers of length len, and a key, determine if key is
    present in list and return the index of key.   If not present, return -1
     
    Pre Conditions:
    list is sorted in ascending order
     
    */
     
    int bsearch_int(int list[], int len, int key) {
        int low = 0;
        int high = len;
     
        int probe;
     
        /* invariant: 
            If present, key is in the index range [low, high), 
            i.e. key is one of list[low], ..., list[high-1]
     
        */
     
        /* keep searching until the search interval is empty */
        while ( low < high ) {
            /* pick a probe point in the middle of the interval
               we assume we do not get integer overflow here, which is safe
               to assume on the Arduino
     
               Invariant: low <= probe < high
            */
            probe = (high + low) / 2;
     
            if ( key == list[probe] ) {
                /* found it */
                return probe;
                }
     
            /* determine which side of probe the key might lie */
            if ( key < list[probe] ) {
                high = probe;
                }
            else {
                low = probe + 1;
                }
            
            } // keep looking
     
        /* if the search interval is empty, the key is not present */
        return -1;
        }


13.4 Performance of Binary Search

Recall that the notation log(x) means logarithms to the base 10, ln(x) means logarithms to the base e. In computing science we do many things involving powers of 2, and so we have the notation lg(n) which means logarithms to the base 2.

The relationships for logarithms and powering are given with this equation:
x = b logb(x)
where b is the base of the logarithm, logb means logarithms to the base b, and x is a positive number.

So in the case of the three common styles of logarithm we have:
x = 10 log(x)
x = e ln(x)
x = 2 lg(x)
and logb of 1 is always 0 for every base b, and logb(b) is always 1.

Note that there is a general realationship between logs of different bases. Consider taking log base c of both sides of the definition of log:
logc(x) = logc(b logb(x)) = logb(x) logc(b)
thus in general
logb(x) = logc(x) / logc(b)


If you are making successive divisions by 2 of a starting number x, then it will take about lg(x) divisions before you get a result that is less than or equal to 2. That is, the logarithm is exponentially smaller than its argument, as this table illustrates:
x number of
divisions by 2
before <=2
lg(x)
1 0 0
2 1 1
3 2 1.584
4 2 2
1024 10 10
1025 10 10.00141
2047 10 10.99930
2048 11 11
1048576 20 20


Since the number of possible places to look is divided by 2 on each probe, in general if there are n possible places to start, it will take roughly lg(n) probes to find the actual place. So if the range for the guess-a-number game is [0, 1000000] it should only take 20 probes to find the secret number. How many could it take if you used the simple approach of trying 0, 1, 2, 3, and do on?

13.5 Other Applications of Binary Search

Binary search can be used in any situation where a probe gives us information that enables us to evenly subdivide the space of possible answers. For example, the Bisection Method that you learn in calculus for locating the zero of a function is a kind of binary search.

13.6 The Searching Library

The standard C library has generic routines for binary search which use the same comparison function as for quicksort. The function is called bsearch. It will find a single matching object in an ordered list, or report that there is no such object. If there are multiple objects with the same key, it will find one of them. Finding all of them requires multiple calls, and some clever manipulation of search intervals.
void * bsearch(const void *key,
    const void *base, size_t nel, size_t width,
    int (*compar) (const void *, const void *));
The bsearch() function searches an array of nel objects, the initial member of which is pointed to by base, for a member that matches the object pointed to by key. The size (in bytes) of each member of the array is specified by width.

The contents of the array should be in ascending sorted order according to the comparison function referenced by compar. The compar routine is expected to have two arguments which point to the key object and to an array member, in that order. It should return an integer which is less than, equal to, or greater than zero if the key object is found, respec- tively, to be less than, to match, or be greater than the array member.

The bsearch() function returns a pointer to a matching member of the array, or a null pointer if no match is found. If two members compare as equal, which member is matched is unspecified.

13.7 Combining Sorting and Searching

Here is some code that combines qsort and bsearch to show how they are used.
code/Searching/BinarySearch/binarysearch.cpp

    #include <Arduino.h>
    #include <stdlib.h>
     
    // type of things to be sorted and searched
    typedef int item_t;
     
    void print_item(item_t a) {
        Serial.print(a);
        }
     
    void print_list(item_t *list, size_t len) {
        for (size_t i=0; i < len; i++) {
            print_item(list[i]);
            Serial.print(" ");
            }
        Serial.println();
        }
     
    // the comparison function for things of type item_t
    int item_compare (const item_t *a, const item_t *b) {
        // Question: what happens if we return a-b instead?
        return *a - *b;
        }
     
    // the test to see if a list of things of type item_t is in ascending order
    uint8_t is_in_order(item_t *list, size_t len) {
        for (size_t i=0; i < len-1; i++) {
            if ( item_compare(&list[i], &list[i+1]) > 0 ) {
                // out of order
                return 0;
                }
            }
        // ok, it's in order
        return 1;
        }
     
    // Question: how would you write a generic is_in_order function similar to
    // qsort and bsearch?
     
    void setup() {
        size_t n_items = 32;
        item_t list[n_items];
     
        item_t  e_first;
        item_t  e_mid;
        item_t  e_last;
     
        Serial.begin(9600);
     
        // generate a random list of item_t things
        for (size_t i=0; i < n_items; i++) {
            list[i] = random(0, 1000);
            }
     
        print_list(list, n_items);
     
        // pick the original first, mid, and last elements
        e_first = list[0];
        e_mid = list[n_items/2];
        e_last = list[n_items-1];
     
        Serial.print("first "); print_item(e_first);
        Serial.print(" mid "); print_item(e_mid);
        Serial.print(" last "); print_item(e_last);
        Serial.println();
     
        // now sort them
        qsort(list, n_items, sizeof(item_t), (__compar_fn_t) item_compare);
     
        print_list(list, n_items);
     
        // are they sorted?
        if ( is_in_order(list, n_items) ) {
            Serial.println("OK, list is in order");
            }
        else {
            Serial.println("ERROR, list is out of order");
            }
     
        // find the sorted positions of the first, mid, and last
     
        // note that bsearch returns a pointe to the item, but you can
        // get it's index by simply subtracting the base address of the list
     
        item_t *i_first = 
            (item_t *) bsearch(&e_first, list, n_items, sizeof(item_t), 
            (__compar_fn_t) item_compare);
     
        item_t *i_mid = 
            (item_t *) bsearch(&e_mid, list, n_items, sizeof(item_t), 
            (__compar_fn_t) item_compare);
     
        item_t *i_last = 
            (item_t *) bsearch(&e_last, list, n_items, sizeof(item_t), 
            (__compar_fn_t) item_compare);
     
        // note the subtraction to get the index
     
        Serial.print("i_first "); Serial.print(i_first - list);
        Serial.print(" i_mid "); Serial.print(i_mid - list);
        Serial.print(" i_last "); Serial.print(i_last - list);
        Serial.println();
     
        // now let's look for something that is not there
        item_t missing = -1;
     
        item_t *i_missing = 
            (item_t *) bsearch(&missing, list, n_items, sizeof(item_t), 
            (__compar_fn_t) item_compare);
        Serial.print("Result of search for "); print_item(missing); 
        Serial.print(" is " ); Serial.print((int) i_missing);
        Serial.println();
     
     
        }
     
    void loop() {
        }


The full code, with Makefile is here code/Searching/BinarySearch

Question: Suppose that we change the type of item to be manipulated to be this:
// type of things to be sorted and searched
typedef struct {
     int16_t x;
     int16_t y;
} item_t;
Now suppose that the ordering rule is as follows: that you order by element x first, and if the x values match, then you refine the order by comparing the y value. So for example (1,2) < (2,3) and (1,2) < (1,3).

Modify the above example to work with this new type of data. To ensure that you get some matching x values, chose them randomly from a small range, say 0 to 4.
13. Searching
Tangible Computing / Version 3.20 2013-03-25