CISC-121
20130205
Searching and Recursion
Python offers the built-in in
test, so that determining whether or not a particular value is in a
list appears to be a one-step operation. This is an illusion.
Today we will explore the problem of searching a list for a
particular value and returning its location (if it is there).
Let my_list be a list, and let target be a value that may or may not be in my_list.
If we don't have any information about my_list then target could be in any position - this means that we may have to compare target to every value in my_list before we find it (and if it is not there, we will look at every value in my_list before we are sure target is not there).
The
following function searches a list for a target value. If the
value is found, its position is returned. If the value is not
found, -1 is returned.
def search(a_list, target):
for i in range(len(a_list)):
if a_list[i] == target:
return i
return -1
But ... suppose we know that my_list
is sorted in ascending order. (Python offers built-in sort
functions and methods - for now we will use these, but soon we will
examine the sorting process in more detail and develop our own sort
functions.)
We can take advantage of the fact that my_list is sorted. Suppose we compare the value of my_list[i] to target, for some value of i. If my_list[i] is too small, then all values in my_list[:i] are also too small - we can eliminate them without looking at them. Similarly if my_list[i] is too large, then all of my_list[i:] can be eliminated from the search.
So if we compare target to the middle value in my_list, there are three possible results
- they are equal - we have found the location of target in my_list
- the middle value in my_list is too small, so we can ignore the left half of my_list
- the middle value in my_list is too big, so we can ignore the right half of my_list
And the clever idea here is that if we didn't find target yet, we can continue by applying exactly the same idea to whichever half of the list is not eliminated. This is called binary search because it effectively divides the size of the list by 2 whenever we make a comparison between target and an element of mylist.
Here's how this might look in Python, using variables start and end to keep track of the part of the list we are still interested in:
def bin_search(my_list,target):
# returns the position of target in my_list, if it is there
# returns -1 if it is not there
first = 0
end = len(my_list)-1
while start <= end:
if start == end:
if my_list[start] == target:
return start
else:
return -1
mid = (start + end)/2 # in Python 2.7 this is integer division, so it automatically rounds down if necessary
if my_list[mid] == target:
return mid
elif my_list[mid] < target:
start = mid + 1
else:
end = mid - 1
You should run this a few times until you are sure
you understand how it works.
Now that we understand binary search, we can use it to illustrate one of the most important ideas in computing: recursion.
Recursion is what happens when a function calls itself.
This may seem like a crazy idea ... if the function calls itself,
then won't it just call itself again, and again, and again, and so on
forever?
Well that can happen (not surprisingly, it is called infinite recursion) but with a little bit of care, we can prevent it. The trick is to identify some situation (the base case)
in which we can solve our problem easily, and then make sure that every
time we call the function again we are closer to the situation that we
know how to solve.
In the case of binary search, the base case is easy to identify: When start > end, (ie when we are down to zero positions where target might be hiding) we just return -1
This is important: every recursive function must have one or more base cases.
In
a recursive function, we first test to see if we have reached the base
case. If we have, we solve the problem on the base case and
return the result. If we have not reached the base case we do
whatever is necessary to reduce the problem to something that is closer
to the base case, and we call the function again.
Here's a recursive version of binary search:
def rec_bin_search(my_list,target,start,end):
# returns the position of target in my_list, if it is there
# returns -1 if it is not there
print "start = ",start
print "end = ",end
if start > end:
return -1
else:
mid = (start + end)/2
# in Python 2.7 this is integer division, so it
automatically rounds down if necessary
if my_list[mid] == target:
return mid
elif my_list[mid] > target:
return rec_bin_search(my_list,target,start,mid-1)
else:
return rec_bin_search(my_list,target,mid+1,end)
Once
again, I recommend that you run this program a
few times and examine the output to get a deeper understanding of how
this works. Observe that the recursive form makes exactly the
same sequence of comparisons as the non-recursive version.
So if
the recursive version does exactly the same thing as the non-recursive
version, why bother with it? One answer relates to one of the
most important things that we do as computer scientists: we prove that
our programs are correct (ie they always give the right answer).
It turns out that proving the correctness of recursive functions
is often very easy.
But there's another answer: thinking
recursively is one of the things that computer scientists learn to do
instinctively, and it seems to be a style of thinking that sets us
apart from most other (normal?) people. Recursive functions often
have a simplicity and elegance that non-recursive functions lack.
How
important are "simplicity" and "elegance"? In some ways, not
important at all. One of our main goals as computer scientists is
to produce provably correct programs that run as fast as possible.
And it is true that recursive functions usually run a little bit
slower than non-recursive functions (calling a function requires adding
a new frame to the frame-stack, as we have discussed in class - this
takes time). And it is also true that it is possible (though
often difficult) to prove the correctness of non-recursive functions.
So it can be argued that non-recursive programs win on speed,
and (sometimes) tie on provable correctness.
But that's not our
only goal. We write programs in the knowledge that they will not
only be read by the computer, but also by other human beings. We
want our programs to be understood by human readers - this is why we
choose meaningful variable names, document our code, and take pains
over the layout of the program (the computer doesn't care about these
things). This begins to touch upon my current area of research:
the relationship between good programming and good story-telling.
For now though, I will just point out that "simple" and "elegant"
programs are often easier for human readers to understand - and that is
a valid reason to design recursive solutions to problems.