|
Comments on Christmas Coding 3 |
2010-02-01 |
|
And now to Christmas coding problem three, which asked for all the partitions of a set of 16 integers. The last few posts have been long, so this one is going to be short. As I mentioned in the problem, there is no getting around the number of partitions which need to be created, so in this case the problem doesn't have any algorithmic speedups, and is a pure implementation challenge. So, imagine if we could represent a partition as a set of bit flags, where an on bit at position k meant that the kth element of the set was in the first sub-set of the partition. To generate the partitions, all we would need to do is generate the correct set of bit flags, then map them back onto the input set. As if by magic, we can do this by generating the numbers from 0 to 2^(N-1) where N is the size of the input set. Since there are 2^N N-bit integers, why use N-1 rather than N? Because the partition (A), (B) is the same as the partition (B), (A), or using our binary representation, 01 == 10. So the problem of generating the partitions is solved for us by using the binary representation of the integers, and all that remains is to map them back onto the input set, like so:
private static void PrintPartitions(int[] ints)
{
for (int i = 0; i < Math.Pow(2, ints.Length - 1); i++)
{
string subset1 = "", subset2 = "";
for (int j = 0; j < ints.Length; j++)
{
if ((i & (1 << j)) > 0) subset1 += ints[j].ToString();
else subset2 += ints[j].ToString();
}
Console.WriteLine(subset1 + " : " + subset2);
}
}
This approach is very fast indeed. Tags:
programming |
|
|
Comments on Last Christmas Coding |
2010-02-01 |
The example I gave takes exponential time because it recalculates smaller fibonacci numbers many times in the process of working out a larger one. This is particularly wasteful since we only ever need to know the Nth and N-1th to calculate the N+1th. Here is my solution:
def fib(n):
current = 1
currentFib = 1
prevFib = 0
while current < n:
nextFib = currentFib + prevFib
prevFib = currentFib
currentFib = nextFib
current += 1
print "%d th fib is %d" % (n, currentFib)
Tags:
programming |
|
|
Comments on Christmas Coding 2 |
2010-01-19 |
|
And now onto challenge #2 - Find the rectangular sub-array with the greatest sum. This one is not as easy as the others, and I'll warn you now that this isn't the optimal solution*, it's just the solution I came up with. So, where to begin with this one? I decided to start by brute forcing the problem, calculating the sum for every sub-array exhaustively. This made me think about how to identify a sub-array, and then how to go about enumerating all of them. I decided that the best way to define a sub-array was by identifying two elements that are opposite corners of the sub-array (say the top left and bottom right, if you think of a two dimensional table representing the array). This definition in turn makes enumerating the sub-arrays easy, using some code like: for each element in array bottomright = element for each element in array <= bottomright topleft = element subarray = topleft, bottomright I am taking a liberty with the "<=" operator in my pseudocode to mean "some other element at i,j where i <= bottomright.i and j <= bottomright.j". Anyway, this sort of looping lets us identify every unique sub-array, and then it is straightforward to loop through the elements that fall inside the bounds defined by the top left and bottom right, totaling them to get the sum for that sub-array. Time for a short digression. How many rectangular sub-arrays are there in an M x N array? Well, the bottom right element of the array will be the bottom right element of M*N sub-arrays (since every element in the array could serve as the top left in our algorithm above), but no other element will be the bottom right element of that many sub-arrays. So we know that there are at least (M*N) but less than (M*N)^2, and intuitively closer to M*N. A big range! I thought about this on and off for a few days and came up with a summation I couldn't easily solve, based on the idea that the element at [i,j] would be the bottom right element of exactly i*j sub-arrays. The summation is: for every i between 1 and M, and for every j between 1 and N, sum i*j. I was able to puzzle this out eventually by looking at some examples: the number of rectangular sub-arrays of an M by N array is the sum of the first M positive integers (i.e. 1 + 2 + 3 + ... M) multiplied by the sum of the first N positive integers. Neat. OK, back to the problem. I now have a brute force solution for identifying and summing each unique sub-array. How to optimise it? The cost of summing each sub-array is significant: if the top left element is at (a,b), and the bottom right is at (c,d), then we need to sum (c - a + 1) * (d - b + 1) values. Luckily there is an easy dynamic programming trick to make the cost of summing an array constant, which relies on the order that the sub-arrays are assessed. This trick exploits the fact that that the sum of a sub-array is equal to the sum of no more than three sub-sub-arrays. For example, the the array with a top left at position (0,0) and bottom right at (3,3) could be summed by summing its sub-arrays (0,0),(3,2), (0,3)(2,3) and (3,3)(3,3). This means that rather than a large number of values in some cases, we can always sum no more than three, provided that these sub-arrays have already been summed. And we can ensure this by looping through the array indices from 0 upwards, like this:
def maxsubarray(array)
bestsub = [-1, -1, -1, -1]
bestscore = nil
if array
for i in 0...array.length
for j in 0...array[i].length
for k in 0..i
for h in 0..j
score = arraysum(array, i, j, i - k, j - h)
if !bestscore || score > bestscore
bestscore = score
bestsub = [i, j, i - k, j - h]
end
end
end
end
end
end
return bestsub
end
@scores = {}
#i,j is the lower right corner of the subarray
#and k,h is the upper left
def arraysum(array, i, j, k, h)
score = 0
key = "#{i}-#{j}-#{k}-#{h}"
if @scores.include?(key)
score = @scores[key]
elsif i == k and j == h #one element sub-array
score = array[i][j]
elsif i >= k and j >= h and
i >= 0 and j >= 0 and k >= 0 and h >= 0
score = arraysum(array, i, j - 1, k, h) +
arraysum(array, i - 1, j, k, j) + array[i][j]
@scores[key] = score
end
return score
end
On consideration, the trick of deriving the sum from the sum of some sub-sub-arrays could also be used in a top down, memoised way, rather than by dynamic programming. At any rate, this solution takes time linear to the number of sub-arrays, which is equal to the sum of the first M positive integers multiplied by the sum of the first N positive integers. As I said at the top, this isn't optimal, but not without charm. * Bentley's "Programming Pearls" sketches a faster solution, which I don't properly understand. Tags:
programming |
|
|
Comments on Christmas Coding 1 |
2010-01-09 |
|
I am going to re-visit some of the Christmas Coding challenges I posted, and explain how I approached them. These may not the best possible solutions, etc.... And so to challenge #1, which was was to find the largest 500,000 of 125 million integers, spread across five files, as quickly as possible. This can be done for not much more* than the cost of reading the files from disk by using two simple strategies:
Read and process in parallel - Each file is 100MB, which means it will take a few seconds to read it from disk, in which time some useful processing could be done. For example, once one file has been read in, it can be processed while the another file is read. Now the best way to do the IO and processing is really technology (OS, library) dependent. I worked out my solution on Windows (on OS X via VMWare, a bad setup for performance analysis!), and found that a synchronous read and then background processing with the threadpool worked best. Interestingly, starting all the reads at once using Windows overlapped IO worked better most of the time, but suffered from occasional long delays before the first read finished, which dragged down that approach's average time. Use a heap - The heap data structure is perfect for this problem. A min-heap (i.e. the top of the heap holds the smallest element) allows constant time checking of the smallest of the largest 500,000 seen so far, and deletion and insertion are very fast, even with 500,000 elements. The only trick is handling concurrent access to the heap, depending on exactly what concurrency demands "read and process in parallel" makes. Two options are to either allow one thread exclusive access to the heap while it processes an entire file, or to make individual heap operations thread safe. The finer grained concurrency of this second approach isn't as daunting as it sounds, since the heap interface is small, but I found there wasn't much difference between these two options in practice, so the simpler approach won out. * "not much more" means a few hundred milliseconds in a VM on my old laptop, which compares with some tens of seconds to read the files from disk. Tags:
programming |
|
|
Last Christmas Coding |
2010-01-04 |
|
I am going to put off writing up a commentary on my Christmas coding problems a little longer by posing one more. This problem presented itself this morning as I was rooting around in /Developer/Examples and found:
def fib(n):
if n<2:
return n
else:
return fib(n-2)+fib(n-1)
This is a an innocuous looking method to find the nth Fibonacci number. The Fibonacci numbers are the sequence of numbers where each successive number is the sum of the previous two, and the first and second numbers in the sequence are each defined as 1 to get the sequence started. Anyway, This function looks innocent enough. But try using it to calculate the 100th Fibonacci... ... I hope you didn't actually try that. Because it would take a long, long time. So my final Christmas coding challenge is to improve on this example and find the 100th fibonacci in sub-millisecond time. In fact, I'll go a little further and say that even sticking with python or similar, it should be possible to get well into the thousands in under a millisecond. Tags:
programming |
|
|
Christmas Coding 3 |
2009-12-22 |
|
This one is a lot easier than the last: Given a set of integers, find all the different partitions of the set into two sub-sets as quickly as possible. For example, given the set {x, y, z}, the partitions we want to find are: {x, y, z} and {}, {x} and {y, z}, {y} and {x, z}, {z} and {x, y}. To give you something to work to, say the set has about 16 elements. I like this one because there is no way around the exponential number of partitions, although there are certainly more or less efficient algorithms. Once we have an algorithm, the challenge is to reduce the constant factors, which in turn has some interesting challenges to do with representing partitions. Enjoy! Tags:
programming |
|
|
Christmas Coding 2 |
2009-12-18 |
|
I hope you had heaps of fun with the previous problem, now on to problem two: Given an m x n array of integers, each of which could be positive or negative or zero, find the rectangular sub-array with the greatest sum of elements.
What's a rectangular sub-array? Well, imagine the m x n array laid out in two dimensions, with rows and columns. A rectangular sub-array is any rectangular shaped portion of the m x n array. Let's take a 2 x 2 (i.e. two rows and two columns) array and find the rectangles: each individual element is, by itself, a (one element) rectangular sub array; the entire array is a 2 x 2 rectangular sub-array; each row is a 1 x 2 rectangluar sub-array; each column is a 2 x 1 rectangular sub array. Easy. For extra credit, how many rectangular sub-arrays are there in an m x n array? Tags:
programming |
|
|
Christmas Coding 1 |
2009-12-16 |
|
Most of us have a little time off over Christmas, and I thought it would be fun to pose some home-coding challenges to give you something to do while everyone else is watching The Grinch, drinking egg nog, or whatever. These aren't tricky interview type problems that require some 'aha' insight. They all have straightforward solutions which are slow. They also have faster solutions. And getting to the faster solution can be fun. I didn't invent the problems, although I have given them some thought. I don't know the best answer to all of them, maybe any of them. Tell me how you go, and maybe I will write up the results. So, on to problem one: Given 5 x 100mb files of random 4 byte integers in random order on a disk, find the largest 500,000 of these integers as quickly as possible.
I like this problem because it poses two substantial questions: Firstly, how should the files be read in from disk? Secondly, how to find the largest 500,000? It's also interesting to think about how the solution changes for finding the n greatest for smaller n, and much larger n. Enjoy! Tags:
programming |
|
|
Divisibility |
2009-11-14 |
|
2. Even numbers are divisible by 2. 3. If a number is divisible by 3, then its digit sum is also divisible by 3. So by repeated application of this digit sum rule, the digit sum of any number divisible by 3 is going to be 3, 6 or 9. 4. If half a number is even, then the number is divisible by four. 5. If a number's least significant digit is 5 or 0, it is divisible by 5. 6. A number that is divisible by 2 and divisible by 3 is also divisible by 6. Or put differently, an even number that passes the '3' repeated digit sum test is also divisible by 6. 8. If half of half of a number is even, it is divisible by eight. When dealing with numbers larger than 1000, one can ignore the thousands (and upwards), since 1000 is divisibly by 8, and just divide the least significant three digits. 9. If a number is divisible by 9, then its digit sum is also divisible by nine. Moreover, its repeated digit sum is 9. There you have it: easy divisibility tests for 2,3,4,5,6,8 and 9. Now you can annoy your friends by saying that the check can definitely be split evenly, although you have no idea what the amount is. Unless you have six friends, in which case you are stuck. Seriously, I thought about these when I came across this interview puzzler: Given a continuously streamed positive integer represented in binary, how can one test if the number is divisible by 3? The divisibility by 3 trick above gives a constant memory solution which doesn't use division. Neat. Tags:
programming |
|
|
Three Tokyo Foods |
2009-08-25 |
First up, I came across the perfect snack for a muggy Tokyo summer's day at the Ghibli Museum - a chilled cucumber rolled in salt. Deliciously cool.
Next we have the Royce chocolate coated potato chip. These taste surprisingly wonderful.
Lastly, the famous Tokyo Banana, which is really just a twinkie with some banana flavoured goo inside. Tags:
food |
|