RSS feed

Linkedin Profile

Tags:
brisbane
bug me
economy
food
london
programming
seattle
wall art

Posts by month: 02/10 (2)
01/10 (3)
12/09 (3)
11/09 (1)
08/09 (1)
04/09 (3)
02/09 (1)
12/08 (2)
10/08 (2)
08/08 (1)
06/08 (2)
05/08 (1)
03/08 (3)
02/08 (1)
01/08 (2)
12/07 (2)
11/07 (1)
07/07 (1)
05/07 (2)
02/07 (1)
01/07 (1)
12/06 (1)
11/06 (1)
10/06 (1)
08/06 (1)
07/06 (1)
06/06 (2)
05/06 (1)
04/06 (2)
02/06 (1)
01/06 (2)
12/05 (3)
11/05 (2)
09/05 (5)
08/05 (5)
07/05 (7)
06/05 (3)
05/05 (6)
04/05 (8)
03/05 (7)
02/05 (7)
01/05 (6)
12/04 (2)
11/04 (3)
10/04 (5)
09/04 (3)
08/04 (5)
07/04 (5)
06/04 (4)
05/04 (4)
04/04 (9)
03/04 (4)
02/04 (3)
01/04 (5)
12/03 (1)
11/03 (14)
10/03 (8)


Comments on Christmas Coding 2
2010-01-19

And now onto challenge #2 - Find the rectangular sub-array with the greatest sum. This one is not as easy as the others, and I'll warn you now that this isn't the optimal solution*, it's just the solution I came up with.

So, where to begin with this one? I decided to start by brute forcing the problem, calculating the sum for every sub-array exhaustively. This made me think about how to identify a sub-array, and then how to go about enumerating all of them. I decided that the best way to define a sub-array was by identifying two elements that are opposite corners of the sub-array (say the top left and bottom right, if you think of a two dimensional table representing the array). This definition in turn makes enumerating the sub-arrays easy, using some code like:

	for each element in array
		bottomright = element
		for each element in array <= bottomright
			topleft = element
			subarray = topleft, bottomright

I am taking a liberty with the "<=" operator in my pseudocode to mean "some other element at i,j where i <= bottomright.i and j <= bottomright.j". Anyway, this sort of looping lets us identify every unique sub-array, and then it is straightforward to loop through the elements that fall inside the bounds defined by the top left and bottom right, totaling them to get the sum for that sub-array.

Time for a short digression. How many rectangular sub-arrays are there in an M x N array? Well, the bottom right element of the array will be the bottom right element of M*N sub-arrays (since every element in the array could serve as the top left in our algorithm above), but no other element will be the bottom right element of that many sub-arrays. So we know that there are at least (M*N) but less than (M*N)^2, and intuitively closer to M*N. A big range!

I thought about this on and off for a few days and came up with a summation I couldn't easily solve, based on the idea that the element at [i,j] would be the bottom right element of exactly i*j sub-arrays. The summation is: for every i between 1 and M, and for every j between 1 and N, sum i*j. I was able to puzzle this out eventually by looking at some examples: the number of rectangular sub-arrays of an M by N array is the sum of the first M positive integers (i.e. 1 + 2 + 3 + ... M) multiplied by the sum of the first N positive integers. Neat.

OK, back to the problem. I now have a brute force solution for identifying and summing each unique sub-array. How to optimise it?

The cost of summing each sub-array is significant: if the top left element is at (a,b), and the bottom right is at (c,d), then we need to sum (c - a + 1) * (d - b + 1) values. Luckily there is an easy dynamic programming trick to make the cost of summing an array constant, which relies on the order that the sub-arrays are assessed. This trick exploits the fact that that the sum of a sub-array is equal to the sum of no more than three sub-sub-arrays. For example, the the array with a top left at position (0,0) and bottom right at (3,3) could be summed by summing its sub-arrays (0,0),(3,2), (0,3)(2,3) and (3,3)(3,3). This means that rather than a large number of values in some cases, we can always sum no more than three, provided that these sub-arrays have already been summed. And we can ensure this by looping through the array indices from 0 upwards, like this:

def maxsubarray(array)
  bestsub = [-1, -1, -1, -1]
  bestscore = nil
  if array
    for i in 0...array.length
      for j in 0...array[i].length
        for k in 0..i
          for h in 0..j
            score = arraysum(array, i, j, i - k, j - h)
            if !bestscore || score > bestscore
              bestscore =  score
              bestsub = [i, j, i - k, j - h]
            end
          end
        end
      end
    end
  end
  return bestsub
end

@scores = {}

#i,j is the lower right corner of the subarray
#and k,h is the upper left
def arraysum(array, i, j, k, h)
  score = 0
  key = "#{i}-#{j}-#{k}-#{h}"
  if @scores.include?(key)
    score = @scores[key]
  elsif i == k and j == h #one element sub-array
    score = array[i][j]
  elsif i >= k and j >= h and 
	i >= 0 and j >= 0 and k >= 0 and h >= 0
    score = arraysum(array, i, j - 1, k, h) + 
      arraysum(array, i - 1, j, k, j) + array[i][j]
    @scores[key] = score
  end
  return score
end

On consideration, the trick of deriving the sum from the sum of some sub-sub-arrays could also be used in a top down, memoised way, rather than by dynamic programming. At any rate, this solution takes time linear to the number of sub-arrays, which is equal to the sum of the first M positive integers multiplied by the sum of the first N positive integers. As I said at the top, this isn't optimal, but not without charm.

* Bentley's "Programming Pearls" sketches a faster solution, which I don't properly understand.

Tags: programming

Back to weblog