#345 - Matrix Sum
We define the Matrix Sum of a matrix as the maximum possible sum of matrix elements such that none of the selected elements share the same row or column.
For example, the Matrix Sum of the matrix below equals 3315 (= 863 + 383 + 343 + 959 + 767):
7 53 183 439 863 497 383 563 79 973 287 63 343 169 583 627 343 773 959 943 767 473 103 699 303Find the Matrix Sum of:
7 53 183 439 863 497 383 563 79 973 287 63 343 169 583 627 343 773 959 943 767 473 103 699 303 957 703 583 639 913 447 283 463 29 23 487 463 993 119 883 327 493 423 159 743 217 623 3 399 853 407 103 983 89 463 290 516 212 462 350 960 376 682 962 300 780 486 502 912 800 250 346 172 812 350 870 456 192 162 593 473 915 45 989 873 823 965 425 329 803 973 965 905 919 133 673 665 235 509 613 673 815 165 992 326 322 148 972 962 286 255 941 541 265 323 925 281 601 95 973 445 721 11 525 473 65 511 164 138 672 18 428 154 448 848 414 456 310 312 798 104 566 520 302 248 694 976 430 392 198 184 829 373 181 631 101 969 613 840 740 778 458 284 760 390 821 461 843 513 17 901 711 993 293 157 274 94 192 156 574 34 124 4 878 450 476 712 914 838 669 875 299 823 329 699 815 559 813 459 522 788 168 586 966 232 308 833 251 631 107 813 883 451 509 615 77 281 613 459 205 380 274 302 35 805
Introduction
A standard first step in solving the problem is to try to simplify it, either by removing restrictions or considering a smaller version. Imagine if the restriction of having each element be in exactly in one row and column was absent. What would be the maximum sum possible? We would simply select the maximum value in each row/column and that would be our answer. Therefore, the actual Matrix sum is smaller than this theoretical maximum sum. We need to find a way to choose other elements such that it decreases the theoretical maximum as little as possible.
For example, the theoretical maximum sum in the 5-by-5 example is $767 + 473 + 563 + 959 + 973 = 3735$. The actual answer (3315) is slightly less than this, as we would need to change our number selection. Intuitively, we want to decrease the sum by the smallest amount, and the algorithm we will use to formalize this is called the Hungarian Algorithm.
Hungarian Algorithm
The Hungarian Algorithm is an optimization algorithm that solves the assignment problem in polynomial time. It primarily operates on finding the minimum assignment. For example, if we have 3 people and need to assign 3 different jobs to them, where each person asks for a different price for each price, how do we find the assignment of jobs to people that results in the lowest total cost?
An equivalent statement is to find the minimum weight bipartite graph matching between two sets of vertices. One set is the set of people $P$, and the second set is the set of jobs $J$. An edge between $p\in P$ and $j\in J$ represents the cost of job $j$ to person $p$.
If we want to apply this algorithm to finding the maximum, then we negate the entire matrix and apply the same steps. The steps of the algorithm reflect the intuition of finding the smallest decrease in total sum in the case of collisions e.g. in the 5-by-5 example, 473 is the highest value in the second column, but we needed to choose 383 (the second highest value) in order to find the optimal solution. The Hungarian Algorithm finds these secondary selections systematically.
Since we have a matrix of values, I will go through the algorithm using the steps shown in this section.
Example
Step 0
The answer asks us to find the maximum, but the algorithm only works on minimizing. Thus, we will negate the entire matrix. \[\begin{bmatrix} -7 & -53 & -183 & -439 & -863 \\ -497 & -383 & -563 & -79 & -973 \\ -287 & -63 & -343 & -169 & -583 \\ -627 & -343 & -773 & -959 & -943 \\ -767 & -473 & -103 & -699 & -303 \end{bmatrix}\]
Step 1
The first step (after negating the matrix) is to subtract the minimum value from each row and column. \[\begin{bmatrix} 856 & 516 & 494 & 424 & 0 \\ 476 & 296 & 224 & 894 & 0 \\ 296 & 226 & 54 & 414 & 0 \\ 332 & 322 & 0 & 0 & 16 \\ 0 & 0 & 478 & 68 & 464 \end{bmatrix}\]
Intuitively, the zeroes represent the minimum points in each row and column, and the nonzero values represent the “extra cost” of switching your selection to that value. For example, the bottom left 0 corresponds to the value of 767 in the original matrix. If we instead picked the value 4 spots to the left to be in our sum (699) this represents a 68 increase in the “cost” i.e. our maximum sum decreased by 68.
Since we subtracted the minimum from each row and column, the corresponding intuition is slightly complicated, but the information is still neatly baked into this new matrix.
Through multiple refinements of this matrix, our ultimate goal is to find a single zero in each row and column. The locations of these zeroes will tell us which values in the original matrix correspond to our sum.
Step 2
Create an assignment of zeroes. The only restriction is that there should not be more than one assigned zero in each row and column. For example, we can do the following: \[\begin{bmatrix} 856 & 516 & 494 & 424 & \color{red} 0 \\ 476 & 296 & 224 & 894 & 0 \\ 296 & 226 & 54 & 414 & 0 \\ 332 & 322 & \color{red} 0 & 0 & 16 \\ \color{red} 0 & 0 & 478 & 68 & 464 \end{bmatrix}\]
Next, we cover all zeroes using the minimum number of lines. There are multiple ways to do this, and the Wikipedia article gives good detail into a method. Using this method, the last column, and the fourth and fifth rows get covered (in green). The bolded elements are the ones which are covered twice (we will need these for the next step). \[\begin{bmatrix} 856 & 516 & 494 & 424 & \color{red} 0 \\ 476 & 296 & 224 & 894 & \color{green} 0 \\ 296 & 226 & 54 & 414 & \color{green} 0 \\ \color{green}332 & \color{green}322 & \color{red}0 & \color{green}0 & \color{green} \mathbf{16} \\ \color{red}0 & \color{green}0 & \color{green}478 & \color{green}68 & \color{green} \mathbf{464} \end{bmatrix}\]
Notice we are still saving the existing assignment.
Step 3
If we do not have a valid assignment of at least $n$ zeroes, then find the smallest value which is uncovered, and subtract this from all uncovered elements. Additionally, add this value to any elements which are covered by two lines.
In our matrix, the smallest uncovered value is 54. We subtract this from all elements in black. We also add 54 to the two elements that are covered twice (the green bolded numbers). Thus, our resulting matrix is \[\begin{bmatrix} 802 & 462 & 440 & 370 & 0 \\ 422 & 242 & 170 & 840 & 0 \\ 242 & 172 & 0 & 360 & 0 \\ 332 & 322 & 0 & 0 & 70 \\ 0 & 0 & 478 & 68 & 518 \end{bmatrix}\]
This step is equivalent to solving a single collision, because this step generates an additional zero.
Then we go back to step 2, and repeat.
Repeat
When we repeat, we preserve the original assignment of zeroes from the previous step. However, in the process of finding the minimum line covering, a new assignment also gets generated (see step 4 in the article). In this case, the last row, and the third, fourth, and fifth columns get covered, and the following is our new assignment. \[\begin{bmatrix} 802 & 462 & \color{green} 440 & \color{green}370 & \color{red}0 \\ 422 & 242 & \color{green}170 & \color{green}840 & \color{green}0 \\ 242 & 172 & \color{red}0 & \color{green}360 & \color{green}0 \\ 332 & 322 & \color{green}0 & \color{red}0 & \color{green}70 \\ \color{red}0 & \color{green}0 & \color{green}\mathbf{478} & \color{green}\mathbf{68} & \color{green}\mathbf{518} \end{bmatrix}\]
Once again, we find the smallest uncovered value (172), and subtract from all uncovered values, and add this to the 3 values that are double-covered. \[\begin{bmatrix} 630 & 290 & 440 & 370 & 0 \\ 250 & 70 & 170 & 840 & 0 \\ 70 & 0 & 0 & 360 & 0 \\ 160 & 150 & 0 & 0 & 70 \\ 0 & 0 & 650 & 240 & 690 \end{bmatrix}\]
Again repeat
The assignment does not change, and the minimum covering results in the last column, and the third, fourth, and fifth columns getting covered. \[\begin{bmatrix} 630 & 290 & 440 & 370 & \color{red}0 \\ 250 & 70 & 170 & 840 & \color{green}0 \\ \color{green}70 & \color{green}0 & \color{red}0 & \color{green}360 & \color{green}0 \\ \color{green}160 & \color{green}150 & \color{green}0 & \color{red}0 & \color{green}70 \\ \color{red}0 & \color{green}0 & \color{green}650 & \color{green}240 & \color{green}690 \end{bmatrix}\]
The smallest uncovered value is 70: \[\begin{bmatrix} 560 & 220 & 370 & 300 & 0 \\ 180 & 0 & 100 & 770 & 0 \\ 70 & 0 & 0 & 360 & 0 \\ 160 & 150 & 0 & 0 & 70 \\ 0 & 0 & 650 & 240 & 690 \end{bmatrix}\]
Repeat once more
At this point, when we find the minimum covering, we see that we need five lines to do so. This simultaneously means that there is an assignment such that there is exactly one assigned zero in each row and column. The location of these zeroes give us the answer: \[\begin{bmatrix} 560 & 220 & 370 & 300 & \color{red}0 \\ 180 & \color{red} 0 & 100 & 770 & 0 \\ 70 & 0 & \color{red} 0 & 360 & 0 \\ 160 & 150 & 0 & \color{red} 0 & 70 \\ \color{red} 0 & 0 & 650 & 240 & 690 \end{bmatrix}\]
Notice these match up exactly with the matrix sum in the problem above. At this point, we are done!
Code
In terms of code, each step maps cleanly to a step we can implement. The most complicated step is finding the minimum covering. We create two separate arrays to keep track which rows and columns are covered. We also have assign_zeroes
and find_uncovered_zero
which creates an assignment of zeroes and finds the first uncovered zero respectively. Our matrix is an np.ndarray
, which allows for some convenient indexing. The entire logic is put in a loop, because we need to keep looping until we find an assignment with the proper number of zeroes. Finally, the large matrix is kept in a space-delimited text file, which is read directly and put into a matrix.
def assign_zeroes(data: np.ndarray):
# Generates an "assignment" of zeroes based on the following rule.
# An assigned zero cannot be in the same row or column as
# another assigned zero.
assigned = []
cols_available = list(range(data.shape[1]))
for r in range(data.shape[0]):
for c in cols_available:
if data[r, c] == 0:
assigned.append((r, c))
cols_available.remove(c)
break
return assigned
def find_uncovered_zero(data: np.ndarray, row_cover_mask, col_cover_mask):
for r in np.where(~row_cover_mask)[0]:
for c in np.where(~col_cover_mask)[0]:
if data[r, c] == 0:
return r, c
return None
def hungarian(data: np.ndarray):
# Subtract the minimum from each row
data -= np.min(data, axis=1)[:, np.newaxis]
# Subtract the minimum from each column (transpose and repeat above)
data = (data.T - np.min(data, axis=0)[:, np.newaxis]).T
# Go through each row, and "assign" the zeroes.
# Assignments can't be in the same row or column as the others
# Save the locations as tuples...
assigned = np.asarray(assign_zeroes(data))
# We keep refining until the number of assigned zeros reaches
# the number of columns
while len(assigned) < data.shape[1]:
# There is a chance that this isn't the best assignment.
# So we will refine this further. We keep refining until
# we have a complete covering.
# The first basic covering is to cover each column which
# has a 0.
row_cover_mask = np.zeros(data.shape[0], dtype=bool)
col_cover_mask = np.zeros(data.shape[1], dtype=bool)
# Initialize just using columns to attempt to cover 0s.
col_cover_mask[[c for _, c in assigned]] = True
primed_zeros = []
# We try to cover ALL the zeroes using the fewest number of lines.
# During the process, it is possible that we end up assigning one more zero.
while np.any(data[:, ~col_cover_mask][~row_cover_mask] == 0):
# Find an uncovered zero and prime it.
uncover_r, uncover_c = find_uncovered_zero(data, row_cover_mask, col_cover_mask)
primed_zeros.append((uncover_r, uncover_c))
# If the zero is on the same row as an assigned zero, then we cover the row, and uncover
# the column of the assigned zero.
if uncover_r in assigned[:, 0]:
row_cover_mask[uncover_r] = True
col_cover_mask[assigned[np.where(assigned[:, 0] == uncover_r)][0, 1]] = False
continue
else:
# This means the uncovered zero doesn't have an assigned zero on the row.
# Find an assigned zero in the same column,
# then find a primed zero in the same row as the assigned zero.
# Keep going until we don't find an assigned zero in the column,
# and keep track of the path...
curr_r, curr_c = uncover_r, uncover_c
assigned = np.append(assigned, [[curr_r, curr_c]], axis=0)
primed_zeros.remove((curr_r, curr_c))
while curr_c in assigned[:-1, 1]:
# Get the row of the assigned zero in this column (don't include the
# location we just assigned)
loc = np.where(assigned[:-1, 1] == curr_c)[0][0]
# Extract the location
assigned_r, assigned_c = assigned[loc, 0], assigned[loc, 1]
# This assigned zero becomes a primed zero, so delete it
# from the assigned list.
assigned = np.delete(assigned, loc, axis=0)
# Find the primed zero on this row and turn this into
# an assigned zero, and set the current column accordingly...
primed_r, primed_c = [(r, c) for r, c in primed_zeros if r == assigned_r][0]
primed_zeros.append((assigned_r, assigned_c))
primed_zeros.remove((primed_r, primed_c))
assigned = np.append(assigned, [[primed_r, primed_c]], axis=0)
curr_c = primed_c
# In this process, we have assigned one more zero.
# Get rid of all the primed zeros, and recover the lines.
primed_zeros = []
row_cover_mask = np.zeros(data.shape[0], dtype=bool)
col_cover_mask = np.zeros(data.shape[1], dtype=bool)
col_cover_mask[[c for _, c in assigned]] = True
# If we ended up assigning one more zero, and this led us to have the correct number,
# then break out of the loop, we have our answer.
if len(assigned) == data.shape[1]:
break
# Otherwise, we find the least element that is NOT covered by a line, and subtract
# it from each element NOT covered by a line, and add it to each element covered
# by TWO lines. Then we repeat the process.
min_value = np.min(data[~row_cover_mask][:, ~col_cover_mask])
row_lines = np.where(~row_cover_mask)[0]
col_lines = np.where(~col_cover_mask)[0]
# Use an inverse zip with product to correctly format the locations where we need to subtract
rows, cols = zip(*[prod for prod in product(row_lines, col_lines)])
data[rows, cols] -= min_value
# Use product to pair the locations where lines cross
for r, c in product(np.where(row_cover_mask)[0], np.where(col_cover_mask)[0]):
data[r, c] += min_value
return assigned
data = np.loadtxt('./problem345.txt', dtype=np.int32)
sum_path = hungarian(-1 * data)
print(sum_path)
rows, cols = zip(*sum_path)
print(data[rows, cols])
# Index into the array and simply return the sum
matrix_sum = sum(data[rows, cols])
print('Maximum sum:', matrix_sum)
Running this long code, we get
[[ 0 9]
[ 2 7]
[ 6 13]
[ 7 2]
[ 8 14]
[ 9 11]
[10 6]
[11 5]
[12 12]
[14 1]
[ 3 4]
[13 8]
[ 5 0]
[ 4 3]
[ 1 10]]
[973 993 992 972 848 976 969 901 823 883 853 966 870 962 957]
Maximum sum: 13938
0.013088769000034972 seconds.
Therefore, the matrix sum for the large matrix is 13938. This was a good exercise in converting an algorithm into a program.