I have got a square matrix consisting of elements either 1 or 0. An ith row toggle toggles all the ith row elements (1 becomes 0 and vice versa) and jth column toggle toggle
Algorithm
Simplify the problem from "Try to transform A into B" into "Try to transform M into 0", where M = A xor B. Now all the positions which must be toggled have a 1 in them.
Consider an arbitrary position in M. It is affected by exactly one column toggle and exactly one row toggle. If its initial value is V, the presence of the column toggle is C, and the presence of the row toggle is R, then the final value F is V xor C xor R. That's a very simple relationship, and it makes the problem trivial to solve.
Notice that, for each position, R = F xor V xor C = 0 xor V xor C = V xor C. If we set C then we force the value of R, and vice versa. That's awesome, because it means if I set the value of any row toggle then I will force all of the column toggles. Any one of those column toggles will force all of the row toggles. If the result is the 0 matrix, then we have a solution. We only need to try two cases!
Pseudo-code
function solve(Matrix M) as bool possible, bool[] rowToggles, bool[] colToggles:
For var b in {true, false}
colToggles = array from c in M.colRange select b xor Matrix(0, c)
rowToggles = array from r in M.rowRange select colToggles[0] xor M(r, 0)
if none from c in M.colRange, r in M.rowRange
where colToggle[c] xor rowToggle[r] xor M(r, c) != 0 then
return true, rowToggles, colToggles
end if
next var
return false, null, null
end function
Analysis
The analysis is trivial. We try two cases, within which we run along a row, then a column, then all cells. Therefore if there are r rows and c columns, meaning the matrix has size n = c * r, then the time complexity is O(2 * (c + r + c * r)) = O(c * r) = O(n). The only space we use is what is required for storing the outputs = O(c + r).
Therefore the algorithm takes time linear in the size of the matrix, and uses space linear in the size of the output. It is asymptotically optimal for obvious reasons.
Rather than look at this as a matrix problem, take the 9 bits from each array, load each of them into 2-byte size types (16 bits, which is probably the source of the arrays in the first place), then do a single XOR between the two.
(the bit order would be different depending on your type of CPU)
The first array would become: 0000000001111101 The second array would become: 0000000111110101
A single XOR would produce the output. No loops required. All you'd have to do is 'unpack' the result back into an array, if you still wanted to. You can read the bits without resorting to that, though.i