As I understand LU factorization, it means that a matrix A can be written as A = LU for a lower-triangular matrix L and an upper-triangular matrix U.
However, the fu
To add to @DomJack: Changing the permutation (aka reordering) can also affect the number of non-zeros in the L and U factors. Thus, reordering can result in a more efficient factorization, memory-wise.
Consider the Gaussian elimination process. What do you do if there's a zero on the pivot? You have to switch rows, which introduces a P matrix.
Moreso, very small non-zero pivot values lead to numerical instability in a floating point environment. Basic algorithms avoid this by searching for the entry with the largest absolute value in the pivot column and switching the corresponding row with the pivot row.
This switch can be expensive, so often the largest absolute value entry will have to be bigger than the pivot's absolute value by some factor, e.g. 10, for the switch to occur. This reduces the number of switches, but keeps those that would be necessary to limit floating point errors.
Search "LU factorization with partial pivoting" for any number of good resources on the issue.
Note: Since P is a permutation matrix, P^T = P^(-1). Thus, Ax = b has the same solution as LUx = P^T b (some implementations return what you've called P, while others return what you'd call P^T and call it P - make sure you know which one it is. It's the difference betwee 'PA = LU', and 'A = PLU' - the P's are not the same in each case).
Not all matrices have an LU decomposition. But every square matrix has at least one row permutation with an LU decomposition.