I've tackled the same problem and found that there's no way to determine a content's encoding type without metadata about the content. That's why I ended up with the same approach you're trying here.
My only additional advice to what you've done is, rather than ordering the list of possible encoding in most-likely order, you should order it by specificity. I've found that certain character sets are subsets of others, and so if you check utf_8
as your second choice, you'll miss ever finding the subsets of utf_8
(I think one of the Korean character sets uses the same number space as utf).