I am scraping the DOM of a static site with PHP and pulling out specific bit\'s of data so I can put stuff into a database.
For this example I am storing the inner H
The number in parenthesis is the total byte count. Obviously, a 45-byte string cannot be identical to a 11-byte one.
You can use bin2hex() to inspect the exact bytes. I also suggest you don't see the output as HTML—In most browsers you can hit Ctrl+U.
Edit: asking why two given strings render the same words after being processed by a web browser is better answered by actually looking at the real raw data (as opposed to just looking at the output produced by the browser).
Edit #2:
var_dump( hex2bin('3c74642077696474683d223832222076616c69676e3d22746f70223e547970653c2f74643e') );
... prints this:
string(37) "Type "
Do you want to strip HTML tags or something? Did you see the raw HTML?