PHP string comparison won't match seemingly identical string

后端 未结 3 1522
梦毁少年i
梦毁少年i 2021-01-16 01:25

I am scraping the DOM of a static site with PHP and pulling out specific bit\'s of data so I can put stuff into a database.

For this example I am storing the inner H

相关标签:
3条回答
  • 2021-01-16 02:12

    The number in parenthesis is the total byte count. Obviously, a 45-byte string cannot be identical to a 11-byte one.

    You can use bin2hex() to inspect the exact bytes. I also suggest you don't see the output as HTML—In most browsers you can hit Ctrl+U.

    Edit: asking why two given strings render the same words after being processed by a web browser is better answered by actually looking at the real raw data (as opposed to just looking at the output produced by the browser).

    Edit #2:

    var_dump( hex2bin('3c74642077696474683d223832222076616c69676e3d22746f70223e547970653c2f74643e') );
    

    ... prints this:

    string(37) "<td width="82" valign="top">Type</td>"
    

    Do you want to strip HTML tags or something? Did you see the raw HTML?

    0 讨论(0)
  • 2021-01-16 02:13

    You should as question why this one happens

    string(45) "Description"
    string(11) "Description"
    

    Second one is 11 chars, first one is 45! Why? So there are some hidden (not showed) characters\symbols. That's why this strings not equal.

    Try this one Remove control characters from php String

    0 讨论(0)
  • 2021-01-16 02:17

    Solution is to use a regex like this

        function clean($string) {
    $string = str_replace(' ', '-', $string); // Replaces all spaces with hyphens.
    return preg_replace('/[^A-Za-z0-9\-\;\,\?\*\%\@\$\!\(\)\#\=\&]/', '', $string); // Removes special chars
    }
    

    Adapt it to the special char you need or not add the one you want to keep catching like this \# or esle \=

    0 讨论(0)
提交回复
热议问题