How to remove htmlentities() values from the database?

前端 未结 6 1577
星月不相逢
星月不相逢 2021-02-08 04:42

Long before I knew anything - not that I know much even now - I desgined a web app in php which inserted data in my mysql database after running the values through htmlent

相关标签:
6条回答
  • 2021-02-08 04:48

    It's a bit kludgy but I think the mass update is the only way to go...

    $Query = "SELECT row_id, html_entitied_column FROM table";
    $result = mysql_query($Query, $connection);
    while($row = mysql_fetch_array($result)){
        $updatedValue = html_entity_decode($row['html_entitied_column']);
        $Query = "UPDATE table SET html_entitied_column = '" . $updatedValue . "' ";
        $Query .= "WHERE row_id = " . $row['row_id'];
        mysql_query($Query, $connection);
    }
    

    This is simplified, no error handling etc. Not sure what the processing time would be on millions of rows so you might need to break it up into chunks to avoid script timeouts.

    0 讨论(0)
  • 2021-02-08 04:53

    This is my bullet proof version. It iterates over all Tables and String columns in a database, determines primary key(s) and performs updates.

    It is intended to run the php-file from command line to get progress information.

    <?php
    $DBC = new mysqli("localhost", "user", "dbpass", "dbname");
    $DBC->set_charset("utf8");
    
    $tables = $DBC->query("SHOW FULL TABLES WHERE Table_type='BASE TABLE'");
    while($table = $tables->fetch_array()) {
        $table = $table[0];
        $columns = $DBC->query("DESCRIBE `{$table}`");
        $textFields = array();
        $primaryKeys = array();
        while($column = $columns->fetch_assoc()) {
            // check for char, varchar, text, mediumtext and so on
            if ($column["Key"] == "PRI") {
                $primaryKeys[] = $column['Field'];
            } else if (strpos( $column["Type"], "char") !== false || strpos($column["Type"], "text") !== false ) {
                $textFields[] = $column['Field'];
            }
        }
        if (!count($primaryKeys)) {
            echo "Cannot convert table without primary key: '$table'\n";
            continue;
        }
        foreach ($textFields as $textField) {
            $sql = "SELECT `".implode("`,`", $primaryKeys)."`,`$textField` from `$table` WHERE `$textField` like '%&%'";
            $candidates = $DBC->query($sql);
            $tmp = $DBC->query("SELECT FOUND_ROWS()");
            $rowCount = $tmp->fetch_array()[0];
            $tmp->free();
            echo "Updating $rowCount in $table.$textField\n";
            $count=0;
            while($candidate = $candidates->fetch_assoc()) {
                $oldValue = $candidate[$textField];
                $newValue = html_entity_decode($candidate[$textField], ENT_QUOTES | ENT_XML1, 'UTF-8');
                if ($oldValue != $newValue) {
                    $sql = "UPDATE `$table` SET `$textField` = '"
                        . $DBC->real_escape_string($newValue)
                        . "' WHERE ";
                    foreach ($primaryKeys as $pk) {
                        $sql .= "`$pk` = '" . $DBC->real_escape_string($candidate[$pk]) . "' AND ";
                    }
                    $sql .= "1";
                    $DBC->query($sql);
                }
                $count++;
                echo "$count / $rowCount\r";
            }
        }
    }
    ?>
    

    cheers Roland

    0 讨论(0)
  • 2021-02-08 04:56

    Since PHP was the method of encoding, you'll want to use it to decode. You can use html_entity_decode to convert them back to their original characters. Gotta loop!

    Just be careful not to decode rows that don't need it. Not sure how you'll determine that.

    0 讨论(0)
  • 2021-02-08 05:01

    I ended up using this, not pretty, but I'm tired, it's 2am and it did its job! (Edit: on test data)

    $tables = array('users', 'users_more', 'users_extra', 'forum_posts', 'posts_edits', 'forum_threads', 'orders', 'product_comments', 'products', 'favourites', 'blocked', 'notes');
    foreach($tables as $table)
        {       
            $sql = "SELECT * FROM {$table} WHERE data_date_ts < '{$encode_cutoff}'";
            $rows = $database->query($sql);
            while($row = mysql_fetch_assoc($rows))
                {
                    $new = array();
                    foreach($row as $key => $data)
                        {
                            $new[$key] = $database->escape_value(html_entity_decode($data, ENT_QUOTES, 'UTF-8'));
                        }
                    array_shift($new);
                    $new_string = "";
                    $i = 0;
                    foreach($new as $new_key => $new_data)
                        {
                            if($i > 0) { $new_string.= ", "; }
                            $new_string.= $new_key . "='" . $new_data . "'";
                            $i++;
                        }
                    $sql = "UPDATE {$table} SET " . $new_string . " WHERE id='" . $row['id'] . "'";
                    $database->query($sql);
                    // plus some code to check that all out
                }
        }
    
    0 讨论(0)
  • 2021-02-08 05:04

    I think writing a php script is good thing to do in this situation. You can use, as Dave said, the html_entity_decode() function to convert your texts back.

    Try your script on a table with few entries first. This will make you save a lot of testing time. Of course, remember to backup your table(s) before running the php script.

    I'm afraid there is no shorter possibility. The computation for millions of rows remains quite expensive, no matter how you convert the datasets back. So go for a php script... it's the easiest way

    0 讨论(0)
  • 2021-02-08 05:06

    I had the exact same problem. Since I had multiple clients running the application in production, I wanted to avoid running a PHP script to clean the database for every one of them.

    I came up with a solution that is far from perfect, but does the job painlessly.

    1. Track all the spots in your code where you use htmlentities() before inserting data, and remove that.
    2. Change your "display data as HTML" method to something like this :

      return html_entity_decode(htmlentities($chaine, ENT_NOQUOTES), ENT_NOQUOTES);

    The undo-redo process is kind of ridiculous, but it does the job. And your database will slowly clean itself everytime users update the incorrect data.

    0 讨论(0)
提交回复
热议问题