I am very confused over something and was wondering if someone could explain.
In PHP i validate user input so htmlentitiies, mysql_real_escape_string is used before
mysql_real_escape_string()
is all you need for the database operations. It'll ensure that a malicious user can't embed something into data that'll "break" your queries.
htmlentities()
and htmlspecialchars()
come into play when you're working with sending stuff to the client/browser. If you want to clean up potentially hostile HTML, you'd be better off using HTMLPurifier, which will strip the data to the bedrock and hose it down with bleach and rebuild it properly.
There's no reason to worry about having malicious JavaScript code in the database if you're escaping the HTML when it comes out. Just make sure you always do escape anything that comes out of the DB.
This is a long question, but I think what you're actually asking boils down to:
"Should I escape HTML before inserting it into my database, or when I go to display it?"
The generally accepted answer to this question is that you should escape the HTML (via htmlspecialchars
) when you go to display it to the user, and not before putting it into the database.
The reason is this: a database stores data. What you are putting into it is what the user typed. When you call mysql_real_escape_string
, it does not alter what is inserted into the database; it merely avoids interpreting the user's input as SQL statements. htmlspecialchars
does the same thing for HTML; when you print the user's input, it will avoid having it interpreted as HTML. If you were to call htmlspecialchars
before the insert, you are no longer being faithful.
You should always strive to have the maximum-fidelity representation you can get. Since storing the "malicious" code in your database does no harm (in fact, it saves you some space, since escaped HTML is longer than unescaped!), and you might in the future want that HTML (what if you use an XML parser on user comments, or some day let trusted users have a subset of HTML in their comments, or some such?), why not let it be?
You also ask a bit about other types of input validation (integer constraints, etc). Your database schema should enforce these, and they can also be checked at the application layer (preferably on input via JS and then again server side).
On another note, the best way to do database escaping with PHP is probably to use PDO, rather than calling mysql_real_escape_string
directly. PDO has more advanced functionality, including type checking.