Ok, this subject is a hotbed I understand that. I also understand that this situation is dependent on what you are using as code. I have three situations that need to be r
It's a very important question and it actually has a simple answer in the form of encodings. The problem you are facing it that you use a lot of languages at the same time. First you are in HTML, then in PHP and a few seconds later in SQL. All these languages have their own syntax rules.
The thing to remember is: a string should at all times be in its proper encoding.
Lets take an example. You have a HTML form and the user enters the following string into it:
I really <3 dogs & cats ;')
Upon pressing the submit button, this string is being send to your PHP script. Lets assume this is done through GET. It gets appended to the URL, which has its own syntax (the & character has special meaning for instance) so we are changing languages. This means the string must be transformed into the proper URL-encoding. In this case the browser does it, but PHP also has an urlencode
function for that.
In the PHP script, the string is stored in $_GET
, encoded as a PHP string. As long as you are coding PHP, this is perfectly fine. But now lets put the string to use in a SQL query. We change languages and syntax rules, therefore the string must be encoded as SQL through the mysql_real_escape_string
function.
At the other end, we might want to display the string back to the users again. We retrieve the string from the database and it is returned to us as a PHP string. When we want to embed it in HTML for output, we're changing languages again so we must encode our string to HTML through the htmlspecialchars
function.
Throughout the way, the string has always been in the proper encoding, which means any character the user can come up with will be dealt with accordingly. Everything should be running smooth and safe.
A thing to avoid (sometimes this is even recommended by the ignorant) is prematurely encoding your string. For instance, you could apply htmlspecialchars
to the string before putting it in the database. This way, when you retrieve the string later from the database you can stick it in the HTML no problem. Sound great? Yeah, really great until you start getting support tickets of people wondering why their PDF receipts are full of & >
junk.
In code:
form.html:
URL it generates:
http://www.example.org/form.php?comment=I%20really%20%3C3%20dogs%20&%20cats%20;')
post.php:
// Connect to database, etc....
// Place the new comment in the database
$comment = $_GET['comment']; // Comment is encoded as PHP string
// Using $comment in a SQL query, need to encode the string to SQL first!
$query = "INSERT INTO posts SET comment='". mysql_real_escape_string($comment) ."'";
mysql_query($query);
// Get list of comments from the database
$query = "SELECT comment FROM posts";
print 'Posts
';
print '';
while($post = mysql_fetch_assoc($query)) {
// Going from PHP string to HTML, need to encode!
print ''. htmlspecialchars($post['comment']) .' ';
}
print '
';
print ''