I am a PHP newbie and am working on a basic form validation script. I understand that input filtering and output escaping are both vital for security reasons. My question is whe
I understand that input filtering and output escaping are both vital for security reasons.
I'd say rather that output escaping is vital for security and correctness reasons, and input filtering is potentially-useful measure for defence-in-depth and to enforce specific application rules.
The input filtering step and the output escaping step are necessarily separate concerns, and cannot be combined into one step, not least because there are many different types of output escaping, and the right one has to be chosen for each output context (eg HTML-escaping in a page, URL-escaping to make a link, SQL-escaping, and so on).
Unfortunately PHP is traditionally very hazy on these issues and so offers a bunch of mixed-message functions that are likely to mislead you.
In the example field below, the field is plain text, so all I need to do is sanitize it.
Yes. Alas, FILTER_SANITIZE_STRING
is not in any way a sane sanitiser. It completely removes some content (strip_tags
, which is itself highly non-sensible) whilst HTML-escaping other content. eg quotes turn into "
. This is a nonsense.
Instead, for input sanitisation, look at:
checking it's a valid string for the encoding you're using (hopefully UTF-8; see eg this regex for that);
removing control characters, U+0000–U+001F and U+007F–U+009F. Allow the newline through only on deliberate multi-line text fields;
removing the characters that are not suitable for use in markup;
validating the input conforms to application requirements on a field-by-field basis, for data whose content model is more specific than arbitrary text strings. Although your escaping should handle a <
character correctly, it's probably a good idea to get rid of it early in fields where it makes no sense to have one.
For the output escaping step I'd generally prefer htmlspecialchars()
to htmlentities()
, though your correct use of the UTF-8
argument stops the latter function breaking in the way it usually does.
Depending on what you want to secure, the filter you call might be overactive (see comments). Injectionwise you should be safe since you're using Prepared Statements (see this answer)
On a design note you might want to filter first, then check for empty values. Doing that you can shorten your code ;)
I understand that input filtering ... is vital for security reasons.
This is wrong statement.
Although it can be right in some circumstances, in such a generalised form it can do no good but false feeling of safety.
all I need to do is sanitize it.
There is no such thing like "general sanitizing". You have to understand each particular case and it's limitations. For example, for the database you need to use several different sanitization techniques, not one. While for the filenames it is going to be completely different one.
I am using prepared statements for my database interaction.
Thus, you should not touch the data at all. Just leave it as is.
Here is the (slightly cleaned up) code:
It seems there is some overkill in your code.
you are cleaning your HTML data twice while it is possible that you won't need it at all.
and for some reason you are raising an error on success.
I'd make it rather this way
$formerrors = '';
if ($_POST['fname'] == "") {
$formerrors .= 'Please enter a valid first name.<br/><br/>';
}
if (!$formerrors) {
$html = array();
foreach ($_POST as $key => $val) {
$html[$key] = htmlspecialchars($val,ENT_QUOTES);
}
}