My website was recently attacked by, what seemed to me as, an innocent code:
Another way to sanitize the input is to make sure that only allowed characters (no "/", ".", ":", ...) are in it. However don't use a blacklist for bad characters, but a whitelist for allowed characters:
$page = preg_replace('[^a-zA-Z0-9]', '', $page);
... followed by a file_exists.
That way you can make sure that only scripts you want to be executed are executed (for example this would rule out a "blabla.inc.php", because "." is not allowed).
Note: This is kind of a "hack", because then the user could execute "h.o.m.e" and it would give the "home" page, because all it does is removing all prohibited characters. It's not intended to stop "smartasses" who want to cute stuff with your page, but it will stop people doing really bad things.
BTW: Another thing you could do in you .htaccess file is to prevent obvious attack attempts:
RewriteEngine on
RewriteCond %{QUERY_STRING} http[:%] [NC]
RewriteRule .* /–http– [F,NC]
RewriteRule http: /–http– [F,NC]
That way all page accesses with "http:" url (and query string) result in an "Forbidden" error message, not even reaching the php script. That results in less server load.
However keep in mind that no "http" is allowed in the query string. You website might MIGHT require it in some cases (maybe when filling out a form).
BTW: If you can read german: I also have a blog post on that topic.
@pek - That won't work, as your array keys are 0 and 1, not 'home' and 'page'.
This code should do the trick, I believe:
<?php
$whitelist = array(
'home',
'page',
);
if(in_array($_GET['page'], $whitelist)) {
include($_GET['page'] . '.php');
} else {
include('home.php');
}
?>
As you've a whitelist, there shouldn't be a need for file_exists()
either.
Use a whitelist and make sure the page is in the whitelist:
$whitelist = array('home', 'page');
if (in_array($_GET['page'], $whitelist)) {
include($_GET['page'].'.php');
} else {
include('home.php');
}
I'm assuming you deal with files in the same directory:
<?php
if (isset($_GET['page']) && !empty($_GET['page'])) {
$page = urldecode($_GET['page']);
$page = basename($page);
$file = dirname(__FILE__) . "/{$page}.php";
if (!file_exists($file)) {
$file = dirname(__FILE__) . '/home.php';
}
} else {
$file = dirname(__FILE__) . '/home.php';
}
include $file;
?>
This is not too pretty, but should fix your issue.
I know this is a very old post and I expect you don't need an answer anymore, but I still miss a very important aspect imho and I like it to share for other people reading this post. In your code to include a file based on the value of a variable, you make a direct link between the value of a field and the requested result (page becomes page.php). I think it is better to avoid that. There is a difference between the request for some page and the delivery of that page. If you make this distinction you can make use of nice urls, which are very user and SEO friendly. Instead of a field value like 'page' you could make an URL like 'Spinoza-Ethica'. That is a key in a whitelist or a primary key in a table from a database and will return a hardcoded filename or value. That method has several advantages besides a normal whitelist:
the back end response is effectively independent from the front end request. If you want to set up your back end system differently, you do not have to change anything on the front end.
Always make sure you end with hardcoded filenames or an equivalent from the database (preferrabley a return value from a stored procedure), because it is asking for trouble when you make use of the information from the request to build the response.
Because your URLs are independent of the delivery from the back end you will never have to rewrite your URLs in the htAccess file for this kind of change.
The URLs represented to the user are user friendly, informing the user about the content of the document.
Nice URLs are very good for SEO, because search engines are in search of relevant content and when your URL is in line with the content will it get a better rate. At least a better rate then when your content is definitely not in line with your content.
If you do not link directly to a php file, you can translate the nice URL into any other type of request before processing it. That gives the programmer much more flexibility.
You will have to sanitize the request, because you get the information from a standard untrustfull source (the rest of the Web). Using only nice URLs as possible input makes the sanitization process of the URL much simpler, because you can check if the returned URL conforms your own format. Make sure the format of the nice URL does not contain characters that are used extensively in exploits (like ',",<,>,-,&,; etc..).
The #1 rule when accepting user input is always sanitize it. Here, you're not sanitizing your page GET variable before you're passing it into include. You should perform a basic check to see if the file exists on your server before you include it.