Say I have a website www.abc.com. Under the website directory there is a page secret.html. It can be accessed directly like www.abc.com/secret.html, but there are no pages that
Any crawler or spider will read your index.htm
or equivalent, that is exposed to the web, they will read the source code for that page, and find everything that is associated to that webpage and contains subdirectories. If they find a "contact us" button, there may be is included the path to the webpage or php that deal with the contact-us action, so they now have one more subdirectory/folder name to crawl and dig more. But even so, if that folder has a index.htm
or equivalent file, it will not list all the files in such folder.
If by mistake, the programmer never included an index.htm
file in such folder, then all the files will be listed on your computer screen, and also for the crawler/spider to keep digging. But, if you created a folder www.yoursite.com/nombresinistro75crazyragazzo19/
and put several files in there, and never published any button or never exposed that folder address anywhere in the net, keeping only in your head, chances are that nobody ever will find that path, with crawler or spider, for more sophisticated it can be.
Except, of course, if they can enter your FTP or access your site control panel.
If a website's directory does NOT have an "index...." file, AND .htaccess has NOT been used to block access to the directory itself, then Apache will create an "index of" page for that directory. You can save that page, and its icons, using "Save page as..." along with the "Web page, complete" option (Firefox example). If you own the website, temporarily rename any "index...." file, and reference the directory locally. Then restore your "index...." file.
DirBuster is such a hacking script that guesses a bunch of common names as nsanders had mentioned. It literally brute forces lists of common words and file endings (.html, .php) and over time figures out the directory structure of such sites, this could discover the page as you described but would also discover many others.
If you have directory listing disabled in your webserver, then the only way somebody will find it is by guessing or by finding a link to it.
That said, I've seen hacking scripts attempt to "guess" a whole bunch of these common names. "secret.html" would probably be in such a guess list.
The more reasonable solution is to restrict access using a username/password via a htaccess file (for apache) or the equivalent setting for whatever webserver you're using.
Yes, you can, but you need a few tools first. You need to know a little about basic coding, FTP clients, port scanners and brute force tools, if it has a .htaccess file.
If not just try tgp.linkurl.htm or html, ie default.html
, www/home/siteurl/web/
, or wap /index/ default /includes/ main/ files/ images/ pics/ vids/
, could be possible file locations on the server, so try all of them so www/home/siteurl/web/includes/.htaccess
or default.html
. You'll hit a file after a few tries then work off that. Yahoo has a site file viewer too: you can try to scan sites file indexes.
Alternatively, try brutus aet, trin00, trinity.x, or whiteshark airtool to crack the site's FTP login (but it's illegal and I do not condone that).
There are only two ways to find a web page: through a link or by listing the directory.
Usually, web servers disable directory listing, so if there is really no link to the page, then it cannot be found.
BUT: information about the page may get out in ways you don't expect. For example, if a user with Google Toolbar visits your page, then Google may know about the page, and it can appear in its index. That will be a link to your page.