In addition to the great stuff other people have said, to get a first pass at what files are actively being used, you can install an opcode cache like APC or eaccelerator on your dev server (or even a production server, this won't break anything). Then, click around the web app on your dev server (or let the users do it on your production server).
Now look at the list of cached files in your cache admin page. If a file isn't listed as being cached by your opcode cache, there's a good chance it isn't being loaded by anything.
This isn't a whole solution, but if each directory has 10 index.php files (e.g. index.php, index2.php, etc.), at least you'll know which one is being used by your app.