I haven\'t found an answer to this particular question; perhaps there isn\'t one. But I\'ve been wondering for a while about it.
What exactly causes a binary file to
The display looks interesting, because a binary file can contain non-printable characters. It is up to the displaying program to replace such characters with something else.
This can be prevented by using a hex editor. Such a program displays each byte from the file as its hexadecimal value. That makes for a nice tabular view of the file, but it is not easy for the average person to decipher this view, because we are not used to look at data that way.
There are a few ways to find out what program a file might belong to. You can look at the beginning of the file and with some knowledge, you might recognize the file type. There are some types that begin with the same characters (RAR, GIF etc.). For other types it might not be as easy.
In Linux you can use the "file" command to help you determine file type. There are probably programs for Windows that will do the same.
A text editor makes very few assumptions about the data coming into it, besides things like character encodings. Thus, it will (as you say) read the file's data as ASCII and display it that way. Since binary data doesn't always fall within the alphanumeric range, you get gibberish. As for showing the raw binary values, you need a hex editor like XVI32.
Binary files often have no context outside of the program that uses them. Some binary formats contain a 4-byte magic sequence at the beginning (for example, Java .class files start with "CAFE"), but to recognize them without their program, you need a mapping of those 4-byte sequences. I believe some Linux distros contain this information for a wide variety of binary formats and will examine the beginning of the file to attempt to identify it. Other than that, there's not much you can do.
The reason files that are binary display as gibberish when viewed in standard text editors such as notepad is because when displayed with the encodings commonly used by these types of applications (e.g. ASCII of UTF-8) the data is mapped to characters when it is encoded for display, the output of this process generally makes as little sense to humans as the binary data being mapped, ergo the gibberish you see
As previously mentioned these files make more sense when viewed in a different way such as with a hex editor.
Certain file types can be recognized by data present in all files of a given type, for example all executable files (*.exe) begin with the letters MZ
Binary data is often very random. Encrypted data in particular, by definition. Each byte can be represented by one of 256 characters (leaving Unicode out of the equation). ASCII only covers 128 of these, and only 94 of these are actual printable characters. Outside the ASCII range, you have a number of international characters and strange symbols. There are certainly more than 128 of these, so one must specify a codepage to select a specific set of symbols.
Anyway, since binary files can be represented as a very random assortment of familiar and unfamiliar characters, the file will look like gibberish if you open it in an editor.
You could always open a file (binary or text file, there really is no difference) in a hex editor, and look at the raw binary data.
There is no way to tell which program created a specific file. In particular, if the program has encrypted its data, all hope is lost. Otherwise, it is often easy to recognize certain "signatures."
Yes, Wordpad and Notepad and many other text editors assume that any file you open with it is a text file and will try to display the ASCII characters represented by the bytes in the file.
Hex Editors are made to view and edit binary files. They usually display each byte as a pair of hexadecimal digits instead of "1s and 0s" because it's easier to read that way.
A binary file appears as gibberish because the data in it is designed for the machine to read and not for humans. Sadly, some of us get used to interpreting gibberish - albeit with somewhat specialized tools to help see the data better - but most people should not need to know.
Each byte in the file is treated as a character in the current code set (probably CP1252 on Windows). Byte value 65 is 'A', for example; you can find illustrative examples easily on the web. So, the bytes that make up the binary data are displayed according to the code set - as best as the text editor can. It doesn't try to convert the binary - it doesn't know how (only the original program does).
As to how to detect what program created the file - you may be able to do that sometimes, but not easily and reliably. On Unix (or with Cygwin on Windows) the 'file' program may be able to help. This program looks at the first few bytes to try and guess the program.
Encrypted data is supposed to look like gibberish. If it doesn't look like gibberish, then it probably isn't very well encrypted.