I only have some very rudimentary theoretical knowledge about RSA.
While reading different sources about how to use it in practice, it seemed that PKCS#1 OAEP would
Your general understanding about interchanging the roles of public and private key is correct. In the end, RSA is based on the fact that
m^(ed) congruent m (mod n)
What is normally titled RSA encryption is typically the operation
m^e mod n,
raising the message to the e-th power where e is the public key.
Decryption is then
(m^e)^d mod n,
raising the encrypted message to the d-th power with d being the private key. Now because of the rules of exponentiation and the fact that multiplication is commutative (these still hold in modular arithmetic) we have that
m congruent (m^e)^d congruent m^(ed) congruent m^(de) congruent (m^d)^e,
and therefore you get the same result if you apply the operations in the reverse order.
You were right in assuming that the reversal leads to digital signatures, because everybody can verify ("decrypt") the signature with the public key e, so the message was authentic only if it was "encrypted" (signed) using the corresponding private key d.
As it turns out, PyCrypto is only trying to prevent you from mistaking one for the other here, OpenSSL or Ruby OpenSSL allow you for example to do both: public_encrypt/public_decrypt and private_encrypt/private_decrypt.
So much for the theory, now to why there is good reason for not letting you use them interchangeably. What I just described is often referred to as "textbook RSA" and it is still far from being secure. Additional things need to be taken care of to make the result usable in practice. And that's why there is a dedicated signature package in PyCrypto - this effectively does what you described, but also additionally takes care of the things I mentioned. While it is good for our understanding to know how these things work, we should always use such packages in practice because they already made and fixed the mistakes we would probably introduce when rolling our own.
As to why e is always 65537. It doesn't really have to be a fixed value, but it is commonly chosen to be a very small number with as few 1's in its binary representation as possible (65537 is 10001). In the past, e=3 or e=17 were also chosen, but were considered as not safe in practice because they could be attacked by simply taking the 3rd or 17th root of the ciphertext. If e=3 and m=3, then 3^3 is 27, and it takes no genius to figure out that m is 3 given that the ciphertext is 27, regardless of the modulus n (which is typically much larger). So the danger lies in the fact that the ciphertext, even after exponentiation, does not cross "the modulus boundary" and therefore allows us to simply take the e-th root to arrive at the original message. With typical moduli of 1024 - 4096 bits, this is no longer an issue with e=65537.
Few 1's in the binary representation are also good for computing m^e fast. Modular exponentiation is often implemented using a Multiply and Square algorithm, and performance is best for small e's with few 1's. Why is it chosen this way and not the other way round, for example having a small d with few 1's? Well for starters, d would be easier to guess that way. A second advantage is that with digital signatures, you typically sign a document once but verify it often. This means m^d is performed once but m^e often, so you have the common task perform best while the rare task is allowed to perform poor.
Edit:
You asked whether I could further explain what schemes like RSA-PSS do in order to be secure.
When comparing what OAEP does for encryption and what PSS does for signatures, the two look pretty similar. And in fact they are, they both introduce randomization in the process, which allows for provable security of OAEP and PSS under certain assumptions. I also found this paper to be helpful. Provable security is a big advantage over old-school PKCS 1.5 encryption and signatures, which can be shown to be not provably secure under the same assumptions (key point: no deterministic scheme can be, randomization is essential). An obvious difference between the proposed signature and encryption schemes is that the signature schemes always mandate the to-be-signed message to be hashed first. This makes sense not only with regard to efficiency, but it also prevents some attacks that would otherwise be possible. And I guess that leads to the gist of why we should always use signature schemes for signatures and encryption schemes for encryption: the proposed schemes come with security proofs attached, our handmade schemes don't.
Cryptographers invent those schemes to make the lives of us mere mortals easier - they give us tools that ideally allow for no abuse or misuse by reducing the number of options to a minimum. For example, even if you managed to come up with a good signature scheme using RSA-OAEP, somebody who uses it might not know about why they should hash their messages first before applying the signature. That kind of misuse is not even a possibility with RSA-PSS.
You also asked about some good reading material. Although this is a very subjective topic, I really enjoyed these:
The practical side:
Applied Cryptography - still a classic and worth reading. Some security people say it is dangerous because it leads people to believing they know enough to write their own crypto. But I guess we are all grown-ups, aren't we? It's still great to get a feeling about "what's out there"
Cryptography Engineering - Has some good practical advice and also mentions the caveats when implementing cryptography code.
Handbook of Applied Cryptography - It's free and still has a lot of good advice especially with regard to implementations.
The theoretical side:
Modern Cryptography - It's a hybrid between theory and practice and has a lot of insight how things can go wrong in practice.
Cryptography - Theory and Practice - this was a game changer for me, I love this book. If you only ever read one book, let it be this one :)
Introduction to Modern Cryptography - does a great job at explaining "the modern approach" and how the security proofs actually work and under which assumptions.
Foundations of Cryptography I&II - if after the previous book you still can't get enough of the theory of one-way functions and friends, this is your book. Very technical.
Security is not only cryptography:
Security engineering - has numerous examples how sound principles can go wrong in practice
Information Security - Similar to Security Engineering, illustrating security in a scope wider than just cryptography.
Apart from that, I try to keep up to date by reading recent papers about new attacks, technologies etc. I found r/netsec very helpful, as well as following researchers and practitioners on Twitter, they post interesting material regularly.
Finally, if you have the time, take the Cryptography courses on Coursera and Udacity! I think they'll start over in the next few weeks, they are really great, I'm sure you won't regret it. They had a lot of practical exercises that are a lot of fun and nicely illustrate various ways to attack cryptography implementations.