I\'ve been given to understand that Python is an interpreted language...
However, when I look at my Python source code I see .pyc
files, w
These are created by the Python interpreter when a .py
file is imported, and they contain the "compiled bytecode" of the imported module/program, the idea being that the "translation" from source code to bytecode (which only needs to be done once) can be skipped on subsequent import
s if the .pyc
is newer than the corresponding .py
file, thus speeding startup a little. But it's still interpreted.
I've been given to understand that Python is an interpreted language...
This popular meme is incorrect, or, rather, constructed upon a misunderstanding of (natural) language levels: a similar mistake would be to say "the Bible is a hardcover book". Let me explain that simile...
"The Bible" is "a book" in the sense of being a class of (actual, physical objects identified as) books; the books identified as "copies of the Bible" are supposed to have something fundamental in common (the contents, although even those can be in different languages, with different acceptable translations, levels of footnotes and other annotations) -- however, those books are perfectly well allowed to differ in a myriad of aspects that are not considered fundamental -- kind of binding, color of binding, font(s) used in the printing, illustrations if any, wide writable margins or not, numbers and kinds of builtin bookmarks, and so on, and so forth.
It's quite possible that a typical printing of the Bible would indeed be in hardcover binding -- after all, it's a book that's typically meant to be read over and over, bookmarked at several places, thumbed through looking for given chapter-and-verse pointers, etc, etc, and a good hardcover binding can make a given copy last longer under such use. However, these are mundane (practical) issues that cannot be used to determine whether a given actual book object is a copy of the Bible or not: paperback printings are perfectly possible!
Similarly, Python is "a language" in the sense of defining a class of language implementations which must all be similar in some fundamental respects (syntax, most semantics except those parts of those where they're explicitly allowed to differ) but are fully allowed to differ in just about every "implementation" detail -- including how they deal with the source files they're given, whether they compile the sources to some lower level forms (and, if so, which form -- and whether they save such compiled forms, to disk or elsewhere), how they execute said forms, and so forth.
The classical implementation, CPython, is often called just "Python" for short -- but it's just one of several production-quality implementations, side by side with Microsoft's IronPython (which compiles to CLR codes, i.e., ".NET"), Jython (which compiles to JVM codes), PyPy (which is written in Python itself and can compile to a huge variety of "back-end" forms including "just-in-time" generated machine language). They're all Python (=="implementations of the Python language") just like many superficially different book objects can all be Bibles (=="copies of The Bible").
If you're interested in CPython specifically: it compiles the source files into a Python-specific lower-level form (known as "bytecode"), does so automatically when needed (when there is no bytecode file corresponding to a source file, or the bytecode file is older than the source or compiled by a different Python version), usually saves the bytecode files to disk (to avoid recompiling them in the future). OTOH IronPython will typically compile to CLR codes (saving them to disk or not, depending) and Jython to JVM codes (saving them to disk or not -- it will use the .class
extension if it does save them).
These lower level forms are then executed by appropriate "virtual machines" also known as "interpreters" -- the CPython VM, the .Net runtime, the Java VM (aka JVM), as appropriate.
So, in this sense (what do typical implementations do), Python is an "interpreted language" if and only if C# and Java are: all of them have a typical implementation strategy of producing bytecode first, then executing it via a VM/interpreter.
More likely the focus is on how "heavy", slow, and high-ceremony the compilation process is. CPython is designed to compile as fast as possible, as lightweight as possible, with as little ceremony as feasible -- the compiler does very little error checking and optimization, so it can run fast and in small amounts of memory, which in turns lets it be run automatically and transparently whenever needed, without the user even needing to be aware that there is a compilation going on, most of the time. Java and C# typically accept more work during compilation (and therefore don't perform automatic compilation) in order to check errors more thoroughly and perform more optimizations. It's a continuum of gray scales, not a black or white situation, and it would be utterly arbitrary to put a threshold at some given level and say that only above that level you call it "compilation"!-)
Python code goes through 2 stages. First step compiles the code into .pyc files which is actually a bytecode. Then this .pyc file(bytecode) is interpreted using CPython interpreter. Please refer to this link. Here process of code compilation and execution is explained in easy terms.
tldr; it's a converted code form the source code, which the python VM interprets for execution.
Bottom-up understanding: the final stage of any program is to run/execute the program's instructions on the hardware/machine. So here are the stages preceding execution:
Executing/running on CPU
Converting bytcode to machine code.
Machine code is the final stage of conversion.
Instructions to be executed on CPU are given in machine code. Machine code can be executed directly by CPU.
Converting Bytecode to machine code.
Converting Source code to bytcode.
Now the actual plot. There are two approaches when carrying any of these stages: convert [or execute] a code all at once (aka compile) and convert [or execute] the code line by line (aka interpret).
For example, we could compile a source code to bytcoe, compile bytecode to machine code, interpret machine code for execution.
Some implementations of languages skip stage 3 for efficiency, i.e. compile source code into machine code and then interpret machine code for execution.
Some implementations skip all middle steps and interpret the source code directly for execution.
Modern languages often involve both compiling an interpreting.
JAVA for example, compile source code to bytcode [that is how JAVA source is stored, as a bytcode], compile bytcode to machine code [using JVM], and interpret machine code for execution. [Thus JVM is implemented differently for different OSs, but the same JAVA source code could be executed on different OS that have JVM installed.]
Python for example, compile source code to bytcode [usually found as .pyc files accompanying the .py source codes], compile bytocde to machine code [done by a virtual machine such as PVM and the result is an executable file], interpret the machine code/executable for execution.
When we can say that a language is interpreted or compiled?
Therefore, JAVA and Python are interpreted languages.
A confusion might occur because of the third stage, that's converting bytcode to machine code. Often this is done using a software called a virtual machine. The confusion occurs because a virtual machine acts like a machine, but it's actually not! Virtual machines are introduced for portability, having a VM on any REAL machine will allow us to execute the same source code. The approach used in most VMs [that's the third stage] is compiling, thus some people would say it's a compiled language. For the importance of VMs, we often say that such languages are both compiled and interpreted.
Python's *.py file is just a text file in which you write some lines of code. When you try to execute this file using say "python filename.py"
This command invokes Python Virtual Machine. Python Virtual Machine has 2 components: "compiler" and "interpreter". Interpreter cannot directly read the text in *.py file, so this text is first converted into a byte code which is targeted to the PVM (not hardware but PVM). PVM executes this byte code. *.pyc file is also generated, as part of running it which performs your import operation on file in shell or in some other file.
If this *.pyc file is already generated then every next time you run/execute your *.py file, system directly loads your *.pyc file which won't need any compilation(This will save you some machine cycles of processor).
Once the *.pyc file is generated, there is no need of *.py file, unless you edit it.
Python (at least the most common implementation of it) follows a pattern of compiling the original source to byte codes, then interpreting the byte codes on a virtual machine. This means (again, the most common implementation) is neither a pure interpreter nor a pure compiler.
The other side of this is, however, that the compilation process is mostly hidden -- the .pyc files are basically treated like a cache; they speed things up, but you normally don't have to be aware of them at all. It automatically invalidates and re-loads them (re-compiles the source code) when necessary based on file time/date stamps.
About the only time I've seen a problem with this was when a compiled bytecode file somehow got a timestamp well into the future, which meant it always looked newer than the source file. Since it looked newer, the source file was never recompiled, so no matter what changes you made, they were ignored...