Internal Architecture of Java Compiler [closed]

心已入冬 提交于 2019-12-21 09:30:14

问题


I have been working on Java from more than 8 years.

Last week, in a small meeting in my company, one of my colleague has asked me how exactly does Java Compiler work? I was with no answer.

I tried explaining, like Java Compiler takes statements one by one and converts them to byte code that is not targeted to any OS but to JVM.

No one satisfied with that answer even me.

Now the main question is how exactly java compiler works. i.e. How many steps or stages or phases are there which will be done by the compiler in case of compiling a Java file.

What exactly the Java's compiler architecture?

What if there are multiple Java classes in same .java file. Then how many classes will be compiled.

What if there are imports pointing to un-compiled Java classes? Then the un-compiled classes be compiled or ignored?

I googled for more than half a day and all are providing same answer as I gave to my colleagues.

But finally I found some useful tutorial here.

But the tutorial also covering not too in-depth and I could not visualize that tutorial.

Still I am not satisfied and eager to learn something more about this from you.

So if any one knows something more than me and the above blog, something by using which I can visualize what exactly the internal architecture of Java Compiler please explain me.


回答1:


Some basic steps:

  1. parse: Reads a set of *.java source files and maps the resulting token sequence into AST (Abstract Syntax Tree)-Nodes.
  2. enter: Enters symbols for the definitions into the symbol table.
  3. process annotations: If Requested, processes annotations found in the specifed compilation units.
  4. attribute: Attributes the Syntax trees. This step includes name resolution, type checking and constant folding.
  5. flow: Performs dataflow analysis on the trees from the previous step. This includes checks for assignments and reachability.
  6. desugar: Rewrites the AST and translates away some syntactic sugar.
  7. generate: Generates Source Files or Class Files.

In more details:

  1. Lex - Break the source file into individual words, or tokens.
  2. Parse - Analyze the phrase structure of the program.
  3. Semantic Actions - Build a piece of abstract syntax tree corresponding to each phrase.
  4. Semantic Analysis - Determine what each phrase means, relate uses of variables to their definitions, check types of expressions, request translation of each phrase.
  5. Frame Layout - Place variables, function-parameters, etc. into activation records (stack frames) in a machine-dependent way.
  6. Translate - Produce intermediate representation trees (IR trees), a notation that is not tied to any particular source language or targetmachine architecture.
  7. Canonicalize - Hoist side effects out of expressions, and clean up conditional branches, for the convenience of the next phases.
  8. Instruction Selection - Group the IR-tree nodes into clumps that correspond to the actions of target-machine instructions.
  9. Control Flow Analysis - Analyze the sequence of instructions into a control flow graph that shows all the possible flows of control the program might follow when it executes.

  10. Dataflow Analysis - Gather information about the flow of information through variables of the program; for example, liveness analysis calculates the places where each program variable holds a still-needed value (is live).

  11. Register Allocation - Choose a register to hold each of the variables and temporary values used by the program; variables not live at the same time can share the same register.

  12. Code Emission - Replace the temporary names in each machine instruction with machine registers.

There is a nice book:

Modern Compiler Implementation in Java

You may want to look inside javac code:

Javac Documentation

OpenJDK source code

Hacker's guide to javac

Don't Panic! To help newcomers to javac navigate their way around the code base

JVM JLS




回答2:


There are different steps on a compiler but here are the most important:

Lexical analysis First step is the lexical analysis. Basically this steps extract tokens from java code (keywords, operators, separators, comments, variable names...)

Syntax analysis (parser) The second step is the syntax analysis. Tokens are taken as input from lexical analysis and are combined to form expressions and instructions.

Optimization and conversion to byte code The last macro step is converting the previous step to byte code. Here the code can be modified to be equivalent to the original code but more efficient.


Note: This process is not related only to java, but it is common to all compilers. Also compilers that don't generate an intermediate byte code but a machine code (like compilers for C or C++).

Generally there are tools to create a lexical analyzer and a syntax analyzer because this steps have many commons parts between different languages.

An open source lexical analizer is flex A useful syntactic analizer is yacc

Both works with C and C++ that are the most used languages to create compilers (java and others too), but there are also similar alternatives for other programming languages (to create a compiler in another language, not for another language). Basically language in which a compiler is written is not related to the language the compiler compiles.



来源:https://stackoverflow.com/questions/32779189/internal-architecture-of-java-compiler

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!