Some basic steps:
- parse: Reads a set of *.java source files and maps the resulting token
sequence into AST (Abstract Syntax Tree)-Nodes.
- enter: Enters symbols for the definitions into the symbol table.
- process annotations: If Requested, processes annotations found in
the specifed compilation units.
- attribute: Attributes the Syntax trees. This step includes name
resolution, type checking and constant folding.
- flow: Performs dataflow analysis on the trees from the previous step.
This includes checks for assignments and reachability.
- desugar: Rewrites the AST and translates away some syntactic sugar.
- generate: Generates Source Files or Class Files.
In more details:
- Lex - Break the source file into individual words, or tokens.
- Parse - Analyze the phrase structure of the program.
- Semantic Actions - Build a piece of abstract syntax tree corresponding to each phrase.
- Semantic Analysis - Determine what each phrase means, relate uses of variables to their definitions, check types of expressions, request translation of each phrase.
- Frame Layout - Place variables, function-parameters, etc. into activation records (stack frames) in a machine-dependent way.
- Translate - Produce intermediate representation trees (IR trees), a notation
that is not tied to any particular source language or targetmachine architecture.
- Canonicalize - Hoist side effects out of expressions, and clean up conditional branches, for the convenience of the next phases.
- Instruction Selection - Group the IR-tree nodes into clumps that correspond to the actions of target-machine instructions.
Control Flow Analysis - Analyze the sequence of instructions into a control flow graph that shows all the possible flows of control the program might
follow when it executes.
Dataflow Analysis - Gather information about the flow of information through variables of the program; for example, liveness analysis calculates the places where each program variable holds a still-needed value (is live).
Register Allocation - Choose a register to hold each of the variables and temporary values used by the program; variables not live at the same time
can share the same register.
Code Emission - Replace the temporary names in each machine instruction with
machine registers.
There is a nice book:
Modern Compiler Implementation in Java
You may want to look inside javac code:
Javac Documentation
OpenJDK source code
Hacker's guide to javac
Don't Panic! To help newcomers to javac navigate their way around the code base
JVM JLS