I am playing around with smali and baksmali on a small Hello World Android application I have written. My source code is:
package com.hello;
import android.app.
What you're looking at is the davlik bytecode. Java code is translated to Dalvik bytecode by the dx tool. The manifest is a separate issue which I'll get to in a minute. Effectively, when you compile your Android application, the dx tool converts your Java code into bytecode (the same way that javac converts Java to Java bytecode for a standard JVM application) using the 256 dalvik opcodes.
For example, invoke-super
is an opcode that instructs the dvm (dalvik virtual machine) to invoke a method on the super class. Similarly, invoke-interface
instructs the dvm to invoke an interface method.
So you can see that
super.onCreate(savedInstanceState);
translates to
invoke-super {p0, p1}, Landroid/app/Activity;->onCreate(Landroid/os/Bundle;)
In this case, invoke-super
takes two parameters, the {p0,p1
group and the Landroid/app/Activity;->onCreate(Landroid/os/Bundle;)
parameter which is the method specification which it uses to look up and resolve the method if necessary.
Then there's the invoke-direct
call in the constructor area.
invoke-direct {p0}, Landroid/app/Activity;-><init>()V
Every class has an init
method that is used to initialize the class's data members, also known as the constructor. When you construct a class, the virtual machine must also call the constructor of the superclass. This explains why the constructor for your class calls the Activity
constructor.
With regards to the manifest, what happens (this is all in the Dalvik specs if you check out the source code) is that the compiler (that generates the apk file) converts the manifest to a more compressed format (binary xml) for the purposes of saving space. The manifest doesn't have anything to do with the code you posted, it more instructs the dvm on how to process the application is a whole with regards to Activities
, Services
, etc. What you've posted is what actually gets executed.
That's a high-level answer to your question. If you need more, let me know and I'll do my best.
Edit You're basically right. The decompiler reads the binary data as a byte stream from the dex file. It has an understanding of what the format should be and is able to pull out information like constants, classes, etc. With regards to the opcodes, that's exactly what it does. It understand what the byte value for each opcode is (or how it's represented in the dex file) and is able to convert that into a human-readable string. If you were going to implement this, aside from understanding the general basics of compilers, I would start with a deep understanding of the structure of a dex file. From there, you would need to construct a table that matches opcode values with the human-readable string. With that information and some additional information regarding string constants, etc. you could construct a text-file representation of the compiled class. Does that make sense?
The opcode specification only describes the instructions. The dex file format is more than that - it contains all the metadata needed for the Dalvik VM (and the disassembler) to interpret the file - strings, classes, types, methods and so on. See also the official opcode spec, it's more complete and verbose than the one you linked.
<plug>
BTW, the next version of IDA Pro will support disassembly of .dex files</plug>