Introduction to Apache BCEL
UCL
COMP0012 yue.jia@ucl.ac.uk
Java’s Execution Model
UCL
✤ ✤
Programs written in Java are compiled into a portable binary format called byte code.
Every Java class is represented in a single class file containing class related data and byte code instructions. These files are loaded dynamically into Java Virtual Machine (a.k.a. JVM) and executed.
COMP0012 f.sarro@ucl.ac.uk
JVM Architecture
UCL
COMP0012 f.sarro@ucl.ac.uk
JVM Memory Model
UCL
✤
✤ ✤
✤
Operates on primitive values and references (i.e., pointers to objects).
Has a garbage-collected heap for storing objects and arrays.
Creates a stack frame for each method call, and destroys the frame when that method exits.
Each frame provides an operand stack and an array of local variables.
COMP0012 f.sarro@ucl.ac.uk
Byte Code Instruction Set
UCL
✤
An instruction consists of a byte-long opcode specifying the operation to be performed, followed by zero or more operands.
Of the 256 possible opcodes, as of 2015, 202 are in use, 51 are reserved for future use, and 3 are permanently reserved for JVM.
✤
COMP0012 f.sarro@ucl.ac.uk
Instruction Groups
Load and store (e.g. aload_0, istore)
Arithmetic and logic (e.g. ladd, fcmpl)
Type conversion (e.g. i2b, d2i)
Object creation and manipulation (new, putfield) Operand stack management (e.g. swap, dup2) Control transfer (e.g. ifeq, goto)
Method invocation and return (e.g. invokespecial, areturn) COMP0012 f.sarro@ucl.ac.uk
UCL
✤ ✤ ✤ ✤ ✤ ✤
✤
Byte Code Manipulation Libraries
UCL
Apache Commons’ BCEL: https://
commons.apache.org/proper/commons-bcel/ ObjectWeb Consortium’s ASM: http://asm.ow2.org/ Javassist: http://www.javassist.org/
✤
✤ ✤
COMP0012 f.sarro@ucl.ac.uk
Byte Code Manipulation Libraries
UCL
✤
✤
✤
Apache Commons’ BCEL: https://
commons.apache.org/proper/commons-bcel/
ObjectWeb Consortium’s ASM: http://asm.ow2.org/
Javassist: http://www.javassist.org/
COMP0012 f.sarro@ucl.ac.uk
Byte Code Engineering Library UCL
Represents a given class in an object that contains all the symbolic information: methods, fields and byte code instructions.
Enables various activities without having access to the source code: bug finding, dead code elimination, obfuscation, etc.
✤
✤
COMP0012 f.sarro@ucl.ac.uk
The BCEL API
UCL
✤ ✤
✤ ✤
✤
Abstracts the JVM and the interactions with Java class files Mainly consists of three parts:
A package containing classes that reflect the class file format;
A package to dynamically generate or modify JavaClass or Method objects;
Various code examples and utilities like a class file viewer, a tool to convert class files into HTML, and a converter from class files to the Jasmin assembly language.
COMP0012 f.sarro@ucl.ac.uk
The classfile Package
package org.apache.bcel.classfile
UCL
COMP0012 UML diagram for the JavaClass API yue.jia@ucl.ac.uk
JavaClass
JavaClass: represents a Java byte code class
Of its various parts, we are mostly interested in:
ConstantPool: represents the collection of constants
Method: represents a method (a list of byte code instructions)
UCL
✤
✤
✤
✤
JavaClass is parsed from .class file by a ClassParser COMP0012 f.sarro@ucl.ac.uk
✤
The generic Package
package org.apache.bcel.generic
UCL
COMP0012 UML diagram of the ClassGen API yue.jia@ucl.ac.uk
ClassGen
UCL
✤
✤ ✤
✤
ClassGen: generates a Java class from parts, which include ConstantPool and Methods
ConstantPoolGen: generates the constant pool
MethodGen: generates Java methods Eventually, ClassGen outputs byte[], which is
our .class file
COMP0012 f.sarro@ucl.ac.uk
Example 1: CompilerString
Source code available on Moodle
UCL
✤
Changes all String constants into “Compiler”
COMP0012 f.sarro@ucl.ac.uk
Example 1: CompilerString
Source code available on Moodle
UCL
// load the original class into a class generator
ClassGen cgen = new ClassGen(original);
ConstantPoolGen cpgen = cgen.getConstantPool();
// get the current constant pool
ConstantPool cp = cpgen.getConstantPool();
// get the constants in the pool
Constant[] constants = cp.getConstantPool();
✤
Changes all String constants into “Compiler”
COMP0012 f.sarro@ucl.ac.uk
Example 1: CompilerString
Source code available on Moodle
UCL
✤
Changes all String constants into “Compiler”
for (int i = 0; i < constants.length; i++)
{
COMP0012
f.sarro@ucl.ac.uk
// string constants take two entries in the pool
// the first one is of ConstantString, which contains
// an index to the second entry, which is ConstantUtf8
// (displayed Asciz when disassembled by javap)
//
// ConstantUtf8 (Asciz) entries are used to store method names, etc
// whereas we are only interested in String constants
// So we first look for ConstantString entry,
// then retrieve the index of ConstantUtf8 entry, which we then replace
if (constants[i] instanceof ConstantString)
{
ConstantString cs = (ConstantString) constants[i];
cp.setConstant(cs.getStringIndex(), new ConstantUtf8("Compiler"));
}
}
Example 2: Five
Source code available on Moodle
UCL
✤
Changes any integer constants pushed to the stack to 5
// load the original class into a class generator
ClassGen cgen = new ClassGen(original);
ConstantPoolGen cpgen = cgen.getConstantPool();
// Do your optimization here
Method[] methods = cgen.getMethods();
for (Method m : methods)
{
optimizeMethod(cgen, cpgen, m); }
COMP0012
f.sarro@ucl.ac.uk
Example 2: Five
Source code available on Moodle
UCL
✤
Changes any integer constants pushed to the stack to 5
InstructionList instList = new InstructionList(methodCode.getCode());
// InstructionHandle is a wrapper for actual Instructions
for (InstructionHandle handle : instList.getInstructionHandles())
{
// if the instruction inside is iconst
if (handle.getInstruction() instanceof ICONST)
{
// insert new one with integer 5, and...
instList.insert(handle, new ICONST(5));
try
{
// delete the old one
instList.delete(handle);
}
catch (TargetLostException e)
{
// TODO Auto-generated catch block
e.printStackTrace();
}
COMP0012 }
f.sarro@ucl.ac.uk
}
Resources
UCL
Manual: https://commons.apache.org/proper/
commons-bcel/manual/manual.html
API doc: http://commons.apache.org/proper/ commons-bcel/apidocs/index.html
COMP0012 f.sarro@ucl.ac.uk