程序代写代做代考 computer architecture compiler Java python javascript mips Compilers and computer architecture: Compiling OO language

Compilers and computer architecture: Compiling OO language
Martin Berger
November 2015

Recall the function of compilers

Recall the structure of compilers
Source program
Lexical analysis
Intermediate code generation
Optimisation
Syntax analysis
Semantic analysis, e.g. type checking
Code generation
Translated program

Introduction
The key ideas in object oriented programming are:

Introduction
The key ideas in object oriented programming are:
􏹩 Data(state)hidingthroughobjects,objectscarryaccess mechanisms (methods).

Introduction
The key ideas in object oriented programming are:
􏹩 Data(state)hidingthroughobjects,objectscarryaccess mechanisms (methods).
􏹩 Subtyping.IfBisasubclassofAthananyobjectthatisan instance of B can be used whenever and wherever an instance of A is expected.

Introduction
The key ideas in object oriented programming are:
􏹩 Data(state)hidingthroughobjects,objectscarryaccess mechanisms (methods).
􏹩 Subtyping.IfBisasubclassofAthananyobjectthatisan instance of B can be used whenever and wherever an instance of A is expected.
Let’s look at an example.

A Java example
interface A { int f () { … } }
class B implements A { int f () { … } }
class C implements A { int f () { … } }

public static void main ( String [] args ) {

A a = if ( userInput == 0 ) {
new B (); }
else {
new C (); } …
a.f() // Does the compiler know which f is used?

A Java example
interface A { int f () { … } }
class B implements A { int f () { … } }
class C implements A { int f () { … } }

public static void main ( String [] args ) {

A a = if ( userInput == 0 ) {
new B (); }
else {
new C (); } …
a.f() // Does the compiler know which f is used?
At compile time we don’t know exactly what objects we have to invoke methods on.

Introduction
The code generator must generate code such that access to an object that is an instance of A must work for any subclass of A.

Introduction
The code generator must generate code such that access to an object that is an instance of A must work for any subclass of A.
Indeed some subclasses of A might only become available at run-time.

Introduction
The code generator must generate code such that access to an object that is an instance of A must work for any subclass of A.
Indeed some subclasses of A might only become available at run-time.
So we have two questions to ask:

Introduction
The code generator must generate code such that access to an object that is an instance of A must work for any subclass of A.
Indeed some subclasses of A might only become available at run-time.
So we have two questions to ask:
􏹩 Howareobjectslaidoutinmemory?

Introduction
The code generator must generate code such that access to an object that is an instance of A must work for any subclass of A.
Indeed some subclasses of A might only become available at run-time.
So we have two questions to ask:
􏹩 Howareobjectslaidoutinmemory?
􏹩 Howismethodinvocation(dynamicdispatch,the characteristic features of using objects) implemented?

Object layout in memory
We solve these problems using the following ideas.

Object layout in memory
We solve these problems using the following ideas. 􏹩 Objectsarelaidoutincontiguousmemory.

Object layout in memory
We solve these problems using the following ideas.
􏹩 Objectsarelaidoutincontiguousmemory.
􏹩 Eachmember/attributeisatthesameplaceinthe contiguous memory representing an object, i.e. at a fixed offset, known at compile-time, from the start of the contiguous memory. Cf. to how variables etc are at fixed offset from frame pointer in AR.

Object layout in memory
We solve these problems using the following ideas.
􏹩 Objectsarelaidoutincontiguousmemory.
􏹩 Eachmember/attributeisatthesameplaceinthe contiguous memory representing an object, i.e. at a fixed offset, known at compile-time, from the start of the contiguous memory. Cf. to how variables etc are at fixed offset from frame pointer in AR.
􏹩 subclassmembersareadded’frombelow’.

Object layout in memory
We solve these problems using the following ideas.
􏹩 Objectsarelaidoutincontiguousmemory.
􏹩 Eachmember/attributeisatthesameplaceinthe contiguous memory representing an object, i.e. at a fixed offset, known at compile-time, from the start of the contiguous memory. Cf. to how variables etc are at fixed offset from frame pointer in AR.
􏹩 subclassmembersareadded’frombelow’.
Instance of A Instance of B
32
Header 120 Header 36 a=0 124 a=0 40 a2 = 1 128 a2 = 1
132 b = 999

Object layout in memory
Note that the the number and types of members/attributes (i.e. size in memory) are available to the compiler at compile time.
class A {
int a = 0;
int a2 = 1;
int f () {
a = a + a2;
return a; } }
class B extends A {
int b = 2;
int f () { return a; }
int g () {
a = a – b + a2;
return a; } }
Instance of A
32
36 a=0 40 a2 = 1
Instance of B
120 Header 124 a=0 128 a2 = 1 132 b = 999
Header

Object layout in memory
Instance of A Instance of B Another instance of B
32
Header 120 Header 1600 Header 36 a=0 124 a=0 1604 a=7 40 a2 = 1 128 a2 = 1 1608 a2 = 12
132 b = 999 1612 b = 44
The compiler uses the same layout for every instance of a class. So if the size of the header is 4 bytes, and integers are 4 bytes, then a is always at offset 8 from the beginning of the object, and a2 is always at offset 12, both in instances of A and B, and likewise for other subclasses of A, or other header and field sizes

Object layout in memory
Instance of A Instance of B Another instance of B
32
Header 120 Header 1600 Header 36 a=0 124 a=0 1604 a=7 40 a2 = 1 128 a2 = 1 1608 a2 = 12
132 b = 999 1612 b = 44
The compiler uses the same layout for every instance of a class. So if the size of the header is 4 bytes, and integers are 4 bytes, then a is always at offset 8 from the beginning of the object, and a2 is always at offset 12, both in instances of A and B, and likewise for other subclasses of A, or other header and field sizes
This ensures that every instance of B can be used where an instance of A is expected.

Object layout in memory
This also works with deeper inheritance hierarchies.
class A {
int a = 0; }
class B extends A {
int b = 2; }
class C extends B {
int c = 9;
int d = 9; }
class D extends C {
int e = 5; }
1600
1604
1608
1612
1616
1620

Object layout in memory
This also works with deeper inheritance hierarchies.
class A {
int a = 0; }
class B extends A {
int b = 2; }
class C extends B {
int c = 9;
int d = 9; }
class D extends C {
int e = 5; }
Header 1604 a=0
1608
1612
1616
1620
1600
A

Object layout in memory
This also works with deeper inheritance hierarchies.
class A {
int a = 0; }
class B extends A {
int b = 2; }
class C extends B {
int c = 9;
int d = 9; }
class D extends C {
int e = 5; }
Header 1604 a=0 1608 b=1
1612
1616 1620
1600
B A

Object layout in memory
This also works with deeper inheritance hierarchies.
class A {
int a = 0; }
class B extends A {
int b = 2; }
class C extends B {
int c = 9;
int d = 9; }
class D extends C {
int e = 5; }
1600 Header 1604 a=0 1608 b=1 1612 c=9 1616 d=9 1620
A
C B

Object layout in memory
This also works with deeper inheritance hierarchies.
class A {
int a = 0; }
class B extends A {
int b = 2; }
class C extends B {
int c = 9;
int d = 9; }
class D extends C {
int e = 5; }
Header 1604 a=0 1608 b=1 1612 c=9 1616 d=9 1620 e=5
1600
B A
D C

Object layout in memory
This also works with deeper inheritance hierarchies.
class A {
int a = 0; }
class B extends A {
int b = 2; }
class C extends B {
int c = 9;
int d = 9; }
class D extends C {
int e = 5; }
No matter what object we create, we can always find the visible fields at the same offset from the ’top’ of the object.
Header 1604 a=0 1608 b=1 1612 c=9 1616 d=9 1620 e=5
1600
B A
D C

We’ve overlooked one subtle issue
In Java and other languages you can write this:
class A { public int a = 0; }
class B extends A { public int a = 1; }
class Main {
public static void main ( String [] args ) {
A a = new A ();
B b = new B ();
A ab = new B ();
System.out.println ( “a.a = ” + a.a );
System.out.println ( “b.a = ” + b.a );
System.out.println ( “ab.a = ” + ab.a ); } }
What do you think this program outputs? Why? (prog/ex3.java)

Shadowing of members/attributes
The solution is twofold:
􏹩 Todeterminewhatmember/attributedtoaccess,thecode generator looks at the static type of the variable (available at compile-time). Note that the type of the object at run-time might be different (e.g. A ab = new B (); in the example on the last slide).
􏹩 Ifthereismorethanonemember/attributewiththesame name, we choose the one that is closest up the inheritance hierarchy.

Shadowing of members/attributes (example)
class A1 { a …} // defines a
class A2 extends A1 { a …} // defines a
class A3 extends A2 { a …} // defines a
class A4 extends A3 {…} // doesn’t define a
class A5 extends A4 { a …} // defines a
class A6 extends A5 {…} // doesn’t define a
class A7 extends A6 {…} // doesn’t define a
class A8 extends A7 {…} // doesn’t define a
class A9 extends A8 {…} // doesn’t define a
class A10 extends A9 { a …} // defines a

A7 x = new A10 ()

print ( x.a ) // prints A5’s a
(prog/ex5.java)

Multiple inheritance
Some OO language (e.g. C++, but not Java) allow multiple inheritance.
class A {
int a = 0; }
class B {
int b = 2; }
class C extends A, B {
int c = 9; }
Now we have two possibilities for laying out objects that are instances of C in memory.

Multiple inheritance
Now we have two possibilities for laying out objects that are instances of C in memory.
1600 Header 1600 Header 1604 a=0 1604 b=1 1608 b=1 1608 a=0 1612 c=9 1612 c=9

Multiple inheritance
Now we have two possibilities for laying out objects that are instances of C in memory.
Either way is fine, as long as we always use the same choice! (Cf. driving on the left vs driving on the right.)
1600 Header 1600 Header 1604 a=0 1604 b=1 1608 b=1 1608 a=0 1612 c=9 1612 c=9

Multiple inheritance: diamond inheritance
However with multiple inheritance the compiler must must be careful because attributes/members and methods can be inherited more than once:
class A { int a = 0; }
class B extends A{
int a = 2; }
class C extends A {
int a = 9; }
class D extends B, C { int a = 11; … }
A
BC
D

Multiple inheritance: diamond inheritance
However with multiple inheritance the compiler must must be careful because attributes/members and methods can be inherited more than once:
class A { int a = 0; }
class B extends A{
int a = 2; }
class C extends A {
int a = 9; }
class D extends B, C { int a = 11; … } Should D contain a once, twice, thrice, four or five times?
A
BC
D

Multiple inheritance: diamond inheritance
However with multiple inheritance the compiler must must be careful because attributes/members and methods can be inherited more than once:
class A { int a = 0; }
class B extends A{
int a = 2; }
class C extends A {
int a = 9; }
class D extends B, C { int a = 11; … }
Should D contain a once, twice, thrice, four or five times? To avoid such complications, Java and other languages prohibit multiple inheritance.
A
BC
D

Quick question
Language like Java have visibility restrictions (private, protected, public).

Quick question
Language like Java have visibility restrictions (private, protected, public).
How does the code generator handle those?

Quick question
Language like Java have visibility restrictions (private, protected, public).
How does the code generator handle those?
Answer: not at all, they are enforced by semantic analysis (type checking).

Summary
Inheritance relationships
class A { a … }
class B extends A { b … }
class C extends A { c … }
give rise to the following object layouts.
Note that we can access a in the same way in instances of A, B and C just by using the offset from the top of the (contiguous memory region representing the) object.
Instance of A Instance of B Instance of C
Header Header Header aaa bc

Methods
We have now learned how to deal with object members/attributes, what about methods?

Methods
We have now learned how to deal with object members/attributes, what about methods? We need to deal with two questions:
􏹩 Howtogeneratethecodeforthemethodbody?
􏹩 Where/howtostoremethodcodetoensuredynamic dispatch works?

Methods
We have now learned how to deal with object members/attributes, what about methods? We need to deal with two questions:
􏹩 Howtogeneratethecodeforthemethodbody?
􏹩 Where/howtostoremethodcodetoensuredynamic
dispatch works?
We begin with the former.

Compilation of method bodies
We have already learned how to generate code for procedures (static methods). Clearly (non-static) methods are very similar to procedures.

Compilation of method bodies
We have already learned how to generate code for procedures (static methods). Clearly (non-static) methods are very similar to procedures.
Why don’t we reuse the code generator for methods?

Compilation of method bodies
We have already learned how to generate code for procedures (static methods). Clearly (non-static) methods are very similar to procedures.
Why don’t we reuse the code generator for methods? Can you imagine how this works?

Compilation of methods by reduction to procedures
Consider the following Java definition:
class A {
int n = 10;
int f ( int x ) = { n = n+1; return x+n; } }

Compilation of methods by reduction to procedures
Consider the following Java definition:
class A {
int n = 10;
int f ( int x ) = { n = n+1; return x+n; } }
We see an invocation a.f(7) as a normal procedure invocation taking two arguments, with the additional argument being the object a that we invoke the method on:
int f_A ( A this, int x ) = {
this.n = this.n + 1;
return x + this.n }
So ’under the hood’ the compiler generates a procedure f_A for each method f in each class A. The object (this in Java) becomes nothing but normal a procedure parameter in f_A. Each access to a member n in the body of f is converted to an access a.n to the field holding b in the contiguous memory representing the object.

Compilation of methods by reduction to procedures
Consider the following Java definition:
class A {
int n = 10;
int f ( int x ) = { n = n+1; return x+n; } }
We see an invocation a.f(7) as a normal procedure invocation taking two arguments, with the additional argument being the object a that we invoke the method on:
int f_A ( A this, int x ) = {
this.n = this.n + 1;
return x + this.n }
So ’under the hood’ the compiler generates a procedure f_A for each method f in each class A. The object (this in Java) becomes nothing but normal a procedure parameter in f_A. Each access to a member n in the body of f is converted to an access a.n to the field holding b in the contiguous memory representing the object. Now we can reuse the code generator for procedures.

Translation of method invocation
If we translate method declarations into procedure declarations, what do we do with method invocations?

Translation of method invocation
If we translate method declarations into procedure declarations, what do we do with method invocations? Does this work?
a.f( arg_1, arg_2, …, arg_n )
to a procedure call
f_A ( a, arg_1, arg_2, …, arg_n )

Translation of method invocation
If we translate method declarations into procedure declarations, what do we do with method invocations? Does this work?
a.f( arg_1, arg_2, …, arg_n )
to a procedure call
f_A ( a, arg_1, arg_2, …, arg_n )
Almost, except that we don’t actually know at compile-time what the precise run-time type of a is. This is because of sub-typing. But apart from this it works.

Where does the method body code go?
The only two issue left to resolve are
􏹩 Howtofindtheactualmethodbody, 􏹩 wheretostoremethodbodies.
Any ideas?

Where does the method body code go?
The only two issue left to resolve are
􏹩 Howtofindtheactualmethodbody, 􏹩 wheretostoremethodbodies.
Any ideas?
Finding methods is easy: just access them (like fields) at fixed offset from the header, known at compile-time.

Where does the method body code go? First idea
Put them all in the contiguous memory with the members/attributes.
class A {
int a = 0;
int b = 1;
int f () = …
int g ( int x ) =

Note that f_A and g_A are normal procedures with an additional argument as described above.
Instance of A
Header a
b
is really
Instance of A
Other header data Code for f_A Code for g_A
a
b

Where does the method body code go? First idea
Put them all in the contiguous memory with the members/attributes.
class A {
int a = 0;
int b = 1;
int f () = …
int g ( int x ) =

Note that f_A and g_A are normal procedures with an additional argument as described above.
Can you see the problem with this solution?
Instance of A
Header a
b
is really
Instance of A
Other header data Code for f_A Code for g_A
a
b

Where does the method body code go? Second idea
The problem with the previous approach was massive code duplication! But methods are the same for each object of the same class.

Where does the method body code go? Second idea
The problem with the previous approach was massive code duplication! But methods are the same for each object of the same class.
Instead we could share the method bodies between objects, and let the instances (at fixed offset) contain pointers the shared method bodies.
Instance of A
Other header data Pointer to f_A Pointer to g_A
a
b
Method table
Code for f_A Code for g_A
Instance of A
Other header data
Pointer to f_A Pointer to g_A a
b

Where does the method body code go? Second idea
The problem with the previous approach was massive code duplication! But methods are the same for each object of the same class.
Instead we could share the method bodies between objects, and let the instances (at fixed offset) contain pointers the shared method bodies.
Instance of A
Other header data Pointer to f_A Pointer to g_A
a
b
Method table
Code for f_A Code for g_A
Instance of A
Other header data
Pointer to f_A Pointer to g_A a
b
Can you see the problem with this solution?

Where does the method body code go? Third idea
The problem with the second approach is that we are still wasting memory: imagine a class with 100 methods, and 1.000.000 instances. With 32 bit pointers (MIPS) that’s a whopping 381 MBytes! Wasteful.
A much better solution is to add a layer of indirection, and have just one pointer (at fixed offset) per object to a table that contains (at fixed offset) a pointer to each method’s code (remaining headers omitted for readability).
Instances of A
mtptr
a mtptr
ba mtptr
ba b
Method table
Pointer to f_A Pointer to g_A
Method bodies
Code for f_A Code for g_A

Where does the method body code go? Third idea
This is the choice taken in practice as far as I know, because it is much more memory efficient.

Where does the method body code go? Third idea
This is the choice taken in practice as far as I know, because it is much more memory efficient.
The main disadvantage is that the ’pointer chasing’ in the invocation of a method is (slightly) more time-consuming than with the first two ideas.
Just-in-time compilers (e.g. the JVM, Microsoft’s CLR, Chrome’s V8) employ clever tricks to ameliorate this shortcoming.

Inheritance and the third idea
Does this also work when inheritance is involved?
class A {
int a = 0;
int f ( int x ) {…}
int g ( int x ) {…} }
class B extends A {
int b = 7;
int f ( int x ) {…} }
Instances of A
mtptr
dptr mtptr
a mtptr
a Instances of B
mtptr
b mtptr
b
Method table for A
Pointer to f_A Pointer to g_A
Method table for B
Pointer to f_B Pointer to g_A
Method bodies
Code for f_A Code for g_A
Method bodies Code for f_B

Inheritance and the third idea
Instances of A
mtptr
dptr mtptr
a mtptr
a Instances of B
mtptr
b mtptr
ba b
Method table for A
Pointer to f_A Pointer to g_A
Method table for B
Pointer to f_B Pointer to g_A
Method bodies
Code for f_A Code for g_A
Method bodies Code for f_B
The key insight (as with the layout of the object in contiguous memory) is that the pointer to the method table is always at fixed offset from the top of the object, and pointers to method bodies are always at fixed offset in the method table, e.g.:
􏹩 fisalwaysatoffset0inthemethodtable.
􏹩 gisalwaysatoffset4inthemethodtable(assuming32bit pointers.

Translation of method invocation
Instances of A mtptr
dptr
amtptr a
Instances of B mtptr
b
Method table for A Pointer to f_A Pointer to g_A
Method table for B
Pointer to f_B Pointer to g_A
Method bodies Code for f_A Code for g_A
Method bodies Code for f_B
mtptr
mtptr
b
b
a
To translate a.f(3) where, for example, a has type A at compile time, but points to an instance of B at run-time, we do the following.
􏹩 Getthepointerdptrtothemethodtableforthecurrent instance of a (find by fixed offset available at compile time).
􏹩 Inthemethodtablefindthepointerptothebodyoff_B (also by fixed offset available at compile time).
􏹩 CreatetheARjustlikeforanyotherprocedure,supplyinga pointer to the object a as first argument.
􏹩 Invokejal(jumpandlink)topthebodyoftheprocedure f_B.

Example
What does this program print out (prog/ex4.java)?
class A {
void f () { System.out.println ( “A::f” ); }
void g () { System.out.println ( “A::g” ); } }
class B extends A {
void f () { System.out.println ( “B::f” ); } }
class Main {
public static void main ( String [] args ) {
A a = new A ();
B b = new B ();
A ab = new B ();
a.f ();
a.g ();
b.f ();
b.g ();
ab.f ();
ab.g (); } }

Reflection
Languages like Java and Python enable reflection, that gives you information about the type of an object at run-time. You can do funky things like this:
import java.lang.reflect.Method;
class A {}
class B extends A {}
class Main {
public static void main ( String [] args ) {
A a = new A ();
B b = new B ();
A ab = new B ();
try {
System.out.println ( a.getClass().getName() );
System.out.println ( b.getClass().getName() );
System.out.println ( ab.getClass().getName() ); }
catch (Exception ioe){ System.out.println(ioe); } } } What does this return? (prog/refl.java)

Reflection
This can be implemented by creating a memory area at fixed offset, containing a description of each class, and each methods for reflection containing pointers to that description.
For languages that don’t have reflection, this need not be implemented.
Instance of A
Pointer to method table …
Description of A Class name “A”
Method descriptions …
Body of f_A
Uses
f_A … getClass

OO a la Javascript
We have learned how object oriented languages like Java are translated. They exhibit class-based object orientation.

OO a la Javascript
We have learned how object oriented languages like Java are translated. They exhibit class-based object orientation.
There is also prototype-based object orientation. This is used in Javascript. The ideas seen here can be adapted to prototype-based object orientation.