程序代写代做代考 compiler computer architecture C javascript RISC-V Java Compilers and computer architecture: Compiling OO language

Compilers and computer architecture: Compiling OO language
Martin Berger 1 November 2019
1Email: M.F.Berger@sussex.ac.uk, Office hours: Wed 12-13 in Chi-2R312
1/1

Recall the function of compilers
2/1

Recall the structure of compilers
Source program
Lexical analysis
Intermediate code generation
Optimisation
Syntax analysis
Semantic analysis, e.g. type checking
Code generation
Translated program
3/1

Introduction
The key ideas in object oriented programming are:
􏰉 Data(state)hidingthroughobjects,objectscarryaccess mechanisms (methods).
􏰉 Subtyping.IfBisasubclassofAthananyobjectthatisan instance of B can be used whenever and wherever an instance of A is expected.
Let’s look at an example.
4/1

A Java example
interface A { int f () { … } }
class B implements A { int f () { … } }
class C implements A { int f () { … } }

public static void main ( String [] args ) {

A a = if ( userInput == 0 ) {
new B (); }
else {
new C (); } …
a.f() // Does the compiler know which f is used?
At compile time we don’t know exactly what objects we have to invoke methods on.
5/1

Introduction
The code generator must generate code such that access (methods and instance variables ) to an object that is an instance of A must work for any subclass of A.
Indeed some subclasses of A might only become available at run-time.
So we have two questions to ask:
􏰉 Howareobjectslaidoutinmemory?
􏰉 Howismethodinvocationimplemented?
6/1

Object layout in memory
We solve these problems using the following ideas.
􏰉 Objectsarelaidoutincontiguousmemory,withpointer pointing to that memory giving us access to object.
􏰉 Eachinstancevariableisatthesameplaceinthe contiguous memory representing an object, i.e. at a fixed offset, known at compile-time, from the top of the contiguous memory representing the offset.
􏰉 Subclassinstancevariablesareadded’frombelow’.
Instance of A Instance of B
32
Header 120 Header 36 a=0 124 a=0 40 a2 = 1 128 a2 = 1
132 b = 999
7/1

Object layout in memory
Note that the the number and types of instance variables/attributes (i.e. size in memory) are available to the compiler at compile time.
class A {
int a = 0;
int a2 = 1;
int f () {
a = a + a2;
return a; } }
class B extends A {
int b = 999;
int f () { return a; }
int g () {
a = a – b + a2;
return a; } }
Instance of A
32
36 a=0 40 a2 = 1
Instance of B
120 Header 124 a=0 128 a2 = 1 132 b = 999
Header
8/1

Object layout in memory
Instance of A Instance of B Another instance of B
32
Header 120 Header 1600 Header 36 a=0 124 a=0 1604 a=7 40 a2 = 1 128 a2 = 1 1608 a2 = 12
132 b = 999 1612 b = 44
The compiler uses the same layout for every instance of a class. So if the size of the header is 4 bytes, and integers are 4 bytes, then a is always at offset 8 from the beginning of the object, and a2 is always at offset 12, both in instances of A and B, and likewise for other subclasses of A, or other header and field sizes
This ensures that every instance of B can be used where an instance of A is expected.
9/1

Object layout in memory
This also works with deeper inheritance hierarchies.
class A {
int a = 0; }
class B extends A {
int b = 1; }
class C extends B {
int c = 9;
int d = 9; }
class D extends C {
int e = 5; }
No matter what object we create, we can always find the visible fields at the same offset from the ’top’ of the object.
10/1
1600
1604
1608
1612
1616
1620
Header 1604 a=0
1608
1612
1616
1620
1600
1 1 1 1
1
A

We’ve overlooked one subtle issue
In Java and other languages you can write this:
class A { public int a = 0; }
class B extends A { public int a = 1; }
class Main {
public static void main ( String [] args ) {
A a = new A ();
B b = new B ();
A ab = new B ();
System.out.println ( “a.a = ” + a.a );
System.out.println ( “b.a = ” + b.a );
System.out.println ( “ab.a = ” + ab.a ); } }
What do you think this program outputs? Why? (Example: prog/ex3.java)
11/1

Shadowing of instance variables/attributes
The solution is twofold:
􏰉 Todeterminewhatinstancevariable/attributetoaccess, the code generator looks at the static type of the variable (available at compile-time). Note that the type of the object at run-time might be different (e.g. A ab = new B (); in the example on the last slide).
􏰉 Ifthereismorethanoneinstancevariable/attributewith the same name, we choose the one that is closest up the inheritance hierarchy.
12/1

Shadowing of instance variables/attributes (bigger example)
class A1 { a …} // defines a
class A2 extends A1 { a …} // defines a
class A3 extends A2 { a …} // defines a
class A4 extends A3 {…} // doesn’t define a
class A5 extends A4 { a …} // defines a
class A6 extends A5 {…} // doesn’t define a
class A7 extends A6 {…} // doesn’t define a
class A8 extends A7 {…} // doesn’t define a
class A9 extends A8 {…} // doesn’t define a
class A10 extends A9 { a …} // defines a

A7 x = new A10 ()

print ( x.a ) // prints A5’s a
(Example: ex5.java)
13/1

Shadowing of instance variables/attributes
Do you think Java’s shadowing is a good idea?
What alternative approaches would you recommend?
14/1

Multiple inheritance
Some OO language (e.g. C++, but not Java) allow multiple inheritance.
class A {
int a = 0; }
class B {
int b = 2; }
class C extends A, B {
int c = 9; }
Now we have two possibilities for laying out objects that are instances of C in memory.
15/1

Multiple inheritance
Now we have two possibilities for laying out objects that are instances of C in memory.
Either way is fine, as long as we always use the same choice!
1600 Header 1600 Header 1604 a=0 1604 b=1 1608 b=1 1608 a=0 1612 c=9 1612 c=9
16/1

Multiple inheritance: diamond inheritance
However with multiple inheritance the compiler must must be careful because attributes/instance variables and methods can be inherited more than once:
class A { int a = 0; }
class B extends A{
int a = 2; }
class C extends A {
int a = 9; }
class D extends B, C { int a = 11; … }
Should D contain a once, twice, thrice, four or five times? To avoid such complications, Java and other languages prohibit multiple inheritance.
A
BC
D
17/1

Quick question
Language like Java have visibility restrictions (private, protected, public).
How does the code generator handle those?
Answer: not at all, they are enforced by semantic analysis (type checking).
18/1

Summary
Inheritance relationships
class A { a … }
class B extends A { b … }
class C extends A { c … }
give rise to the following object layouts.
Note that we can access a in the same way in instances of A, B and C just by using the offset from the top of the (contiguous memory region representing the) object.
Instance of A Instance of B Instance of C
Header Header Header aaa bc
19/1

Methods
We have now learned how to deal with object instance variables/attributes, what about methods? We need to deal with two questions:
􏰉 Howtogeneratethecodeforthemethodbody?
􏰉 Where/howtostoremethodcodetoensuredynamic
dispatch works?
We begin with the former.
20/1

Compilation of method bodies
We have already learned how to generate code for procedures (static methods). Clearly (non-static) methods are very similar to procedures.
Why don’t we reuse the code generator for methods? Can you imagine how this works?
21/1

Compilation of methods by reduction to procedures
Consider the following Java definition:
class A {
int n = 10;
int f ( int x ) = { n = n+1; return x+n; } }
What’s the difference between
a.f(7) f(a, 7)
22/1

Compilation of methods by reduction to procedures
We see an invocation a.f(7) as a normal procedure invocation taking two arguments, with the additional argument being (a pointer to) the object a that we invoke the method on.
int f_A ( A this, int x ) = {
this.n = this.n + 1;
return x + this.n }
So ’under the hood’ the compiler generates a procedure f_A for each method f in each class A. The object (this in Java) becomes nothing but normal a procedure parameter in f_A. Each access to a instance variable n in the body of f is converted to an access a.n to the field holding b in the contiguous memory representing the object. Now we can reuse the code generator for procedures, with one caveat.
23/1

Where does the method body code go?
The only two issues left to resolve are
􏰉 Howtofindtheactualmethodbody? 􏰉 Wheretostoremethodbodies?
Any ideas?
Finding methods is easy: just access them (like fields) at fixed offset from the header, known at compile-time.
24/1

Where does the method body code go? First idea
Put them all in the contiguous memory with the instance variables/attributes.
class A {
int a = 0;
int b = 1;
int f () = …
int g ( int x ) =

Note that f_A and g_A are normal procedures with an additional argument as described above.
Can you see the problem with this solution?
Instance of A
Header a
b
is really
Instance of A
Other header data Code for f_A Code for g_A
a
b
25/1

Where does the method body code go? Second idea
Problem: massive code duplication! But methods are the same for each object of the same class.
Instead we could share the method bodies between objects, and let the instances (at fixed offset) contain pointers the shared method bodies.
Instance of A
Other header data Pointer to f_A Pointer to g_A
a
b
Method table
Code for f_A Code for g_A
Instance of A
Other header data
Pointer to f_A Pointer to g_A a
b
Can you see the problem with this solution?
26/1

Where does the method body code go? Third idea
The problem with the second approach is that we are still wasting memory: imagine a class with 100 methods, and 1.000.000 instances. With 32 bit pointers (RISC-V) that’s a whopping 381 MBytes! Wasteful.
A much better solution is to add a layer of indirection, and have just one pointer (at fixed offset) per object to a table that contains (at fixed offset) a pointer to each method’s code (remaining headers omitted for readability).
27/1
Instances of A
mtptr
a mtptr
ba mtptr
ba b
Method table
Pointer to f_A Pointer to g_A
Method bodies
Code for f_A Code for g_A

Where does the method body code go? Third idea
This is the choice taken in practice as far as I know, because it is much more memory efficient.
The main disadvantage is that the ’pointer chasing’ in the invocation of a method is (slightly) more time-consuming than with the first two ideas.
Just-in-time compilers (e.g. the JVM, Microsoft’s CLR, Chrome’s V8) employ clever tricks to ameliorate this shortcoming.
28/1

Inheritance and the third idea
Does this also work when inheritance is involved?
class A {
int a = 0;
int f ( int x ) {…}
int g ( int x ) {…} }
class B extends A {
int b = 7;
int f ( int x ) {…} }
29/1
Instances of A
mtptr
dptr mtptr
a mtptr
a Instances of B
mtptr
b mtptr
b
Method table for A
Pointer to f_A Pointer to g_A
Method table for B
Pointer to f_B Pointer to g_A
Method bodies
Code for f_A Code for g_A
Method bodies Code for f_B

Inheritance and the third idea
Instances of A
mtptr
dptr mtptr
a mtptr
a Instances of B
mtptr
b mtptr
ba b
Method table for A
Pointer to f_A Pointer to g_A
Method table for B
Pointer to f_B Pointer to g_A
Method bodies
Code for f_A Code for g_A
Method bodies Code for f_B
The key insight (as with the layout of the object in contiguous memory) is that the pointer to the method table is always at fixed offset from the top of the object, and pointers to method bodies are always at fixed offset in the method table, e.g.:
􏰉 fisalwaysatoffset0inthemethodtable.
􏰉 gisalwaysatoffset4inthemethodtable(assuming32bit
pointers.
30/1

Translation of method invocation
Instances of A mtptr
dptr
amtptr a
Instances of B mtptr
b
Method table for A Pointer to f_A Pointer to g_A
Method table for B
Pointer to f_B Pointer to g_A
Method bodies Code for f_A Code for g_A
Method bodies Code for f_B
mtptr
mtptr
b
b
a
To translate a.f(3) where, for example, a has type A at compile time, but points to an instance of B at run-time, we do the following.
􏰉 Getthepointermptrtothemethodtableforthecurrent instance of a (find by fixed offset available at compile time).
􏰉 Inthemethodtablefindthepointerptothebodyoff_B (also by fixed offset available at compile time).
􏰉 CreatetheARjustlikeforanyotherprocedure,supplyinga pointer to the object a as first argument.
􏰉 Jumptopthebodyoftheproceduref_B.
31/1

Example
What does this program print out?
class A {
void f () { System.out.println ( “A::f” ); }
void g () { System.out.println ( “A::g” ); } }
class B extends A {
void f () { System.out.println ( “B::f” ); } }
class Main {
public static void main ( String [] args ) {
A a = new A ();
B b = new B ();
A ab = new B ();
a.f ();
a.g ();
b.f ();
b.g ();
ab.f ();
ab.g (); } }
(Example: prog/ex4.java)
32/1

Reflection
Languages like Java and Python enable reflection, that gives you information about the type of an object at run-time. You can do things like this:
import java.lang.reflect.Method;
class A {}
class B extends A {}
class Main {
public static void main ( String [] args ) {
A a = new A ();
B b = new B ();
A ab = new B ();
try {
System.out.println ( a.getClass().getName() );
System.out.println ( b.getClass().getName() );
System.out.println ( ab.getClass().getName() ); }
catch (Exception ioe){ System.out.println(ioe); } } }
What does this return? (Example: prog/refl.java)
33/1

Reflection
This can be implemented by creating a memory area at fixed offset, containing a description of each class, and each methods for reflection containing pointers to that description.
Alternatively, make all classes inherit from base class with suitable getClassName method or similar. Overwrite the getClassName method for each class (ideally automatically).
Instance of A
Pointer to method table Pointer to description of A …
Description of A Class name “A”
Method descriptions …
Body of f_A
Uses
f_A … getClass
34/1

OO a la Javascript
We have learned how object oriented languages like Java are translated. They exhibit class-based object orientation.
There is also prototype-based object orientation. This is used in Javascript. The ideas seen here can be adapted to prototype-based object orientation.
35/1