Assignment 2
cpe 453 Winter 2020
In the beginning there was data. The data was without form and
null, and darkness was upon the face of the console; and the Spirit of IBM was moving over the face of the market. And DEC said, Let there be registers; and there were registers. And DEC saw that they carried; and DEC separated the data from the instructions. DEC called the data Stack, and the instructions they called Code. And there was evening and there was morning, one interrupt.
Rico Tudor, The Story of Creation or, The Myth of Urk
usrgamesfortune
Due by 11:59:59pm, Wednesday, January 29th. This assignment may be done with a partner.
Program: Support for Lightweight Processes liblwp.so
This assignment requires you to implement support for lightweight processes threads under linux using the GNU C Compilergcc. A lightweight process is an independent thread of control sequence of executed instructionsexecuting in the same address space as other lightweight pro cesses. Here you will implement a nonpreemptive userlevel thread package.
This comes down to writing nine functions, described briefly in Table 1, and in more detail below.
lwp createfunction,argument,stacksize create a new LWP
lwp gettidvoid return thread ID of the calling LWP
lwp exitvoid terminates the calling LWP
lwp yieldvoid yield the CPU to another LWP
lwp startvoid start the LWP system
lwp stopvoid stop the LWP system
lwp set schedulerscheduler install a new scheduling function
lwp get schedulervoid find out what the current scheduler is
tid2threadtid map a thread ID to a context
Table 1: The functions necessary to support threads
The Big Picture
Most of the magic will be in lwp create. The job of lwp create is to set up a threads context so that when it is selected by the scheduler to run and one of lwp yield, lwp start, lwp exit returns1 to it, it will start executing where you want it to.
With a few minor differences, youll find that lwp yield, lwp start, lwp stop, and lwp exit all more or less do the same thing: save one context, pick a thread to run, and load that threads context.
1 This is important: none of these thread functionsthe ones that are passed to lwp create to form the program of the new threadare ever called. They are returned to.
1
Things to know
Everything in the rest of this document is intended to provide information needed to implement a lightweight processing package for a 64bit Intel x86 64 CPU compiling with gcc. This is the environment found on the Linux desktop machines in the CSL and unix16.csc.calpoly.edu.
Context: What defines a thread
Before we build a thread support library, we need to consider what defines a thread. Threads exist in the same memory as each other, so they can share their code and data segments, but each thread needs its own registers and stack to hold local data, function parameters, and return addresses.
Registers
The x86 64 CPU doing only integer arithmetic2 has sixteen registers of interest, shown in Table 2.
Table 2: Integer registers of the x86 64 CPU
Since C has no way of naming registers, I have provided some useful tools below that will allow you to access these registers.
First there are two macros, Get SP and Set SP, which will allow you to get or set the rsp register directly.
Second, is an assembly language file, magic64.S3 which contains a function void swap rfilesrfile old, rfile new. This does two things:
1. if old ! NULL it saves the current values of all 16 registers to the struct registers pointed to by old.
2. if new ! NULL it loads the 16 register values contained in the struct registers pointed to by new into the registers.
For convenience, there are also two macros, load contextc and save contextc that do half of a swap should you find that desireable.
To assemble magic64.S, use gcc:
gcc o magic64.o c magic64.S
The whole function can be seen in Figure 3.
2As well as a bunch more for floating point, but we arent going to talk about those here. Swaprfilessaves them, though.
3For what its worth, if an assembly file ends in .S, the compiler will run it through the C preprocesser. If its .s, it wont.
rax General Purpose A rbx General Purpose B rcx General Purpose C rdx General Purpose D rsi Source Index
rdi Destination Index rbp Base Pointer
rsp Stack Pointer
r8 General Purpose 8
r9 General Purpose 9
r10 General Purpose 10
r11 General Purpose 11
r12 General Purpose 12
r13 General Purpose 13
r14 General Purpose 14
r15 General Purpose 15
2
Floating Point State
As we said above, in addition to the registers, swap rfiles also preserves the state of the x87 Floating Point UnitFPU. This is stored in the last element of the struct rfile, the struct fxsave called fxsave. This structure holds all the FPU state. Important: when you initialize your threads register file, you will have to initialize this structure to the predefined value FPU INIT like so:
newthreadstate.fxsaveFPU INIT;
Stack structure: The gcc calling convention
The extra registers available to the x86 64 allow it to pass some parameters in registers. This makes the overall calling convention a little more complicated, but, in practice, it will be easier for your program since you wont be passing enough parameters to push you out of the registers onto the stack.
The steps of the convention are as follows illustrated in Figures 1af:
a. Before the call Caller places the first six integer arguments into registers rdi, rsi, rdx, rcx, r8, and r9. If there are more, they are pushed onto the stack in reverse order. This is shown in the figure, but you wont encounter more in this assignment.
b. After the call The call instruction has pushed the return address onto the stack.
c. Before the function body Before the body of a function executes it needs to set up its stack frame that will hold any parameters and local variables that will fit into the registers. To do this, it will execute the following two instructions to set up its frame:
pushq rbp
movq rsp,rbp
Then, it may adjust the stack pointer to leave room for any locals it may need.
d. Before the return Before returning, the function needs to clean up after itself. To do this,
before returning it executes a leave instruction. This instruction is equivalent to: movq rbp,rsp
popq rbp
The effect is to rewind the stack back to its state right after the call.
e. After the return After the return, the Return address has been popped off the stack, leaving it looking just like it did before the call.
Remember, the ret instruction, while called return, really means pop the top of the stack into the program counter.
f. After the cleanup Finally, the caller pops off any parameters on the stack and leaves the stack is just like it was before.
LWP system architecture
Everything you need is defined in lwp.h, fp.h and magic64.S, two of which are included in Figures 2 and 3 for the third, see Supplied Code later on.
At the heart of lwp.h is the definition of a struct threadinfo st which defines a threads context. This contains:
3
Arg 6 caller push
Arg 7 caller push
.
Arg n caller push
rsp 00 08
.
original rsp Original TOS rbp somewhere
a Before the call
Return Address
Arg 6 caller push
Arg 7 caller push
.
Arg n caller push
rsp
00 08 16
.
Original TOS rbp somewhere
b After the call
local variables .
somewhereold BP
Return Address
Arg 6 caller push
Arg 7 caller push
.
Arg n caller push
rsp rbp
k .
00 08 16 24
.
Original TOS
c Before function body
Return Address
Arg 6 caller push
Arg 7 caller push
.
Arg n caller push
rsp
00 08 16
.
Original TOS rbp somewhere
d Before the return
Arg 6 caller push
Arg 7 caller push
.
Arg n caller push
rsp
original rsp Original TOS rbp somewhere
e After the return
rsp Original TOS rbp somewhere
f After the cleanup
Figure 1: Stack development Remember that the real stack is upsidedown
4
5
ifndef LWPH
define LWPH include systypes.h
context;
typedef void lwpfunvoid ; type for lwp function
ifndef TRUE define TRUE 1 endif
ifndef FALSE define FALSE 0 endif
Tuple that describes a scheduler typedef struct scheduler
if defined x86 64 include fp.h
scheduler;
lwp functions
typedef struct
attribute
aligned16
attribute
packed
extern tid t lwp createlwpfun,void ,size t; extern void lwp exitvoid;
extern tid t lwp gettidvoid;
extern void lwp yieldvoid;
registers
unsigned long rax; unsigned long rbx; unsigned long rcx; unsigned long rdx; unsigned long rsi; unsigned long rdi; unsigned long rbp; unsigned long rsp; unsigned long r8; unsigned long r9; unsigned long r10; unsigned long r11; unsigned long r12; unsigned long r13; unsigned long r14; unsigned long r15; struct fxsave fxsave;
the sixteen architecturallyvisible regs.
space to save floating point state error This only works on x8664 for now
if defined x86 64 X86 only code
define BAILSIGNAL SIGSTKFLT
define GetSPsp asmmovq rsp,0: r sp :
define SetSPsp asmmovq 0,rsp: : r sp
rfile; else
endif
typedef unsigned long tid t;
else END x86 only code
error This stack manipulation code can only be compiled on an x86
define NO THREAD 0
an always invalid thread id
40
if defined APPLE
undef BAILSIGNAL
define BAILSIGNAL SIGABRT
typedef struct threadinfo st thread; typedef struct threadinfo st
endif
tid t tid; unsigned long stack;
lightweight process id
Base of allocated stack
prototypes for asm functions
define load contextc swap rfilesNULL,c define save contextc swap rfilesc,NULL void swap rfilesrfile , rfile to;
size t rfile thread thread thread thread
stacksize; state;
Size of allocated stack saved registers Two pointers reserved
lib one; lib two; sched one; sched two;
for use by the library
100
Two more for schedulers to use
50
endif
Figure 2: Definitions and prototypes for LWP: lwp.h
10
20
30
Sets the stack pointer to the current value of the 80 given variable.
extern void lwp startvoid;
extern void lwp stopvoid;
extern void lwp set schedulerscheduler fun; extern scheduler lwp get schedulervoid; extern thread tid2threadtid t tid;
70
Macros for stack pointer manipulation:
GetSPvar SetSPvar
Sets the given variable to the current value of the stack pointer.
endif
initialize any structures
tear down any structures
void initvoid;
void shutdownvoid;
void admitthread new;
void removethread victim; remove a thread from the pool
add a thread to the pool thread nextvoid; select a thread to schedule
60
90
.text
.globl swap rfiles
.type swap rfiles, function
swap rfiles:
void swap rfilesrfile old, rfile new
old will be in rdi
new will be in rsi
load:
cmpq 0,rsi 40 je done
pushq rbp movq rsp,rbp
set up a frame pointer 10
save the old context if old ! NULL
cmpq 0,rdi je load
movq rax, rdi store rax into oldrax so we can use it Now store the Floating Point State
leaq 128rdi,rax fxsave rax
movq rbx, 8rdi movq rcx, 16rdi movq rdx, 24rdi movq rsi, 32rdi movq rdi, 40rdi movq rbp, 48rdi movq rsp, 56rdi movq r8, 64rdi movq r9, 72rdi movq r10, 80rdi movq r11, 88rdi movq r12, 96rdi movq r13,104rdi movq r14,112rdi movq r15,120rdi
get the address 20 now the rest of the registers
load the new one if new ! NULL
First restore the Floating Point State
leaq 128rsi,rax fxrstor rax
get the address
retreive rax from newrax etc.
must do rsi last, since its our pointer
movq movq movq movq movq movq movq movq movq movq movq movq movq movq movq movq
rsi,rax 8rsi,rbx 16rsi,rcx 24rsi,rdx 40rsi,rdi 48rsi,rbp 56rsi,rsp
64rsi,r8 72rsi,r9 80rsi,r10 88rsi,r11 96rsi,r12
104rsi,r13 112rsi,r14 120rsi,r15 32rsi,rsi
50
60
done: leave ret
etc.
Figure 3: magic64.S: Store one register file and load another
6
30
The threads thread ID. This must be a unique integer that stays the same for the lifetime of the thread. Its what a thread may use to identify itself. NO THREAD is defined to be 0 and is always invalid. You may assume that there will never be more than 264 2 threads, so a counter is just fine.
A pointer to the base of the threads allocated stack space so that it can later be freed.
A struct registers that contains a copy of all the threads stored registers.
Four pointers:
lib one and lib two are reserved for the use of the library internally. E.g., to maintain a global linked list of all threads for implementing tid2thread.
sched one and sched two are reserved for use by schedulers. Together, these hold all the state we need for each thread.
Scheduling
The lwp librarys default scheduling policy is round robin, but client code can install its own scheduler with lwp set scheduler. The lwp scheduler type is a pointer to a structure that holds pointers to five functions. These are:
void initvoid This is to be called before any threads are admitted to the scheduler. Its to allow the scheduler to set up. This one is allowed, to be NULL, so dont call it if it is.
void shutdownvoid This is to be called when the lwp library is done with a scheduler to allow it to clean up. This, too, is allowed, to be NULL, so dont call it if it is.
void admitthread new Add the passed context to the schedulers scheduling pool.
void removethread victim Remove the passed context from the schedulers scheduling pool. thread next Return the next thread to be run or NULL if there isnt one.
Changing schedulers will involve initializing the new one, pulling out all the threads from the old one using next and remove and admitting them to the new one with admit, then shutting down the old scheduler.
A note on function pointers:
Remember, the name of a function is its address, so you can pass a pointer to a function just by using its name. For example, my round robin scheduler is defined like so:
static struct scheduler rr publish NULL, NULL, rr admit, rr remove, rr next; scheduler RoundRobin rr publish;
Calling a function pointer is just a matter of dereferencing it and applying it to an argument. E.g.:
thread nxt;
nxt RoundRobinnext
7
How to get started
1. Write the default round robin scheduler. This consists almost entirely of keeping a list, and then you will have a scheduler.
2. Then, in lwp create:
a Allocate a stack for each LWP.
b Build a stack frame on the stack so that when that context is loaded it will properly return to the lwps function with the stack and registers arranged as it will expect. This involves making the stack look as if the thread called you and was suspended.
3. When lwp start is called:
a save the real context somewhere where lwp stop can find it,
b pick one of the lightweight processes to run by calling the scheduler.
c Load its context with swap rfiles and you should be off and running.
Remember, what you are trying to do is to build a context so that when lwp yield selects it, loads its registers, and returns, it starts executing its very first instruction with the stack pointer pointing to a stack that looks like it had just been called.
If the arguments fit into registers, this will simply be:
rsp 00
Original TOS 08 rbp somewhere
But what is this return address? Its supposed to be the place where the thread function should go back to after its done, but it didnt come from anywhere. Use lwp exit. That way either it calls lwp exit or it returns there, but one way or the other when its done, lwp exit will be called.
The semantics of the individual functions are defined in Table 3.
Tricks, Tools, and Useful Notes
a segmentation violation may mean
a stack overflow.
stack corruption
all the other usual causes
Use the CSL linux machines
But I really want to use my Mac.
Ok. . . but there are a few things that are different about doing this under OSX:
OSX requires all stack frames to be 16byte aligned.
Dynamic libraries have the suffix .dylib
The path the loader searches for dynamic libraries is DYLD LIBRARY PATH.
It is possible to compile multiple architectures of library into a single .dylib file. See lido1 for details.
Return Address
8
lwp createfunction,argument,stacksize;
Creates a new lightweight process which executes the given func tion with the given argument. The new processess stack will be stacksize words.
lwp create returns the lightweight thread id of the new thread or 1 if the thread cannot be created.
lwp gettidvoid;
Returns the tid of the calling LWP or NO THREAD if not called by a
LWP.
lwp yieldvoid;
Yields control to another LWP. Which one depends on the sched uler. Saves the current LWPs context, picks the next one, restores that threads context, and returns.
lwp exitvoid;
Terminates the current LWP and frees its resources. Calls schednext to get the next thread. If there are no other threads, restores the original system thread.
lwp startvoid;
Starts the LWP system. Saves the original context for lwp stop to use later, picks a LWP and starts it running. If there are no LWPs, returns immediately.
lwp stopvoid;
Stops the LWP system, restores the original stack pointer and re turns to that context. Wherever lwp start was called from. lwp stop does not destroy any existing contexts, and thread processing will be restarted by a call to lwp start.
lwp set schedulerscheduler;
Causes the LWP package to use the given scheduler to choose the
next process to run. Transfers all threads from the old scheduler to the new one in next order. If scheduler is NULL the library should return to roundrobin scheduling.
lwp get schedulervoid;
Returns the pointer to the current scheduler.
tid2threadtid;
Returns the thread corresponding to the given thread ID, or NULL if the ID is invalid
Table 3: The LWP functions
9
Finally, youll need to be sure it compiles and runs on Linux, since thats where itll be graded.
If you want to find out what your compiler is really doing, use the gcc S switch to dump the assembly output.
gcc S foo.c will produce foo.s containing all the assembly.
lwp exit is probably the second trickiest part of this assignment because you must be careful not to free a stack that youre still using.
Remember that stacks start in high memory and grow towards low memory.
Also remember that pointer arithmetic is done in terms of the size of the thing pointedto.
Instructions for building and using shared libraries are included in Asgn1 if you need to review.
Note that lwp stop does not necessarily mean stop forever. It should be possible to call lwp stop then later call lwp start to restart.
Finally, remember that there doesnt have to be a next thread. If schednext returns NULL, lwp yield, lwp exit, and lwp start should restore the original system context. as lwp stop does.
Supplied Code
There are several pieces of supplied code along with this assignment, all available on the CSL machines in pncs453GivenAsgn2.
File DescriptionLocation
Note: When linking with libsnakes.a it is also necessary to link with the standard library ncurses using lncurses on the link line. Ncurses is a library that supports text terminal manip ulation.
Coding Standards and Make
See the pages on coding standards and make on the cpe 453 class web page.
lwp.h
fp.h
libPLN.a libsnakes.a magic64.S snakes.h hungrymain.c snakemain.c numbersmain.c
Header file for lwp.c
Header file for preserving floating point state precompiled library of lwp functions for testing precompiled library of snake functions
ASM source for swap rfiles
header file for snake functions
demo program for hungry snakes
demo program for wandering snakes
demo program with indented numbers
10
What to turn in
Submit via handin to the asgn2 directory of the pncs453 account:
your welldocumented source files.
Your header file, lwp.h, suitable for inclusion with other programs. This must be compatabile with the distributed one, but you may extend it.
A makefile called Makefile that will build liblwp.so from your source when invoked with no target or with the target liblwp.so.
A README file that contains:
Your names, including your login names in parentheses e.g. pnico. Any special instructions.
Any other thing you want me to know while I am grading it.
The README file should be plain text, i.e, not a Word document, and should be named README, all capitals with no extension.
Sample runs
We did these in class. If you want, though, you can use the provided libPLN.a to build your own samples.
11