CSCI-GA.2250-001
Operating Systems
Processes and Threads
Details of Lecture
• Process Model
• Process Creation ( fork , exec )
• Signals
• Process State / Transition Models
• Multi-programming
• Threads
OS Management
of Application Execution
• Resources are made available to multiple applications
• A “processor” can only run one unit of execution
(process/thread) at a time.
• The processor is switched among multiple applications so all will appear to be progressing (albeit potentially at reduced speed)
[ this is called a “context switch”, more later]
• The processor and I/O devices can be used efficiently
– When application performs I/O, the processor can be used for a different application.
Program code
Resides on storage
(just a sequence of bytes [ code , data ])
Processor executes the code
When the processor begins to execute the program code, we refer to this executing entity as a process
What is exactly a “processor” from an OS perspective
OS can schedule an independent unit of execution onto each Hardware-Thread (HW-Thread)
Each HW-thread expose register set (typically,
32 integer regs
32 floating point regs special regs
[ see ABI ]
CPU-0
Core-0
HW HW thread0 thread1
FXU, IXPU
Core-1
HW thread0
FXU, IXPU
HW thread1
CPU-1
Core-2
HW HW thread0 thread1
FXU, IXPU
Core-3
HW thread0
FXU, IXPU
HW thread1
What Is a Process?
An abstraction of a running program
Program Counter (points at current instruction)
The Process Model
• A process has a program, input, output, and state (data).
• A process is an instance of an executing program and includes
– Variables ( memory )
– Code
– Program counter ( really hardware resource) – Registers ( – “ – )
–…
If a program is running twice, does it count as two processes? or one?
Process: a running program
• A process includes
– Process State (state, registers save areas)
• Open files, thread(s) state, resources held
– Address space (view of its private memory)
All state is accessible as an entry in the process (entry) table
• A process tree
– A created two child processes, B and C
– B created three child processes, D, E, and F
Address Space
• Defines where sections of data and code are located in 32 or 64 address space
• Defines protection of such sections
• ReadOnly, ReadWrite, Execute
• Confined “private” addressing concept
mmap mmap
Data
Text/Code
➔ requires form of address virtualization ( will get to it during
memory memory management )
Termination
Creation
Process
Implementation
State
Process Creation
• System initialization
– At boot time
– Foreground
– Background (daemons)
• Execution of a process creation system call by a running process
• A user request
• A batch job
• Created by OS to provide a service
• Interactive logon
Process Termination
• Normal exit (voluntary)
• Error exit (voluntary)
• Fatal error (involuntary)
• Killed by another process (involuntary)
Process Termination: More Scenarios
Implementation of Processes
• OS maintains a process table Process procs[];
• An array (or a hash table) of structures
• One entry per process ( pid is the uniq id)
procs[pid];
Implementation of Processes: Process Control Block (PCB)
▪ Contains the process elements
▪ It is possible to interrupt a running process and later resume execution as if the interruption had not occurred → state
▪ Created and managed by the operating system
▪ Key tool that allows support for multiple processes
OS objects related to process
Conceptual view of the tables that OS maintains
in order to manage execution of processes on resources.
Though there is one PCB per process, there is a “mesh” of multiple objects to allow
for sharing between processes where appropriate
– Memory tables, io-tables, ….
Linux Process (Task) object
• Linux Process ~= struct task
~1.5KB + 2*stacks
( user + kernel )
Need to separate
to ensure isolation between Kernel and User
(recall systemcall)
user: stack growth till out of space Kernel: 4KB or 8KB
Some Unix Details
• Creation of a new process: fork()
• Executing a program in that new process • Signal notifications
• The kernel boot manually creates ONE process (the init process (pid=0)),
all other processes are created by fork()).
fork()
Only LOGICALLY duplicates the whole process
(will cover in MemMgmt)
!
Before
parent
parent
child
After
#include
#include
int main(int argc, char **argv)
{
pid_t pid = fork(); if (pid == 0)
{
// child process
}
else if (pid > 0)
{
// parent process
// fork failed
syscall that creates new PCB and duplicates Address Space
Child runs now here
Parent continues here
fork()
}
else
{
return 1; }
}
printf(“fork() failed!\n”);
execv()
#include
#include
int main(int argc, char **argv) {
pid_t pid = fork(); if (pid == 0)
{
execv(path,executablename)
}
else if (pid > 0) {
int status;
waitpid(pid,&status,option)
else
{
return -1; }
Creates new PCB and Address Space
Child starts new program image of new process this will NEVER return but for an error
Parent waits for child process to finish
execv()
}
// fork failed
printf(“fork() failed!\n”);
Fork , exec, wait (and their variants) are system calls
pid is the object handle that the operating system returns for a process
clone()
See: http://man7.org/linux/man-pages/man2/clone.2.html
For what things can be clone.
This is the bases on how threads are created.
Linux/Unix internally uses objects to create processes / threads
How these objects interrelate depends on fork/clone calls
Signal()
• Means to “signal a process” (i.e. get its attention).
• There are a set of signals that can be sent to a process (some require permissions).
• Process indicates
which signal it wants to catch and provides a call back function
• When the signal is to be sent (event or “kill
Signal()
#include
#include
#include
void sig_handler(int signo) {
if (signo == SIGINT)
printf(“received SIGINT\n”);
int main(void) {
// install the handler
if (signal(SIGINT, sig_handler) == SIG_ERR) printf(“\ncan’t catch SIGINT\n”);
// A long long wait so that we can easily // issue a signal to this process
while(1) sleep(1);
return 0; }
When signal is delivered:
* kernel stops threads in process
* kernel “adds a stack frame”
* kernel “switches IPC to sig_handler
* kernel continues process
* Process will continue with sig_handler * Process on completion will call
}
back to kernel
A few basics to remember
• fork(), exec(), wait() and it’s variants are OS system calls to create and manipulate processes
• PIDs are handles that the operating system identifies life processes by
• There are user processes and system processes
Process State Model
• Depending on the implementation, there can be several possible state models.
• The Simplest one: Two-state diagram
Process State Model: Three-State Model
Process State Model: Five-State Model
Using Queues to Manage Processes
Using Queues to Manage Processes
One Extra State!
Swapped to disk
Whole process and its “address space” is moved to disk
One Extra State!
Multiprogramming
• One CPU and several processes
• CPU switches from process to process quickly
What Really Happens
What We Think It Happens
Running the same program several times will not result in the same execution times due to:
– interrupts
– multi-programming
Simple Modeling of Multiprogramming
• A process spends fraction p waiting for I/O
• Assume n processes in memory at once
• The probability that all processes are waiting for I/O at once is pn
• So -> CPU Utilization = 1 – pn
Degree of Multiprogramming
Multiprogramming lets processes use the CPU when it would otherwise become idle.
How to do multiprogramming
• Really a question of how to increase concurrency (e.g. multi-core system) and overlap I/O with computation.
• Example Webserver:
– If single process, every system call that
blocks will block forward progress – Let’s discuss !!!!!!!
Solution #1
• Multiple Processes • What’s the issue ?
– Resource consumption
• Each process has its own address space
( code, stack, heap, files, ….. )
– Who owns perceived single resource:
• E.g. webserver port 80 / 1080 / 8080 ????
Threads
• Multiple threads of control within a process
– unique execution
• All threads of a process share the same address space and resources
(with exception of stack)
Why Threads?
• For some applications many activities can happen at once
– With threads, programming becomes easier
• Otherwise application needs to actively manage different
logical executions in the process
• This requires significant state management
– Benefit applications with I/O and processing that can overlap
• Lighterweightthanprocesses
– Faster to create and restore
( we just really need a stack and an execution unit, but don’t have create new address space etc. )
Example : Multithreaded Web Server
Processes vs. Threads
OS internal Data Structure implications:
Processes vs Threads
• Process groups resources – (Address Space, files)
• Threads are entities scheduled for execution on CPU
• Threads can be in any of several states: running, blocked, ready, and terminated ( remember the process state model ? )
• Noprotectionsamongthreads(unlike processes) [Why?]→thisisimportant
Processes vs Threads
• The unit of dispatching is referred to as a thread or lightweight process (lwp)
• The unit of resource ownership is referred to as a process or task
(unfortunately in linux struct task represents both a process and thread)
• Multithreading – The ability of an OS to support multiple, concurrent paths of execution within a single process
Processes vs Threads
• Process is the unit for resource allocation and a unit of protection.
• Process has its own (one) address space.
• A thread has:
– an execution state (Running, Ready, etc.)
– saved thread context when not running
– an execution stack
– some per-thread static storage for local variables
– access to the memory and resources of its process (all threads of a process share this)
Multithreading on Uniprocessor System
Where to Put The Thread Implementation / Package?
User space Kernel space
Discussed in previous slides
Kernel-Level Threads (KLTs)
• Thread management is done by the kernel
• no thread management is done by the application
Kernel-Level Threads (KLTs)
Advantages Disadvantages
• The kernel can simultaneously schedule multiple threads from the same process on multiple processors
• If one thread in a process is blocked, the kernel can schedule another thread of the same process
• Kernel routines can be multithreaded
• The transfer of control from one thread to another within the same process requires a mode switch to the kernel
Implementing Threads in Kernel Space
• Kernel knows about and manages the threads
• No runtime is needed in each process
• Creating/destroying/(other thread related operations) a thread involves a system call
Implementing Threads in Kernel Space
Advantages Disadvantages
• When a thread blocks (due to page fault or blocking system calls) the OS can execute another thread from the same process
• Scalability
( operating systems had limited memory dedicated to them )
• Cost of system call is very high
Disagree because if you want to implement interruption to do thread scheduling you have to use “signal(SIGVTALARM)” which is much more expensive.
User-Level Threads (ULT)
• All thread management is done by the application
• Initially developed to run on kernels that are not multithreading capable.
• The kernel is not aware of the existence of threads
Implementing Threads in User Space
• Threads are implemented by a library
• Kernel knows nothing about threads
• Each process needs its own private thread table in userspace
• Thread table is managed by the runtime system
User-Level Threads (ULTs)
• The kernel continues to schedule the process as a unit and assigns a single execution state .
User-Level Threads (ULTs)
Advantages
• Thread switch does not require kernel-mode.
• Scheduling (of threads) can be application specific.
• Can run on any OS.
• Scales better.
Disadvantages
• A system-call by one thread can block all threads of that process.
• Page fault blocks the whole process.
• In pure ULT, multithreading cannot take advantage of multiprocessing.
Combined (Hybrid) Approach
• Thread creation is done completely in user space.
• Bulk of scheduling and synchronization of threads is by the application (i.e. user space).
• Multiple ULTs from a single application are mapped onto (smaller or equal) number of KLTs.
• Solaris is an example
PCB vs TCB
• Process Control Block handles global process resources
• Thread Control Block handles thread execution resources
• pids vs. tid
Different Naming Conventions
• Thread Models are also knows as general ratio of user threads over kernels threads
• 1:1 : each user thread == kernel thread • M:1 : user level thread mode
• M:N : hybrid model
How are threads created ?
int pthread_create(pthread_t *thread,
const pthread_attr_t *attr,
void *(*start_routine) (void *), void *arg );
Assuming 1:1 model:
a) Allocates a new stack via malloc
b) Calls clone() to create a new schedulable thread
c) Sets the threads stack pointer to (a) d) Calls (*start_routine)(arg)
Thread State Model:
• What changes to the state model in a kernel based thread model ?
• Really replace “process” with “thread” and you are basically there.
• Often we interchangeably use thread and process scheduling.
Thread
#> sed –e “s/[P|p]rocess/thread/g”
Context Switch
• Scenarios:
1) Currentprocess(orthread)blocksOR 2) Preemption.
•
Operation(s) to be done
– Must release CPU resources (registers)
– Requires storing “all” non privileged registers
to the PCB or TCB save area
– Tricky as you need registers to do this
– All written in assembler
– Typically an architecture has a few privileged registers so the kernel can accomplish this
CtxSwitch: Process Blocks
0xFFFFFFFF
int call_io() {
char buffer[128];
return read(0,buffer,128); }
main() { call_io();
}
Process
userstack
main
get_buf
Glibc_read
Marshall arguments + #sys-call-read
1
asm(“trap”)
2
Task struct
prio, proc, memmaps
Reg_save_area
5
kernelstack
syscall_table
3
int sys_read() {
:
: wait_for_comp()
}
4
Save Registers Call Scheduler Restore Registers
IntrService
User Space
Kernel Space
0xFFFFFFFF
User Space
Kernel Space
}45 76
CtxSwitch: Preemption
IntrService
1 2
timerstack
int intr_timer {
Save Registers Call Scheduler Restore Regs
Task struct
prio, proc, 3 memmaps
Reg_save_area
Task struct
prio, proc, memmaps
Reg_save_area
timerstack
int intr_timer {
Save Registers Call Scheduler Restore Regs
}
Conclusions
• Process is one the most central concepts in Operating Systems
• Process vs. Thread (understand difference) – Process is a resource container with at least one
– Thread is a unit of execution that lives in a process (no thread without a process)
– Threads share the resources of the owning process. • Multiprogramming vs multithreading
thread of execution