CS代考 ECE391- Computer System Engineering

ECE391- Computer System Engineering
University of Illinois at Urbana- Champaign
Fall 2021
Lecture 17 Processes

Announcements
• MP3
• Checkpoint 1: 5:59PM on Tuesday 10/19
• Exam 2
• 10/28/2021, 7-9 PM, ECEB 1002 • Conflicts by 10/22/2021
• No Class 10/28/2021

Virtualization
• Virtualization is the process of creating a software- based, or virtual, representation of something, such as virtual applications, servers, [memory, cpu, ] storage and networks.
• How do we virtualize?
https://www.vmware.com/solutions/virtualization.html

Sharing
• What did we just share last lecture? How? What else might we virtualize?

Goals
• Gain familiarity with the concept of Multitasking, Context switching
• Understand the User/Application view of processes using LINUX as an example
• Application functions (e.g., fork, exit, wait, execve) • Familiarize yourself with shells
• Understand the System Software view of processes using LINUX as an example
• Basic data structures (e.g., task struct)
• Organization (e.g., double-linked lists, hashes) • Function calls
• Appreciate the x86 hardware view of tasks and their use in LINUX

Processes
User/Application View

Processes
• Definition: A process is an instance of a running program.
• One of the most profound ideas in computer science • Not the same as “program” or “processor”
• Process provides each program with two key abstractions:
• Logical control flow
• Each program seems to have exclusive use of the CPU
• Private virtual address space
• Each program seems to have exclusive use of main memory
• How are these Illusions maintained?
• Process executions interleaved (multitasking) or run on
separate cores
• Address spaces managed by virtual memory system
https://www.cs.cmu.edu/afs/cs/academic/class/15213-f12/www/lectures/

The World of Multitasking
• System runs many processes concurrently
• Process: executing program
• State includes memory image + register values + program counter
• Regularly switches from one process to another
• Suspend process when it needs I/O resource or timer event occurs • Resume process when I/O available or given scheduling priority
• Appears to user(s) as if all processes executing simultaneously
• Even though most systems can only execute one process at a time • Except possibly with lower performance than if running alone
https://www.cs.cmu.edu/afs/cs/academic/class/15213-f12/www/lectures/

Concurrent Processes
• Two processes run concurrently (are concurrent) if their flows overlap in time
• Otherwise, they are sequential
• Examples (running on single core): • Concurrent:A&B,A&C
• Sequential: B & C
Process A Process B Process C
Time
https://www.cs.cmu.edu/afs/cs/academic/class/15213-f12/www/lectures/

User View of Concurrent Processes
• Control flows for concurrent processes are physically disjoint in time
• However, we can think of concurrent processes as running in parallel with each other
Process A Process B Process C
Time
https://www.cs.cmu.edu/afs/cs/academic/class/15213-f12/www/lectures/

Context Switching
• Processes are managed by a shared chunk of OS code called the kernel
• Important: the kernel is not a separate process, but rather runs as part of some user process
• Control flow passes from one process to another via a context switch
Process A
Process B
user code
kernel code
user code
kernel code
user code
Time
context switch context switch
https://www.cs.cmu.edu/afs/cs/academic/class/15213-f12/www/lectures/

fork: Creating New Processes • int fork(void)
• creates a new process (child process) that is identical to the calling process (parent process)
• returns 0 to the child process
• returns child’s pid (process id) to the parent process
pid_t pid = fork();
if (pid == 0) {
printf(“hello from child\n”);
} else {
printf(“hello from parent\n”);
}
• Fork is interesting (and often confusing) because it is called once but returns twice
https://www.cs.cmu.edu/afs/cs/academic/class/15213-f12/www/lectures/

Understanding fork
Process n
Child Process m
pid_t pid = fork();
if (pid == 0) {
printf(“hello from child\n”);
} else {
printf(“hello from parent\n”);
}
pid_t pid = fork();
if (pid == 0) {
printf(“hello from child\n”);
} else {
printf(“hello from parent\n”);
}
pid_t pid = fork();
if (pid == 0) {
printf(“hello from child\n”);
} else {
printf(“hello from parent\n”);
}
pid_t pid = fork();
if (pid == 0) {
printf(“hello from child\n”);
} else {
printf(“hello from parent\n”);
}
pid = m
pid = 0
pid_t pid = fork();
if (pid == 0) {
printf(“hello from child\n”);
} else {
printf(“hello from parent\n”);
}
pid_t pid = fork();
if (pid == 0) {
printf(“hello from child\n”);
} else {
printf(“hello from parent\n”);
}
hello from parent
Which one is first? hello from child
https://www.cs.cmu.edu/afs/cs/academic/class/15213-f12/www/lectures/

Fork Example #1
• Parent and child both run same code
• Distinguish parent from child by return value from fork
• Start with same state, but each has private copy • Including shared output file descriptor
• Relative ordering of their print statements undefined
void fork1()
{
int x = 1;
pid_t pid = fork();
if (pid == 0) {
printf(“Child has x = %d\n”, ++x);
} else {
printf(“Parent has x = %d\n”, –x);
}
printf(“Bye from process %d with x = %d\n”, getpid(), x);
}
https://www.cs.cmu.edu/afs/cs/academic/class/15213-f12/www/lectures/

Fork Example #2
• Two consecutive forks
void fork2()
{
printf(“L0\n”);
fork();
printf(“L1\n”);
fork();
printf(“Bye\n”);
}
Bye L1 Bye
Bye L0 L1 Bye
https://www.cs.cmu.edu/afs/cs/academic/class/15213-f12/www/lectures/

Fork Example #3
• Three consecutive forks
void fork3()
{
printf(“L0\n”);
fork();
printf(“L1\n”);
fork();
printf(“L2\n”);
fork();
printf(“Bye\n”);
}
Bye L2 Bye
Bye L1 L2 Bye Bye L2 Bye
Bye L0 L1 L2 Bye
https://www.cs.cmu.edu/afs/cs/academic/class/15213-f12/www/lectures/

Fork Example #4
• Nested forks in parent
void fork4()
{
printf(“L0\n”);
if (fork() != 0) {
printf(“L1\n”);
if (fork() != 0) {
printf(“L2\n”);
fork(); }
}
printf(“Bye\n”);
}
Bye
Bye
Bye L0 L1 L2 Bye
https://www.cs.cmu.edu/afs/cs/academic/class/15213-f12/www/lectures/

Fork Example #5
• Nested forks in children
void fork5()
{
printf(“L0\n”);
if (fork() == 0) {
printf(“L1\n”);
if (fork() == 0) {
printf(“L2\n”);
fork(); }
}
printf(“Bye\n”);
}
Bye L2 Bye
L1 Bye L0 Bye
https://www.cs.cmu.edu/afs/cs/academic/class/15213-f12/www/lectures/

exit: Ending a process • void exit(int status)
• exits a process
• Normally return with status 0
• atexit() registers functions to be executed upon exit
void cleanup(void) {
printf(“cleaning up\n”);
}
void fork6() {
atexit(cleanup);
fork();
exit(0);
}
https://www.cs.cmu.edu/afs/cs/academic/class/15213-f12/www/lectures/

Zombies
• Idea
• When process terminates, still consumes system resources • Various tables maintained by OS
• Called a “zombie”
• Living corpse, half alive and half dead
• Reaping
• Performed by parent on terminated child (using wait or
waitpid)
• Parent is given exit status information
• Kernel discards process
• What if parent doesn’t reap?
• If any parent terminates without reaping a child, then child
will be reaped by init process (pid == 1)
• So, only need explicit reaping in long-running processes
• e.g., shells and servers https://www.cs.cmu.edu/afs/cs/academic/class/15213-f12/www/lectures/

Zombie Example
https://www.cs.cmu.edu/afs/cs/academic/class/15213-f12/www/lectures/
• ps shows child process as “defunct”
• Killing parent allows child to be reaped by init
void fork7()
{
if (fork() == 0) {
/* Child */
printf(“Terminating Child, PID = %d\n”,
getpid());
exit(0);
} else {
printf(“Running Parent, PID = %d\n”,
getpid());
while (1)
; /* Infinite loop */
} }
linux> ./forks 7 &
[1] 6639
Running Parent, PID = 6639 Terminating Child, PID = 6640 linux> ps
PID TTY
6585 ttyp9
6639 ttyp9
6640 ttyp9
6641 ttyp9
TIME CMD
00:00:00 tcsh
00:00:03 forks
00:00:00 forks
00:00:00 ps
linux> kill 6639 [1] Terminated linux> ps
PID TTY
6585 ttyp9
6642 ttyp9
TIME CMD
00:00:00 tcsh
00:00:00 ps

wait: Synchronizing with Children • Parent reaps child by calling the wait function
• int wait(int *child_status)
• suspends current process until one of its children terminates
• return value is the pid of the child process that terminated
• if child_status != NULL, then the object it points to will
be set to a status indicating why the child process terminated
https://www.cs.cmu.edu/afs/cs/academic/class/15213-f12/www/lectures/

wait: Synchronizing with Children
void fork9() {
int child_status;
if (fork() == 0) {
printf(“HC: hello from child\n”);
}
else {
printf(“HP: hello from parent\n”);
wait(&child_status);
printf(“CT: child has terminated\n”);
}
printf(“Bye\n”);
exit();
}
HC Bye
HP CT Bye
https://www.cs.cmu.edu/afs/cs/academic/class/15213-f12/www/lectures/

execve: Loading and Running Programs
• int execve(
char *filename, char *argv[], char *envp[]
)
• Loads and runs in current process: • Executablefilename
• With argument list argv
• And environment variable list envp
• Does not return (unless error)
• Overwrites code, data, and stack
• keeps pid, open files and signal context
• Environment variables:
• “name=value” strings
• getenv and putenv
Stack bottom
unused
envp[n] == NULL
envp[n-1]

envp[0]
environ
https://www.cs.cmu.edu/afs/cs/academic/class/15213-f12/www/lectures/
Stack top
Null-terminated env var strings
Null-terminated cmd line arg strings
argv[argc] == NULL
argv[argc-1]

argv[0]
Linker vars
envp
argv
argc
Stack frame for
main

execve Example
if ((pid = Fork()) == 0) { /* Child runs user job */ if (execve(argv[0], argv, environ) < 0) { printf("%s: Command not found.\n", argv[0]); exit(0); } } argv[argc] = NULL argv[argc-1] ... argv[0] argv “/usr/include” “-lt” “ls” “PWD=/usr/droh” “PRINTER=iron” “USER=droh” envp[n] = NULL envp[n-1] ... envp[0] environ https://www.cs.cmu.edu/afs/cs/academic/class/15213-f12/www/lectures/ Unix Process Hierarchy [0] init [1] Login shell Daemon e.g. httpd Child Grandchild Child Child Grandchild https://www.cs.cmu.edu/afs/cs/academic/class/15213-f12/www/lectures/ Shell Programs • A shell is an application program that runs programs on behalf of the user. • sh Original Unix shell ( , AT&T Bell Labs, 1977) • csh BSD Unix C shell (tcsh: enhanced csh at CMU and elsewhere) • bash “Bourne-Again” Shell Execution is a sequence of read/evaluate steps int main() { char cmdline[MAXLINE]; while (1) { /* read */ printf("> “);
Fgets(cmdline, MAXLINE, stdin);
if (feof(stdin))
exit(0);
/* evaluate */
eval(cmdline);
}
}
https://www.cs.cmu.edu/afs/cs/academic/class/15213-f12/www/lectures/

Simple Shell eval Function
void eval(char *cmdline) {
char *argv[MAXARGS]; /* argv for execve() */
int bg; /* should the job run in bg or fg? */
pid_t pid; /* process id */
bg = parseline(cmdline, argv);
if (!builtin_command(argv)) {
if ((pid = Fork()) == 0) { /* child runs user job */
if (execve(argv[0], argv, environ) < 0) { printf("%s: Command not found.\n", argv[0]); exit(0); } } if (!bg) { /* parent waits for fg job to terminate */ int status; if (waitpid(pid, &status, 0) < 0) unix_error("waitfg: waitpid error"); } else /* otherwise, don’t wait for bg job */ printf("%d %s", pid, cmdline); } } https://www.cs.cmu.edu/afs/cs/academic/class/15213-f12/www/lectures/ What Is a “Background Job”? • Users generally run one command at a time • Type command, read output, type another command • Some programs run “for a long time” • Example: “delete this file in two hours” unix> sleep 7200; rm /tmp/junk # shell stuck for 2 hours
• A “background” job is a process we don’t want to wait
for
unix> (sleep 7200 ; rm /tmp/junk) &
[1] 907
unix> # ready for next command
https://www.cs.cmu.edu/afs/cs/academic/class/15213-f12/www/lectures/

Processes
OS/System Software View

Processes
• Reality: each CPU can only run one program at a time
• Fiction to user: many people getting short (~10-100 ms) time slices
• pseudo-parallelismàmultiprogramming • modeled as sequential processes
• context switch
https://www.cs.columbia.edu/~hgs/teaching/ap/slides/process_threads.ppt

What’s a process?
• timesharing system: alternate between processes, interrupted by OS:
• run on CPU
• clock interrupt happens
• save process state (context switch) • registers (PC, SP, numeric)
• memorymap
• continue some other process
https://www.cs.columbia.edu/~hgs/teaching/ap/slides/process_threads.ppt

Process creation
• Processes are created: • system initialization
• by another process
• Foreground processes interact with user • Background processes (daemons)
https://www.cs.columbia.edu/~hgs/teaching/ap/slides/process_threads.ppt

Linux: Processes or Threads?
• Linux uses a neutral term: tasks
• Tasks represent both processes and threads
• Linux view
• Threads: processes that share address space
• Linux “threads” (tasks) are really “kernel threads“
• Lighter-weight than traditional processes
• File descriptors, VM mappings need not be copied
• Implication: file table and VM table not part of process descriptor
http://www.cs.columbia.edu/~nahum/w4118/lectures/Processes.ppt

Stacks and task-descriptors
• To manage multitasking, the OS needs to use a data-structure which can keep track of every task’s progress and usage of the computer’s available resources (physical memory, open files, pending signals, etc.)
• Such a data-structure is called a ‘process descriptor’ – every active task needs one
• Every task needs its own ‘private’ stack
• So every task, in addition to having its own code and data, will also have a stack-area that is located in user-space, plus another stack- area that is located in kernel-space
• Each task also has a process-descriptor which is accessible only in kernel-space
http://www.cs.columbia.edu/~nahum/w4118/lectures/Processes.ppt

Process Descriptor
• Process – dynamic, program in motion • Kernel data structures to maintain “state”
• Descriptor, PCB (control block), task_struct • Larger than you think! (about 1K)
• Complex struct with pointers to others
• Type of info in task_struct
• state, id, priorities, locks, files, signals, memory maps, locks,
• Some details
• Address of first few fields hardcoded in asm • Careful attention to cache line layout
queues, list pointers, …
http://www.cs.columbia.edu/~nahum/w4118/lectures/Processes.ppt

The Linux process descriptor
task_struct
mm_struct contains many *pgd
pagedir[]
Each process descriptor
fields
and some are pointers to other kernel
structures
which may themselves
include fields that point to
structures
user_struct
files_struct
signal_struct
http://www.cs.columbia.edu/~nahum/w4118/lectures/Processes.ppt
state
*stack
flags
*mm
exit_code
*user
pid
*files
*parent
*signal

What is a process/task in Linux?
• unit of scheduling in Linux
• also name of data structure (struct task_struct) (see linux/sched.h)

User-level view
• each execution context that can be independently scheduled must have its own process descriptor
• traditional process id (pid, a field in task structure/process descriptor)
• from 1 to 32,767 in Linux, used as task-unique identifier
• tgid (thread group id) plays process id role for multithreaded applications (common id for all threads in process)
• most processes belong to a thread group consisting of a single member

Kernel view
• keeps two data structures in a single per-process area (8kB)
• thread_info structure (keeps pointer to task structure or process descriptor)
thread info
stack
• kernel stack
• both dynamically allocated
8 kB
• architecture-dependent thread info shares space with kernel stack
• Remember: DO NOT USE RECURSION IN THE KERNEL!

Task structures
• placed in a cyclic, doubly-linked list
• list starts with sentinel: init_task
• first task created by kernel at boot time
• persists until machine shut down/rebooted
• used for lazy page table updates (discussed later)
init_task
tasks.next
tasks.prev

For system calls
• must translate pid to struct pid* (small structure referencing task)
• find_pid (kernel/pid.c) uses a hash table
• map large, sparse space into small, dense space
• O(1) access time if # buckets similar to # of elements (really O(n))
• NOT dynamically grown
• doubly-linked list per bin, but back links only for deletion • hash function = ((pid * big prime #) >> shift)
pid_hash (array of struct pid*)
bound as small as 15 on machines with less memory
0
pidchain.next (*) pidchain.pprev (**)
4095

parent/child relationships
• a process that creates a second process
• is said to be the latter’s parent
• example
• shell from which you run program • is the parent for those programs
• a tree relationship with unbounded degree • root of the tree is the init_task
how can such a structure be contained in constant space per node?

Sibling List
• include list of children as part of each childʼs structure…
parent children.next children.prev sibling.prev sibling.next real_parent
parent pointer
oldest child pointer
youngest child pointer
older sibling pointer
younger sibling pointer original parent (for debugging)
(sibling list is cyclic)
younger sibling older sibling
parent
oldest child

Creating Processes/Tasks
• User-level:
• fork, vfork, and clone system calls

Creating Processes/Tasks
• In kernel: all map into do_fork
• prototype in linux/sched.h; implementation in kernel/fork.c
int do_fork (unsigned long clone_flags, unsigned long stack_start,
struct pt_regs* regs, unsigned long stack_size, int __user* parent_tidptr, int __user* child_tidptr);
• __user means not to be dereferenced
• returns new pid or negative value on error

Parameters
• clone_flags – control flags, to be discussed later; for now, just CLONE_PARENT, which creates a sibling task (like a thread) instead of a child task
• stack_start – the new task’s ESP (in user space; 0 for kernel threads)
• regs – register values for the new task
• stack_size – ignored on x86, but needed on some
ISAs
• parent/child_tidptr – filled in as part of system call (sys_clone only)

Creating Processes/Tasks (cont.)
• do_fork calls copy_process (same file) to set up the process descriptor and any other kernel data structures necessary for child execution
• mostly relies on an architecture-dependent version, copy_thread (see arch/i386/kernel/process.c)
• only stack_start and regs are used by x86 version

Creating Kernel Threads
• What is a kernel thread?
• a task without an associated address space
• inherits address space from last user task to execute
• (all address spaces map the kernel pages and data structures)

Creating Kernel Threads (cont.)
• The interface
• prototype in asm/processor.h; implementation in
arch/i386/kernel/process.c
int kernel_thread (int (*fn)(void*), void* arg, unsigned long flags);
• returns new pid or negative value on error

Creating Kernel Threads (cont.)
• to identify a kernel thread, check the memory map fields of task
• mm = memory map owned by the task
• active_mm = memory map in use by task
• these are identical for user tasks; for kernel threads, mm is NULL

Creating Kernel Threads (cont.)
• init_task is a kernel thread
• created during boot (in start_kernel); persists until
shutdown • pid = 1
init_mm
init_fs
init_files init_signals/init_sighand
init task (pid=1)
• Other kernel threads
• ksoftirqd – the soft interrupt daemon (periodically try to execute tasklets)

Processes
ISA/Hardware View

IA-32 architecture’s task management facilities
• A task is a unit of work that a processor can dispatch, execute, and suspend.
• It can be used to execute a program, a task or process, an operating-system service utility,
an interrupt or exception handler, or a kernel or executive utility.
• The IA-32 architecture provides a mechanism for saving the state of a task, for dispatching tasks for execution, and for switching from one task to another.
• When operating in protected mode, all processor execution takes place from within a task. Even simple systems must define at least one task.
• More complex systems can use the processor’s task management facilities to support multitasking applications.
• A task is made up of two parts: a task execution space and a task-state segment (TSS).
• The task execution space consists of a code segment, a stack segment, and one or more data segments (see Figure 6-1).
• The TSS specifies the segments that make up the task execution space and provides a storage place for task state information.
• Use of task management facilities for handling multitasking applications is optional. Multitasking can be handled in software, with each software defined task executed in the context of a single IA-32 architecture task.
IA-32 Intel® Architecture Software Developer’s Manual Volume 3: System Programming Guide

Use of TSS in Linux
• Although a TSS could be created for each task running on the computer, Linux kernel only creates one TSS for each CPU and uses them for all tasks.
• This approach was selected as it provides easier portability to other architectures (for example,
the AMD64 architecture does not support hardware task switches), and improved performance and flexibility.
• Linux only uses the I/O port permission bitmap and inner stack features of the TSS; the other features are only needed for hardware task switches, which the Linux kernel does not use.
https://en.wikipedia.org/wiki/Task_state_segment

x86 Task State Segment

Task State Segment Transparency
• TSS contains three parts
• first part is shown in diagram
• second two parts are variable-length
• interrupt redirection (for emulation of earlier ISA generations) • I/O permission bitmap (for individual ports)
• I/O Map Base Address gives pointer to start of the last part
• Segment limit (in TSS descriptor) defines I/O bitmap length
• T field – debug trap flag
• if set, execution stops (exception thrown) when task executes

Task State Segment Transparency (cont.)
• SS/ESP fields are saved values for user privilege level • Other privilege levels read from SS2/ESP2, SS1/ESP1,
SS0/ESP0
• I/O bitmap privilege checking
• IOPL (I/O Privilege Level field in EFLAGS register) specifies level needed for arbitrary access
• if CPL > IOPL
• processorchecksI/ObitmapinTSS • exceptiongeneratedunless
• bitmap contains enough bits to represent port as a bit
• bit representing port (I/O port for in/out instruction) is set to 0
• Linux keeps only the first 1024 ports in bitmap to conserve space