CITS2002 Systems Programming
1 next ¡ú CITS2002 CITS2002 schedule
Welcome to CITS2002 Systems Programming
The unit explores the role of contemporary operating systems and their support for high-level programming languages, how they manage efficient access to computer hardware, and how a computer’s resources may be accessed and controlled by the C programming language.
The unit will be presented by Prof. (lectures) and Dr Chris McDonald (workshops).
Our UWA Handbook entry
Understanding the relationship between a programming language and the contemporary operating systems on which it executes is central to developing many skills in Computer Science. This unit introduces the standard C programming language, on which many other programming languages and systems are based, through a study of core operating system services including processes, input and output, memory management, and file systems.
The C language is introduced through discussions on basic topics like data types, variables, expressions, control structures, scoping rules, functions and parameter passing. More advanced topics like C’s run-time environment, system calls, dynamic memory allocation, and pointers are presented in the context of operating system services related to process execution, memory management and file systems. The importance of process scheduling, memory management and interprocess communication in modern operating systems is discussed in the context of operating system support for multiprogramming. Laboratory and tutorial work place a strong focus on the practical application of fundamental programming concepts, with examples designed to compare and contrast many key features of contemporary operating systems.
The UWA Handbook entry for this unitstrongly recommends that you take one of three units of Advisable Prior Study before taking this unit – CITS1001, CITS1401, or CITS2401. Students who took this unit in recent years, and had chosen to not take one of the units of Advisable Prior Study, found the material in this unit difficult. This unit is not suitable for first-time programmers.
CITS2002 Systems Programming, Lecture 1, p1, 26th July 2021.
CITS2002 Systems Programming
¡ûprev 2 next¡ú CITS2002 CITS2002schedule
Topics to be covered in CITS2002 Systems Programming
It’s important to know where we’re heading, so here’s a list of topics that we’ll be covering:
An introduction to the ISO-C programming language
The structure of a C program, basic datatypes and variables, compiling and linking. We will focus on the C11 language standard.
An introduction to Operating Systems
A brief history of operating systems, the role of contemporary operating systems, the relationship between programming languages, programs, and operating systems.
An overview of computer hardware components
The processor and its registers, the memory hierarchy, input and output (I/O) and storage components.
C programs in greater detail
Arrays and character strings, user-defined types and structures, how the computer hardware represents data, functions, parameter passing and return values.
Operating system services
Creating and terminating processes, a program’s runtime environment, command-line arguments, accessing operating system services from C.
Managing memory
Allocating physical memory to processes, sharing memory between multiple processes, allocating and managing memory in C programs.
Files and their use in programs
The file management system, file allocation methods, file and directory operations and attributes, file input and output (I/O), raw and formatted I/O, unbuffered and buffered I/O functions.
By the end of this unit you’ll have this knowledge – it just won’t all be presented strictly in this order.
Here is our unit’s schedule.
CITS2002 Systems Programming, Lecture 1, p2, 26th July 2021.
CITS2002 Systems Programming
¡ûprev 3 next¡ú CITS2002 CITS2002schedule
Systems-focussed Standards
In this unit we’ll introduce a number of standards relevant to systems programming. Formal standards are used to define nearly all aspects of compuitng, notably data-representations, file- formats, programming-languages, networking protocols, web (communication) interfaces, and encryption and authentication.
Formal standards in computing are often very large. For example, the formal standard for the C11 programming language (used in this unit) is 660 pages. You are not expected to understand these standards in depth (they will not be examined), but as part of professional development you’re encouraged to skim them for an appreciation of their role in computing.
Standards discussed in this unit
C11 – the ISO/IEC 9899:2011 programming language standard standardizes a set of features supported by common contemporary compilers, such as gcc and clang. In this unit we focus on C11, despite it being superseded by C17 (standard ISO/IEC 9899:2018), because C11 is widely supported in the computing environments you’ll use (and C17 is not yet widely supported).
POSIX – the Portable Operating System Interface is a family of standards specified by the IEEE Computer Society for maintaining compatibility between operating systems. POSIX defines the both system- and user-level application programming interfaces (API), along with command line shells and utility interfaces [Wikipedia].
While POSIX is often associated with open-source systems (such as Linux), the first POSIX-certified system was Microsoft’s Windows-NT v3.5 in 1999!
CITS2002 Systems Programming, Lecture 1, p3, 26th July 2021.
CITS2002 Systems Programming
¡ûprev 4 next¡ú CITS2002 CITS2002schedule
Why teach C?
Since its beginnings in early 1973, the C programming language has evolved to become one of the world’s most popular, and widely deployed programming languages. The language has undergone extensive formal standardization to produce the ANSI-C standard in 1989, the ISO- C99 standard in 1999, ISO-C11 (revision) inDec 2011, and ISO-C18 inJune 2018 (which introduces no new language features, only technical corrections and clarifications to defects in C11).
C is the programming language of choice for most systems-level, engineering, and scientific programming:
most of the world’s popular operating systems, Linux, Windows and macOS, their interfaces and file-systems, are written in C,
the infrastructure of the Internet, including most of its networking protocols, web servers, and email systems, are written in C,
software libraries providing graphical interfaces and tools, and efficient numerical, statistical, encryption, and compression algorithms, are written in C,
the software for most embedded devices, including those in cars, aircraft, robots, smart appliances, sensors, mobile phones, and game consoles, is written in C,
the software on the Lander is written in C,
much of the safety-critical software on the F-35 joint strike fighter, is written in C, but
C was not used on the Apollo-11 mission! (The Tiobe survey is based on search-engine queries – is not about the best programming language or the language
Is C still relevant? [ref: Tiobe survey]
in which most lines of code have been written).
Of note, in July 2021, the Tiobe survey rates C, Java, and Python as almost identical in ‘popularity’.
Though, of course, popularity is a poor measure of quality – otherwise, McDonald’s Restaurants would receive Michelin stars.
So we’ll not focus on popularity, but on the relevance and appropriate uses of C. Other interesting surveys:
Stackoverflow’s Developer Survey Results 2020 (the 2021 survey is now open). Jetbrains’ The State of Developer Ecosystem in 2021
HackerRank’s 2020 Developer Skills Report
CITS2002 Systems Programming, Lecture 1, p4, 26th July 2021.
CITS2002 Systems Programming
¡ûprev 5 next¡ú CITS2002 CITS2002schedule
So what is C?
¡° A programming language that doesn’t affect the way you think about programming isn’t worth knowing.
¡ª , 1st Turing Award winner
In one breath, C is often described as a good general purpose language, an excellent systems programming language, and just a glorified assembly language. So how can it be all three?
C can be correctly described as a general purpose programming language – a description also given to Java, Python, Visual-Basic, C++, and C#.
C is a procedural programming language, not an object-oriented language like Java, (parts of) Python, Objective-C, or C#.
C programs can be “good” programs, if they are:
well designed,
clearly written,
written for portability,
well documented,
use high level programming practices, and well tested.
Of course, the above properties are independent of C, and are offered by many high level languages.
C has programming features provided by most procedural programming languages – strongly typed variables, constants, standard (or base) datatypes, enumerated types, user-defined types, aggregate structures, standard control flow, recursion, and program modularization.
C does not offer tuples or sets, Java’s concept of classes or objects, nested functions, subrange types, and has only recently added a Boolean datatype.
C does have, however, separate compilation, conditional compilation, bitwise operators, pointer arithmetic, and language independent input and output.
CITS2002 Systems Programming, Lecture 1, p5, 26th July 2021.
CITS2002 Systems Programming
¡ûprev 6 next¡ú CITS2002 CITS2002schedule
A Systems Programming Language
C is frequently, and correctly, described as an excellent systems programming language.
C also provides an excellent operating system interface through its well defined, hardware and
operating system independent, standard library.
The C language began its development in 1972, as a programming language in which to re-write significant portions on the Unix operating system:
Unix was first written in assembly languages for PDP-7 and PDP-11 computers.
In 1973 was working on a programming language for operating system development. Basing his ideas upon BCPL, he developed B and finally created one called C.
(Yes, there is a language named ‘D’, but it’s not a descendant of C)
By the end of 1973, the UNIX kernel was 85% written in C which enabled it to be ported to other machines for which a C compiler could be fashioned.
This was a great step because it no longer tied the operating system to the PDP-7 as it would have been if it remained in assembly language. In 1976 and ported Unix to an Interdata 8/32 machine. Since then, Unix and Linux have been ported to over 260 different processor architectures.
Today, well in excess of 95% of the Unix, Linux, macOS, and Windows operating system kernels and their standard library routines are all written in the C programming language – it’s extremely difficult to find an operating system not written in either C or its descendants C++ or Objective-C.
CITS2002 Systems Programming, Lecture 1, p6, 26th July 2021.
CITS2002 Systems Programming
¡ûprev 7 next¡ú CITS2002 CITS2002schedule
Portability on different architectures
C compilers have been both developed and ported to a large number and type of computer architectures:
from 4-bit and 8-bit microcontrollers,
through traditional 16-, 32-, and 64-bit virtual memory architectures in most PCs and workstations, to larger 64- and 128-bit supercomputers.
Compilers have been developed for:
traditional large instruction set architectures, such as Intel x86, AMD, ARM, Motorola 680×0, Sun SPARCs, and DEC-Alpha, newer reduced instruction set architectures (RISC), such as SGI MIPS, IBM/Motorola PowerPC,
mobile phones, home theatre equipment, routers and access-points, and
parallel and pipelined architectures.
CITS2002 Systems Programming, Lecture 1, p7, 26th July 2021.
CITS2002 Systems Programming
¡ûprev 8 next¡ú CITS2002 CITS2002schedule
All it requires is a ported C compiler
Once a C compiler has been developed for a new architecture, the terabytes of C programs and libraries available on other C-based platforms can also be ported to the new architecture.
What about assembly languages?
It is often quoted that a compiled C program will run only 1-2% slower than the same program hand-coded in the native assembly language for the machine.
But the obvious advantage of having the program coded in a readable, high level language, provides the overwhelming advantages of maintainability and portability.
Very little of an operating system, such as Windows, macOS, or Linux, is written in an assembly language – in most cases the majority is written in C.
Even an operating system’s device drivers, often considered the most time-critical code in an operating system kernel, today contain assembly language numbered in only the hundreds of lines.
CITS2002 Systems Programming, Lecture 1, p8, 26th July 2021.
CITS2002 Systems Programming
¡ûprev 9 next¡ú CITS2002 CITS2002schedule
The unreadability of C programs
C is described as nothing more than a glorified assembly language, meaning that C programs can be written in such an unreadable fashion that they look like your monitor is set at the wrong speed.
(in fact there’s a humorous contest held each year,The International Obfuscated C Code Contest to design fully working but indecipherable code,
and the Underhanded C Contest whose goal is to write code that is as readable, clear, innocent and straightforward as possible, and yet it must fail to perform at its apparent function).
Perhaps C’s biggest problem is that the language was designed by programmers who, folklore says, were not very proficient typists.
C makes extensive use of punctuation characters in the syntax of its operators and control flow. In fact, only the punctuation characters
are not used in C’s syntax! (and DEC-C once used the $ character, and Objective-C now uses the @).
It is not surprising, then, that if C programs are not formatted both consistently and with sufficient white space between operators, and if very short identifier names are used, a C program will be very difficult to read.
To partially address these problems, a number of text-editors, integrated development environments (IDEs), and beautification programs (such as indent) can automatically reformat our C code according to consistent specifications.
CITS2002 Systems Programming, Lecture 1, p9, 26th July 2021.
CITS2002 Systems Programming
¡ûprev 10 next¡ú CITS2002 CITS2002schedule
Criticisms of C’s execution model
C is criticized for being too forgiving in its type-checking at compile time.
It is possible to cast an instance of some types into other types, even if the two instances have considerably different types.
A pointer to an instance of one type may be coerced into a pointer to an instance of another type, thereby permitting the item’s contents to be interpreted differently.
Badly written C programs make incorrect assumptions about the size of items they are managing. Integers of 8-, 16-, and 32-bits can hold different ranges of values. Poor choices, or underspecification can easily lead to errors.
C provides no runtime protection against arithmetic errors.
There is no exception handling mechanism, and errors such as division-by-zero and
arithmetic overflow and underflow, are not caught and reported at run-time.
C offers no runtime checking of popular and powerful constructs like pointer variables and array indices.
Subject to constraints imposed by the operating system’s memory management routines, a pointer may point almost anywhere in a process’ address space and seemingly random addresses may be read or written to.
Although all array indices in C begin at 0, it is possible to access an array’s elements with negative indices or indices beyond the declared end of the array.
There are occasions when each of these operations make sense, but they are rare.
C does not hold the hand of lazy programmers.
We avoid all of these potential problems by learning the language well, and employing safe programming practices.
CITS2002 Systems Programming, Lecture 1, p10, 26th July 2021.
CITS2002 Systems Programming
¡ûprev 11 next¡ú CITS2002 CITS2002schedule
What is the best programming language?
The question, even arguments, of whether C, Java, Visual-Basic, C++, or C# is the best general purpose programming language is pointless.
¡° C and C++ are only the foundation because they happened to become popular due to a bunch of miscellaneous factors, not because they are inherently great inventions in themselves. Also, they (and their standard libraries) evolved over time to their current state.
It’s like saying English and Spanish are the most important languages because they are fundamentally the “best-invented” ones, not because of the accidents of fate that were colonial expansion, WWII, and the Internet. ¡ª important question is:
“which language is most suited for the task at hand?”
This unit will answer the questions:
“when is C the best language to use?”and
“how do we best use C’s features for systems programming?”
Through a sequence of units offered by Computer Science & Software Engineering you can become proficient in a wide variety of programming languages – procedural, object-oriented, functional, logic, set-based, and formal – and know the most appropriate one to select for any project.
CITS2002 Systems Programming, Lecture 1, p11, 26th July 2021.
CITS2002 Systems Programming
¡ûprev 12 next¡ú CITS2002 CITS2002schedule
The Standardization of C – K&R C
Despite C’s long history, being first designed in the early 1970s, it underwent considerably little change until the late 1980s.
This is a very lengthy period of time when talking about a programming language’s evolution.
The original C language was mostly designed by and then described by and in their imaginatively titled book The C Programming Language.
The language described in this seminal book, described as the “K&R” book, is now described as”K&R” or “old” C.
228 pages.
CITS2002 Systems Programming, Lecture 1, p12, 26th July 2021.
CITS2002 Systems Programming
¡ûprev 13 next¡ú CITS2002 CITS2002schedule
The Standardization of C – ANSI-C (K&R-2)
In the late 1980s, a number of standards forming bodies, and in particular the American National Standards Association X3J11 Committee, commenced work on rigorously defining both the C language and the commonly provided standard C library routines. The results of their lengthy meetings are termed the ANSI-X3J11 standard, or informally as ANSI-C, C89, or C90.
The formal definition of ANSI-C introduced surprisingly few modifications to the old “K&R” language and only a few additions.
Most of the additions were the result of similar enhancements that were typically provided by different vendors of C compilers, and these had generally been considered as essential extensions to old C. The ANSI-C language is extremely similar to old C. The committee only introduced a new base datatype, modified the syntax of function prototypes, added functionality to the preprocessor, and formalized the addition of constructs such as constants and enumerated types.
272 pages.
CITS2002 Systems Programming, Lecture 1, p13, 26th July 2021.
CITS2002 Systems Programming
¡ûprev 14 next¡ú CITS2002 CITS2002schedule
The Standardization of C – ANSI/ISO-C99 and ISO/IEC 9899:2011 (C11)
A new revision of the C language, named ANSI/ISO-C99 (known as C99), was completed in 1999.
Many features were “cleaned up”, including the addition of Boolean and complex datatypes, single line comments, and variable length arrays, and the removal of many unsafe features, and ill-defined constructs.
753 pages.
A revision of C99, ISO/IEC 9899:2011 (known as C11), was completed in December 2011.
In this unit we will focus exclusively on C11,
and only mention other versions of C when the differences are significant.
If the C compiler on