2021/10/29 ÏÂÎç6:23 CITS2002 Systems Programming, Lecture 21,
CITS2002 Systems Programming
CITS2002 CITS2002 schedule
Systems Programming and Portability
In this unit we’ve focused on system programming – understanding the interface between the operating system and application programs.
Operating systems are the best examples of programs that need to be aware of hardware’s specifications and limitations, and to successfully hide as much of this detail from potential applications through good software engineering practices.
If the operating system, itself, has any chance of being ported to different architectures, its own implementation must identify and isolate its hardware dependencies.
Unix, the historic forefather of Linux and macOS (and many others), was the first portable operating system, reimplemented in C to support its migration from early Digital Equipment Corp (DEC). minicomputers. C itself was invented specifically for the purpose of enabling Unix to be portable.
¡° We here at Bell Laboratories were truly dumfounded when this visitor from an unknown school in Australia reported his elegant procedure. ¡ª Doug McIlroy, Head Unix Research Group, Bell Laboratories
UNIX: a portable operating system? [Miller, 1978].
Unix portability: underutilized in embedded development [Crooks, 2002]. The First Port of UNIX [Reinfelds, 1978].
Today, of course, we see Linux ported to nearly every form of contemporary architecture because: hardware-dependent code has been identified and isolated,
software abstractions and application-programming interfaces hide hardware characteristics from applications, and
successful applications do not introduce, or depend upon, any hardware dependencies.
CITS2002 Systems Programming, Lecture 21, p1, 12th October 2021.
What is portability?
A program may be considered portable if it can be ‘moved’, migrated, to different computing environments.
These environments do not just include different operating systems, running on different forms of hardware, but can include different (human) interfaces and natural languages.
Many operating systems are written in C and are, in theory, portable. This is possible because C toolchains (the pre-processor, compiler, and linker) are supported by header files and libraries that have ‘extended’ the language, without requiring the language, itself, to be change (the above paragraph is not strictly correct, as C11 has recently added new features aiding portability, such as in-language support for Unicode).
https://teaching.csse.uwa.edu.au/units/CITS2002/lectures/lecture21/singlepage.html 1/7
2021/10/29 ÏÂÎç6:23 CITS2002 Systems Programming, Lecture 21,
C is portable at the level of its source-code
C programs require compiling in their new computing environment, or cross-compiled on an existing environment with knowledge of the destination hardware architecture and able to provide the necessary libraries. Examples include being able to develop programs on an Intel-based Linux platform, destined for an ARM-based platform (also running Linux), or developing a program under Apple’s macOS destined for an iPhone (and then both uploaded (by network or cable) to the new environment).
C’s source-level portability is in contrast to:
Java’s use of an architecture-independent bytecode. Java’s source code is compiled on one platform, and the resulting bytecode copied to a destination platform with a platform-specific implementation of a Java Virtual Machine (JVM) to interpret the bytecode.
Python’s portability coming from its interpretation of its source code with a platform-specific Python interpreter.
CITS2002 Systems Programming, Lecture 21, p2, 12th October 2021.
Your C compiler’s version and default language standard
Now a decade since C11 was released, and contempory compilers, such as gcc and clang, support all C11 features (on hosted platforms), and support requestis for backward compatability from earlier standards (cc -std=cXX …).
While easy to determine the version of a compiler being used:
mac-prompt> cc –version
Apple clang version 11.0.3 (clang-1103.0.32.59)
Target: x86_64-apple-darwin20.6.0
Thread model: posix
InstalledDir: /Library/Developer/CommandLineTools/usr/bin
compiler front-ends support many languages and versions, so knowing the compiler’s version is not much use. Instead, we need to know how our source code is being compiled, at compile time. We can test against the __STDC_VERSION__ preprocessor token, and then (possibly) compile different code/functions in our program:
#include
int main(int argc, char *argv[]) {
#if __STDC_VERSION__ >= 201710L
printf(“hello from C18!\n”);
#elif __STDC_VERSION__ >= 201112L
printf(“hello from C11!\n”);
#error This program demands features from C11 or later
#elif __STDC_VERSION__ >= 199901L
printf(“hello from C99!\n”); #else
https://teaching.csse.uwa.edu.au/units/CITS2002/lectures/lecture21/singlepage.html 2/7
2021/10/29 ÏÂÎç6:23 CITS2002 Systems Programming, Lecture 21,
printf(“hello from (ANSI-C) C89/C90!\n”); */
return 0; }
This assists our goal of portable programming by ensuring that a program’s required features are supported by the local compiler, and its default compilation arguments.
CITS2002 Systems Programming, Lecture 21, p3, 12th October 2021.
Detecting the target operating system platform
Similarly, at compile-time we can determine the operating system platform for which we’re compiling (note, if we’re cross-compiling, this will not be our native platform).
Based on this information we can conditionally report an inability to support specific platforms, or can include our own implementation of functions not otherwise available.
#ifdef _WIN64
//define something for Windows (64-bit)
#elif _WIN32
//define something for Windows (32-bit)
#elif __APPLE__
#include “TargetConditionals.h”
#if TARGET_OS_IPHONE && TARGET_IPHONE_SIMULATOR
// define something for simulator
#elif TARGET_OS_IPHONE
// define something for iphone
#define TARGET_OS_OSX 1
// define something for OSX
#endif #elif __linux
#elif __unix // all Unix-derived systems not detected above
#elif __posix
#error unrecognized operating system platform
CITS2002 Systems Programming, Lecture 21, p4, 12th October 2021.
Pre-defined preprocessor tokens
The recent examples enabling detection of C langugae standard and operating system platform, are a small, but important sample of the information available when compiling programs.
We can see the pre-processor’s pre-defined tokens with:
prompt> cc -dM -E – < /dev/null
https://teaching.csse.uwa.edu.au/units/CITS2002/lectures/lecture21/singlepage.html 3/7
2021/10/29 ÏÂÎç6:23 CITS2002 Systems Programming, Lecture 21,
Some of the following examples (not specifically related to portability) taken from: gcc's Standard
Predefined Macros
The standard predefined macros are specified by the relevant language standards, so they are available with all compilers that implement those standards.
This macro expands to the name of the current input file, in the form of a C string constant.
This macro expands to the current input line number, in the form of a decimal integer constant. __FILE__ and __LINE__ are useful in generating an error message to report an inconsistency detected
by the program. C99 also introduced __func__, and GCC has provided __FUNCTION__ for a long time. __STDC__
In normal operation, this macro expands to the constant 1, to signify that this compiler conforms to ISO Standard C.
__STDC_VERSION__
This macro expands to the C Standard¡¯s version number, a long integer constant of the form yyyymmL where yyyy and mm are the year and month of the Standard version.
__STDC_HOSTED__
This macro is defined, with value 1, if the compiler¡¯s target is a hosted environment. A hosted environment has the complete facilities of the standard C library available.
CITS2002 Systems Programming, Lecture 21, p5, 12th October 2021.
Employing the correct sized integers for portability
In most of our C programming (laboratories and projects) we have employed the standard int datatype whenever we have simply wished to count something, or to loop a small number of times.
We have not cared (probably not even thought) whether the host architecture supported integers of length 16-, 32-, or 64-bits, but have been confident (on laptops and desktops) that inetegrs were at least 32-bits long; meeting our typical requirements.
For different applications, the actual storage size of an integer may be significant, and a portable program should enforce its requirements. For example, if we required an array to store temperature samples on, say, an Internet-of-Things (IoT) device, then an 8-bit integer may be sufficient, or necessary if we required a million of them.
C99 introduced the standard header file
typedef signed char typedef short int typedef int
# if __WORDSIZE == 64 typedef long int
int8_t; int16_t;
int32_t; int64_t;
https://teaching.csse.uwa.edu.au/units/CITS2002/lectures/lecture21/singlepage.html 4/7
2021/10/29 ÏÂÎç6:23 CITS2002 Systems Programming, Lecture 21,
typedef long long int int64_t;
/* Minimum of signed integral types. */
# define INT8_MIN
# define INT16_MIN
# define INT32_MIN
# define INT64_MIN
/* Maximum of signed integral types. */
# define INT8_MAX # define INT16_MAX # define INT32_MAX # define INT64_MAX
(2147483647) (__INT64_C(9223372036854775807))
(-32767-1)
(-2147483647-1) (-__INT64_C(9223372036854775807)-1)
Similar support is provided for unsigned integers, and float-point numbers of different lengths (32-, 64-, 128-bits).
Employing the correct form of these datatypes is critcal in many application domains demanding portable software – including networking protocols, cryptography, and image processing.
CITS2002 Systems Programming, Lecture 21, p6, 12th October 2021.
Employing the correct sized integers for portability, continued
While the C99 and C11
When using standard C functions like printf() and sscanf(), we can employ C’s ability for the compiler (i.e. at compile-time, not run-time) to concatenate string constants. Within the
#include
int64_t nbytes;
printf(“%” PRIi64 “MB\n”, n / (1 << 20) );
Similar support exists within the C99 and C11 standards for varying sized pointers (typically 32- or 64-bits), the ability to perform I/O on their character (string) representations, and to select the appropriate sized integer so that it may hold a pointer value.
CITS2002 Systems Programming, Lecture 21, p7, 12th October 2021.
Portable programs are 'team-players'
Simply porting a program to a different computing environment does not guarantee that the program will be able to operate successfully, or be accepted by users, in the new environment.
https://teaching.csse.uwa.edu.au/units/CITS2002/lectures/lecture21/singlepage.html 5/7
2021/10/29 ÏÂÎç6:23 CITS2002 Systems Programming, Lecture 21,
Systems-focused programs also need to 'fit in' with the new computing environment, to both interoperate with existing utilities, and also contribute something new.
This requires programs to make use of existing operating system supported runtime features and interfaces in a consistent manner. This makes it easier for users to quickly understand and benefit from the newly ported program.
An excellent introduction to this topic is The Art of Unix Programming, by Raymond, 2003:
Chapters 1 and 5 are the most relevant to the material discussed in this lecture.
In addition, some other good Chapters/Sections that are not too long or dry (in order of relevance), are:
Chpt 10 - Configuration: What Should be Configurable?; Environment Variables; Command-Line Options
Chpt 11 - Unix Interface Design Patterns: The Filter Pattern -> The ed Pattern
Chap 19 – Open Source
Chpt 16 – Reuse
CITS2002 Systems Programming, Lecture 21, p8, 12th October 2021.
A example of ‘team-players’ – filters
One of the most successful ideas introduced in early Unix systems was the interprocess communication mechanism termed a pipe. Pipes enable shells (or other programs) to connect the output of one program to the input of another, and for arbitrary sequences of pipes – a pipeline – to filter a data-stream with a number of transformations.
A great pipeline example, providing a rudimentary spell-checker:
prompt> tr -cs ‘A-Za-z’ ‘\n’ < inputfilename | sort -u | comm -23 - /usr/share/dict/words
Programs typically used in pipelines are termed filters, and they work in combination because of their simple communication schemes which do not add 'unexpected detail' to their output, so that programs reading that output as their input only have the expected data-stream to process.
It's for this reason that programs don't produce verbose natural-language descriptions of their output, no headings for tables of data, unless a specific command-line option requests it. Just the facts.
CITS2002 Systems Programming, Lecture 21, p9, 12th October 2021.
Unicode support in C11
One of the long-overdue features added to the C11 standard is support for Unicode character sets, through UTF-8, UTF-16, and UTF-32 encodings.
C was missing this feature for a long time, and C programmers had to use third-party libraries such as IBM's International Components for Unicode (ICU).
Before C11, we only had char and unsigned char types, 8-bit integer variables used to store ASCII and Extended ASCII characters. By creating arrays of these ASCII characters, we could create ASCII strings.
https://teaching.csse.uwa.edu.au/units/CITS2002/lectures/lecture21/singlepage.html 6/7
2021/10/29 ÏÂÎç6:23 CITS2002 Systems Programming, Lecture 21,
Portable programs should not be limited to communicating only in English, or ISO-Latin languages. There are thousands of other natural languages, employing character sets other than the English alphabet. Portable program should support these without requiring a different program, or source- code base, for each language.
ASCII and Extended-ASCII - 8-bit character sets
The ASCII standard has 128 characters each stored in 7 bits. Extended-ASCII adds another 128 characters to total 256 characters; an 8-bit or one-byte variable is sufficient. See man ascii.
Support for ASCII characters and strings is fundamental, and will never be removed from C. C11 adds support for new character sets and, therefore, new strings require a different number of bytes, not just one byte, for each character.
Suddenly, characters may be of different lengths (1-, 2- or 4-bytes long), and it's the value of the character that determines its length. Consider how this would affect an inplementation of, say, the standard C11 strlen() function, which just counts the bytes found until the NULL-byte!
CITS2002 Systems Programming, Lecture 21, p10, 12th October 2021.
Unicode support in C11, continued
The Unicode standard introduced mechanisms supporting more than one byte to encode all characters in ASCII, Extended-ASCII, and 'wide' characters in thousands of different natural languages. These methods are termed encodings.
Unicode defines 3 well-known encodings: UTF-8, UTF-16, and UTF-32:
UTF-8 uses the first byte for storing the first half of ASCII characters, and following next bytes, usually up 4, for the other half of ASCII characters together with all other wide characters. Hence, UTF-8 is considered as a variable-sized encoding.
Like UTF-8, UTF-16 uses one or two words (each word occupying 16 bits) for storing all characters - In both UTF-8 and UTF-16, a smaller number of bytes are used for more frequent characters. Most of the characters require up to two bytes. Hence it is also a variable-sized encoding.
UTF-32 uses exactly 4 bytes for storing the values of all characters; therefore, it is a fixed- sized encoding. UTF-32 uses a fixed number of bytes (4) even for ASCII characters, but does restore our idea of 'counting' charcaters, and enables individual characters to act as array indicies..
Note that C11 does not define new standard functions to operate on Unicode strings, therefore we have to write a new strlen() function for them.
However, many Unicode conversion functions are defined in the new
An excellent introduction to Unicode – unicodebook.readthedocs.io/unicode_encodings.html some thoughts on their support in C11: Unicode operators for C,
and some example code: Unicode in C11
CITS2002 Systems Programming, Lecture 21, p11, 12th October 2021.
https://teaching.csse.uwa.edu.au/units/CITS2002/lectures/lecture21/singlepage.html 7/7