Microsoft PowerPoint – 2_Unix_Linux_Intro_to_C
O
SU
C
SE
2
42
1
J.E.Jones
CSE 2421
O
SU
C
SE
2
42
1
J.E.Jones
Developed from 1969-1971 at AT&T Bell Laboratories (Ken
Thompson/Dennis Ritchie/Brian Kernighan/Douglas
McIIroy/Joe Ossanna)
Written largely in C (some assembly language code as well)
C was originally developed as a programming language to
write the Unix OS, which was a multi-user, multi-tasking OS.
Proprietary (requires a license for use)
AT&T sold Unix to Novell in the early 1990s, which then sold
its Unix business to the Santa Cruz Operation (SCO) in 1995
UNIX trademark passed to the industry standards consortium
The Open Group, which allows the use of the mark for certified
operating systems compliant with the Single UNIX
Specification (SUS). Among these is Apple’s MacOS which is
the Unix version with the largest installed base as of 2014.
O
SU
C
SE
2
42
1
J.E.Jones
“Modular design”: The OS provides a collection of
simple tools that each implement a limited, well-
defined function. (This was very forward thinking at
the time.)
More complex functionality is provided by
combining the simple tools.
A unified file system is the main means of
communication; for example, devices (e.g., disk
drives, keyboards) are treated as files.
First “portable” operating system
O
SU
C
SE
2
42
1
J.E.Jones
Developed in 1991 by Linus Torvalds, a graduate student in
Computer Science at the University of Helsinki.
Linux is a Unix “clone,” we can say, because it does not use Unix
code (which is proprietary, and therefore could not be freely
distributed), but it provides the same functionality and features as
Unix generally, and follows the Unix OS philosophy.
Open source, so it is available in versions without cost (There are
also versions which are licensed for a fee, often with support, as well
as the OS itself)
Various “distributions,” which are all broadly similar, but also
exhibit various differences (If you would like to get a Linux
distribution, ask me, and I’ll be glad to point you to some!).
It is worth your time and effort to learn as much as you can about
Linux, because you are likely to encounter Linux/Unix in the work
world.
O
SU
C
SE
2
42
1
J.E.Jones
“Unix is used by 75.0% (up from 72.4 % last January) of
all the websites whose operating system we know..”
Linux is used by 57.4% (up from 38.7 % last January) of
all the websites whose operating system we know. (This
57.4% is a part of the 75.0% aggregate above.)
http://w3techs.com/technologies/details/os-unix/all/all
Linux also runs on 96-98% of the top 500 supercomputers!
Has anyone used a Raspberry Pi??? Then you have used Linux
O
SU
C
SE
2
42
1
J.E.Jones
• C is procedural, not object-oriented
• C is fully compiled (to machine code), not to byte-code
• C allows direct manipulation of memory (via pointers)
• C does not have garbage collection; the software writer
must do explicit memory management when it is required,
and failure to do so can result in significant problems (e.g.,
memory leaks, where the memory used by the program
may grow potentially without bound)
• Many of the basic language constructs in C act similarly to
the way they work in Java; nonetheless, there are
sometimes important differences which need to be
understood
• C has many nuanced, yet important details
O
SU
C
SE
2
42
1
J.E.Jones
• C does not support the notion of Classes or Objects
• We will talk about classes in C, but it has no relationship to
classes in C++
• C does not itself support Encapsulation, as all memory is
technically accessible and modifiable by any instruction of an
executable
• C does not itself support polymorphism, virtual function,
inheritance, operator overloading, namespace concepts
• C is not an object-oriented language, it is procedural.
• This means development is usually top-down rather than bottom-
up
• C breaks down to functions while Java breaks down to objects
• ANSI C does not have a Boolean type
O
SU
C
SE
2
42
1
J.E.Jones
In this class, we will learn ANSI C, which was originally
standardized in 1989 by the American National Standards Institute.
This version of C is sometimes referred to as “C89” (also
sometimes referred to as “Standard C”). The ISO, or International
Standards Organization, also adopted an equivalent version of C in
1990, which is often referred to as C90. Therefore, C89 and C90
are, in effect, equivalent.
There are later versions of C, for example, C99, which differ from
ANSI C; for example, they may support features that ANSI C does
not support (such as just-in-time declaration, for example).
O
SU
C
SE
2
42
1
J.E.Jones
There are a few words that you will see in different contexts as we
go through this semester
The words Dynamic and Static are two of the worst offenders.
To determine context, check out the word these two adjectives
describe:
◦ Dynamic memory
◦ Dynamic memory allocation
◦ Dynamic error
◦ Dynamic linking
◦ Static error
◦ Static memory
◦ Static memory allocation
◦ Static identifier class
◦ Static linking
I can’t change these things. I can’t tell you why it is this way. We
all must live with it – even though it can get confusing. So be
aware!
O
SU
C
SE
2
42
1
J.E.Jones
In computer science, we distinguish events that can occur when a program is being
compiled or built, called compile time, from events which can occur when the
program is being executed, or running, called run time.
Certain kinds of errors can be identified (or occur) only at compile time, and while
others only at run time.
Errors which occur at compile/build time: syntax errors/static errors [these make a
program invalid, i.e., in the case of C, not a valid C program], or possibly linkage
errors – more on this in a few weeks. In this case, the compiler may or may not
generate an executable depending upon the severity of the error.
Compile/build time events are often referred to as static and run time events as
dynamic.
Errors which occur at run time are semantic errors/dynamic errors [these do not
make a program invalid, because only a valid program can be built and then
executed]. An example of such an error would be division by zero or some other
logic error. The compiler can not discover such an error, but when the program
runs, an exception will sometimes be generated (e.g., segmentation fault), and the
operating system will terminate the program.
Here’s the 1st occurrence!
O
SU
C
SE
2
42
1
J.E.Jones
Four General Categories of Statements in Computer Languages
◦ Declarations (optional in some languages like Python)
◦ Data Movement
• Memory to function variables
• Function variables to function variables
• Function variables to memory
◦ Arithmetic/Logical Operations
• Compare something
• Calculate something
◦ Control-Flow
• Procedure/function calls
• Looping
• Conditionals
O
SU
C
SE
2
42
1
J.E.Jones
/* Version 1 */
#include
int main (int argc, char **argv) {
printf(“Hello, ”);
printf(“World!\n”);
return(0);
}
/* Version 2 */
#include
int main (void) {
printf(“Hello, World!\n”);
return (0);
}
/* Version 3 */
#include
#include
int main () {
printf(“Hello, World!\n”);
return (EXIT_SUCCESS);
}
O
SU
C
SE
2
42
1
J.E.Jones
% gcc –o hello hello.c
Source code
hello.i
Assembly Code (hello.s)
Libraries
(e.g., printf() code) Object Code (hello.o)
Executable Code
% hello [executes the program]
Preprocessor
Compiler
Assembler
Link Editor/
Linker
hello.c
hello
Type in program source code (file.c) using
an editor of your choice; plain text
.c + .h = .i which is the ultimate source
code – i.e., #includes expanded and
#defines replaced, comments removed
C syntax parser
.i → .s which is assembler source code
Assembler code parser
.s → .o which is an object file; fragments of
machine code with unresolved symbols, i.e.,
some addresses not yet known (vars/functions).
.o + library links a.out (default name);
resolves symbols, generates an executable
O
SU
C
SE
2
42
1
J.E.Jones
Notice the lines at the top which begin with the #
character. These are known as preprocessor directives
and are used by a translation program called the
preprocessor, which is the first program called when
you build C source code (in our case, with gcc).
Required Reading: Pointers on C, Chapter 14, The
Preprocessor
◦ You can omit section 14.5
◦ The engineering library has a copy if you didn’t purchase the
book.
◦ 20 pages, but the print is large. By far, the largest reading
assignment all semester, but worth it.
◦ You must be able to answer questions on the content by next
week.
O
SU
C
SE
2
42
1
J.E.Jones
Prepares a .c file for the compiler
Input is from sourcecode.c and output goes to
sourcecode.i
Strips all comments from sourcecode.c
Using any #include file preprocessor directives in the
source code file, copy the entire contents of the file in
to sourcecode.i
Replace any MACROs (Chapter 14 will tell you what this is)
O
SU
C
SE
2
42
1
J.E.Jones
One of the first things the preprocessor does is find, in the
area on disk where library files are kept, the file with the
name stdio.h or stdlib.h and copy the contents of that file
into the source file. The text in the preprocessor directive
(for example “#include
source file by the preprocessor also.
The header file may contain various things, but one thing it
usually contains is one or more function prototypes
(function declarations – see below), which tell the
compiler information about a function (return type and
parameters) that it needs to be able to do error checking
while compiling the source code.
The header file does not contain the code for any of the
functions…only the prototype.
O
SU
C
SE
2
42
1
J.E.Jones
The file accessed when #include
is /usr/include/stdio.h, similarly #include
references the file /usr/include/stdlib.h.
The preprocessor also removes comments from the
source file.
Comments must be enclosed between /* and */
◦ [Note: no single line comments with // are permitted in ANSI C].
◦ Also, the preprocessor for ANSI C is not written to find nested
comments, so they are disallowed. /* comment1 */ /*comment2*/
When you write comments in code on
exams/homework/etc, correct format is expected.
O
SU
C
SE
2
42
1
J.E.Jones
Another thing the preprocessor does is to replace macros,
which are fragments of code, defined in the source file, or in a
header file, with the code they are defined to represent
◦ Keep in mind that code in a source file is just text, so macros are just
chunks of text.
These macros, or code fragments, are defined with the define
preprocessor directive, as follows:
#define string1 string2
◦ string1 cannot contain any white space characters (spaces, tabs, new
lines), but string2 can.
An example of this is in Version 3 of the Hello program,
where you see EXIT_SUCCESS in the return statement at the
end of the function. This macro is defined in the stdlib.h file,
and that is why it has been included, using a preprocessor
directive, in this version of the program.
O
SU
C
SE
2
42
1
J.E.Jones
#include
#include “local.h”
/* comment */
main(){
.
.
return(EXIT_SUCCESS);
}
.c file
Contents of stdio.h
Contents of local.h
main(){
.
.
return(0);
}
.i file
O
SU
C
SE
2
42
1
J.E.Jones
In C, as in Java, statements must generally end with a
semi-colon.
Remember, though, that preprocessor directives are
NOT statements, and are not terminated with a semi-
colon!
C is also case sensitive, so if we have two variables,
named num and Num in a C program, the compiler will
treat them as two distinct variables.
O
SU
C
SE
2
42
1
J.E.Jones
C programs consist of zero or more:
◦ preprocessor directives,
◦ declarations and
◦ definitions (see explanation on a future slide) of:
One or more functions (which may contain variable
declarations or definitions), and
Zero or more variables declared or defined outside of any
function
O
SU
C
SE
2
42
1
J.E.Jones
Notice that all three versions of the Hello program have a
main() function. Every C program must have exactly one
main function, and program execution always begins in this
function. main() does not necessarily have to be the first
function defined in the program file.
Also, notice that, although Java also has a main method, it is
in a class, but C has no classes!
In C, technically, we have no methods, only functions. Please
pay attention to this distinction. It makes you look bad to
other computer science professionals to talk about a “method”
in a C program (remember the correct programming model
and terminology for the language being discussed)! One of
the fastest ways to lose the respect of your work-life peers is
to use the wrong terminology. It tells them you don’t know
what you are doing.
O
SU
C
SE
2
42
1
J.E.Jones
A number of statements grouped into a single logical unit is called a
function.
REMEMBER: It is required to have a single function ‘main’ in
every C program
A function prototype is a function declaration or definition which
includes:
◦ Information about the number of arguments
◦ Information about the types of arguments
◦ What type of value the function returns
◦ NO CODE!
◦ Example: float add_floats(float a, float b);
note that there is a ‘;’ at the end rather than {}
Although you are allowed not to specify any information about a
function’s arguments in a declaration, it is purely because of
backwards compatibility with Old C and should be avoided (poor
coding style).
◦ A declaration without any information about the arguments is not a
prototype.
O
SU
C
SE
2
42
1
J.E.Jones
C passes arguments to functions by value
◦ For example, int mult_values (int a, int b);
◦ This ensures that the values passed to the function can not be
changed in the calling function by the called function.
But what if that’s what we would like to have happen?
◦ That’s when pointers get involved and we use what is called
pass by reference. This is somewhat of a misnomer since we
are still passing a value to the function; it’s just that the value
we’re passing to the function is an address to some other
variable.
◦ We’ll look at this when we look at pointers in a week or so.
O
SU
C
SE
2
42
1
J.E.Jones
Declaration (type information only, no value(s)):
◦ Variable
it tells the compiler the type of a variable, but not its value
can only be declared once in a given block in a C program;
◦ Function
it tells the compiler the return type, and the number and types of its parameters
(parameter names are optional), this is called a prototype.
can be declared multiple times in a C program, if all the declarations are consistent
(that is, identical with respect to types/# parameters).
Definition (type information and value):
◦ Variable
it tells the compiler the type of the variable and the initial value.
◦ Function
it tells the compiler the return type, parameter types, parameter names and the code
that should be executed (i.e., the statements) when the function is called.
◦ A given variable or function can only be defined once in a C program.
◦ Note that a definition is also a declaration, since it contains type information.
O
SU
C
SE
2
42
1
J.E.Jones
In C:
◦ A variable must be declared (but not necessarily defined)
before it can be referenced in a non-declarative statement.
◦ int a; int a=7;
◦ a=7;
◦ A function must be declared (but not necessarily defined)
before it can be referenced in a non-declarative statement
(that is, before it can be called or invoked) . We declare a
function with a prototype statement.
◦ functionA(int a); functionA(int a){return(a+1);}
O
SU
C
SE
2
42
1
J.E.Jones
int funcA(int a);
main(){
int c=3;
c=funcA(c);
}
int funcA(int a){
return(a+1);
}
int funcA(int a){
return(a+1);
}
main(){
int c=3;
c=funcA(c);
}
Prototype (i.e., declared)
Function (i.e., defined – because it contains code)
Note that 2nd alternative doesn’t need a prototype because of location of the definition
O
SU
C
SE
2
42
1
J.E.Jones
Functions consist of one or more blocks (blocks can legally be empty).
A block in ANSI C has this form:
{ /* left curly brace */
Zero or more variable declarations/definitions
Zero or more non-declarative statements (i.e., statements that DO something)
} /* right curly brace */
IMPORTANT: In ANSI C, all variable declarations in the block must precede the first non-
declarative statement (i.e., no just-in-time declaration in ANSI C).
Yes, I know just-in-time declarations are more efficient.
Nested blocks are valid in C, but nested functions are not valid (that is, the compiler will
generate errors, and will not produce an executable, if your source code contains one or more
nested functions).
◦ If you tried to “nest” functions, then you would be defining one function within a block within another
function.
◦ Other languages (e.g., ALGOL, MATLAB, C# (after 7.0), Scala) do support nested functions
In C, all functions have file scope. This means that any function declared in a file can be called
from anywhere in the same file, after the point at which the function is declared.
In C, variables can have either file scope or block scope depending upon where they are
declared. Block scope means that the variable is only accessible by the variable name within the
block where it was declared.
◦ The caveat “by the variable name” is important. We will revisit this when we cover pointers.
O
SU
C
SE
2
42
1
J.E.Jones
#include
#include
int function_level=0;
int functionA(int a, float b);
int functionB(int a, float b);
main(){
.
.
function_level = ‘M’;
variable2 = functionA(variable1, float_val1);
variable3 = functionB(variable1, float_val1);
}
float functionC(int a, float b);
float new_float;
int functionA(int a, float b){
/* code within not germane*/
}
int functionB(int a, float b){
.
.
function_level=‘B’;
new_float=functionC(a,b);
}
int functionC(int a, float b){
/* code within not germane */
}
Function prototypes for functionA() and functionB(). They can be used anywhere within this
file from this point down.
Function prototype for functionC(). It can be used anywhere within this file from this point
down. (i.e., main() can’t call functionC(), but functionA() or functionB() or functionC()
could call it.
The variable function_level can be used anywhere within this file from this point down. Its
initial value is 0.
The variable new_float can be used anywhere within this file from this point
down. (i.e., main() can’t reference new_float, but functionA() or functionB()
or functionC() can.
O
SU
C
SE
2
42
1
J.E.Jones
#include
#include
int main() {
int total = 0; /* variable declaration and definition */
int i; /* variable declaration */
int values[4] = {12, 14, 18, 20}; /* array variable
declaration and definition */
for (i = 0; i < 4; i++) { /* nested block */
total = total + values[i];
}
printf(“The total is: %i\n”, total);
return (EXIT_SUCCESS);
}
Only file scope identifier here is main()
O
SU
C
SE
2
42
1
J.E.Jones
Required Reading:
Computer Systems: A Programmer’s Perspective, 3rd Edition,
Chapter 1 thru Section 1.3
Pointers on C,
Chapter 5 thru Section 5.1.3, 5.3 through the end of the chapter
O
SU
C
SE
2
42
1
J.E.Jones
• Integer Types
• char – smallest addressable unit, *always* 8 bits (1 byte); each byte has its own
address. This data type IS NOT just ASCII characters.
• short – not used as much; typically, is 16 bits (2 bytes)
• int – default type for an integer constant value; typically, is 32 bits (4 bytes)
• long – do you really need it?; typically, is 64 bits (8 bytes)
• long long – at least 64 bits, sometimes 128 bits, (only supported in C99 and after)
• Floating Point Types – (these are usually “inexact”, we’ll see why later)
• float – single precision (about 6 decimal digits of precision), (4 bytes)
• double – double precision (about 15 decimal digits of precision) (8 bytes)
• long double – about 30 decimal digits of precision (only C99 and after) (16 bytes)
• double is constant default value; for 4 bytes values use ‘f’ suffix
• Note that variables of type char are guaranteed to always be one byte. All others can
differ depending upon the processor being used.
• There is no fixed or maximum size for a type in C (except for char; otherwise, size depends on
implementation), but the following relationships must hold:
• sizeof(char) <= sizeof(short) <= sizeof(int) <= sizeof(long)<=sizeof(long long)
• sizeof(float) <= sizeof(double) <= sizeof(long double)
• True size of data types is dependent upon the size of the processor being used.
O
SU
C
SE
2
42
1
J.E.Jones
• Beside the basic types, there is a conceptually infinite
class of derived types constructed from the
fundamental types in the following ways:
• arrays of objects (variables or derived types) of a given type;
• pointers to objects of a given type;
• structures containing a sequence of objects (variables or derived types)
of various types;
• unions capable of containing any of one of several objects of various
types.
• In general, these methods of constructing data types
(variables or derived types) can be applied recursively
• An array of pointers to some type
• An array of characters (i.e., a string)
• Structures that contain pointers
• And so on.
O
SU
C
SE
2
42
1
J.E.Jones
• Special characters
• Not convenient to type on a keyboard
• Use single quotes (e.g., ‘\n’)
• Looks like two characters but is really only one
\a alert (bell) character \\ backslash
\b backspace \? question mark
\f formfeed \’ single quote
\n newline \" double quote
\r carriage return \ooo octal number
\t horizontal tab \xhh hexadecimal number
\v vertical tab
char ‘A’, ‘b’, 0x42 (hexadecimal), 127, -7
unsigned char ‘©’, 255u, 127
int 123, -1, 2147483647, 040 (octal), 0xab (hexadecimal)
unsigned int 123u, 2107433648, 040U (octal), 0xc2 (hexadecimal)
long 123L, 0x1FFFl (hexadecimal)
unsigned long 123ul, 0777UL (octal)
float 1.23F, 3.14e+0f
double 1.23, 2.718281828
long double 1.23L, 9.99E-9L
O
SU
C
SE
2
42
1
J.E.Jones
• 2 ways (both legal)
• Put the const keyword after the type keyword, or before the type keyword
• Note: The compiler treats these as variables to which any assignment is invalid.
• This means the declared const must be initialized with its (constant) value as part of
the declaration, because the compiler will not allow statements which make
assignments to it later! Treated as a read-only variable.
• For program readability, pick one of the two ways and use it exclusively. Be
consistent!
• Examples:
float const PI = 3.141593f; (f is used after value because double is default)
const float PI = 3.141593f;
• Convention is to use uppercase for declared constants, also for those with #define (see
below).
• Symbolic constants (with the #define directive - below) can be used anywhere a literal
constant can be used, but constants defined with the const keyword can only be used where
variables can be used. More on this later (with examples).
• We will say more about constants as function parameters, pointers to constants, and
constant pointers later.
O
SU
C
SE
2
42
1
J.E.Jones
• A name that substitutes for a value that cannot be changed
• Can be used to define:
• Constant
• Statement
• Mathematical expression
• Uses a preprocessor directive: #define
•
•
• for example, (3.1415927 * r * r)
• REMINDER: No semi-colon is used for preprocessor directives.
• Coding convention is to use all capital letters for the name:
• #define AREA(r) (3.141593 * r * r)
• #define AREA (PI*r*r)
• What might happen if parentheses aren’t included in these define statements?
• What if these statements used addition/subtraction rather than multiplication?
• Can be used any place you would use the actual value
• All occurrences are replaced by the preprocessor before the program is compiled by the
compiler.
• Examples:
• The use of EXIT_SUCCESS in hello.c code
• #define PI 3.141593
• #define TRUE 1
O
SU
C
SE
2
42
1
J.E.Jones
• Purpose: define a variable (can also be a constant) before it is used.
• Format: type identifier (, identifier) ; Note: the parentheses here indicate
any number of identifiers, each preceded by a comma
• Initial value: can be assigned, but is not required (unless it is a constant)
• int i, j = 5, k;
• char code, category;
• int i = 123;
• const float PI = 3.1415926535f;
• double const PI = 3.1415926535;
• Type conversion: aka, type casting
• Directing the compiler to use a variable as a different type than the one
used in the declaration.
• Casting “larger” types to “smaller” types is dangerous (truncation
occurs) and should be done with extreme caution!!!
• To cast a variable to a different type explicitly, use: (type) identifier
• int i = 65;
• char ch; /* range -128 to 127 */
• ch = (char) i; /* What is the value of ch? */
• What happens if we change the initial value of i to 165?
O
SU
C
SE
2
42
1
J.E.Jones
• Identifier Naming Rules: names for variables, constants, types and functions.
• Can use a-z, A-Z, 0-9, and _ (i.e., alphanumeric, digits, and underscore)
• No other characters can be used
• Case sensitive
• The first character must be a letter or _ (Usually don’t use _ , though,
because it is used for operating system purposes).
• Keywords are reserved words, and may not be used as identifiers
• (See the following slide for C keywords)
• No guarantee that any value past the 31st character will be recognized. (i.e.,
will let you use more characters, but no guarantee that it will parse it.)
• Identifier Naming Style (the grader will enforce these)
• Separate words with ‘_’ (this is the original style in C) OR capitalize the first
character of each word after the first (e.g., char_count or charCount)
• Use all UPPERCASE for symbolic constants or macro (code chunk)
definitions.
• Be consistent. Be consistent. Be consistent.
• Be meaningful: Write “self-documenting code”; i.e., identifiers should give a
clear idea of what a variable, constant, type or function is being used for.
• Sample Identifiers
• i0, j1, student_name, studentName, student_score, studentScore…
O
SU
C
SE
2
42
1
J.E.Jones
1 2 3 4 5 6 7 8 9 1
0
1
1
1
2
1
3
1
4
1
5
1
6
1
7
1
8
1
9
2
0
2
1
2
2
2
3
2
4
2
5
2
6
2
7
2
8
2
9
3
0
3
1
3
2
U S _ S o c i a l _ S e c u r i t y _ I d e n t i f i c a t i \
0
char US_Social_Security_Identification_LastName[50];
char US_Social_Security_Identification_FirstName[50];
Symbol table for identifier names might only be 32 long
(strings must be NULL terminated in C)
Declares 2 50-character arrays
O
SU
C
SE
2
42
1
J.E.Jones
• Purpose: reserves a word or identifier to have a particular meaning
• The meanings of keywords — and, indeed, the meaning of the notion of
keyword — differs widely from language to language.
• You shouldn’t use them for any other purpose in a C program. They are
allowed, of course, within double quotation marks (as part of a string to
be assigned or printed, for example; this is not using an identifier,
actually).
This chart will be supplied to you as a reference for the midterm exam