orced version 1.0, by vivisimo, and posted in http://library.nu
last modified: 2011.06.05
1 A Quick Start
1.1 Introduction
It is always difficult to start describing a programming language because little details do not make much sense until one knows enough to understand the big picture. In this chapter, I try to give you a glimpse of the big picture by looking at a sample program and explaining its workings line by line. This sample program also shows you how familiar procedures are accomplished in C. This information plus the other topics discussed in the chapter introduce you to the basics of the C language so that you can begin writing useful programs.
The program we dissect reads text from the standard input, modifies it, and writes it to the standard output. Program l.l first reads a list of column numbers. These numbers are pairs and indicate ranges of columns in the input line. The list is terminated with a negative number. The remaining input lines are read and printed, then the selected columns from the input lines are extracted and primed. Note that the first column in a line is number zero. For example, if the input is
4 9 12 20 -1
abcdefghijklmnopqrstuvwxyz
Hello there, how are you?
I am fine, thanks.
See you!
Bye
then the program would produce:
Original input : abcdefghijklmnopqxstuvwxyz
Rearranged line: efghijmnopqrstu
Download at http://www.pin5i.com/
2 Chapter 1 A Quick Start
Original input : Hello there, how are you?
Rearranged line: o ther how are
Original input : I am fine, thanks.
Rearranged line: fine, thanks.
Original input : See you!
Rearranged line: you!
Original input : Bye
Rearranged line:
The important point about this program is that it illustrates most of the basic techniques you need to know to begin writing C programs.
/*
** This program reads input lines from standard input and prints
** each input line, followed by just some portions of the line, to
** the standard output.
**
** The first input is a lint of column numbers, which ends with a
** negative number. The column numbers are paired and specify
** ranges of columns from the input line that are to be printed.
** For example, 0 3 10 12 -1 indicates that only columns 0 through 3
** and columns 10 through 12 will be printed.
*/
#include
#inc1ude
#include
#define MAX_COLS
#define MAX_INPUT
20 /* max # of columns to process */
1000 /* max len of input & output lines */
int read_column_numbers( int columns[], int max );
void rearrange( char *output, char const *input,
int n_columns, int const columns[] );
int
main( void )
{
int n_columns;
int columns[MAX_COLS];
char input[MAX_INPUT];
char output[MAX_INPUT];
/* # of columns to process */
/* the columns to process */
/*array for input line */
/*array for output line */
Program1.1 Rearrangecharacters continue
Download at http://www.pin5i.com/
1.1 Introduction 3
}
/*
** Read the list of column numbers
*/
n_columns = read_column_numbers( columns, MAX_COLS );
/*
** Read, process and print the remaining lines of input
*/
while( gets(input ) != NULL ){
printf( “Original input : %s\n”, input );
rearrange( output, input, n_columns, columns );
printf( “Rearranged line: %s\n”, output );
}
return EXIT_SUCCESS;
/*
** Read the list of column numbers, ignoring any beyond the specified
** maximum.
*/
int
read_column_numbers( int columns[], int max )
{
int num = 0;
int ch;
/*
** Get the numbers, stopping at eof or when a number is < 0.
*/
while( num < max && scanf( "%d", &columns[num] ) == 1
&&columns[num] >= 0 )
num +=1;
/*
** Make sure we have an even number of inputs, as they are
** supposed to be paired.
*/
if( num % 2 != 0 ){
puts( “Last column number is not paired.” );
exit( EXIT_FAILURE );
}
/*
** Discard the rest of the line that contained the final
Program1.1 Rearrangecharacters continue
4 Chapter 1 A Quick Start
Download at http://www.pin5i.com/
}
{
** number.
*/
while( (ch = getchar()) != EOF && ch != ‘\n’ )
; return num;
/*
** Process a line of input by concatenating the characters from
** the indicated columns. The output line is the NUL terminated,
*/
void
rearrange( char *output, char const *input,
in n_columns, int const columns[] )
int col;
int output_col;
int len;
len = strlen( input );
output_col = 0;
/* subscript for columns array */
/* output column counter */
/* length of input line */
/*
** Process each pair of column numbers.
*/
for( col = 0; col < n_columns; col += 2 ){
int nchars = columns[col + 1] – columns[col] + 1;
/*
** If the input line isn't this long or the output
** array is full, we're done
*/
if( columns[col] >= len ||
output_col == MAX_INPUT – 1 )
break;
/*
** If there isn’t room in the output array, only copy
** what will fit.
*/
if( output_col + nchars > MAX_INPUT – 1)
/*
nchars = MAX_INPUT – output_col – 1;
Program1.1 Rearrangecharacters continue
Download at http://www.pin5i.com/
Program1.1 Rearrangecharacters rearrang.c
1.1.1
Spacing and Comments
Now, let’s take a closer look at this program. The first point to notice is the spacing of the program: the blank lines that separate different parts from one another, the use of tabs to indent statements to display the program structure, and so forth. C is a free form language, so there are no rules as to how you must write statements. However, a little discipline when writing the program pays off later by making it easier to read and modify. More on this issue in a bit.
While it is important to display the structure of the program clearly, it is even more important to tell the reader what the program does and how it works. Comments fulfill this role
/*
** This program reads input lines from the standard input and prints
** each input line, followed by just: some portions of the lines, to
** the standard output .
**
** The first input; is a list of column numbers, which ends with a
** negative number . The column numbers are paired and specify
** ranges of columns from the input line that are to be printed.
** For example, 0 3 l0 12 —l indicates that only columns 0 through 3
** and columns 10 through 12 will be printed
*/
This block of text is a comment. Comments begin with the /* characters and end with the */ characters. They may appear anywhere in a C program in which white space may appear. However, comments cannot contain other comments, that is, the First */ terminates the comment no matter how many /* s have appeared earlier.
1.1 Introduction 5
}
** Copy the relevant data.
*/
strncpy( output + output_col, input + columns[col],
nchars );
output_col += nchars;
}
output[output_col] = ‘\0’;
Download at http://www.pin5i.com/
6 Chapter 1 A Quick Start
Comments are sometimes used in other languages to comment out code, thus removing the code from the program without physically deleting it from the source file. This practice is a bad idea in C, because it won’t work if the code you‘re trying to get rid of has any comments in it. A better way to logically delete code in a C program is the #if directive. When used like this:
#if 0
statements
#endif
the program statements between the #if and the #endif are effectively removed from the program. Comments contained in the statements have no effect on this construct, thus it is a much safer way to accomplish the objective. There is much more that you can do with this directive, which I explain fully in Chapter 14.
1.1.2 Preprocessor Directives
#include
#include
#include
#define MAX_COLS 20 /* max # of columns to process */
#define MAX_INPUT 1000 /* max len of input & output lines */
These five lines are called preprocessor directives, or just directives, because they are interpreted by the preprocessor The preprocessor reads the source code, modifies it as indicated by any preprocessor directives, and then passes the modified code to the compiler.
In our sample program, the preprocessor replaces the first #include statement with the contents of the library header named stdio.h; the result is the same as if the contents of stdio.h had been written verbatim at this point in the source file. The second and third directives do the same with stdlib.h and string.h.
The stdio.h header gives us access to functions from the Standard I/O Library, a collection of functions that perform input and output, stdlib.h defines the EXIT_SUCCESS and EXIT_FAILURE symbols. We need string.h to use the string manipulation functions.
This technique is also a handy way to manage your declarations if they are needed in several different source files—you write the declarations in a separate file and then use #include to read them into each relevant source tile. Thus there is only one copy of the declarations; they are not duplicated in many different places, which would be more error prone to maintain.
TIP
TIP
Download at http://www.pin5i.com/
The other directive is #define, which defines the name MAX_COLS to be the value 20, and MAX_INPUT to be the value 1000. Wherever either name appears later in the source tile, it is replaced by the appropriate value. Because they are defined as literal constants, these names cannot be used in some places where ordinary variables can be used (for example, on the left side of an assignment). Making their names uppercase serves as a reminder that they are not ordinary variables. #define directives are used for the same kinds of things as symbolic constants in other languages and for the same reasons. If we later decide that 20 columns are not enough, we can simply change the definition of MAX_COLS. There is no need to hunt through the program looking for 20’s to change and possibly missing one or changing a 20 that had nothing to do with the maximum number of columns.
int read_column_numbers( int columns[], int max );
void rearrange( char *output, char const *input,
int n_columns, int const columns[] );
These declarations are called function prototypes. They tell the compiler about the characteristics of functions that are defined later in the source tile. The compiler can then check calls to these functions for accuracy. Each prototype begins with a type name that describes the value that is returned. The type name is followed by the name of the function. The arguments expected by the function are next, so read_column_numbers returns an integer and takes two arguments, an array of integers and an integer scalar. The argument names are not required; I give them here to serve as a reminder of what each argument is supposed to be.
The rearrange function takes four arguments. The first and second are pointers. A pointer specifies where a value resides in the computer’s memory, much like a house number specifies where a particular family resides on a street. Pointers are what give the C language its power and are covered in great detail starting in Chapter 6. The second and fourth arguments are declared const, which means that the function promises not to modify the caller s arguments. The keyword void indicates that the function does not return any value at all; such a function would be called a procedure in other languages.
If the source code for this program was contained in several source tiles, function prototypes would have to be written in each tile using the function. Putting the prototypes in header files and using a #include to access them avoids the maintenance problem caused by having multiple copies of the same declarations.
1.1 Introduction 7
TIP
int n_columns;
int columns[MAX_COLS];
char input[MAX_INPUT];
char output[MAX_INPUT];
/* # of columns to process */
/* the columns to process */
/*array for input line */
/*array for output line */
Download at http://www.pin5i.com/
8 Chapter 1 A Quick Start 1.1.3 The Main Function
int
main( void )
{
These lines begin the definition of a function called main. Every C program must have a main function, because this is where execution begins. The keyword int indicates that the function returns an integer value; the keyword void indicates that it expects no arguments. The body of the function includes everything between this opening brace and its matching closing brace.
Observe how the indentation clearly shows what is included in the function.
These lines declare four variables: an integer scalar, an array of integers, and two arrays of characters. All four of these variables are local to the main function, so they cannot be accessed by name from any other functions. They can, of course, be passed as arguments to other functions.
/*
** Read the list of column numbers
*/
n_columns = read_column_numbers( columns, MAX_COLS );
This statement calls the function read_column_numbers. The array columns and the constant represented by MAX_COLS (20) are passed as arguments. In C, array arguments behave as though they are passed by reference, and scalar variables and constants are passed by value (like var parameters and value parameters, respectively, in Pascal or Modula). Thus, any changes made by a function to a scalar argument are lost when the function returns; the function cannot change the value of the calling program s argument in this manner. When a function changes the value of an element of an array argument, however, the array in the calling program is actually modified.
The rule about how parameters are passed to C functions actually states:
All arguments to functions are passed by value.
Nevertheless, an array name as an argument produces the call by reference behavior
}
1.1 Introduction 9 described above. The reason for this apparent contradiction between the rule and the
actual behavior is explained in Chapter 8.
/*
** Read, process and print the remaining lines of input
*/
while( gets(input ) != NULL ){
printf( “Original input : %s\n”, input );
rearrange( output, input, n_columns, columns );
printf( “Rearranged line: %s\n”, output );
}
return EXIT_SUCCESS;
The comment describing this piece of code might seem unnecessary. However, the major expense of software today is not writing it but maintaining it. The first problem in modifying a piece of code is figuring out what it does, so anything you can put in our code that makes it easier for someone ( perhaps you!) to understand it later is worth doing. Be sure to write accurate comment when you change the code. Inaccurate comments are worse than none at all!
This piece of code consists of a while loop. In C, while loops operate the same as they do in other languages. The expression is tested. If it is false, the body of the loop is skipped. If the expression is true, the body of the loop is executed and the whole process begins again.
This loop represents the main logic of the program. In brief, it means:
while we were able to read another line of input print the input
rearrange the input, storing it in output print the output
The gets function reads one line of text from the standard input and stores it in the array passed as an argument. A line is a sequence of characters terminated by a newline character; gets discards the newline and stores a NUL byte at the end of the line1. (A NUL byte is one whose bits are all 0, written as a character constant like this: ‘\0’.) gets then returns a value that is not NULL to indicate that a line was
Download at http://www.pin5i.com/
1 NUL is the name given in the ASCII character set to the character ‘\0’, whose bits are all zero. NULL refers to a pointer whose value is zero. Both are integers and have the same value, so they could be used interchangeably. However, it is worth using the appropriate constant because this tells a person reading the program not only that you are using the value zero, but what you are using it for.
Format
%d
%o
%x
%g
%c
%s
\n
Meaning
Print an integer value in decimal. Print an integer value in octal.
Print an integer value in hexadecimal. Print a floating point value.
Print a character.
Print a character string. Print a newline.
Download at http://www.pin5i.com/
10 Chapter 1 A Quick Start
successfully read2. When gets is called but there is no more input, it returns NULL to indicate that it has reached the end of the input (end of tile).
Dealing with character strings is a common task in C programs. Although there is no string data type, there is a convention for character strings that is observed throughout the language: a string is a sequence of characters terminated by a NUL byte. The NUL is considered a terminator and is not counted as a part of the string. A string literal is a sequence of characters enclosed in quotation marks in the source program3. For example, the string literal
“Hello”
occupies six bytes in memory, which contain (in order) H,e,l,l,o, and NUL.
The printf function performs formatted output. Modula and Pascal users will be delighted with the simplicity of formatted output in C. printf takes multiple arguments; the first is a character string that describes the format of the output, and the rest are the values to be printed. The format is often given as a string literal.
The format string contains format designators interspersed with ordinary characters. The ordinary characters are printed verbatim, but each format designator causes the next argument value to be printed using the indicated format. A few of the more useful format designators are given in Table l.l. If
Table 1.1 Common printf format codes
2 The symbol NULL is defined in the header stdio.h. On the other hand, there is no predefined symbol NUL, so if you wish to use it instead of the character constant ‘\0’ you must define it yourself
3 This symbol is a quotation mark: “, and this symbol is an apostrophe: ‘. The penchant of computer people to call them single quote and double quote when their existing names are perfectly good seems unnecessary, so I will use their everyday names.
Download at http://www.pin5i.com/
1.1 Introduction 11 thearrayinputcontainsthestringHi friends!,thenthestatement
printf( “Original input : %s\n”, input );
will produce
Original input : Hi friends!
terminated with a newline.
The next statement in the sample program calls the rearrange function. The last
three arguments are values that are passed to the function, and the first is the answer that the function will construct and pass back to the main function. Remember that it is only possible to pass the answer back through this argument because it is an array. The last call to printf displays the result of rearranging the line.
Finally, when the loop has completed, the main function returns the value EXIT_SUCCESS. This value indicates to the operating system that the program was successful. The closing brace marks the end of the body of the main function.
1.1.4 The read_column_numbers Function
/*
** Read the list of column numbers, ignoring any beyond the specified
** maximum.
*/
int
read_column_numbers( int columns[], int max )
{
These lines begin the definition of the read_column_numbers function. Note that this declaration and the function prototype that appeared earlier in the program match in the number and types of arguments and in the type returned by the function. It is an error if they disagree.
There is no indication of the array size in the array parameter declaration to the function. This format is correct, because the function will get whatever size array the calling program passed as an argument. This is a great feature, as it allows a single function to manipulate one—dimensiona1 arrays of any size. The down side of this feature is that there is no way for the function to determine the size of the array. If this information is needed, the value must be passed as a separate argument.
CAUTION!
When the read_column_numbers function is called, the name of one of the arguments that is passed happens to match the name of the formal parameter given above. However, the name of the other argument does not match its corresponding parameter. As in most other languages, the formal parameter name and the actual argument name have no relationship to one another; you can make them the same if you wish, but it is not required.
int num = 0;
int ch;
Two variables are declared; they will be local to this function. The first one is initialized to zero in the declaration, but the second one is not initialized. More precisely, its initial value will be some unpredictable value, which is probably garbage. The lack of an initial value is not a problem in this function because the first thing done with the variable is to assign it a value.
** Get the numbers, stopping at eof or when a number is < 0.
*/
while( num < max && scanf( "%d", &columns[num] ) == 1
&&columns[num] >= 0 )
num +=1;
This second loop reads in the column numbers. The scanf function reads characters from the standard input and converts them according to a format string— sort of the reverse of what printf does. scanf takes several arguments, the first of which is a format suing that describes the type of input that is expected. The remaining arguments are variables into which the input is stored. The value retuned by scanf is the number of values that were successfully converted and stored into the arguments.
You must be careful with this function for two reasons. First, because of the way scanf is implemented, all of its scalar arguments must have an ampersand in front of them. For reasons that I make clear in Chapter 8, array arguments do not require an ampersand4. However, if a subscript is used to identify a specific array element, then an ampersand is required. I explain the need for the ampersands on the scalar
12 Chapter 1 A Quick Start
Download at http://www.pin5i.com/
4 There is no harm in putting an ampersand in front of an array name here, however, so you may use one if you wish.
CAUTION!
1.1 Introduction 13 arguments in Chapter 15. For now, just be sure to put them in. because the program
will surely fail without them.
The second pitfall is the format codes, which are not identical to those in printf but similar enough to be confusing. Table 1.2 informally describes a few of the format designators that you may use with scanf. Note that the first five values, so the variable given as the argument must be preceded with an ampersand. With all of these format codes (except %c), white space (spaces, tabs, newlines, etc.) in the input is skipped the value is encountered, and subsequent white space terminates the value. Therefore, a character string read with %s cannot contain white space. There are many other format designators, but these will be enough for our current needs.
We can now explain the expression
scanf( “%d”, &columns[num] )
The format code %d indicates that an integer value is desired. Characters are read from the standard input, any leading white space found is skipped. Then digits are converted into an integer, and the result is stored in the specified array element. An ampersand is required in front of the argument because the subscript selects a single array element, which is a scalar.
The test in the while loop consists of three parts.
num < max
makes sure that we do not get too many numbers and overflow the array. scanf
retums the value one if it converted an integer. Finally,
columns[num] >= 0
checks that the value entered was positive. lf any of these tests are false. The loop stops.
Download at http://www.pin5i.com/
Format Meaning
%d Read an integer value.
%ld Read a long integer value.
%f Read a real value.
%lf Read a double precision real value.
%c Read a character.
%s Read a character string from the input.
Table 1.2 Common scanf format codes
Type of Variable
int
long
float
double
char
array of char
CAUTION!
Download at http://www.pin5i.com/
14 Chapter 1 A Quick Start
The Standard does not require that C compilers check the validity of array subscripts, and the vast majority of compilers don’t. Thus, if you need subscript validity checking, you must write it yourself. if the test for num < max were not here and the program read a file containing more than 20 column numbers, the excess values would be stored in the memory locations that follow the array, thus destroying whatever data was formerly in those locations, which might be other variables or the function s return address. There are other possibilities too, but the result is that the program will probably not perform as you had intended.
The && is the logical and operator. For this expression to be true, the expressions on both sides of the && must evaluate to true. However, if the left side is false, the right side is not evaluated at all, because the result can only be false. In this case, if we find that num has reached the maximum value, the loop breaks and the expression
columns[num]
is never evaluated5.
Be careful not to use the & operator when you really want &&; the former does a bitwise AND, which sometimes gives the same result that && would give but in other cases does not. I describe these operators in Chapter 5.
Each call to scanf roads a decimal integer from the standard input. If the conversion fails, either because end of me was reached or because the next input characters were not valid input for an integer, the value 0 is returned, which breaks the loop. If the characters are legal input for an integer, the value is converted to binary and stored in the array element columns[num]. scanf than returns the value 1.
Beware: The operator that tests two expressions for equality is ==. Using the = operator instead results in a legal expression that almost certainly will not do what you want it to do: it does an assignment rather than a comparison! It is a legal expression, though, so the compiler won’t catch this error for you6. Be extremely careful to use the double equal sign operator for comparisons. If your program is not working, check all of your comparisons for this error. Believe me, you will make this mistake, probably more than once, as I have.
TIP
CAUTION!
5 The phrase the loop breaks means that it terminates, not that it is has suddenly become defective. This phrase comes from the break statement, which is discussed in Chapter 4.
6 Some newer compilers will print a warning about assignments in if and while statements on the theory that it is much more likely that you wanted a comparison than an assignment in this context.
Download at http://www.pin5i.com/
1.1 Introduction 15 The next && makes sure that the number is tested for a negative value only if
scanf was successful in reading it. The statement num += 1;
adds 1 to the variable num. It is equivalent to the statement num = num + 1;
I discuss later why C provides two different ways to increment a variable7.
/*
** Make sure we have an even number of inputs, as they are
** supposed to be paired.
*/
if( num % 2 != 0 ){
puts( "Last column number is not paired." );
exit( EXIT_FAILURE );
}
This test checks that an even number of integers were entered, which is required because the numbers are supposed to be in pairs. The % operator performs an integer division, but it gives the remainder rather than the quotient. If num is not an even number, the remainder of dividing it by two will be nonzero.
The puts function is the output version of gets; it writes the specified string to the standard output and appends a newline character to it. The program then calls the exit; function, which terminates its execution. The value EXIT_FAILURE is passed back to the operating system to indicate that something was wrong.
/*
** Discard the rest of the line that contained the final
** number.
*/
while( (ch = getchar()) != EOF && ch != '\n' )
;
scanf only reads as far as it has to when converting input values. Therefore, the remainder of the Line that contained the last value will still be out there, waiting to
7 With the prefix and postfix ++ operators, there are actually four ways to increment a variable
Download at http://www.pin5i.com/
16 Chapter 1 A Quick Start
be read. It may contain just the terminating newline, or it may contain other characters too. Regardless, this while loop reads and discards the remaining characters to prevent them from being interpreted as the first line of data.
The expression
(ch = getchar()) != EOF && ch != '\n'
merits some discussion. First, the function getchar reads a single character from the standard input and returns its value. If there are no more characters in the input, the constant EOF (which is defined in stdio.h) is rammed instead to signal end of line.
The value returned by getchar is assigned to the variable ch, which is then compared to EOF. The parentheses enclosing the assignment ensure that it is done before the comparison. If ch is equal to EOF, the expression is false and the loop stops. Otherwise, ch is compared to a newline; again, the loop stops if they are found to be equal. Thus, the expression is true {causing the loop to run again) only if end of line was not reached and the character read was not a newline. Thus, the loop discards the remaining characters on the current input line.
Now let’s move on to the interesting part. In most other languages, we would have written the loop like this:
ch = getchar();
while( ch != EOF && ch != '\n' )
ch = getchar();
Get a character, there if we ve not yet reached end of tile or gotten a newline, get another character. Note that there are two copies of the statement.
ch = getchar();
The ability to embed the assignment in the while statement allows the C programmer
to eliminate this redundant statement.
The loop in the sample program has the same functionality as the one shown above, but it contains one fewer statement. It is admittedly harder to road, and one could make a convincing argument that this coding technique should be avoided for just that reason. However, most, of the difficulty in reading is due co inexperience with the language and its idioms; experienced C programmers have no trouble reading (and writing) statements such as this one. You should avoid making code harder to read when there is no tangible benefit to be gained from it, but the maintenance advantage in not having multiple copies of code more than justifies this common coding idiom.
TIP
}
Download at http://www.pin5i.com/
1.1.5 The rearrange Function
/*
** Process a line of input by concatenating the characters from
** the indicated columns. The output line is the NUL terminated,
*/
void
rearrange( char *output, char const *input,
{
in n_columns, int const columns[] )
int col;
int output_col;
int len;
/* subscript for columns array */
/* output column counter */
/* length of input line */
1.1 Introduction 17
TIP
A question frequently asked is why ch is declared as an integer when we are using it to read characters? The answer is that EOF is an integer value that requires more bits than are available in a character variable; this fact prevents a character in the input from accidentally being interpreted as EOF. But it also means that ch, which is receiving the characters, must be large enough to hold EOF 100, which is why an integer is used. As discussed in Chapter 3, characters are just tiny integers anyway, so using an integer variable to hold character values causes no problems.
One final comment on this fragment of the program: there are no statements in the body of the while statement. It turns out that the work done to evaluate the while expression is all that is needed, so there is nothing left for the body of the loop to do. You will encounter such loops occasionally, and handling them is no problem. The solitary semicolon after the while statement is called the empty statement, and it is used in situations like this one where the syntax requires a statement but there is no work to be done. The semicolon is on a line by itself in order to prevent the reader from mistakenly assuming that the next statement is me body of the loop.
return num;
The return statement is how a function returns a value to the expression from which it was called. In this case, the value of the variable num is returned to the calling program, where it is assigned to the main program s variable n_columns.
Download at http://www.pin5i.com/
18 Chapter 1 A Quick Start
These statements define the rearrange function and declare some local variables for it. The most interesting point here is that the first two parameters are declared as pointers but array names are passed as arguments when the function is called. When an array name is used as an argument, what is passed to the function is a pointer to the beginning of the array, which is actually the address where the array resides in memory. The fact that a pointer is passed rather than at copy of the array is what gives arrays their call by reference semantics. The function can manipulate the argument as a pointer, or it can use a subscript with the argument just as with an array name. These techniques are described in more detail in Chapter 8.
Because of the call by reference semantics, though, if the function modifies elements of the parameter array, it actually modifies the corresponding elements of the argument array. Thus, declaring columns to be const is useful in two ways. First, it states that the intention of the function s author is that this parameter is not to be modified. Second, it causes the compiler to verify that this intention is not violated. Thus, callers of this function need not worry about the possibility of elements of me array passed as the fourth argument being changed.
len = strlen( input );
output_col = 0;
/*
** Process each pair of column numbers.
*/
for( col = 0; col < n_columns; col += 2 ){
The real work of the function begins here. We first get the length of the input string, so we can skip column numbers that are beyond the end of the input. The for statement in C is not quite like other languages; it is more of at shorthand notation for a commonly used style of while statement. The for statement contains three expressions (all of which are optional, by the way). The first expression is the initialization and is evaluated once before the loop begins. The second is the test and is evaluated before each iteration of the loop; if the result is false the loop terminates. The third expression, is the adjustment which is evaluated at the end of each iteration just before the test is evaluated. To illustrate, the for loop that begins above could be rewritten as a while loop:
col = 0;
Download at http://www.pin5i.com/
1.1 Introduction 19
TIP
while( col < n_columns ){
body of the loop
col += 2; }
int nchars = columns[col + 1] – columns[col] + 1;
/*
** If the input line isn't this long or the output
** array is full, we're done
*/
if( columns[col] >= len ||
output_col == MAX_INPUT – 1 )
break;
/*
** If there isn’t room in the output array, only copy
** what will fit.
*/
if( output_col + nchars > MAX_INPUT – 1)
nchars = MAX_INPUT – output_col – 1;
/*
** Copy the relevant data.
*/
strncpy( output + output_col, input + columns[col],
nchars );
output_col += nchars;
Here is the body of the for loop, which begins by computing the number of characters in this range of columns. Then it checks whether to continue with the loop. If the input line is shorter than this starting column, or if the output line is already full, there is no more work to be done and the break statement exits the loop immediately.
The next test checks whether all of the characters from this range of columns will fit in the output line. If not, nchars is adjusted to the number that will fit.
It is common in throwaway programs that are used only once to not bother checking things such as array bounds and to simply make the array big enough so that it will never overflow. Unfortunately, this practice is sometimes used in production code, too. There, most of the extra space is wasted, but it is still possible to overflow the
20 Chapter 1 A Quick Start
array, leading to a program failure8.
}
Finally, the strncpy function copies the selected characters from the input line to the next available position in the output line. The first two arguments to strncpy are the destination and source, respectively, of a string to copy. The destination in this call is the position output_col columns past the beginning of the output array. The source is the position columns[col] past the beginning of the input array. The third argument specifies the number of characters to be copied9. The output column counter is then advanced nchars positions.
}
output[output_col] = ‘\0’;
After the loop ends, the output string is terminated with a NUL character; note that the body of the loop takes care to ensure that there is space in the array to hold it. Then execution reaches the bottom of the function, so an implicit return is executed. With no explicit return statement, no value can be passed back to the expression from which the function was called. The missing return value is not a problem here because the function was declared void (that is, returning no value) and there is no assignment or testing of the function‘s return value where it is called.
Download at http://www.pin5i.com/
1.2 Other Capabilities
The sample program illustrated many of the C basics, but there is a little more you should know before you begin writing your own programs. First is the putchar function, which is the companion to getchar. It takes a single integer argument and prints that character on the standard output.
Also, there are many more library functions for manipulating strings. I’ll briefly introduce a few of the most useful ones here. Unless otherwise noted, each argument to these functions may be a string literal, the name of a character array, or a pointer to a character.
8 The astute reader will have noticed that there is nothing to prevent gets from overflowing the input array if an extremely long input line is encountered. This loophole is really a shortcoming of gets, which is one reason why fgets (described in
chapter 15) should be used instead.
9 lf the source of the copy contains fewer characters than indicated by the third argument, the destination is padded to the proper length with NUL. bytes.
Download at http://www.pin5i.com/
strcpy is similar to strncpy except that there is no specified limit to the number of characters that are copied. It takes two arguments: the string in the second argument is copied into the first, overwriting any string that the first argument might already contain, strcat also takes two arguments, but this function appends the string in the second argument to the end of the string already contained in the first. A string literal may not be used as the first argument to either of these last two functions. It is the programmers responsibility with both functions to ensure that the destination array is large enough to hold the result.
For searching in strings, there is strchr, which takes two arguments the first is a string, and the second is a character. It searches the string for the first occurrence of the character and returns a pointer to the position where it was found. If the first argument does not contain the character, a NULL pointer is returned instead. The strstr function is similar. Its second argument is a string, and it searches for the first occurrence of this string in the first argument.
1.3 Compiling
The way you compile and run C programs depends on the kind of system you’re
using. To compile a program stored in the file testing.c on a UNIX machine, try these
commands:
cc testing.c
a.out
On PC’s, you need to know which compiler you are using. For Borland C++, try this
command in a MS DOS window:
bcc testing.c
testing
1.4 Summary
The goal of this chapter was to describe enough of C to give you an overview of the language. With this context, it will be easier to understand the topics in the next chapters.
The sample program illustrated numerous points. Comments begin with / * and end with */, and are used to include descriptions in the program. The preprocessor directive #include causes the contents of a library header to be
1.4 Summary 21
Download at http://www.pin5i.com/
22 Chapter 1 A Quick Start
processed by the compiler, and the #define directive allows you to give symbolic names to literal constants.
All C programs must have a function called main in which execution begins. Scalar arguments to functions are passed by value, and array arguments have call by reference semantics. Strings are sequences of characters terminated with a NUL byte, and there is a library of functions to manipulate strings in various ways. The printf function performs formatted output, and the scanf function is used for formatted input; getchar and putchar perform unformatted character input and output, respectively. if and while statements work much the same in C as they do in other languages.
Having seen how the sample program works, you may now wish to try writing some C programs of your own. If it seems like there ought to be more to the language, you are right, there is much more, but this sampling should be enough to get you started.
1.5 Summary of Cautions
1. Not putting ampersands in front of scalar arguments to scanf (page 12).
2. Using printf format codes in scanf (page 13).
3. Using & for a logical AND instead of && (page 14).
4. Using = to compare for equality instead of == (page 14).
1.6 Summary of Programming Tips
1. Using #include files for declarations (page 6).
2. Using #define to give names to constant values (page 7).
3. Putting function prototypes in #include files (page 7).
4. Checking subscript values before using them (page 14).
5. Nesting assignments in a while or if expression (page 16).
6. How to write a loop with an empty body (page 17).
7. Always check to be sure that you don’t go out of the bounds of an array (page 19).
1.7 Questions
Download at http://www.pin5i.com/
1. C is a free form language, which means that there are no rules regarding how programs must look10. Yet the sample program followed specific spacing rules. Why do you think this is?
2. What is the advantage of putting declarations, such as function prototypes, in header files and then using #include to bring the declarations into the source files where they are needed?
3. What is the advantage of using #define to give names to literal constants?
4. What format string would you use with printf in order to print a decimal integer, a string, and a floating point value, in that order? Separate the values from one another with a space, and end the output with a newline character.
5. Write the scanf statement needed to read two integers, called quantity and price, followed by a string, which should be stored in a character array called department.
6. There are no checks made on the validity of an array subscript in C. Why do you
think this obvious safety measure was omitted from the language?
7. The rearrange program described in the chapter contains the statement
strncpy( output + output_col,
input + columns[col], nchars );
The strcpy function takes only two arguments, so the number of characters it copies is determined by the string specified by the second argument. What would be the effect of replacing the strncpy function call with a call to strcpy in this program?
8. The rearrange program contains the statement
while( gets( input ) != NULL ){
What might go wrong with this code?
1.8 Programming Exercises
1. The Hello world! program is often the first C program that a student of C writes. It prints Hello world! followed by a newline on the standard output. This trivial program is a good one to use when figuring out how to run the C compiler on your particular system.
1.8 Programming Exercises 23
10 Other than for the preprocessor directives.
Download at http://www.pin5i.com/
24 Chapter 1 A Quick Start
2. Write a program that reads lines from the standard input. Each line is printed on the standard output preceded by its line number. Try to write the program so that it has no built in limit on how long a line it can handle.
3. Write a program that reads characters from the standard input and writes them to the standard output. It should also compute a checksum and write it out after the characters.
Thechecksumiscomputedinasigned charvariablethatisinitializedto—1.As each character is read from the standard input, it is added to the checksum. Any overflow from the checksum variable is ignored. When all of the characters have been written, the checksum is then written as a decimal integer, which may be negative. Be sure to follow the checksum with a new line.
On computers that use ASCII, running your program on a file containing the words Hello world! followed by a newline should produce the following output:
Hello world!
102
4. Write a program that reads input lines one by one until end of file is reached, determines the length of each input line, and then prints out only the longest line that was found. To simplify matters, you may assume that no input line will be longer than 1000 characters.
5. The statement
if( columns[col] >= len … )
break;
in the rearrange program stops copying ranges of characters as soon as a range is encountered that is past the end of the input line. This statement is correct only if the ranges are entered in increasing order, which may not be the case. Modify the rearrange function so that it will work correctly even if the ranges are not entered in order.
6. Modify the rearrange program to remove the restriction that an even number of column values must be read initially. If an odd number of values are read, the last valued indicates the start of the final range of characters. Characters from here to the end of the input string are copied to the output string.
Download at http://www.pin5i.com/
There is no doubt that learning the fundamentals of a programming language is not as much fun as writing programs. However, not knowing the fundamentals makes writing programs a lot less fun.
2.1 Environments
In any particular implementation of ANSI C, there are two distinct environments that are of interest: the translation environment, in which source code is converted in to executable machine instructions; and the execution environment, in which the code actually runs. The Standard makes it clear that these environments need not be on the same machine. For example, cross compilers run on one machine but produce executable code that will be run on a different type of machine. Nor is an operating system a requirement: the Standard also discusses freestanding environments in which there is no operating system. You might encounter this type of environment in an embedded system such as the controller for a microwave oven.
2.1.1 Translation
The translation phase consists of several steps. First, each of the (potentially many) source tiles that make up a program are individually converted to object code via the compilation process. Then, the various object files are tied together by the linker to form a single, complete executable program. The linker also brings in any functions from the standard C libraries that were used in the program, and it can also search personal program libraries as well. Figure 2.l illustrates this process.
2 Basic Concepts
26 Chapter 2 Basic Concepts Source code
Source code Source code Libraries
Compiler Compiler Compiler Linker
Executable
Object code Object code Object code
Download at http://www.pin5i.com/
Figure 2.1 The compilation process
The compilation process itself consists of several phases, with the first being the preprocessor. This phase performs textual manipulations on the source code, for example, substituting the text of identifiers that have been #define’d and reading the text of tiles that were #include d.
The source code is then parsed to determine the meanings of its statements. This second stage is where most error and warning messages are produced. Object code is then generated. Object code is a preliminary form of the machine instructions that implement the statements of the programs called for by a command line option, an optimizer processes the object code in order to make it more efficient. This optimization takes extra time, so it is usually not done until the program has been debugged and is ready to go into production. Whether the object code is produced directly or is in the form of assembly language statements that must then be assembled in a separate phase to form the object file is not important to us.
Filename Conventions
Although the Standard does not have any rules governing the names used for tiles, most environments have filename conventions that you must follow. C source code is usually put in files whose names end with the .c extension. Files that are #include d into other C source code are called header files and usually have names ending in .h.
Different environments may have different conventions regarding object file names. For example, they end with .o on UNIX systems but with .obj on MS DOS systems.
Download at http://www.pin5i.com/
Compiling and Linking
The specific commands used to compile and link C programs vary from system, but many work the same as the two systems described here. The C compiler on most UNIX systems is called cc, and it can be invoked in a variety of ways.
1. To compile and link a C program that is contained entirely in one source file:
cc program.c
This command produces an executable program called a.out. An object file called
program.o is produced, but it is deleted after the linking is complete.
2. To compile and link several C source files:
cc main.c sort.c lookup.c
The object files are not deleted when more than one source file is compiled. This fact allows you to recompile only the file(s) that changed after making modifications, as shown in the next command.
3. To compile one C source file and link it whit existing object files:
cc main.o lookup.o sort.c
4. To compile a single C source file and produce an object file (in this case, called program.o) for later linking:
cc –c program.c
5. To compile several C source files and produce an object file for each:
cc –c main.c sort.c lookup.c
6. To link several object files:
cc main.o sort.o lookup.o
The –o name option may be added to any of the commands above that produce an executable program; it causes the linker to store the executable program in a file called name rather than a.out. By default, the linker searches the standard C library. The lname flag tells the linker to also search the library called name; this option should appear last on the command line. There are other options as well; consult your system documentation.
2.1 Environments 27
Download at http://www.pin5i.com/
28 Chapter 2 Basic Concepts
Borland C/C++ 5.0 for MS DOS/Windows has two interfaces that you can use. The Windows Integrated Development Environment is a complete self contained programming tool that contains a source code editor, debuggers, and compilers. Its use is beyond the scope of this book. The MS DOS command line interface, though, works much the same as the UNIX compilers, with the following exceptions:
1. its name is bcc;
2. the object files are named file.obj;
3. the compiler does not delete the object file when only a single source file is compiled and linked; and
4. by default, the executable file named after the first source or object file named on the command line, though the –ename option may be used to put the executable program in name.exe.
2.1.2 Execution
The execution of a program also goes through several phases. First, the program must be loaded into memory. In hosted environments (those with an operating system), this task is handled by the operating system. It is at this point that pre initialized variables that arc not stored on the stack are given their initial values. Program loading must be arranged manually in freestanding environments, perhaps by placing the executable code in read only memory (ROM).
Execution of the program now begins. In hosted environments, a small startup routine is usually linked with the program. It performs various housekeeping chores, such as gathering the command line arguments so that the program can access them. The main function is than called.
Your code is now executed. On most machines, your program will use a runtime stack, where variables local to functions and function return addresses are stored. The program can also use static memory; variables stored in static memory retain their values throughout the program’s execution.
The final phase is the termination of the program, which can result from several different causes. Normal termination is when the main function returns.11 Some execution environments allow the program to return a code that indicates why the program stopped executing. In hosted environments, the startup routine receives
11 Or when some function calls exit, described in Chapter 16.
Download at http://www.pin5i.com/
control again and may perform various housekeeping tasks, such as closing any files that the program may have used but did not explicitly close. The program might also have been interrupted, perhaps due to the user pressing the break key or hanging up a telephone connection, or it might have interrupted itself due to an error that occurred during execution.
2.2 Lexical Rules
The lexical rules, like spelling rules in English, govern how you form the individual pieces, called tokens, of a source program.
An ANSI C program consists of declarations and functions. The functions define the work to be performed, whereas the declarations describe the functions and/or the kind of data (and sometimes the data values themselves) on which the functions will operate. Comments may be interspersed throughout the source code.
2.2.1 Characters
The Standard does not require that any specific character set be used in a C environment, but it does specify that the character set must have the English alphabet in both upper and lowercase, the digits 0 through 9, and the following special characters.
!”#%'()*+,-./: ;<>=?[]\^_{}|~
The newline character is what marks the end of each line of source code and, when character input is read by the executing program, the end of each line of input. If needed by the runtime environment, the newline can be a sequence of characters, but they are all treated as if they were a single character. The space, tab, vertical tab, and form feed characters are also required. These characters and the newline are often referred to collectively as white space character, because they cause space to appear rather than making marks on the page when they are printed.
The Standard defines several trigraphs – a trigraph is a sequence of characters that represents another character. Trigraphs are provided so that C environments can be implemented with character sets that lack some of the required characters. Here are the trigraphs and the characters that they represent.
2.2 Lexical Rules 29
CAUTION!
There is no special significance to a pair of question marks followed by any other character.
Although trigraphs are vital in a few environments, they are a minor nuisance for nearly everyone else. The sequence ?? was chosen to begin each trigrahp because it does not often occur naturally, but therein lies the danger. You never think about trigraphs because they are usually not a problem, so when one is written accidentally, as in
printf( “Delete file (are you really sure??): ” );
the resulting ] in the output is sure to surprise you.
There are a few contexts in writing C source code where you would like to use a
particular character but cannot because that character has a special meaning in that context. For example, the quotation mark ” is used to delimit string literals. How does one include a quotation mark within a string literal? K&R C defined several escape sequences or character escapes to overcome this difficulty, and ANSI C has added a few new ones to the list. Escape sequences consist of a backslash followed by one or more other characters. Each of the escape sequences in the list below represents the character that follows the backslash but without the special meaning usually attached to the character.
\?
\* \’ \\
There are many characters that are not used to express source code but are very useful in formatting program output or manipulating a terminal display screen. Character escapes are also provided to simplify their inclusion in your program. These character escapes were chosen for their mnemonic value.
The character escapes marked with † are new to ANSI C and are not implemented in K&R C.
30 Chapter 2 Basic Concepts ??( [
??< {
??> }
??’ ^
??= #
??/ \
??- ~
??) ] ??! |
Download at http://www.pin5i.com/
Used when writing multiple question marks to prevent them from being interpreted as trigraphs.
Used to get quotation marks inside of string literals.
Used to write a character literal for the character ‘.
Used when a backslash is needed to prevent its being interpreted as a character escape.
K&R C
Download at http://www.pin5i.com/
\a
\b \f \n \r \t \v \ddd
\xddd
Note that any number of hexadecimal digits may be included in a \xddd sequence, but the result is undefined if the resulting valued is larger than what will fit in a character.
2.2.2 Comments
† Alert character. This rings the terminal bell or produces some other audible or visual signal.
Backspace character.
Formfeed character.
Newline character.
Carriage return character.
Horizontal tab character.
† Vertical tab character.
ddd represents from one to three octal digits. This escape represents the character whose representation has the given octal value.
† Like the above, except that the value is specified in hexadecimal.
2.2 Lexical Rules 31
CAUTION!
C comments begin with the characters /*, end with the characters */, and may contain anything except */ in between. Whereas comments may span multiple lines in the source code, they may not be nested within one another. Note that these character sequences do not begin or end comments when they appear in string literals.
Each comment is stripped from the source code by the preprocessor and replaced by a single space. Comments may therefore appear anywhere that white space characters may appear.
A. comment begins where it begins and ends where it ends, and it includes everything on all the lines in between. This statement may seem obvious, but it wasn t to the student who wrote this innocent looking fragment of code.
Can you see why only the first variable is initialized?
x1 = 0;
x2 = 0;
x3 = 0;
x4 = 0
/***********************
** Initialize the **
** counter variables. **
***********************/
CAUTION!
Take care to terminate comments with */ rather than *?. The latter can occur if you are typing rapidly or hold the shift key down too long. This mistake looks obvious when pointed out, but it is deceptively hard to find in real programs.
Download at http://www.pin5i.com/
32 Chapter 2 Basic Concepts 2.2.3 Free Form Source Code
C is a free form language, meaning that there are no rules governing where statements can be written, how many statements may appear on a line, where spaces should be put, or how many spaces can occur.12 The only rule is that one or more white space characters (or a comment) must appear between tokens that would be interpreted as a single long token if they were adjacent. Thus, the following statements are equivalent:
y=x+1; y=x+1;
y=x +
1;
Of the next group of statements, the first three are equivalent, but the last is illegal.
int x;
int x;
int/*comment*/x;
intx;
This freedom is a mixed blessing; you will hear some soapbox philosophy about this issue shortly.
2.2.4 Identifiers
Identifiers are the names used for variables, functions, types, and so forth. They are composed of upper and lowercase letters, digits, and the underscore character, but they may not begin with a digit. C is a case sensitive language, so abc, Abc, abC, and ABC are four different identifiers. Identifiers maybe any length, though the Standard allows the compiler to ignore characters after the first 31. It also allows an implementation to restrict identifiers for external names (that is, those that the linker manipulates) to six monocase characters.
12 Except for preprocessor directives, described in Chapter 14, which are line oriented.
Download at http://www.pin5i.com/
2.3 Program Style 33 The following C keywords are reserved, meaning that they cannot also be used
as identifiers.
auto do goto break double if
case else int
char enum long const extern register
continue float return default for short
2.2.5 Form of a Program
signed
sizeof
static
struct
switch
typedef
union
unsigned
void
volatile
while
A C program may be stored in one or more source tiles. Although one source file may contain more than one function, every function must be completely contained in a single source file.13 There are no rules in the Standard governing this issue, but a reasonable organization of a C program is for each source file to contain a group of related functions. This technique has the side benefit of making it possible to implement abstract data types.
2.3 Program Style
A few comments on program style are in order. Freeform language such as C will accept sloppy programs, which are quick and easy to write but difficult to read and understand later. We humans respond to visual clues so putting them in your source code will aid whoever must read it later. (This might be you!) Program 2.1 is an example that, although admittedly extreme, illustrates the problem. This is a working program that performs a marginally useful function. The question is, what does it do?14 Worse yet, suppose you had to make a modification to this program! Although experienced C programmers could figure it out given enough time, few would bother. It would be quicker and easier to just toss it out and write a new program from scratch.
13 Technically, a function could begin in one source file and continue in another if the second were #include d into the first. However, this procedure is not a good use of the #include directive.
14 Believe it or not, it prints the lyrics to the song The twelve Days of Christmas. The program is a minor modification of one written by Ian Phillipps of Cambridge Consultants Ltd. for the International Obfuscated C Code Contest (see http://reality.sgi.com/csp/ioccc). Reprinted by permission. Copyright © 1988, Landon Curt Noll & Larry Bassel. All Rights Reserved. Permission for personal, educational or non profit use is granted provided this copyright and notice is included in its entirety and remains unaltered. All other users must receive prior permission in writing form both Landon Curt Noll and Larry Bassel.
34 Chapter 2 Basic Concepts
Download at http://www.pin5i.com/
#include
main(t,_,a)
char *a;
{return!0