Microsoft PowerPoint – 10_Strings_in_C
O
SU
C
SE
2
42
1
J.E.Jones
Required reading: Pointers on C, Chapter 9 through 9.4, 9.11 and 9.12
O
SU
C
SE
2
42
1
J. E. Jones
ANSI C does not have a string data type.
In ANSI C, strings are stored as sequences of ASCII
characters in memory declared as an array of char data
type values (i.e., a character array).
The individual characters will always be stored in
contiguous bytes in memory.
When using strings in ANSI C (other than string literals
passed to printf() or scanf() ), the header file, string.h,
which contains prototypes and declarations needed to use
string library functions should be included in any .c file
you create
O
SU
C
SE
2
42
1
J. E. Jones
Note how the strings are stored here:
O
SU
C
SE
2
42
1
J. E. Jones
There are two different kinds of data “strings” in C:
1. Strings which are stored as string literals (“read only”
strings)…anything within double quotes not allocated as an
automatic or static variable:
/* string literals in scanf() or printf()*/
“%s %s\n”
“The value of x is %d and value of y is %4.2f\n”
/* string literals that are assigned to a pointer */
/* the pointer is allocated space; the string is not */
char *title = “A Tale of Two Cities”;
String literals are terminated with a NULL character (‘\0’).
The compiler will insert the NULL terminator when we define
any type string with “”.
O
SU
C
SE
2
42
1
J. E. Jones
There are two different kinds of data “strings” in C:
2. Strings which are arrays of characters (read-write
strings)…anything declared within your program as an
an array of data type char where the binary values in each
byte are interpreted as ASCII characters:
/* string declared as a single string */
char title[] =“Top Hat”;
/* string declared as a series of ASCII chars */
char title[8]={‘T’,’o’,’p’,’ ‘,’H’,’a’,’t’,’\0’};
Char array strings must be terminated with a NULL character
(‘\0’). The compiler will insert the NULL terminator when we
define a string with “”, but not when defined ASCII char by
ASCII char. In the latter case, we must explicitly define the
NULL at the end of the string.
O
SU
C
SE
2
42
1
J. E. Jones
Pointers are closely related to strings in C and, whenever we use a pointer to a
string, we must be careful to distinguish between the two types of strings.
char *string1 = “Go Bucks!”; /* string1 can change, but “Go Bucks!” can’t */
char string2[10] = “Go Bucks!”; /* string2 can’t change, but “Go Bucks!” can */
string1:
◦ The individual chars in string1 make up a string literal (a string constant).
◦ string1 (the identifier) is a pointer to char and it is a variable (string1 can point
to any string, not just “Go Bucks!”).
◦ “Go Bucks!” will be stored in Read-Only memory defined by the OS.
string2:
◦ The individual chars in string2 are variables (we can change the chars in the
string so the elements in string2 can be treated as they are in other arrays).
◦ string2 (the identifier) is a constant pointer to char (just as the name of any
array in ANSI C is normally a constant pointer to the first element).
◦ “Go Bucks!” is stored in the (up to) 10 elements of the string2[] array which is
Read-Write memory
In either case(RO or RW memory), the values stored in each of the individual characters would
be the ASCII value of each alphanumeric or symbol and will end with a NULL value at the end of
the string.
O
SU
C
SE
2
42
1
J. E. Jones
Consider this code:
char *string1 = “warehorse”;
char string2[] = “conteiner”;
string1[6] = ‘u’; /* Invalid – chars in a string literal, i.e.,
a read-only string, cannot be changed –
segmentation fault occurs */
string1 = “warehouse”; /* Valid – a pointer to char can be
made to point to a different string; the
string which string1 points to is still a
read-only string, however. The address to
“warehorse” is lost in this case */
string2[4] = ‘a’; /*Valid – any ASCII char in a char array string
can be changed – they are variables */
O
SU
C
SE
2
42
1
J. E. Jones
A string is a sequence of zero or more characters which must
be terminated by a null character (NULL byte), ‘\0’; the ASCII
character code for ‘\0’ has the value 0. Thus, no string can
contain this character except at its end.
The null character termination is one of the fundamental
differences between arrays of characters (e.g., strings) and
other arrays.
A string that contains nothing, “”, will contain one character:
the NULL byte (‘\0’).
O
SU
C
SE
2
42
1
J. E. Jones
The null character termination is important, because library functions such as
printf() and scanf(), among many others, use it to mark and determine where the
ASCII character string ends (i.e., the length of the string).
Because of the null character termination, we do not have to pass the size or length
of a character string to functions.
◦ even though we must do so for all other array types or char arrays that do not
contain ASCII strings
If the null character termination is not present, various library functions that
manipulate strings will not behave correctly. In such cases, you will likely get
segmentation faults when your code runs.
When you declare read only or read-write strings using “. . .”, the compiler stores
ASCII representations of each character in your string and will add the null
character termination at the end.
O
SU
C
SE
2
42
1
J. E. Jones
Example:
int main () {
char string[] = {‘e’, ‘x’, ‘a’, ‘m’, ‘\0’, ‘p’, ‘l’, ‘e’};
printf(“%s\n”, string);
return (0);
}
What will be printed?
O
SU
C
SE
2
42
1
J. E. Jones
Example:
int main () {
char string[] = {‘e’, ‘x’, ‘a’, ‘m’, ‘\0’, ‘p’, ‘l’, ‘e’};
printf(“%s\n”, string);
return (0);
}
What will be printed?
exam
The string appears to consist of 8 characters, but, because printf()
treats the null character (‘\0’) as the end of the string, the final three
characters that were stored in the char array, when it was initialized,
are not printed by printf().
O
SU
C
SE
2
42
1
J. E. Jones
The length of a string is the number of characters it contains not including
the terminating null character, if one is present (it always should be).
Thus, when you use an array of characters as a string, you must make the
size of the array one greater than the string length, or the maximum
number of characters which the string can contain before the null character
termination.
◦ Example:
char label[10] = {‘c’, ‘o’, ‘n’, ‘t’, ‘a’, ‘i’, ‘n’, ‘e’, ‘r’};
a string stored in label can have a maximum length of 9, so that there is
space for the null character termination.
Note that since we only defined 9 ASCII characters above, the compiler
initializes all subsequent array elements to zero (e.g., NULL). However, if
you do not leave space for the NULL char, it can not be added. This will
lead to errors.
O
SU
C
SE
2
42
1
J. E. Jones
The following two declarations are equivalent:
char label[10] = {‘c’, ‘o’, ‘n’, ‘t’, ‘a’, ‘i’, ‘n’, ‘e’, ‘r’};
char label[10] = “container”; /* More usual declaration */
In both cases, the compiler will add the terminating null character when it
stores the characters of the string in memory (if the length of the string is
sufficient).
Also notice that a character enclosed in single quotation marks is treated as
char, whereas zero or more characters enclosed in double quotation marks
are treated as a string (a sequence of contiguous chars).
Simple I/O examples:
◦ scanf(“%s”, label); /* will read an ASCII string until whitespace found. */
◦ printf(“%s\n”, label);/* will print an ASCII string until null found */
NOTE: not *label likely expected for scanf(), but must be used for printf(), too. Later
slides in this deck will further explain this.
Refer to the I/O format slides for more complex options.
O
SU
C
SE
2
42
1
J. E. Jones
Here are 4 possible ways to declare a string. In all but
one of these options, the compiler will place a null
character at the end of the string. Can you correctly
pick which one?
char label1[10] = {‘c’,’o’,’n’,’t’,’a’,’i’,’n’,’e’,’r’};
char label2[] = {‘c’,’o’,’n’,’t’,’a’,’i’,’n’,’e’,’r’};
char label3[10] = “container”;
char label4[] = “container”;
O
SU
C
SE
2
42
1
J. E. Jones
Here are 4 possible ways to declare a string. In all but
one of these options, the compiler will place a null
character at the end of the string. Can you correctly
pick which one?
char label1[10] = {‘c’,’o’,’n’,’t’,’a’,’i’,’n’,’e’,’r’};
char label2[] = {‘c’,’o’,’n’,’t’,’a’,’i’,’n’,’e’,’r’};
char label3[10] = “container”;
char label4[] = “container”;
All declarations will be null terminated except the label2
array.
O
SU
C
SE
2
42
1
J. E. Jones
Since strings are stored as arrays in memory, individual chars can be
accessed using an index:
char *string1 = “quit”; /* string1 can change, but “quit” can’t */
char string2[12] = “slop”; /* string2 can’t change, but “slop” can */
if (string1[0] == ‘q’) {
string2[1] = ‘t’; /* string2 changes from “slop” to “stop” */
}
Remember: Since string1 is a read-only string, it is valid to read
individual chars in the string, you cannot write to them. If you attempt to
write individual char elements of a string which is in read-only memory,
you will get segmentation faults. The only way to change the string
string1 points to is string1 = “quip”;
O
SU
C
SE
2
42
1
J. E. Jones
You can assign a read-write string to a char * type of string variable; then,
the char * string is read-write (Remember, a char * is just a pointer, and in
this case, points to a string in read-write memory).
Example:
char *string1;
char string2[] = “Go Ducks!”;
string1 = string2;
string1[3] = ‘B’; /* No seg fault, because string1 */
/* points to a read-write string */
printf(“string1 is: %s\n”, string1);
/*NOTE: not *string1 – WHY???? */
O
SU
C
SE
2
42
1
J. E. Jones
HOWEVER, you cannot assign a char * type of string to a
char array type of string.
char *string1 = “Go Bucks”;
char string2[11] = “Touchdown!”;
string2 = string1; /* INVALID – WHY?*/
O
SU
C
SE
2
42
1
J. E. Jones
HOWEVER, you cannot assign a char * type of string to a
char array type of string.
char *string1 = “Go Bucks”;
char string2[11] = “Touchdown!”;
string2 = string1; /* INVALID – WHY?*/
The compiler will refuse to compile, and will output:
error: incompatible types when assigning to type ‘char[9]’
from type ‘char *’
O
SU
C
SE
2
42
1
J. E. Jones
Pointer arithmetic may be used with strings which are arrays
of char (and with read-only strings which are char * and point
to string literals, but only for reading):
char string1[8] = “quick”;
char string2[5] = “dime”;
if (*(string1 + 0) == ‘q’)
*(string2 + 1) = ‘o’;
printf(“%s\n”, string2);
What prints or is there a seg fault?
O
SU
C
SE
2
42
1
J. E. Jones
Pointer arithmetic may be used with strings which are arrays
of char (and also with read-only strings which are char * and
point to string literals, but only for reading):
char string1[8] = “quick”;
char string2[5] = “dime”;
if (*(string1 + 0) == ‘q’)
*(string2 + 1) = ‘o’;
printf(“%s\n”, string2);
dome is printed
O
SU
C
SE
2
42
1
J. E. Jones
Normally, printf() requires that all parameters are passed by value.
◦ (%c, %i,%d, %f).
When working with strings, however, this is not the case:
char *string1 = “Bucks”;
printf (“%s\n”, string1);
Since string1 is a pointer, printf() gets passed the address where the first char in
string1 is stored, and it will output a string of characters starting with the character
at that address and subsequent addresses, until it encounters a NULL character.
Notice that dereferencing of the pointer is not used here (this may seem surprising,
but since we used a string format specifier (%s), printf() “knows” that we want to
print a sequence of chars, and it expects a pointer to the first char, so the dereference
operator is not necessary. (And will produce segmentation fault errors if used).
If we wanted to print only a single char from string1, THEN we must use
dereference so that we are passing by value: printf (“%c”, *string1) or
printf(“%c”, string[0]);
O
SU
C
SE
2
42
1
J. E. Jones
The standard library has two types of character operation
functions which operate on individual characters.
These characters can be contained in strings, and usually are.
The first type of function is used to classify characters
according to certain characteristics, and the second type of
function is used to transform them to a related character.
ctype.h is the header file where these functions are prototyped.
O
SU
C
SE
2
42
1
J. E. Jones
Each of these functions takes an integer argument which
contains an ASCII character value.
All these functions return an integer value which is used as a
Boolean.
They are listed on the next slide.
You can get a clue about what the function does from its name,
but you can look up the details if you think one of these may
be useful in some software you are writing.
O
SU
C
SE
2
42
1
J. E. Jones
iscntrl()
isspace()
isdigit() (also isxdigit() for hexadecimal digits)
islower(), isupper()
isalpha()
isalnum()
ispunct()
isgraph()
isprint()
isblank()
O
SU
C
SE
2
42
1
J. E. Jones
These functions are used to transform a character value, which is
passed as an integer argument, to a related character value, which is
returned as an integer.
int tolower((int)ch); /*returns lower case if ch is upper case */
int toupper((int) ch); /*returns upper case if ch is lower case */
From the manual page for these functions:
BUGS
The details of what constitutes an uppercase or lowercase letter depend on the
current locale. For example, the default “C” locale does not know about umlauts,
so no conversion is done for them.
In some non-English locales, there are lowercase letters with no corresponding
uppercase equivalent; the German sharp s is one example.
So, no issue unless you might be programming an application for an international
environment, but some of you will likely be doing so.
O
SU
C
SE
2
42
1
J. E. Jones
strcat() Appends a string
strchr() Finds first occurrence of a given character
strcmp() Compares two strings
strcasecmp() Compares two strings, non-case sensitive
strstr() Finds a substring in a string
strcpy() Copies one string to another
strlen() Finds length of a string
strlwr() Converts a string to lowercase
All these functions rely on strings terminating with NULL char for
error free operation.
O
SU
C
SE
2
42
1
J. E. Jones
strncat() Appends n characters of string
strncmp() Compares n characters of two strings
strncpy() Copies n characters of one string to another
strnset() Sets n characters of string to a given character
strrchr() Finds last occurrence of given character in string
strrev() Reverses string
strset() Sets all characters of string to a given character
strspn() Finds first substring from given character set in string
strstr(s1,s2) Returns a pointer to the first occurrence of string s2 in
string s1
strupr() Converts string to uppercase
See http://man7.org/linux/man-pages/man3/string.3.html for a complete list or on
stdlinux, type the shell command man string.