OSU CSE 2421
Required reading: Pointers on C, Chapter 9 through 9.4, 9.11 and 9.12
J.E.Jones
OSU CSE 2421
ANSI C does not have a string data type.
In ANSI C, strings are stored as sequences of ASCII characters in memory declared as an array of char data type values (i.e. a character array).
The individual characters will always be stored in contiguous bytes in memory.
When using strings in ANSI C (other than string literals passed to printf() or scanf() ), the header file, string.h, which contains prototypes and declarations needed to use string library functions should be included in any .c file you create
J. E. Jones
OSU CSE 2421
Note again how the strings are stored here:
J. E. Jones
OSU CSE 2421
There are two different kinds of data “strings” in C:
1. Strings which are stored as string literals (“read only”
strings)…anything within double quotes not allocated as an auto
or static variable:
/* string literals in scanf() or printf()*/
“%s %s\n”
“The value of x is %d and value of y is %4.2f\n”
/* string literals that are assigned to a pointer */
/* the pointer is allocated space; the string is not */ char *title = “A Tale of Two Cities”;
String literals are terminated with a NULL character (‘\0’). The compiler will insert the NULL terminator when we define any type string with “”.
J. E. Jones
OSU CSE 2421
There are two different kinds of data “strings” in C: 2. Strings which are arrays of characters (read-write
strings)…anything declared within your program as an
an array of data type char where the binary values in each byte are interpreted as ASCII characters:
/* string declared as a single string */ char title[] =“Top Hat”;
/* string declared as a series of ASCII chars */ char title[8]={‘T’,’o’,’p’,’ ‘,’H’,’a’,’t’,’\0’};
Char array strings must be terminated with a NULL character (‘\0’). The compiler will insert the NULL terminator when we define a string with “”, but not when defined ASCII char by ASCII char. In the latter case, we must explicitly define the NULL at the end of the string.
J. E. Jones
OSU CSE 2421
Pointers are closely related to strings in C and, whenever we use a pointer to a string, we must be careful to distinguish between the two types of strings.
char *string1 = “Go Bucks!”; /* string1 can change, but “Go Bucks!” can’t */ char string2[10] = “Go Bucks!”; /* string2 can’t change, but “Go Bucks!” can */
string1:
◦ The individual chars in string1 make up a string literal (a string constant).
◦ string1 (the identifier) is a pointer to char and it is a variable (string1 can point to any string, not just “Go Bucks!”).
◦ “Go Bucks!” will be stored in Read-Only memory defined by the OS. string2:
◦ The individual chars in string2 are variables (we can change the chars in the string so the elements in string2 can be treated as they are in other arrays).
◦ string2 (the identifier) is a constant pointer to char (just as the name of any array in ANSI C is normally a constant pointer to the first element).
◦ “Go Bucks!” is stored in the (up to) 10 elements of the string2[] array which is Read-Write memory
In either case(RO or RW memory), the values stored in each of the individual characters would be the ASCII value of each alphanumeric or symbol and will end with a NULL value at the end of the string.
J. E. Jones
OSU CSE 2421
Consider this code:
char *string1 = “warehorse”; char string2[] = “conteiner”; string1[6] = ‘u’;
/* Invalid – chars in a string literal, i.e., a read-only string, cannot be changed – segmentation fault occurs */
string1 = “warehouse”;
/* Valid – a pointer to char can be
made to point to a different string; the
string2[4] = ‘a’;
/*Valid – any ASCII char in an char array string can be changed – they are variables */
string which string1 points to is still a read-only string, however. The address to “warehorse” is lost in this case */
J. E. Jones
OSU CSE 2421
A string is a sequence of zero or more characters which must be terminated by a null character (NULL byte), ‘\0’; the ASCII character code for ‘\0’ has the value 0. Thus, no string can contain this character except at its end.
The null character termination is one of the fundamental differences between arrays of characters (e.g. strings) and other arrays.
A string that contains nothing, “”, will contain one character: the NULL byte (‘\0’).
J. E. Jones
OSU CSE 2421
The null character termination is important, because library functions such as printf() and scanf(), among many others, use it to mark and determine where the ASCII character string ends (i.e. the length of the string).
Because of the null character termination, we do not have to pass the size or length of a character string to functions.
◦ even though we must do so for all other array types or char arrays that do not contain ASCII strings
If the null character termination is not present, various library functions that manipulate strings will not behave correctly. In such cases, you will likely get segmentation faults when your code runs.
When you declare read only or read-write strings using “. . .”, the compiler stores ASCII representations of each character in your string and will add the null character termination at the end.
J. E. Jones
OSU CSE 2421
Example: int main () {
}
char string[] = {‘e’, ‘x’, ‘a’, ‘m’, ‘\0’, ‘p’, ‘l’, ‘e’}; printf(“%s\n”, string);
return (0);
What will be printed?
J. E. Jones
OSU CSE 2421
Example: int main () {
}
char string[] = {‘e’, ‘x’, ‘a’, ‘m’, ‘\0’, ‘p’, ‘l’, ‘e’}; printf(“%s\n”, string);
return (0);
What will be printed?
exam
The string appears to consist of 8 characters, but, because printf() treats the null character (‘\0’) as the end of the string, the final three characters that were stored in the char array, when it was initialized, are not printed by printf().
J. E. Jones
OSU CSE 2421
The length of a string is the number of characters it contains not including the terminating null character, if one is present (it always should be).
Thus, when you use an array of characters as a string, you must make the size of the array one greater than the string length, or the maximum number of characters which the string can contain before the null character termination.
◦ Example:
char label[10] = {‘c’, ‘o’, ‘n’, ‘t’, ‘a’, ‘i’, ‘n’, ‘e’, ‘r’};
a string stored in label can have a maximum length of 9, so that there is space for the null character termination.
Note that since we only defined 9 ASCII characters above, the compiler initializes all subsequent array elements to zero (e.g. NULL). However, if you do not leave space for the NULL char, it can not be added. This will lead to errors.
J. E. Jones
OSU CSE 2421
The following two declarations are equivalent:
char label[10] = {‘c’, ‘o’, ‘n’, ‘t’, ‘a’, ‘i’, ‘n’, ‘e’, ‘r’}; charlabel[10]=“container”; /*Moreusualdeclaration*/
In both cases, the compiler will add the terminating null character when it stores the characters of the string in memory (if the length of the string is sufficient).
Also notice that a character enclosed in single quotation marks is treated as char, whereas zero or more characters enclosed in double quotation marks are treated as a string (a sequence of contiguous chars).
Simple I/O examples:
◦ scanf(“%s”, label); /* will read an ASCII string until whitespace found. */
◦ printf(“%s\n”, label);/* will print an ASCII string until null found */
NOTE: not *label likely expected for scanf(), but must be used for printf(), too. Later slides in this deck will further explain this.
Refer to the I/O format slides for more complex options.
J. E. Jones
OSU CSE 2421
Here are 4 possible ways to declare a string. In all but one of these options, the compiler will place a null character at the end of the string. Can you correctly pick which one?
char label1[10] = {‘c’,’o’,’n’,’t’,’a’,’i’,’n’,’e’,’r’}; char label2[] = {‘c’,’o’,’n’,’t’,’a’,’i’,’n’,’e’,’r’}; char label3[10] = “container”;
char label4[] = “container”;
J. E. Jones
OSU CSE 2421
Here are 4 possible ways to declare a string. In all but one of these options, the compiler will place a null character at the end of the string. Can you correctly pick which one?
char label1[10] = {‘c’,’o’,’n’,’t’,’a’,’i’,’n’,’e’,’r’};
char label2[] = {‘c’,’o’,’n’,’t’,’a’,’i’,’n’,’e’,’r’};
char label3[10] = “container”;
char label4[] = “container”;
All declarations will be null terminated except the label2 array.
J. E. Jones
OSU CSE 2421
Since strings are stored as arrays in memory, individual chars can be accessed using an index:
char *string1 = “quit”; /* string1 can change, but “quit” can’t */ char string2[12] = “slop”; /* string2 can’t change, but “slop” can */
if (string1[0] == ‘q’) {
string2[1] = ‘t’; /* string2 changes from “slop” to “stop” */
}
Remember: Since string1 is a read-only string, it is valid to read individual chars in the string, you cannot write to them. If you attempt to write individual char elements of a string which is in read-only memory, you will get segmentation faults. The only way to change the string string1 points to is string1 = “quip”;
J. E. Jones
OSU CSE 2421
You can assign a read-write string to a char * type of string variable; then, the char * string is read-write (Remember, a char * is just a pointer, and in this case, points to a string in read-write memory).
Example:
char *string1;
char string2[] = “Go Ducks!”;
string1 = string2;
string1[3] = ‘B’; /* No seg fault, because string1 */
/* points to a read-write string */
printf(“string1 is: %s\n”, string1);
/*NOTE: not *string1 – WHY???? */
J. E. Jones
OSU CSE 2421
HOWEVER, you cannot assign a char * type of string to a char array type of string.
char *string1 = “Go Bucks”;
char string2[11] = “Touchdown!”;
string2 = string1; /* INVALID – WHY?*/
J. E. Jones
OSU CSE 2421
HOWEVER, you cannot assign a char * type of string to a char array type of string.
char *string1 = “Go Bucks”;
char string2[11] = “Touchdown!”;
string2 = string1; /* INVALID – WHY?*/
The compiler will refuse to compile, and will output:
error: incompatible types when assigning to type ‘char[9]’ from type ‘char *’
J. E. Jones
OSU CSE 2421
Pointer arithmetic may be used with strings which are arrays of char (and with read-only strings which are char * and point to string literals, but only for reading):
char string1[8] = “quick”; char string2[5] = “dime”;
if (*(string1 + 0) == ‘q’) *(string2 + 1) = ‘o’;
printf(“%s\n”, string2);
What prints or is there a seg fault?
J. E. Jones
OSU CSE 2421
Pointer arithmetic may be used with strings which are arrays of char (and also with read-only strings which are char * and point to string literals, but only for reading):
char string1[8] = “quick”; char string2[5] = “dime”;
if (*(string1 + 0) == ‘q’) *(string2 + 1) = ‘o’;
printf(“%s\n”, string2);
dome is printed
J. E. Jones
OSU CSE 2421
Normally, printf() requires that all parameters are passed by value. ◦ (%c, %i,%d, %f).
When working with strings, however, this is not the case: char *string1 = “Bucks”;
printf (“%s\n”, string1);
Since string1 is a pointer, printf() gets passed the address where the first char in string1 is stored, and it will output a string of characters starting with the character at that address and subsequent addresses, until it encounters a NULL character.
Notice that dereferencing of the pointer is not used here (this may seem surprising, but since we used a string format specifier (%s), printf() “knows” that we want to print a sequence of chars, and it expects a pointer to the first char, so the dereference operator is not necessary. (And will produce segmentation fault errors if used).
If we wanted to print only a single char from string1, THEN we must use dereference so that we are passing by value: printf (“%c”, *string1) or printf(“%c”, string[0]);
J. E. Jones
OSU CSE 2421
The standard library has two types of character operation functions which operate on individual characters.
These characters can be contained in strings, and usually are.
The first type of function is used to classify characters according to certain characteristics, and the second type of function is used to transform them to a related character.
ctype.h is the header file where these functions are prototyped.
J. E. Jones
OSU CSE 2421
Each of these functions takes an integer argument which contains an ASCII character value.
All these functions return an integer value which is used as a Boolean.
They are listed on the next slide.
You can get a clue about what the function does from its name, but you can look up the details if you think one of these may be useful in some software you are writing.
J. E. Jones
OSU CSE 2421
iscntrl()
isspace()
isdigit() (also isxdigit() for hexadecimal digits) islower(), isupper()
isalpha()
isalnum()
ispunct()
isgraph()
isprint()
isblank()
J. E. Jones
OSU CSE 2421
These functions are used to transform a character value, which is passed as an integer argument, to a related character value, which is returned as an integer.
int tolower((int)ch); /*returns lower case if ch is upper case */
int toupper((int) ch); /*returns upper case if ch is lower case */
From the manual page for these functions:
BUGS
The details of what constitutes an uppercase or lowercase letter depend on the current locale. For example, the default “C” locale does not know about umlauts, so no conversion is done for them.
In some non-English locales, there are lowercase letters with no corresponding uppercase equivalent; the German sharp s is one example.
So, no issue unless you might be programming an application for an international environment, but some of you will likely be doing so.
J. E. Jones
OSU CSE 2421
strcat() strchr() strcmp() strcasecmp() strcpy() strlen() strlwr()
Appends a string
Finds first occurrence of a given character Compares two strings
Compares two strings, non-case sensitive Copies one string to another
Finds length of a string
Converts a string to lowercase
All these functions rely on strings terminating with NULL char for error free operation.
J. E. Jones
OSU CSE 2421
strncat() strncmp() strncpy() strnset() strrchr() strrev() strset() strspn() strstr(s1,s2)
Appends n characters of string
strupr()
Converts string to uppercase
Compares n characters of two strings
Copies n characters of one string to another
Sets n characters of string to a given character
Finds last occurrence of given character in string
Reverses string
Sets all characters of string to a given character
Finds first substring from given character set in string
Returns a pointer to the first occurrence of string s2 in string s1
See http://man7.org/linux/man-pages/man3/string.3.html for a complete list or on stdlinux, type the shell command man string.
J. E. Jones