Strings in C
Strings in C
CSE 2421
Required reading: Pointers on C, Chapter 9 through 9.4, 9.11 and 9.12
Strings in C – Intro
ANSI C does not have a string data type.
In ANSI C, strings are stored as sequences of ASCII characters in memory declared as an array of char data type values (i.e. a character array).
The individual characters will always be stored in contiguous bytes in memory.
When using strings in ANSI C (other than string literals passed to printf() or scanf() ), the header file, string.h, which contains prototypes and declarations needed to use string library functions should be included in any .c file you create
Strings in C
Note again how the strings are stored here:
Strings in C – Intro
There are two different kinds of data “strings” in C:
1. Strings which are stored as string literals (“read only”
strings)…anything within double quotes not allocated as an auto
or static variable:
“%s %s\n” /* string literals in scanf() or printf() */
“The value of x is %d and value of y is %4.2f\n”
char *title = “A Tale of Two Cities”; /* string literals that are assigned to a pointer */
/* the pointer is allocated space, the string is not */
2. Strings which are arrays of characters (read-write
strings)…anything declared within your program as an
an array of data type char where the binary values in each
byte are interpreted as ASCII characters:
char title[] =“Top Hat”; /* string declared as a single string */
char title[8]={‘T’,’o’,’p’,’ ‘,’H’,’a’,’t’,’\0’}; /* string declared as a series of ASCII chars */
Both string literals and char array strings must be terminated with a NULL character (‘\0’). The compiler will insert the NULL terminator when we define either type string with “”, but not when defined ASCII char by ASCII char. In the latter case, we must explicitly define the NULL at the end of the string.
4
Example declarations
Pointers are closely related to strings in C and, whenever we use a pointer to a string, we must be careful to distinguish between the two types of strings.
char *string1 = “Go Bucks!”; /* string1 can change, but “Go Bucks!” can’t */
char string2[10] = “Go Bucks!”; /* string2 can’t change, but “Go Bucks!” can */
string1:
The individual chars in string1 make up a string literal (a string constant).
string1 (the identifier) is a pointer to char and it is a variable (string1 can point to any string, not just “Go Bucks!”).
“Go Bucks!” will be stored in Read-Only memory
string2:
The individual chars in string2 are variables (we can change the chars in the string so the elements in string2 can be treated as they are in other arrays).
string2 (the identifier) is a constant pointer to char (just as the name of any array in ANSI C is normally a constant pointer to the first element).
“Go Bucks!” is stored in the (up to) 10 elements of the string2[] array which is Read-Write memory
In either case(RO or RW memory), the values stored in each of the individual characters would be the ASCII value of each alphanumeric or symbol and will end with a NULL value at the end of the string.
Examples
Consider this code:
char *string1 = “warehorse”;
char string2[] = “conteiner”;
string1[6] = ‘u’; /* Invalid – chars in a string literal, i.e.,
a read-only string, cannot be changed –
segmentation fault occurs */
string1 = “warehouse”; /* Valid – a pointer to char can be made to point to a different string; the
string which string1 points to is still a
read-only string, however. The address to
“warehorse” is lost in this case. */
string2[4] = ‘a’; /*Valid – any ASCII char in an char array string
can be changed – they are variables */
String Basics
A string is a sequence of zero or more characters which must be terminated by a null character (NULL byte), ‘\0’; the ASCII character code for ‘\0’ has the value 0. Thus, no string can contain this character except at its end.
The null character termination is one of the fundamental differences between arrays of characters (e.g. strings) and other arrays.
A string that contains nothing, “”, will actually contain one character: the NULL byte (‘\0’).
String Basics
The null character termination is important, because library functions such as printf() and scanf(), among many others, use it to mark and determine where the string ends (i.e. the length of the string).
Because of the null character termination, we do not have to pass the size or length of a character string to functions.
even though we must do so for all other array types or char arrays that do not contain ASCII strings
If the null character termination is not present, various library functions that manipulate strings will not behave correctly. In such cases, you will likely get segmentation faults when your code runs.
When you declare read only or read-write strings using “. . .”, the compiler stores ASCII representations of each character in your string and will add the null character termination at the end.
Null Character Termination Example
Example:
int main () {
char string[] = {‘e’, ‘x’, ‘a’, ‘m’, ‘\0’, ‘p’, ‘l’, ‘e’};
printf(“%s\n”, string);
return (0);
}
What will be printed?
Null Character Termination Example
Example:
int main () {
char string[] = {‘e’, ‘x’, ‘a’, ‘m’, ‘\0’, ‘p’, ‘l’, ‘e’};
printf(“%s\n”, string);
return (0);
}
What will be printed?
exam
The string appears to consist of 8 characters, but, because printf() treats the null character (‘\0’) as the end of the string, the final three characters that were stored in the char array, when it was initialized, are not printed by printf().
String Length and ‘\0’
The length of a string is the number of characters it contains not including the terminating null character, if one is present (it always should be).
Thus, when you use an array of characters as a string, you must make the size of the array one greater than the string length, or the maximum number of characters which the string can contain before the null character termination.
Example:
char label[10] = {‘c’, ‘o’, ‘n’, ‘t’, ‘a’, ‘i’, ‘n’, ‘e’, ‘r’};
a string stored in label can have a maximum length of 9, so that there is space
for the null character termination.
Note that since we only defined 9 ASCII characters above, the compiler initializes all subsequent array elements to zero (e.g. NULL). However, if you do not leave space for the NULL char, it can not be added. This will lead to errors.
Another Example
The following two declarations are equivalent:
char label[10] = {‘c’, ‘o’, ‘n’, ‘t’, ‘a’, ‘i’, ‘n’, ‘e’, ‘r’};
char label[10] = “container”; /* More usual declaration */
In both cases, the compiler will add the terminating null character when it stores the characters of the string in memory (as long as the length of the string is sufficient).
Also notice that a character enclosed in single quotation marks is treated as char, whereas zero or more characters enclosed in double quotation marks are treated as a string (a sequence of contiguous chars).
Review the slides on I/O in C to see how to read strings from input with scanf() if needful.
Dealing with Individual Chars in Strings
Since strings are stored as arrays in memory, individual chars can be accessed using an index:
char *string1 = “quit”; /* string1 can change, but “quit” can’t */
char string2[12] = “slop”; /* string2 can’t change, but “slop” can */
if (string1[0] == ‘q’) {
string2[1] = ‘t’; /* string2 changes from “slop” to “stop” */
}
Remember: Since string1 is a read-only string, it is valid to read individual chars in the string, you cannot write to them. If you attempt to write individual char elements of a string which is in read-only memory, you will get segmentation faults.
Assignments with strings
You can assign a read-write string to a char * type of string variable; then, the char * string is read-write (Remember, a char * is just a pointer, and in this case, points to a string in read-write RAM).
Example:
char *string1;
char string2[] = “Go Ducks!”;
string1 = string2;
string1[3] = ‘B’; /* No seg fault, because string1 */
/* points to a read-write string */
printf(“string1 is: %s\n”, string1);
Assignments with strings
HOWEVER, you cannot assign a char * type of string to a char array type of string.
char *string1 = “Go Bucks”;
char string2[11] = “Touchdown!”;
string2 = string1; /* INVALID – WHY?*/
Assignments with strings
HOWEVER, you cannot assign a char * type of string to a char array type of string.
char *string1 = “Go Bucks”;
char string2[11] = “Touchdown!”;
string2 = string1; /* INVALID – WHY?*/
The compiler will refuse to compile, and will output:
error: incompatible types when assigning to type ‘char[9]’ from type ‘char *’
Pointer Arithmetic with Strings
Pointer arithmetic may be used with strings which are arrays of char (and also with read-only strings which are char * and point to string literals, but only for reading):
char string1[8] = “quick”;
char string2[5] = “dime”;
if (*(string1 + 0) == ‘q’)
*(string2 + 0) = ‘t’;
printf(“%s\n”, string2);
What prints or is there a seg fault?
Pointer Arithmetic with Strings
Pointer arithmetic may be used with strings which are arrays of char (and also with read-only strings which are char * and point to string literals, but only for reading):
char string1[8] = “quick”;
char string2[5] = “dime”;
if (*(string1 + 0) == ‘q’)
*(string2 + 0) = ‘t’;
printf(“%s\n”, string2);
time is printed
A note on printf() and strings
Normally, printf() requires that all parameters are passed by value.
(%c, %i/,%d, %f).
When working with strings, however, this is not the case:
char *string1 = “Bucks”;
printf (“%s\n”, string1);
Since string1 is a pointer, printf() gets the address where the first char in string1 is stored, and it will output a string of characters starting with the character at that address, until it encounters a NULL character.
Notice that dereferencing of the pointer is not used here (this may seem surprising, but since we used a string format specifier (%s), printf() “knows” that we want to print a sequence of chars, and it expects a pointer to the first char, so the dereference operator is not necessary, and will produce segmentation fault errors if used).
If we wanted to print only a single char from string1, THEN we must use dereference so that we are passing by value: printf (“%c”, *string1) or printf(“%c”, string[0]);
Character Operations
The standard library has two types of character operation functions which operate on individual characters.
These characters can be contained in strings, and usually are.
The first type of function is used to classify characters according to certain characteristics, and the second type of function is used to transform them to a related character.
ctype.h is the header file where these functions are prototyped.
Character Classification
Each of these functions takes an integer argument which contains a ASCII character value.
All of these functions return an integer value which is used as a Boolean.
They are listed on the next slide.
You can get a clue about what the function does from its name, but you can look up the details if you think one of these may be useful in some software you are writing.
Character Classification Functions
iscntrl()
isspace()
isdigit() (also isxdigit() for hexadecimal digits)
islower(), isupper()
isalpha()
isalnum()
ispunct()
isgraph()
isprint()
Character Modification Functions
These functions are used to transform a character value, which is passed as an integer argument, to a related character value, which is returned as an integer.
int tolower(int ch); /*returns lower case if
ch is upper case */
int toupper(int ch); /*returns upper case if
ch is lower case */
String Library Functions in string.h
strcat() Appends a string
strchr() Finds first occurrence of a given character
strcmp() Compares two strings
strcasecmp() Compares two strings, non-case sensitive
strcpy() Copies one string to another
strlen() Finds length of a string
strlwr() Converts a string to lowercase
All of these functions rely on strings terminating with NULL char for error free operation.
24
String Library Functions in string.h cont.
strncat() Appends n characters of string
strncmp() Compares n characters of two strings
strncpy() Copies n characters of one string to another
strnset() Sets n characters of string to a given character
strrchr() Finds last occurrence of given character in string
strrev() Reverses string
strset() Sets all characters of string to a given character
strspn() Finds first substring from given character set in string
strstr(s1,s2) Returns a pointer to the first occurrence of string s2 in string s1
strupr() Converts string to uppercase
See http://man7.org/linux/man-pages/man3/string.3.html for a complete list
25
/docProps/thumbnail.jpeg