CS代考 4CCE1PHC

4CCE1PHC
1 Introduction
Files & Strings
October 18, 2021
This lab introduces you to string manipulation techniques in C and how to use standard library functions to read from/write to text files.
At the end of the lab, you should be able to: • Perform basic string manipulation.
• Read in and process a text file.
Before starting work you should read through this document and complete the preparatory tasks detailed
in §2.
2 Preparation
Materials
Make sure you have everything you need to complete this lab.
Ensure that you have the following software installed on your computer: (i) a terminal program (e.g., xterm or similar on Linux/Mac OS, cygwin on Windows), (ii) a text editor (e.g., gedit or similar on Linux/Mac OS, notepad++ on Windows), (iii) the GNU compiler collection (i.e., gcc).
Self-study
Read §3 below and read up on any new material (in textbooks and/or online) that you need to, to accomplish the tasks. Remember to make notes of where you got the information, so that you can quickly go there again if something does not work as expected.
Answer the following questions:
1. Look up the definitions for FILE, fopen, fclose, fgetc and EOF and describe their purpose.
• FILE is a special data type that identifies a stream to a file (i.e., it allows you to read from/write
to files on your computer).
• fopen opens a file for reading/writing,
• fclose closes a file for reading/writing,
• fgetc gets a character from a file, and
• EOF is a special symbol to indicate the end of a file.
2. Which header file do you need to include to use the functions for isalpha() and toupper()? What do they do?
isalpha() and toupper() are contained in the header ctype.h.
• isalpha() checks whether the character passed to it is alphabetic (i.e., a letter, not a number). • toupper() converts lowercase letters to uppercase ones.
Dr Matthew Howard 1 Deparment of Engineering © 2020 King’s College London

4CCE1PHC
1 #include /*
2 #include /*
3 #include /*
4
Include the stdio libary, for printing, I/O. */ Include the string libary, for handling strings. */ Optional: Include library for handling errors and displaying messages. */
5
6 #define SUCCESS 0 /* Define what to return if program runs successfully … */
7 #define FAILURE -1 8
9 int main(void) {
/* … and what to return if not. */
10
11 char filename[] = “test.txt”; /* 12
13
14 int return_value;
15
16 intch=0; /* 17
18 FILE * file_p;
19
20 file_p = fopen(filename ,
21 if(file_p == NULL)
22 23 24
25 {
26 printf(“%d␣%s\n”,errno, strerror(errno)); /* Use the errno library to
27
28 return_value = FAILURE;
29 }
30 else
31
32 {
33 ch = fgetc(file_p);
34 while(ch != EOF)
35
36 {
37 printf(“%c”,ch);
38 ch = fgetc(file_p);
39 }
40 return_value = SUCCESS;
41 }
42 fclose(file_p);
43
44 return return_value; 45
46 }
print an error message. */
/*
/*
“r”);/* /*
Assume the text file is called test.txt, and it is in the same directory the program is in. */
Declare a variable to hold the
/* /*
/* /*
/* /*
/* /* /*
Set the return value to
Otherwise , if we opened file successfully … */
Get the first character
Loop, until we reach the end of file special symbol. */
Print the current character to stdout. */ Get the next character from the file. */
Set the return value to signal success. */ Close the file, when done. */
Return the return_value to indicate success or failure. */
return value.*/
Declare and initialise ch , of type
for holding characters. */
Declare file pointer for accessing the file. */
Open the
Check if
the file
which case, fopen will return the NULL pointer ). */
file. */
there was an error in opening (e.g., file not found , in
signal failure. */ the
from the file. */
int ,
Listing 1: Sample program for §3.1.
3. Characters in C are represented in ASCII format. What value is stored in memory to represent the character A? 65.
4. Draft a short program to open a text file for reading and display its contents on stdout. See sample code in Listing 1.
Dr Matthew Howard 2 Deparment of Engineering © 2020 King’s College London

4CCE1PHC
3 3.1
1.
2.
3.2
Laboratory Work Reading from files
Take your code from §2 and produce a program which reads in a text file and prints the contents to stdout. Create a test file in the same folder as your program and name it something like test.txt. Test if your program successfully prints the text of the test file out. Test what happens if the test file is not in the same folder.
Add a function in your program called words() which takes a filename and a calculates the number of words in the text file.
Hint: Remember that in the English language, words can be delimited by spaces and punctuation. See sample code in Listing 2.
String Similarity
Many interesting problems require the comparison of different (sets of) strings, from DNA analysis (e.g., test- ing whether two strings of DNA are similar) to document classification (e.g., spam detection and filtering). In this part of the lab, you will explore some simple techniques for this in the context of matching DNA fragments containing the bases ’A’, ’C’, ’G’, ’T’ and ’U’. Do the following.
1. Extend your program from §3.1 to include a function called contains() which takes a string and a 5-element array as arguments and checks whether the letters ’A’, ’C’, ’G’, ’T’, ’U’ appear in the string. If a letter appears, the corresponding element of the array should be set to 1, otherwise the element should be set to zero (so, for example, if the string is “ACCG”, the array should contain the elements {1,1,1,0,0}). Test your function on the strings “AAAA”, “ACTU” and “GGUU”, and verify that it works as expected. Sample code implementing the contains() function is given in Listing 3.
2. Use your array.h library from the Arrays & Pointers lab to create a function d_euclid() that takes a pair of arrays, as computed by the contains() function, treats them as vectors, and computes the (Euclidean) distance between them, i.e.,
d = 􏰂(a − b)⊤(a − b). (1)
To test your function, use it to compute the similarity (i.e., distance) between the following pairs of strings: “AAAA” and “AAAA”, “AAAA” and “AAAC”, “AAAA” and “GGGG”. Verify that it works as expected. Sample code implementing the d_euclid() function is given in Listing 4. This imple- mentation can be tested by calling the following in main():
1 2 3 4 5 6 7 8 9
10 11 12 13 14 15
Dr Matthew Howard © 2020
int a[5],b[5]; /*Set up arrays to data about bases. */ contains(a,”AAAA”);/*Determine which bases are in string a.*/ contains(b,”AAAA”);/*Determine which bases are in string b.*/ printf(“Similarity␣between␣string␣a␣and␣string␣b␣=␣%f\n”,
d_euclid(a,b)); /* Display the similarity score. */
contains(a,”AAAA”);/*We can reuse the arrays a[] and b[] */ contains(b,”AAAC”);/*for these tests too… */ printf(“Similarity␣between␣string␣a␣and␣string␣b␣=␣%f\n”,
d_euclid(a,b));
contains(a,”AAAA”);
contains(b,”GGGG”); printf(“Similarity␣between␣string␣a␣and␣string␣b␣=␣%f\n”,
d_euclid(a,b));
3 Deparment of Engineering King’s College London

4CCE1PHC
1 #include /* Include the stdio libary, for printing, I/O. */
2 #include /* Include the string libary, for handling strings.
3 #include /* Optional: Include library for handling errors */
4
5 #define SUCCESS 0 /* Define what to return if program runs successfully
*/
6 #define FAILURE -1 7
8 /* Function to count number of words
9 * the number of spaces in the text.
10 int words(const char * filename)
11 {
12
13 int return_value; /* 14
15 intch=0; /* 16
17 int wc;
18
19 FILE * file_p; 20
21 file_p = fopen(filename ,
22 if(file_p == NULL)
23 24 25
26 {
27 printf(“%d␣%s\n”,errno, strerror(errno)); /* Use the errno library to
print an error message. */ Set the return value to signal failure. */
Otherwise , if we opened the
file successfully … */
Assume at least one word (until we find our first space ). */
/*
/*
“r”);/* /*
28
29 return_value = FAILURE;
30 }
31 else
32
33 {
34 wc=1;
35
36 do
37 {
38 ch = fgetc(file_p);
39 if(ch==’␣’)
40
41 {
42 wc++;
43 }
44 }
45 while(ch != EOF);
46
47 return_value = wc;
48 }
49
50 fclose(file_p);
51
52 return return_value;
53
54
55 }
56
57 int main(void) {
58
59 char filename[] = “test.txt”; 60
61
62 int wc = words(filename);
63 if(wc>0)
64
65 {
66 printf(“Number␣of␣words␣in␣%s␣=␣%d\n”,filename ,wc);
67 }
68 else
69 {
70 printf(“Couldn’t␣determing␣the␣number␣of␣words␣in␣%s.”,filename);
71 }
72
73 return SUCCESS;
74 }
Listing 2: Sample program for counting the words in a text file.
Dr Matthew Howard 4 © 2020
/* /*
/*
/* /*
/* /*
/* /*
/*
/* /*
Get the next character from the file. */ If the character is a space, then add one to the word count. */
… */ /* … and what to return if not. */
in a file based purely on */
Declare a variable to hold
return value.*/
Declare and initialise ch,
for holding characters. */
Declare and initialise wc,
for holding the word count. */ Declare file pointer for accessing file. */
Open the
Check if
the file
which case, fopen will return the NULL pointer ). */
file. */
there was an error in opening (e.g., file not found , in
Loop, until we reach the end of file special symbol. */
Set the return value to signal failure. */
Close the file, when done. */
Return the return_value (either the number of words , if successful , or the failure code if not. */
Assume the text file is called test.txt, and it is in the same directory the program is in. */
Call the function to count the words. */ If the count is non-zero (i.e., there
were no errors), print number of words. */
the
of type of type
int , int , the
Deparment of Engineering King’s College London

4CCE1PHC
1 /* Function to test if a string contains one of the characters 2 * A, C, G, T or U. */
3 void contains(int a[5], const char * dna)
4{
5 int i;
6
7 for (i=0;i<5;i++) 8 9{ 10 a[i]=0; 11 } 12 13 i=0; /* Declare some temporary storage space for the loop counter. */ /* Initialise a[] to make sure all elements are zero to begin with. */ /* Set the ith element of a[] to zero. */ /* As we are reusing i in the do-while loop, we need to re-initialise it to zero. */ 14 15 do 16 17 18 19 { 20 if 21 22 } 23 else if(dna[i]==’C’) { 24 25 a[1]=1; 26 } 27 else if(dna[i]==’G’) { 28 a[2]=1; 29 30 } 31 else if(dna[i]==’T’) { 32 a[3]=1; 33 34 } 35 else if(dna[i]==’U’) { a[4]=1; 44 45 return ; 46 } Dr Matthew Howard © 2020 (dna[i]==’A’) { a[0]=1; /* /* /* /* /* If the ith character of the DNA */ string is ’A’, set a[0] to one. */ Or if the ith character of the */ string is ’C’, set a[1] to one. */ ..andsoonforG,TandU...*/ /* The length of the dna string is not specified so we use a do-while loop to iterate through each character until we reach the string termination character. */ 36 37 38 } 39 i++; 40 41 42 } 43 while(dna[i] != ’\0’); /* Check if we have reached the end of /* the string, and leave the loop if so. */ Listing 3: Sample contains() function. 5 Deparment of Engineering King’s College London Increment counter so we move to the next character of the string. */ 4CCE1PHC 1 /* Function to find the Euclidean distance between two integer 2 * arrays of length 5. */ 3 float d_euclid(const int a[5], const int b[5]) 4{ 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 int i; float d = 0; /* /* Declare some temporary storage space for the loop counter. */ Declare a floating point variable for storing the calculated distance. Initialise it to zero. */ Iterate through each element of the arrays. */ } and return it. */ Listing 4: Sample d_euclid() function. for(i=0;i<5;i++) /* { d = d + pow(a[i]-b[i],2); /* Add the square of the difference of the ith elements. */ } return sqrt(d); /* Take the square root of the result Dr Matthew Howard © 2020 6 Deparment of Engineering King’s College London 4CCE1PHC 3. Consider the case that you are given DNA fragments from two people. The fragment from person A is "ACCG" and that from person B is "GTUU". Moreover, you have a fragment "ATUU" from an unknown source. Use your program to determine whose DNA is the closest match to the unknown fragment. A sample of what to call in main() is the following: 4 4.1 Optional Additional Work Simple Translator 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 int person_a[5], /* Set up arrays to hold the data */ person_b[5], /* about bases contained in the */ fragment[5]; /* DNA fragments. */ contains(person_a ,"ACCG");/*Determine which bases are in */ */ contains(person_b ,"GTUU");/*the three fragments. contains(fragment ,"ATUU"); float d_a=d_euclid(person_a ,fragment),/*Compute similarity */ d_b=d_euclid(person_b ,fragment);/*scores. */ if(d_ad_b) /* Otherwise, it could be from person B. */ {
printf(“Fragment␣is␣more␣similar␣to␣DNA␣from ␣␣␣␣␣␣␣␣person␣B.\n”);
}
else /* Don’t forget the possibility that they are equally similar! */
{
printf(“Fragment␣is␣equally␣similar␣to␣DNA
␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣␣from␣both␣people.\n”); }
Adapt your program from §3.2 to compute work with the full 26 letters of the English alphabet. Add a function called hamming() that computes the Hamming distance between two strings.1 Use your program to detect which of the following words from different languages means ’Brother’ in English:
“ama”, “bror”, “madre”, “brudder”, “birodar”, “brawd”, “matka” 4.2
Adapt your program from §3.1 to encrypt your text file using the .2 Your program should take the cipher key (i.e., offset) as an input, and write a new file containing the encrypted version.
1 https://en.wikipedia.org/wiki/Hamming_distance 2 https://en.wikipedia.org/wiki/Caesar_cipher
Dr Matthew Howard 7 Deparment of Engineering © 2020 King’s College London