School of Computer Science
Third Year Undergraduate Papers
January Examinations 2021
1
Contents
30209LHAdvancedNetworking ……………. 3 30229 LH Machine Learning and Intelligent Data Analysis . . 7 32167LHNeuralComputation …………….. 9 30230 LH Programming Language Principles, Design, and
Implementation…………………. 12 30231LHSecurityofReal-WorldSystems . . . . . . . . . . . 15
2
30209 LH Advanced Networking
30209 LH Advanced Networking
Advanced Networking
Question 1
Ethernet was one of many candidates for local area networking, but is now the dominant technology. There is always some amount of luck and contingency in market success, but Ethernet’s dominance is now so total that it is likely there were also sound technical reasons.
(a) One of the technologies that Ethernet succeeded against was IBM Token Ring. How does Ethernet differ in operation from Token Ring? Why do you think that Ethernet proved more successful? Justify your answers. [4 marks]
(b) Ethernet has increased in bitrate from 3Mbps through 10, 100 and 1000 Mbps and is now pushing to 10Gbps, 40GBps and soon 100GBps. Explain how switching and full-duplex operation enabled this increase. Your answer should pay close attention to the practicality of collision detection in networks with bitrates in excess of 1Gbps.
[4 marks]
(c) Ethernet was originally very exposed: a passive observer could read all traffic, and an active attacker could send traffic posing as any other station. Explain how ethernet switches can be used to provide some protection against these attacks. How effective are these mechanisms, and to what extent are they useful? Justify your answer.
[12 marks]
2 3
Question 2
TCP is approaching its fortieth birthday. While modern implementations still interwork with older ones, many additional features have been added. These are largely in response to changes in the functionality and performance of underlying networks, and take advantage of the increased performance of modern computers.
For reference, the TCP header is shown here, taken from RFC793:
0123 01234567890123456789012345678901
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
— Source Port — Destination Port —
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
— Sequence Number —
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
— Acknowledgment Number —
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
— Data — —U—A—P—R—S—F—
— Offset— Reserved —R—C—S—S—Y—I—
— — —G—K—H—T—N—N—
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
— Checksum — Urgent Pointer —
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
— Options — Padding —
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
— data —
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
In this question, we are going to design a hypothetical “TCP version 2” which, which being recognisably TCP and therefore leveraging the extensive experience with this pro- tocol, can:
• fully utilise the performance available on high-speed, long-distance networks.
• replace UDP for fast “one question and one answer” protocols such as DNS and
SNMP.
• take advantage of modern computers’ increased memory and processing power as compared to 16-bit computers of the early 1980s.
• deliver high-quality streaming video and audio, including low-latency video confer- encing.
(a) Describe and justify the changes you would make to TCP in order to solve the problems listed above. You can change the header format, the state machine and
— Window — —
3 4
the typical libraries used to access network facilities. Your answer should include a new packet header format, showing your improvements, as well as some form of diagram to show the flow of packets in the areas you have changed. [12 marks]
(b) Explain the ways in which your new version of TCP will operate more effectively on modern networks. You can assume these networks have higher bitrates, lower loss and error rates and similar latency to the networks for which TCP was originally designed. [8 marks]
Marks will be awarded based upon:
• How well your proposals cover the four listed issues; • How technically plausible your solutions are;
• The clarity and precision of your answer.
4 5
Question 3
Wireless networks are increasingly common, and in many organisations have almost com- pletely replaced wired networks for client-device access.
(a) Explain the difference between WPA2 PSK (pre-shared key, also called WPA2 Per- sonal) and WPA2 Enterprise. What criteria should one use to inform the choice between these two systems? Why might some companies be tempted to use WPA2 PSK instead of WPA2 Enterprise? [6 marks]
(b) You are the network manager of a large shared office building which has many tenants from different organisations, all using a single shared wireless network secured with WPA2 PSK. You have been approached by one of the tenant organisations, who has concerns about the security of the wireless network, given the fact that everyone is using the same pre-shared key. They have asked you the following questions:
(i) Can other tenants within the building read the emails that we send and receive?
(ii) Can other tenants send emails that look like they come from our organisation?
(iii) Can other tenants see what URLs we visit when web browsing? What about the content of the web pages?
(iv) Can other tenants access the shared drive we have in the building (it is meant to be accessible only to members of our organisation).
Answer the questions as best you can. Mention any clarifications you would need to obtain, in order to give more precise answers. [4 marks]
(c) The tenant organisation that approached you suggests you should upgrade the whole network to WPA2 Enterprise. You have been asked to respond to the CEO of the tenant organisation, advising her of the best course of action.
Write a memo, of approximately half a page, briefing her on the issues. You should consider:
• What are the problems that using a single shared-key network for multiple enterprises will cause?
• What will need to be done to deploy WPA2 enterprise, and what will the costs be?
• What will be the benefits?
• Someone has suggested having both WPA2 PSK and WPA2 Enterprise at the same time. Could that work?
[10 marks]
5 6
30229 LH Machine Learning and Intelligent Data Analysis
30229 LH Machine Learning and Intelligent Data Analysis
Machine Learning and Intelligent Data Analysis
Question 1 Dimensionality Reduction
(a) Explain what is meant by “dimensionality reduction” and why it is sometimes nec- essary. [4 marks]
(b) Consider the following dataset of four sample points x(i)4i=1 with x(i) ∈ R2 ∀i: 4 1
X = 2 3 5 4
10
Explain how to calculate the principal components of this dataset, outlining each step and performing all calculations up to (but not including) the computation of eigenvectors and eigenvalues. [6 marks]
(c) What does principal component analysis (PCA) tell you about the nature of a multi- variate dataset? Explain how it can be used for dimensionality reduction? [4 marks]
(d) What are the limitations of PCA and what other dimensionality reduction techniques may be used instead? [2 marks]
(e) You are given a dataset consisting of 100 measurements, each of which has 10 variables. The eigenvalues of the covariance matrix are shown in the following table:
What can you say about the underlying nature of this dataset? [4 marks]
Question 2 Classification
(a) Consider the Soft Margin Support Vector Machine learnt in Lecture 4e. Consider also that C = 100 and that we are adopting a linear kernel, i.e., k(x(i),x(j)) = x(i)T x(j). Assume an illustrative binary classification problem with the following training ex- amples:
x(1) =(0.3,0.3)T,y(1) =1 x(2) =(0.6,0.6)T,y(2) =1 x(3) =(0.6,0.3)T,y(3) =−1 x(4) =(0.9,0.6)T,y(4) =−1
Which of the Lagrange multipliers below is(are) a plausible solution(s) for this prob- lem? Justify your answer.
Eigenvalue number
1
2
3
4
5
6
7
8
9
10
Eigenvalue
1382.0
508.4
187.0
68.8
25.3
9.3
3.4
1.3
0.46
0.17
2 7
(i) a(1) =0,a(2) =2,a(3) =2,a(4) =10 (ii) a(1) =0,a(2) =44,a(3) =22,a(4) =22
(iii) a(1) = 0, a(2) = 200, a(3) = 100, a(4) = 100
(b) Consider a binary classification problem where around 5% of the training examples are likely to have their labels incorrectly assigned (i.e., assigned as -1 when the true label was +1, and vice-versa). Which value of k for k-Nearest Neighbours is likely to be better suited for this problem: k = 1 or k = 3? Justify your answer.
[6 marks]
(c) Consider a binary classification problem where you wish to predict whether a piece of machinery is likely to contain a defect. For this problem, 0.5% of the training examples belong to the defective class, whereas 99.5% belong to the non-defective class. When adopting Na ̈ıve Bayes for this problem, the non-defective class may almost always be the predicted class, even when the true class is the defective class. Explain why and propose a method to alleviate this issue. [8 marks]
Question 3 Document Analysis
(a) In a small universe of five web pages, one page has a PageRank of 0.4. What does this tell us about this page? [2 marks]
(b) Compare and contrast the TF-IDF and word2vec approaches to document vectori- sation. You should explain the essential principles of each method, and highlight their respective advantages and disadvantages. [8 marks]
(c) One possible approach to searching a large linked set of documents is to combine a measure of document similarity such as TF-IDF similarity with a measure of a page’s importance such as that provided by PageRank. Suggest three ways in which this could be done and discuss the advantages and disadvantages of each of them.
[10 marks]
[6 marks]
3 8
Question 1
32167 LH Neural Computation
32167 LH Neural Computation
Neural Computation
Consider the neural network above where the units are defined for j ∈ {1, 2} as follows: zj =wj1x1+wj2x2+bj,
ezj z1 z2 pj = Q, whereQ=e +e .
The network is trained using stochastic gradient descent (i.e., minibatch size 1). For a training example with inputs x1,x2 ∈ R and correct output y ∈ {1,2}, the cost C is defined as
C = log(Q)−zy.
(a) How does the sum p1 + p2 relate to the inputs x1 and x2?
(b) Compute the partial derivatives ∂ C and ∂ C ∂z1 ∂z2
[4 marks]
[6 marks]
(c) Suppose that the gradient step increases the weight parameter w11, and that x1 > x2 > 0. What can you say about the correct output y ? Justify your answer.
[10 marks]
2 9
Question 2
Consider a set of two input-output pairs 1, 1, −1, −1. Let the loss function be the 22
least square and we wish to find a linear model x → x⊤w, where x⊤ is the transpose of x ∈ R2 and w = w1 ∈ R2. Then the objective function becomes
w2
11 2 1 2
J(w)=2 2 (1,2)w−1 +2 (−1,2)w+1 .
Let us build our prediction model by minimizing the above objective function.
(a) Simplify the objective function to the form of
J(w) = c1w12 + c2w2 + c3w1 + c4w2 + c5w1w2 + c6,
where ck ∈ R are coefficients, and wi is the i-th coordinate of w ∈ R2. After that,
compute the gradient of J(w) in terms of ck.
[8 marks]
(b) Consider gradient descent with the initial point w(0) = 31 and learning rate ε = 0.5. Write down the process of calculating w(1) and w(2). After that, calculate J(w(1)) and J(w(2)).
[6 marks]
(c) Consider gradient descent with momentum. Assume w(0) = 31, learning rate ε = 0.5 and momentum rate α = 0.5. Write down the process of calculating w(1) and w(2). After that, calculate J(w(1)) and J(w(2)).
[6 marks]
3 10
Question 3
A convolutional network is shown in the figure below. We use the nonlinear ReLU activa- tion function in the first layer and the linear activation function in the output layer. Note that for better visualisation, in the images we use white regions to denote 0 and darker regions to denote larger values.
(a) Design appropriate convolution kernels of size 3 × 3 for the first layer such that the feature maps 1 and 2 are these displayed in the figure. Justify your answer.
[5 marks]
(b) Design appropriate convolution kernels of size 3 × 3 for the output layer such that the output is that displayed in the figure. Justify your answer.
[5 marks]
(c) How many (free) parameters (weights) in the convolutional network do we need to train? Justify your answer.
[4 marks]
(d) In convolutional networks, apart from ReLU there exist other nonlinear activation functions such as Sigmoid and tanh. For the collection of neurons in a single layer, what would be the Jacobian matrices of the Sigmoid and tanh functions, respec- tively?
[6 marks]
4 11
30230 LH Programming Language Principles, Design, and Imple- mentation
Question 1
(a) Consider the following function, which we call M:
λf : B → B.λg : B → B.λx : B.λy : B.if x then f y else g y
(i) Prove (using a proof tree) that M is a well-typed expression of the Simply Typed λ-Calculus.
(ii) Provide examples of expressions F and G such that (M F G) is the exclusive or function (i.e., it returns true iff its two arguments are different).
(iii) The expression (M F G false true) has type B. Describe the safety guaran- tee we can deduce about this expression from the fact that it is a well-typed expression of the Simply Typed λ-Calculus.
(iv) Prove (using proof trees) that (M F G false true) computes to a value. [10 marks]
(b) Similar to the encoding of numbers in System F, we can define stacks as follows (we only define stacks of numbers here). The stack type is defined as follows:
• Stack=∀α.(N→α→α)→α→α
For example, the empty stack is s0 = λα.λf : N → α → α.λx : α.x; the stack s1 =λα.λf :N→α→α.λx:α.f nxisthestackthatcontainsonlythevalue n;andthestacks2 =λα.λf :N→α→α.λx:α.f n(f mx)isthestackthat contains the value m on top of the value n. More generally, a function of the form λα.λf : N → α → α.λx : α.f n1 (…(f nk x)…) encodes a stack where nk is stacked on top of nk−1, which is stacked on top of nk−2 etc., and where n1 is at the bottom of the stack. The push, peek, and pop operations are defined as follows:
• push=λn:N.λs:Stack.λα.λf :N→α→α.λx:α.s{α}f (f nx) • peek=λd:N.λs:Stack.s{N→N}GId,where
– G=λn:N.λg:N→N.λx:N.gn – I = λx : N.x
• pop=λs:Stack.λα.λf :N→α→α.λx:α.s{(α→α)→α}GCI,where
– G=λn:N.λg:(α→α)→α.λh:α→α.h(g(f n)) – C=λh:α→α.x
– I = λx : α.x
30230 LH Programming Language Principles, Design, and
Programming Language Principles, Design, and Implementation
Implementation
2 12
The function push pushes a number onto a stack, peek returns the last number pushed onto a stack, and pop removes the last element pushed onto a stack, i.e., the furthest one to the right. Note that peek takes a default value d as argument, which is returned when the stack is empty, i.e., (peek d s0) computes to d, while (peek d s1) computes to n, and (peek d s2) computes to m.
(i) Provide an example of a function of type Stack that encodes the stack where 1 is stacked on top of 0, which is stacked on top of another 0, which is at the bottom of the stack.
(ii) Prove (using a proof tree) that peek is a well-typed expression of System F.
(iii) Prove (using proof trees) that (peek d s2) computes to m, where d is an
expression of type N, and s2 is defined above.
(iv) Using the extension of System F with existential types we saw in the lectures, define a stack abstract datatype that hides this implementation and provides the following interface: the empty stack, push, peek, pop.
Question 2
Consider the following recursion combinator:
Y =λt.(λf.t(λz.ffz))(λf.t(λz.ffz))
(a) Express Y as an ASG.
[10 marks]
[5 marks]
(b) Explain, using an ASG abstract machine, how the following expressions are evaluated, given that the multiplication operator (∗) is a “shortcut” operator which does not evaluate its second argument if the first argument is 0.
(i) Y(λf.λx.0)1
(ii) Y(λf.λx.f(x)∗0)1
(iii) Y(λf.λx.0∗f(x))1
If an expression evaluates to a constant value show clearly the result and also the key intermediate steps in the ASG rewrite. If an expression diverges make a clear argument as to why it diverges and what the runtime behaviour will be. [10 marks]
(c) Furthermore:
(i) How would your answers above differ if multiplication ∗ always needed to eval- uate both arguments?
3 13
(ii) Mathematically, for any x, x ∗ 0 = 0 ∗ x = 0. Which one of the second and third expressions above can be optimised into the first expression? Explain your answer.
(iii) Can the first expression be optimised just to the constant value 0? Justify your answer.
[5 marks]
Hint: You may want to use a drawing tool to allow you to evaluate ASGs by copying-and-
pasting then editing rather than drawing every stage of the rewrite from scratch.
Question 3
Consider the labelled transition system (LTS) below:
(a) Below are several properties that could be verified on this LTS. For each one, state which classes of property it belongs to (e.g., invariant, safety, liveness), justifying your answer, and translate it into LTL. Assume that a and b are atomic propositions.
(i) a is true whenever b is true;
(ii) a and b are simultaneously true only a finite number of times;
(iii) every b is immediately followed by a;
(iv) exactly one of a or b (but not both) is eventually true;
(v) if a ever becomes true, then it remains true forever, and this is immediately preceded by a state where b was true.
[6 marks]
(b) Illustrate the LTL model checking procedure for verifying property (iii) from above against the LTS. [6 marks]
(c) Let us now relax property (iii) to “every a is followed by b within at most k steps”. Assume a new LTL operator ♦≤kψ, which states that ψ becomes true within k steps (in other words, for a trace σ = A0A1A2 …, we have σ |= ♦≤kψ if and only if AiAi+1Ai+2 … |= ψ for some i ≤ k). Write the new property in this extension of LTL and discuss how it could be model checked on the LTS above. Then discuss how general LTL model checking could be extended to include this new ♦≤k operator.
[8 marks]
s0 s1 s2 {a,b} {a,b}
4 14
30231 LH Security of Real-World Systems
30231 LH Security of Real-World Systems
Security of Real-World Systems
Question 1
Download the binary at: https://www.cs.bham.ac.uk/ ̃tpc/RWS2020/binary
(a) The binary includes a password check, what is the password? Describe how it is
checked, write no more than one paragraph. [5 marks]
(b) There is a possible buffer overflow attack against this binary, where is it? What
input will trigger it and what values can be overflowed? [5 marks]
(c) Describe the most common defences that stop buffer overflow attacks, and how (if at all) advanced attacks can be used to overcome these defences. Write no more
than 1 page.
Question 2
The station to station protocol runs as follows:
A→B : gx
B→A : gy,{SignB(gx,gy)}gxy A→B : {SignA(gx , gy )}gxy
[10 marks]
(a) What would be the effect (in terms of secrecy and authentication of A and B) of replacing the symmetric encryption with public key encryption for A and B, i.e.,
A→B : gx
B→A : gy , EA(SignB(gx , gy )) A→B : EB(SignA(gx,gy))
If there are any attacks write them in Alice and Bob notation and describe them.
[8 marks]
(b) How would you add “post compromise security” to this protocol? State the security goals you aim to achieve, write your protocol extension in Alice and Bob notation, and explain how it works.
[12 marks]
2 15
Question 3
A smartcard allows you to compute the RSA signature of a chosen value, i.e. given input x, it computes sigsk (f (x)), where f () is a deterministic, known padding function, and sk is the 2048-bit private key. The corresponding public key pk is also known.
In addition, the smartcard has a function to update the private key sk. This function does not require authentication to be used. More precisely, you can send the message update sk(i, v), which sets the i’th 16-bit word in the private key to the given 16-bit value v. The remainder of the key will not be changed.
(a) Why is the described key update function update sk () insecure, i.e. allows an adver- sary to recover the key? Briefly explain. Hint: We are concerned with confidentiality of the key, i.e. simply stating that updates are not authenticated will not be accepted.
[3 marks]
(b) Describe an algorithm that uses the weakness in update sk () to recover the complete secret key sk. [11 marks]
(c) How many invocations of sigsk () and update sk() are required in the worst case to recover the first 16-bit word of sk? [2 marks]
(d) How can the security issue that you identified be addressed? [4 marks]
3 16