Question 2 [15 marks]
A tech company has just invented an AI algorithm for tracking engagement during Zoom sessions. This
is hypothetical of course, though it is not too far fetched to have such feature in the near future. In
this question, we are interested in student and instructor engagement during CS lectures. Engagement
is measured by a complicated function. Within the specific context of a CS course, engagement
calculation takes into account how often students ask questions, the relevance of chat messages to
lecture content, the professor’s voice tone, poll response rate, number of attendees, and the facial
expressions of those with video on. The function finds the engagement score over a time period (not
at a single point in time) and returns a value between 0 and 100 for each time period.
Each time period is represented as an integer t rather than an interval or tuple. For example, t1
refers to the first, i.e. earliest, time period measured in that lecture (which could be minute 1, the
first 180 seconds, from 00:00:00 to 00:04:59, or the first half of the lecture). Analogously, t2 would be
the second time period measured in that lecture: minute 2, the second 180 seconds, from 00:05:00 to
00:09:59, or the second half of the lecture. Since a lecture is made up of a bunch of time periods, we
add the subscript i to indicate an ordering relationship between periods. A relationship between ti
and tj where i < j indicates that time period ti is temporally earlier than tj and there might be
other periods between them. You do not need to worry about the specific units of the time periods,
just that the periods are consecutive, are represented as integers, do not overlap, and are measured in
a consistent way within a single instance of the ADT.
Let L be the ADT storing the time periods (and any other necessary data) for a single lecture. We
wish to implement a data-structure that supports the following operations over L:
� Engagement(L,t): Given a single time period t in L, return the engagement score for that
time period.
� AverageEngagement(L,ti,tj): Given two time periods ti and tj from L where i ≤ j, return
the average engagement score (per time period) for that interval. This average should include
both endpoint time periods in the interval if i 6= j. For example, if t1 has engagement score
50, t2 has engagement score 67, t3 has 80, and t4 has 40, then the average engagement over the
interval t2 to t4 is 62.33.
� Update(L,t,e): Update the engagement score for time period t to e. You can assume that t
is already in L.
Your implementation will be based on an augmented AVL tree and you should make the following
assumptions:
1. n is the number of time periods in a zoom lecture. A lecture is broken up into a collection of time
periods t1, t2, ...., tn, all stored in L. Your algorithm must work for any size n. What t means as
a time period will always be consistent within a single L.
2. A tree L already exists for each lecture. (L.root is the root of the AVL tree.)
3. A node already exists for every time period ti in L where 1 ≤ i ≤ n. Each node has the
associated engagement score so you do not need to implement or use Insert.
4. The nodes in L have the in-order traversal t1, t2, ...., tn.
5. All BST and AVL operations covered in class can be used
Here is what you need to do.
(a) Describe any additional information that you will need to store in your AVL tree to implement the
ADT operations in worst-case running time of O log(n). To get full marks, your data structure
must not waste space by storing unncessary data.
1
(b) - (d) For each of the three operations (Engagement, AverageEngagement, and Update), explain the
algorithm used to implement the operation (in plain English) and justify that the operation
runs in the required worst-case time. For AverageEngagement, provide pseudocode as part
of your explanation.
(e) Assume that we drop assumption 3. That means L could have some of the ti for 1 ≤ i ≤ n but not
necessarily all. For example, if L only has the following time periods: t2 with engagement score 67,
t4 with engagement score 40, t5 with engagement score 50, and t7 with engagement score 80, then
the average engagement for the interval t2 to t7 is 59.25. The average here is the sum of period
scores that we have in the interval / the number of periods (again, that we have in the interval).
There is no need for interpolation or estimating the missing data. Describe briefly what you
would have to do and what extra information you have to store so that AverageEngagement
will calculate the average within the worst case asymptotic bound specified earlier. Describe any
changes you would make to the algorithm for AverageEngagement. You do not need to write
pseudocode.
2