MGTS7526 Assignment – Risk Modelling Assignment Sheet
1. Harry goes driving (10 marks)
Harry is going to go for a drive and wants to know the risk in doing so. He is particularly concerned about the driving habits of other drivers.
He looks out the window and sees that it is raining and knows that this makes a car accident more likely because the weather is a known contributor to the incidence of car crashes. More crashes are expected when it rains because of the slippery conditions that ensues (than when it is fine).
Based on annual statistics, Harry knows that the chance that it will rain on any given day is 1/20. Harry is also aware of key statistics in car accidents – he knows another factor in the risk of having a car accident is whether a driver on the road is a careful driver (drives to the conditions) or a risky driver (drives quickly regardless of the conditions). Most drivers are careful, but it is the 10% that are risky.
Harry also knows (because he is a careful driver himself) that the probability of a careful driver having a car accident in dry conditions (i.e. not raining) is 2% whereas under the same conditions, the probability of a risky driver having an accident is 9%. When it rains, the chance that a careful driver having an accident increases to 1/20 whereas the chance of a risky driver having an accident when it rains is four times higher than for a careful driver.
Finally, Harry knows that he can turn on the radio and listen to the traffic report, which is accurate in reporting a traffic accident (when there is one) 95% of the time, and only reports a traffic accident (when there isn’t one) 5% of the time. Harry does turn on the radio and the traffic report states that there has indeed been a traffic accident.
Construct a Bayesian network to show the probability that there are risky drivers on the road given what Harry knows. Explain your Bayesian network and how you obtained your answer.
2. Horse Race (10 marks)
Let’s assume that there is a race between two horses: Sea Biscuit and Caviar, and you want to determine which horse to bet on. Sea Biscuit and Caviar have raced against each other on 25 previous occasions, all two-horse races. Of these last 25 races, Sea Biscuit won 18 of them. Therefore, all other things being equal, the probability of Sea Biscuit winning the next race can be estimated as 18/25 or 0.72 or 72%. However, on five of Caviar ‘s previous wins, it had rained before the race. It had rained on three of the days that Caviar lost. On the day of the race in question, it is raining.
Construct a Bayesian network to show the probability of Caviar winning the race. Explain your Bayesian network and how you obtained your answer.
3. Airplane Crash (10 Marks)
Airplane travel is one of the safest modes of transport there is – crashes are thankfully rare. However, they do occur and this is linked to four determining factors: Bad_Weather, Human_Error, System_Failure and Terrorism. Each of these four factors is fairly rare, which is characterised in the following prior probabilities:
P(Bad Weather = True) = 3/10,000 P(System_Failure = True) = 0.02% P(Human_Error = True) = 2/10,000 P(Terrorism = True) = 1.9/10,000
Construct a Boolean model and use the OR function to parameterise a Boolean Airplane_Crash node using these four factors as inputs. Use this model to estimate the probability that Terrorism was the cause of a plane crash after Bad_Weather and System_Failure have been ruled out. Show all your working including the function used.
4. Flower Breeding (15 marks)
You are a flower breeder. The plant you are breeding can either have red flowers or white flowers. You know that the colour of a flower depends on the genotype of the plant. The gene for red flowers (represented by R) is a dominant gene and the gene for white flowers (represented by r) in a recessive gene. Therefore, a plant with the genotype RR or Rr has red flowers, while a plant with the genotype rr has white flowers. Hence, the colour of a plant’s flowers is influenced by its genotype (as shown in Figure 1) and the probability of a plant having red or white flowers, given its genotype, is shown in Table 1.
Plant_1_Genotype
RR 33.3 Rr 33.3 rr 33.3
Plant_1_Flower_Colour
Red 50.0 White 50.0
Figure 1: Diagram showing that a plant’s genotype influences the colour of its flowers.
2
Table1: Probability of a plant having red or white flowers given its genotype.
Genotype
Probability of flower colour (%)
Red
White
RR
100
0
Rr
100
0
rr
0
100
When breeding flowers you know that the genotype (and therefore flower colour) of an offspring is influenced by the genotype of its parents (as shown in Figure 2).
Plant_1_Genotype
RR 33.3 Rr 33.3 rr 33.3
Plant_2_Genotype
RR 33.3 Rr 33.3 rr 33.3
O f f sp r i n g _ G e n o t y p e
RR 33.3 Rr 33.3 rr 33.3
Figure 2: Diagram showing that the genotype of an offspring is influenced by the genotype of its parents.
You also know that the following parent crosses are possible:
• If two plants of genotype RR are mated, then the offspring will always be RR.
• If two plants of genotype rr are mated, then the offspring will always be rr.
• If a plant of genotype RR is mated with a plant of genotype Rr, then the offspring will
always get an R from one parent and may get an R or an r from the other parent,
which means it could be of genotype RR or Rr.
• If a plant of genotype RR is mated with a plant of genotype rr, then the offspring will
always get an R from one parent and will always get an r from the other parent,
which means it will always be of genotype Rr.
• If a plant of genotype Rr is mated with a plant of genotype Rr, then the offspring may
get an R or r from one parent and an R or r from the other parent, which means it
could be of genotype RR, Rr or rr.
• If a plant of genotype Rr is mated with a plant of genotype rr, then the offspring may
get an R or r from one parent and will always get an r from the other parent, which means it could be of genotype Rr or rr.
For the above crosses, the probability of offspring being a particular genotype is given in Table 2.
3
Table 2: Probability of offspring genotypes given the genotypes of the parents
Parent Genotypes Probability of Offspring Genotype (%)
Parent 1 Parent 2 RR
Rr
rr
RR RR 100
0
0
rr rr 0
0
100
RR Rr 50
50
0
RR rr 0
100
0
Rr Rr 25
50
25
Rr rr 0
50
50
Finally, for plants with unknown parent genotypes, you know that the probability of them being genotype Rr is 50%, while the probability of them being of genotype RR or rr is 25%.
Now suppose you have two plants. The genotypes of their parents are unknown; however the flowers of both plants are red. You mate these two plants to produce a first generation offspring. This offspring is then mated with a third plant, with white flowers, to produce a second generation offspring.
Construct a Bayesian Network and use it to determine the probability that the second generation offspring will have red flowers? Explain your Bayesian network and how you obtained your answer.
5. Who is Guilty? (25 Marks)
You are the Judge presiding over a case where a house has been burgled. At the crime scene, it was found that a window had been broken and the fingerprints found on this window were matched to a local (Mel). The items stolen from the burgled property were then found at the home of another local (Kim).
Both Mel and Kim protested their innocence, and both have produced alibis.
Kim said that she had no idea how the items from the burglary had turned up in her home as she had been away for the last two days camping in the forest with her friend (Liam) and that she had only just returned home. Liam says that he will testify that this is true.
Mel explained the presence of her fingerprints on the broken window as being the result of an incident that happened a few days prior to the burglary. Mel and her friend (Noel) were walking home from a restaurant when she thought she heard someone yell her name out from this property. Mel said that she responded to this by climbing on the same window that was now broken. Noel says that he will testify that this is true.
It is noted that the windows of the burgled property, including the window that was broken in the burglary, were reportedly cleaned on the day of, but prior to, the burglary taking place, which would have almost certainly removed any earlier fingerprints.
4
Finally, a previous conviction supposedly showed that Mel and Kim had worked together in the past, which led to the conviction of Mel as an accessory to robbery. This suggests that Mel and Kim work as a team.
Guilty Hypotheses
Based on this information you determine that there are three, and only three, distinct causal pathways that need to be considered:
• Mel and Kim are guilty of the burglary and worked together
• Mel is guilty of the burglary and worked alone
• Kim is guilty of the burglary and worked alone
Alibis
As the Judge, you consider that the testimony of the alibis for Mel (Noel) and Kim (Liam) be considered in determining which of these three guilty hypotheses is most likely. However, you are reasonably uncertain about the likelihood that Kim went camping and that Mel climbed on the now broken window a few days before the burglary occurring, along with the validity (accuracy) of the evidence provided by the alibis. You therefore apply the following priors:
• P(Accuracy of Noel’s evidence = LOW [i.e. low quality]) = 45%
• P(Kim went camping = FALSE) = 60%
• P(Accuracy of Liam’s evidence = LOW [i.e. low quality]) = 45%
• P(Mel climbed onto the window 2 days before the robbery = FALSE) = 60%
• You consider that there is a 50% probability that Mel is the Burglar (i.e. is True) and
acted on her own.
You also reasoned that if Kim went camping (True) and Liam’s evidence was accurate (High), then the probability that testimony that Kim went camping is True is 80%. Conversely if the accuracy of Liam’s evidence is low and Kim still went camping, then the probability that the testimony that Kim went camping is True falls to 40%. If Kim didn’t in fact go camping then the probability that the Testimony that Kim went camping is considered True is 20% if Liam’s evidence appears of high accuracy and 5% if Liam’s evidence appears of Low accuracy.
Finally, you reason that if Mel did climb onto the window then if the accuracy of Noel’s evidence is High, the probability that the testimony that Mel climbed onto the window is True = 95%. If the accuracy of Noel’s evidence is Low, then the probability that this testimony is False = 30%.
If Mel did not climb onto the window, then if the accuracy of Noel’s evidence is High, the probability that the testimony is False = 95%; and if the accuracy of Noel’s evidence is Low, the probability that the testimony is False drops to 40%.
Other evidence
You see that the reported presence of Mel’s fingerprints on the broken window as important evidence. You also believe that the accuracy of the fingerprint evidence and the evidence provided by the window cleaning company also be considered. The following
5
priors to the accuracy of the fingerprint testing and evidence given that the window was cleaned just before the burglary:
• P(Accuracy of fingerprinting = LOW) = 10%
• P(Accuracy of evidence of window cleaner = LOW) = 15%
• P(Window was cleaned = TRUE) = 90%
You reasoned that the following conditional probabilities existed for the evidence:
Window was cleaned
Accuracy of
evidence of window cleaner
Testimony that the window was cleaned (%)
True
False
False
High
10
False
Low
45
True
High
90
True
Low
55
Fingerprints on window
Accuracy of fingerprinting
Fingerprints match Mel’s (%)
True
False
True
High
10
True
Low
50
False
High
90
False
Low
40
You consider the following conditional probabilities:
• P(Kim broke the window = TRUE | Kim is the burglar = YES) = 100%
• P(Kim broke the window = TRUE | Kim is the burglar = NO) = 0%
• P(Mel & Kim are the burglers = TRUE | Testimony that Kim went camping = TRUE) =
5%
• P(Mel & Kim are the burglers = TRUE | Testimony that Kim went camping = FALSE) =
95%
• P(Kim is the burglar = TRUE | Testimony that Kim went camping = TRUE) = 5%
• P(Kim is the burglar = TRUE | Testimony that Kim went camping = FALSE) = 95%
• P(Mel broke the window = TRUE) is 100% if ‘Mel’ and/or ‘Mel and Kim are guilty.
Otherwise the probability for this child node = TRUE = 0%.
• P(Mel’s fingerprints are on the window = TRUE) is 100% for the following scenarios:
o ‘Mel broke the window’ is TRUE regardless of Noel’s or the Cleaner’s testimony;
o Noel’s ‘Testimony that Mel climbed window’ is TRUE and the ‘Testimony that the window was cleaned’ is FALSE.
o For all other scenarios P(Mel’s fingerprints are on the window = TRUE) is 0%
Construct a Bayesian network that you can use as the Judge to determine which of the three guilty hypotheses is more likely. Note that the basic structure for this BN is shown in Figure 3. However, this structure is incomplete – there are several nodes that need to be added to make sure that the BN functions correctly.
6
From your BN and what you currently know, state the probabilities that each of the three hypotheses is True. Explain your Bayesian Network, including the additions that you made to the basic structure, and how you obtained your answer.
Then update your BN based on the assumption that Liam’s testimony that Kim went
camping is 100% False, Noel’s testimony that Mel climbed on the window is also 100% False
and that the accuracy of the fingerprint process is 100% Low. State the updated
probabilities that each of the three guilty hypotheses is True. Explain where in your BN you
A
state0 100
entered your evidence and how you obtained your answer.
Kim went camping
Testimony that Kim went camping
Mel is the burglar Mel and Kim are the burglars
Mel_broke_the_window
Kim is the Burglar
Kim broke the window
Mel climbed on the window
Testimony that Mel climbed on the window
Mels fingerprints on window
Window was cleaned
Testimony that the window was cleaned
Figure 3: Diagram showing the basic structure for the ‘Who is guilty?’ Bayesian Network. Note that all nodes should have two states. Also note that the network is incomplete, requiring several nodes to be added so that it functions correctly.
6. Sydney Harbour Fish Kill (30 Marks)
It is a lovely summer’s day and you are enjoying a walk along one of the beaches of Sydney Harbour. Sydney Harbour is one of the most iconic harbours of the world. However, there has long been a problem with water quality in Sydney Harbour, especially heavy metals, nutrients and pathogens. You notice that there are dead fish at the water’s edge – this is known as a ‘fish kill’.
Use the following information provided to you to build a Risk Assessment Bayesian network model to answer the following questions:
• Given what you already know (i.e. enter any hard evidence that you know into your model) and the model you build, what is the probability that (1) the fish is contaminated; (2) there is an algal bloom; (3) there is risk to human health; and (4) the probability that there has been rainfall in the Middle Catchment?
• If you ring the authorities and they notify you that there has been food poisoning reported, and low contamination has been recorded, what are the probabilities that
Fingerprints match Mel
7
(1) the fish is contaminated; (2) there is an algal bloom; (3) there is risk to human health; and (4) the probability that there has been rainfall in the Middle Catchment?
Explain your Bayesian network and how you obtained your answer.
Information needed to build the model:
A major source of contamination into the harbour is from stormwater runoff during Rainfall (i.e. Stormwater_Contaminant_Loading). That is, when it rains, the rainwater will enter the stormwater system, carrying contaminants with it, discharging these contaminants into the harbour.
Rainfall itself has three states: 0 to 5 (mm of rainfall), 5 to 15 (mm of rainfall) and >= 15 (mm of rainfall). The amount of Rainfall (i.e. the probability of observing these three states) is dependent on the Season (Summer, Autumn, Winter, Spring – each of equal probability) as summarised in the following table.
A second factor that combines with the amount of Rainfall to directly affect the amount of Stormwater Contaminant Loading is where the rainfall occurs (i.e. the Catchment_Area that the rainfall occurs). The land around the harbour can be separated into three Catchment_Areas: the upper_catchment, the middle_catchment and the lower_catchment. The upper catchment area comprises 10%, the middle catchment comprises 70% and the lower catchment comprises 20% of the total catchment area respectively (thus these can be used as the priors).
Another major source of contamination into the Sydney Harbour is from the municipal sewer system (i.e. Sewer_Contaminant_Loading). Like the stormwater system, the important factors directly influencing this variable are rainfall and the catchment area where the rainfall occurs.
The stormwater contaminant loading has the following relationships with rainfall and catchment area:
• If the rainfall is 5mm or greater and occurs in the Upper catchment area, stormwater
loading will be ‘High’.
• If the rainfall is less than 15mm and occurs in the Lower catchment, stormwater loading
will be ‘Low’.
• If the rainfall is less than 5mm and occurs in the Middle catchment, stormwater loading
will be ‘Low’.
• If the rainfall is 15mm or more and occurs in the Middle catchment, stormwater loading
will be ‘High’
Season
Rainfall (%)
0 to 5
5 to 15
>=15
Summer
40
50
10
Autumn
60
35
5
Winter
85
14
1
Spring
60
35
5
8
• For all other scenarios, the stormwater loading will be ‘Medium’
The sewer contaminant loading has the following relationships with rainfall and catchment area:
• If the rainfall is 5mm of greater and occurs in the Lower catchment area, sewer loading
will be ‘High’
• If the rainfall is less than 15mm and occurs in the Upper catchment, sewer loading will
be ‘Low’.
• If the rainfall is less than 5mm and occurs in the Middle catchment, sewer loading will be
‘Low’.
• If the rainfall is 15mm or more and occurs in the Middle catchment, sewer loading will
be ‘High’
• For all other scenarios, the sewer loading will be ‘Medium’
Water quality contaminants in the Sydney Harbour
Information has been provided that details the mechanisms of contaminants entering the Sydney Harbour (i.e. stormwater and sewer). These mechanisms impact on the concentration of three types of contamination in the harbour: Heavy_Metals, Nutrients, Pathogens.
Heavy metals and nutrients are characterised by three states: Low (i.e. low concentration), Medium (i.e. medium concentration) and High (i.e. high concentration). For pathogens, it is of more interest that this contaminant is either present (True) or absent (False). All three are dependent on the Stormwater_Contaminant_Loading and the Sewer_Contaminant_Loading.
The relationship between Heavy Metals and its determining factors of stormwater and sewer contaminant loadings are:
• If both input loadings are Low, or if one input is Low and one is Medium, then the concentration of heavy metals is Low
• If one input loading is Low and the other input loading is High, or both input loadings are Medium, then the concentration of heavy metals is Medium.
• For all other combinations of the input loadings, the heavy metals is High.
The relationship between Nutrients and its determining factors of stormwater and sewer contaminant loadings are:
• If both input loadings are Low, then nutrients is Low.
• If one of the input loading is Low and the other input loading is High, or if both input
loadings are High, or if one of the input loading is Medium and the other input
loading is High, or if both input loadings are Medium, then nutrients is High.
• For all other combinations of the input loadings, the nutrients is Medium.
For the pathogens, the risk that these will be present (True) or not (False) is also dependent on the stormwater contamination load and the sewer contamination load. If the sewer contamination load is High, or if the sewer contamination load is Medium and the stormwater contamination load is not Low then pathogens will be present. For all other combinations of the input loadings, pathogens will be absent (False).
9
Contamination consequences
Contamination of fish by heavy metals
Heavy metals bioaccumulate in Fish and therefore it is of concern whether the fish in the harbour are contaminated (True) or not (False). If the heavy metals in the harbour are High, then there is a 65% chance that the fish will be contaminated. If the heavy metals in the harbour are Medium, then there is a 35% chance that the fish will be contaminated. Conversely, if the heavy metals in the harbour are Low, then there is a 90% chance that the fish will not be contaminated.
It is of interest to the authorities to measure and record the amount of heavy metals in the fish. It is known that the measurement technique can introduce error into the measurement (assumed priors: Low_Accuracy = 10%, High_Accuracy = 90%). Use the following table to set up the measurement component of the network.
Algal Bloom
For the nutrients, this can lead directly to algal blooms. The chance of an algal bloom is also dependent on the season, as warmer conditions (e.g. Summer) increases the chance of an algal bloom occurring. It is known that if nutrients are Low, then you will not get algal blooms (algal bloom = False) regardless of the season. If nutrients are Medium, then you will only get algal blooms (True) in Spring and Summer (and not in Autumn or Winter). If nutrients are High, then you will get algal blooms in Summer, Autumn and Spring but not Winter.
Fish Kills and Human Health
Your Bayesian Network needs to be able to evaluate the probability of environmental and human health concerns.
Environmentally, whether there is a fish kill (True) or not (False) is influenced by three factors: whether the fish is contaminated (True) or not (False); whether there is an algal bloom (True) or not (False); and whether pathogens are present (True) or not (False). These factors are not equally important – the relative importance , or weighted average, of these factors is 30% for contaminated fish, 50% for the algal bloom and 20% for pathogens.
The risk to human health (True = there is a risk, False = there is not a risk) is also dependent on whether the fish is contaminated or not, whether there is an algal bloom or not and whether there pathogens are present or not. As with Fish_Kills, Human_Health_Risk is dependent on a weighted average of these three factors. However, this time, the relative weighted of these factors is 50% for contaminated fish, 20% for the algal bloom and 30% for pathogens.
Contaminated Fish
Measurement Accuracy
Contamination Recorded (%)
Low
High
True
Low
40
60
True
High
5
95
False
Low
60
40
False
High
95
5
10
The final part of your risk model is associating the reporting of fish kills (reported_fish_kills) and reporting of human health impacts (reported_food_posioning). These last two nodes have the following probability tables:
Fish_Kill
Reported_Fish_Kills (%)
No
Yes
True
5
95
False
99
1
Human_Health_Risk
Reported_Food_Posioning (%)
Yes
No
True
90
10
False
10
90
11