This lab is built around the process of identifying the fault with a coffee machine.
Your task is to:
- Given the structure of a graphical model for the state of a coffee machine learn the distributions from data.
- Implement belief propagation, so you can evaluate the probability of each failure given the available evidence.
- Identify the most probable failure for a set of broken coffee machines.
Marking and Submission
These lab exercises are marked, and contribute to your final grade. For this lab exercise there are 3 places where you are expected to enter your own code, for 15 marks overall. Every place you have to add code is indicated by
# **************************************************************** n marks
with instructions above the code block.
Please submit your completed workbook using Moodle before 5pm on the 28th November 2018. The workbook you submit must be an .ipynb
file, which is saved into the directory you’re running Jupyter; alternatively you can download it from the menu above using File -> Download As -> Notebook (.ipynb)
. Remember to save your work regularly (Save and checkpoint in the File menu, the icon of a floppy disk, or Ctrl-S); the version you submit should have all code blocks showing the results (if any) of execution below them. You will normally receive feedback within a week.
%matplotlib inline
import zipfile
import io
import csv
import numpy
import matplotlib.pyplot as plt
import pandas as pd###############自己插的
Coffee Machine Data Set
You can download from moodle a zip file that contains the data as a csv file. The below code will load the data directly from the zip file, so you don’t have to unzip it.
Each row of the file contains a fully observed coffee machine, with the state of every random variable. The random variables are all binary, with False represented by 0 and True represented by 1. The variables are:
Failures (you’re trying to detect these):
-
he
– No electricity
fp
– Fried power supply unit
fc
– Fried circuit board
wr
– Water reservoir empty
gs
– Group head gasket forms seal
dp
– Dead pump
fh
– Fried heating element
Mechanism (these are unobservable):
-
pw
– Power supply unit works
cb
– Circuit board works
gw
– Get water out of group head
Diagnostic (these are the tests the mechanic can run – observable):
-
ls
– Room lights switch on
vp
– A voltage is measured across power supply unit
lo
– Power light switches on
wv
– Water visible in reservoir
hp
– Can hear pump
me
– Makes espresso
ta
– Makes a hot, tasty espresso
For the above the number is the column number of the provided data (dm
) and the two letter code a suggested variable name.
For if you are unfamiliar with an espresso coffee machine here is a brief description of how one works (you can ignore this if you want):
The user puts ground coffee into a portafilter (round container with a handle and two spouts at the bottom), tamps it (compacts the coffee down), and clamps the portafilter into the group head at the front of the machine. A gasket (rubber ring) forms a seal between the portafilter and group head. A button is pressed. Water is drawn from a reservoir by a pump into a boiler. In the boiler a heating element raises the waters temperature, before the pump pushes it through the group head and into the portafilter at high pressure. The water passes through the coffee grinds and makes a tasty espresso.
The graphical model showing how the variables are related is included on moodle as coffee machine.pdf
; here it is given as conditional probabilities:
Failures:
P_he:
P(𝚗𝚘 𝚎𝚕𝚎𝚌𝚝𝚛𝚒𝚌𝚒𝚝𝚢)P_fp:
P(𝚏𝚛𝚒𝚎𝚍 𝚙𝚜𝚞)P_fc:
P(𝚏𝚛𝚒𝚎𝚍 𝚌𝚒𝚛𝚌𝚞𝚒𝚝 𝚋𝚘𝚊𝚛𝚍)P_wr:
P(𝚠𝚊𝚝𝚎𝚛 𝚛𝚎𝚜𝚎𝚛𝚟𝚘𝚒𝚛 𝚎𝚖𝚙𝚝𝚢)P_gs:
P(𝚐𝚛𝚘𝚞𝚙 𝚑𝚎𝚊𝚍 𝚐𝚊𝚜𝚔𝚎𝚝 𝚜𝚎𝚊𝚕 𝚋𝚛𝚘𝚔𝚎𝚗)P_dp:
P(𝚍𝚎𝚊𝚍 𝚙𝚞𝚖𝚙)P_fh:
P(𝚏𝚛𝚒𝚎𝚍 𝚑𝚎𝚊𝚝𝚒𝚗𝚐 𝚎𝚕𝚎𝚖𝚎𝚗𝚝)
Mechanism:
P_pw_he_fp:
P(𝚙𝚜𝚞 𝚠𝚘𝚛𝚔𝚜|𝚗𝚘 𝚎𝚕𝚎𝚌𝚝𝚛𝚒𝚌𝚒𝚝𝚢,𝚏𝚛𝚒𝚎𝚍 𝚙𝚜𝚞)P_cb_pw_fc:
P(𝚌𝚒𝚛𝚌𝚞𝚒𝚝 𝚋𝚘𝚊𝚛𝚍 𝚠𝚘𝚛𝚔𝚜|𝚙𝚜𝚞 𝚠𝚘𝚛𝚔𝚜,𝚏𝚛𝚒𝚎𝚍 𝚌𝚒𝚛𝚌𝚞𝚒𝚝 𝚋𝚘𝚊𝚛𝚍)P_gw_cb_wr_dp:
P(𝚐𝚎𝚝 𝚠𝚊𝚝𝚎𝚛|𝚌𝚒𝚛𝚌𝚞𝚒𝚝 𝚋𝚘𝚊𝚛𝚍 𝚠𝚘𝚛𝚔𝚜,𝚠𝚊𝚝𝚎𝚛 𝚛𝚎𝚜𝚎𝚛𝚟𝚘𝚒𝚛 𝚎𝚖𝚙𝚝𝚢,𝚍𝚎𝚊𝚍 𝚙𝚞𝚖𝚙)
Diagnostic:
P_ls_he:
P(𝚕𝚒𝚐𝚑𝚝𝚜 𝚜𝚠𝚒𝚝𝚌𝚑 𝚘𝚗|𝚗𝚘 𝚎𝚕𝚎𝚌𝚝𝚛𝚒𝚌𝚒𝚝𝚢)P_vp_pw:
P(𝚟𝚘𝚕𝚝𝚊𝚐𝚎 𝚊𝚌𝚛𝚘𝚜𝚜 𝚙𝚜𝚞|𝚙𝚜𝚞 𝚠𝚘𝚛𝚔𝚜)P_lo_cb:
P(𝚙𝚘𝚠𝚎𝚛 𝚕𝚒𝚐𝚑𝚝 𝚘𝚗|𝚌𝚒𝚛𝚌𝚞𝚒𝚝 𝚋𝚘𝚊𝚛𝚍 𝚠𝚘𝚛𝚔𝚜)P_wv_wr:
P(𝚠𝚊𝚝𝚎𝚛 𝚟𝚒𝚜𝚒𝚋𝚕𝚎|𝚠𝚊𝚝𝚎𝚛 𝚛𝚎𝚜𝚎𝚛𝚟𝚘𝚒𝚛 𝚎𝚖𝚙𝚝𝚢)P_hp_dp:
P(𝚌𝚊𝚗 𝚑𝚎𝚊𝚛 𝚙𝚞𝚖𝚙|𝚍𝚎𝚊𝚍 𝚙𝚞𝚖𝚙)P_me_gw_gs:
P(𝚖𝚊𝚔𝚎𝚜 𝚎𝚜𝚙𝚛𝚎𝚜𝚜𝚘|𝚐𝚎𝚝 𝚠𝚊𝚝𝚎𝚛,𝚐𝚛𝚘𝚞𝚙 𝚑𝚎𝚊𝚍 𝚐𝚊𝚜𝚔𝚎𝚝 𝚜𝚎𝚊𝚕 𝚋𝚛𝚘𝚔𝚎𝚗)P_ta_me_fh:
P(𝚝𝚊𝚜𝚝𝚢|𝚖𝚊𝚔𝚎𝚜 𝚎𝚜𝚙𝚛𝚎𝚜𝚜𝚘,𝚏𝚛𝚒𝚎𝚍 𝚑𝚎𝚊𝚝𝚒𝚗𝚐 𝚎𝚕𝚎𝚖𝚎𝚗𝚝)
Note that while the model is close to what you may guess the probabilities are not absolute, to account for mistakes and unknown failures. For instance, the mechanic may make a mistake while brewing an espresso and erroneously conclude that the machine is broken when it is in fact awesome. The probabilities associated with each failure are not uniform.
# It may prove helpful to have a mapping between the suggested variable names and
# column indices in the provided file...
nti = dict() # 'name to index'
nti['he'] = 0
nti['fp'] = 1
nti['fc'] = 2
nti['wr'] = 3
nti['gs'] = 4
nti['dp'] = 5
nti['fh'] = 6
nti['pw'] = 7
nti['cb'] = 8
nti['gw'] = 9
nti['ls'] = 10
nti['vp'] = 11
nti['lo'] = 12
nti['wv'] = 13
nti['hp'] = 14
nti['me'] = 15
nti['ta'] = 16
# Opposite to the above - index to name...
itn = ['he', 'fp', 'fc', 'wr', 'gs', 'dp', 'fh',
'pw', 'cb', 'gw',
'ls', 'vp', 'lo', 'wv', 'hp', 'me', 'ta']
# For conveniance this code loads the data from the zip file,
# so you don't have to decompress it (takes a few seconds to run)...
with zipfile.ZipFile('coffee_machines.zip') as zf:
with zf.open('coffee_machines.csv') as f:
sf = io.TextIOWrapper(f)
reader = csv.reader(sf)
next(reader)
dm = []
for row in reader:
dm.append([int(v) for v in row])
dm = numpy.array(dm, dtype=numpy.int8)
print('Data: {} exemplars, {} features'.format(dm.shape[0], dm.shape[1]))
print(' Broken machines =', dm.shape[0] - dm[:,nti['ta']].sum())
print(' Working machines =', dm[:,nti['ta']].sum())
1. Learn Model
Below a set of variables to represent conditional probability distributions have been defined. They are a Bernoulli trial for each combination of conditional variables, given as P(𝙵𝚊𝚕𝚜𝚎|...) in [0,...]
and P(𝚃𝚛𝚞𝚎|...) in [1,...]
(It may be easier to think of them as boring categorical distributions).
To fit a maximum likelihood model you should first use them as counters – loop over the data set and count how many of each combination exist. You must then normalise them so that the sum over axis 0 is always 1. There is an extra mark for doing the right thing and including a prior. You may want to know that the conjugate prior to the Bernoulli trial represented as [P(𝙵𝚊𝚕𝚜𝚎),P(𝚃𝚛𝚞𝚎)] is a Beta distribution; a uniform prior would be a reasonable choice (it can be argued that the expected failure probability is low, and you should adjust the first 7 variables accordingly, but given the quantity of data available it’s not going to matter).
Hint:
- The use of
0=False
and1=True
both in the dm table and in the conditional probability distributions is very deliberate. - Consider putting all of the variables into a list with extra information about them (such as indices from
nti
) to make your code neater. - If you make a mistake you could easily end up with NaN or infinity – which would break the next step. Print them out so you can check they are valid!
- Do not write unique code for each variable – that will be hundreds of lines of code and very tedious. It’s possible to get all of the marks in 7 lines of code, if you’re particularly sneaky.
(4 marks)
- 2 marks for counting all conditions
- 1 mark for normalising distributions
- 1 mark for including a sensible prior
# A set of variables that will ultimately represent conditional probability distributions.
# The naming convention is that P_he means P(he), or that P_ta_me_hw means P(ta|me,hw)...
P_he = numpy.zeros(2)
P_fp = numpy.zeros(2)
P_fc = numpy.zeros(2)
P_wr = numpy.zeros(2)
P_gs = numpy.zeros(2)
P_dp = numpy.zeros(2)
P_fh = numpy.zeros(2)
P_pw_he_fp = numpy.zeros((2,2,2))
P_cb_pw_fc = numpy.zeros((2,2,2))
P_gw_cb_wr_dp = numpy.zeros((2,2,2,2))
P_ls_he = numpy.zeros((2,2))
P_vp_pw = numpy.zeros((2,2))
P_lo_cb = numpy.zeros((2,2))
P_wv_wr = numpy.zeros((2,2))
P_hp_dp = numpy.zeros((2,2))
P_me_gw_gs = numpy.zeros((2,2,2))
P_ta_me_fh = numpy.zeros((2,2,2))
# It may prove conveniant to have a structure that describes the above in a computer
# readable form, including labeling what each dimension is...
ops = [(P_he, nti['he']),
(P_fp, nti['fp']),
(P_fc, nti['fc']),
(P_wr, nti['wr']),
(P_gs, nti['gs']),
(P_dp, nti['dp']),
(P_fh, nti['fh']),
(P_pw_he_fp, nti['pw'], nti['he'], nti['fp']),
(P_cb_pw_fc, nti['cb'], nti['pw'], nti['fc']),
(P_gw_cb_wr_dp, nti['gw'], nti['cb'], nti['wr'], nti['dp']),
(P_ls_he, nti['ls'], nti['he']),
(P_vp_pw, nti['vp'], nti['pw']),
(P_lo_cb, nti['lo'], nti['cb']),
(P_wv_wr, nti['wv'], nti['wr']),
(P_hp_dp, nti['hp'], nti['dp']),
(P_me_gw_gs, nti['me'], nti['gw'], nti['gs']),
(P_ta_me_fh, nti['ta'], nti['me'], nti['fh'])]
# **************************************************************** 4 marks
dm_pd=pd.DataFrame(dm)
for index_0 in [0,1,2,3,4,5,6]:
i = dm_pd.loc[:,ops[index_0][-1]]
ops[index_0][0][0] = dm_pd[(i==0)].shape[0]/dm_pd.shape[0]
ops[index_0][0][1] = dm_pd[(i==1)].shape[0]/dm_pd.shape[0]
for index_0 in [10,11,12,13,14]:
i = dm_pd.loc[:,ops[index_0][-1]]
j = dm_pd.loc[:,ops[index_0][-2]]
ops[index_0][0][0][0] = dm_pd[(i==0) & (j==0)].shape[0]/dm_pd.shape[0]
ops[index_0][0][0][1] = dm_pd[(i==0) & (j==1)].shape[0]/dm_pd.shape[0]
ops[index_0][0][1][0] = dm_pd[(i==1) & (j==0)].shape[0]/dm_pd.shape[0]
ops[index_0][0][1][1] = dm_pd[(i==1) & (j==1)].shape[0]/dm_pd.shape[0]
for index_0 in [7,8,15,16]:
i = dm_pd.loc[:,ops[index_0][-1]]
j = dm_pd.loc[:,ops[index_0][-2]]
w = dm_pd.loc[:,ops[index_0][-3]]
ops[index_0][0][0][0][0] = dm_pd[(i==0) & (j==0) & (w==0)].shape[0]/dm_pd.shape[0]
ops[index_0][0][0][0][1] = dm_pd[(i==0) & (j==0) & (w==1)].shape[0]/dm_pd.shape[0]
ops[index_0][0][0][1][0] = dm_pd[(i==0) & (j==1) & (w==0)].shape[0]/dm_pd.shape[0]
ops[index_0][0][0][1][1] = dm_pd[(i==0) & (j==1) & (w==1)].shape[0]/dm_pd.shape[0]
ops[index_0][0][1][0][0] = dm_pd[(i==1) & (j==0) & (w==0)].shape[0]/dm_pd.shape[0]
ops[index_0][0][1][0][1] = dm_pd[(i==1) & (j==0) & (w==1)].shape[0]/dm_pd.shape[0]
ops[index_0][0][1][1][0] = dm_pd[(i==1) & (j==1) & (w==0)].shape[0]/dm_pd.shape[0]
ops[index_0][0][1][1][1] = dm_pd[(i==1) & (j==1) & (w==1)].shape[0]/dm_pd.shape[0]
for index_0 in [9]:
i = dm_pd.loc[:,ops[index_0][-1]]
j = dm_pd.loc[:,ops[index_0][-2]]
w = dm_pd.loc[:,ops[index_0][-3]]
v = dm_pd.loc[:,ops[index_0][-4]]
ops[index_0][0][0][0][0][0] = dm_pd[(i==0) & (j==0) & (w==0)&(v==0)].shape[0]/dm_pd.shape[0]
ops[index_0][0][0][0][0][1] = dm_pd[(i==0) & (j==0) & (w==0)&(v==1)].shape[0]/dm_pd.shape[0]
ops[index_0][0][0][0][1][0] = dm_pd[(i==0) & (j==0) & (w==1)&(v==0)].shape[0]/dm_pd.shape[0]
ops[index_0][0][0][0][1][1] = dm_pd[(i==0) & (j==0) & (w==1)&(v==1)].shape[0]/dm_pd.shape[0]
ops[index_0][0][0][1][0][0] = dm_pd[(i==0) & (j==1) & (w==0)&(v==0)].shape[0]/dm_pd.shape[0]
ops[index_0][0][0][1][0][1] = dm_pd[(i==0) & (j==1) & (w==0)&(v==1)].shape[0]/dm_pd.shape[0]
ops[index_0][0][0][1][1][0] = dm_pd[(i==0) & (j==1) & (w==1)&(v==0)].shape[0]/dm_pd.shape[0]
ops[index_0][0][0][1][1][1] = dm_pd[(i==0) & (j==1) & (w==1)&(v==1)].shape[0]/dm_pd.shape[0]
ops[index_0][0][1][0][0][0] = dm_pd[(i==1) & (j==0) & (w==0)&(v==0)].shape[0]/dm_pd.shape[0]
ops[index_0][0][1][0][0][1] = dm_pd[(i==1) & (j==0) & (w==0)&(v==1)].shape[0]/dm_pd.shape[0]
ops[index_0][0][1][0][1][0] = dm_pd[(i==1) & (j==0) & (w==1)&(v==0)].shape[0]/dm_pd.shape[0]
ops[index_0][0][1][0][1][1] = dm_pd[(i==1) & (j==0) & (w==1)&(v==1)].shape[0]/dm_pd.shape[0]
ops[index_0][0][1][1][0][0] = dm_pd[(i==1) & (j==1) & (w==0)&(v==0)].shape[0]/dm_pd.shape[0]
ops[index_0][0][1][1][0][1] = dm_pd[(i==1) & (j==1) & (w==0)&(v==1)].shape[0]/dm_pd.shape[0]
ops[index_0][0][1][1][1][0] = dm_pd[(i==1) & (j==1) & (w==1)&(v==0)].shape[0]/dm_pd.shape[0]
ops[index_0][0][1][1][1][1] = dm_pd[(i==1) & (j==1) & (w==1)&(v==1)].shape[0]/dm_pd.shape[0]
Brute Force
The below code does exactly what you want to implement for step 2, but it brute forces it. Slow and memory inefficient of course, but lets you test it.
def brute_marginals(known):
"""known is a dictionary, where a random variable index existing as a key in the dictionary
indicates it has been observed. The value obtained using the key is the value the
random variable has been observed as. Returns a 17x2 matrix, such that [rv,0] is the
probability of random variable rv being False, [rv, 1] the probability of being True."""
# Calculate the full joint (please don't ask)...
params = []
for op in ops:
params.append(op[0])
params.append(op[1:])
params.append(range(17))
joint = numpy.einsum(*params)
# Multiply in the known states (zero out the dimensions we know it's not)...
for key, value in known.items():
other = abs(value-1)
index = [slice(None)] * len(joint.shape)
index[key] = other
joint[tuple(index)] = 0.0
# Calculate and return all marginals...
ret = numpy.empty((len(joint.shape), 2))
for row in range(ret.shape[0]):
ret[row,:] = numpy.einsum(joint, range(len(joint.shape)), [row])
ret /= ret.sum(axis=1)[:,None]
return ret
print(brute_marginals({}))
print(brute_marginals({nti['ta'] : 0}))
2. Belief Propagation
Your task is to write a function that take known states and calculates the marginal probability for every state of the graphical model. This will require using the sum-product algorithm and passing all messages on the graph.
Here is the wikipedia page for reference: https://en.wikipedia.org/wiki/Belief_propagation (not the greatest, but not horrible)
Hints:
- The order of the variables above is such that each variable is dependent only on variables that occur before it. This should save you some hassle in terms of calculating a message passing order.
- You may want to add code before the function to prepare. This might involve creating a list of edges in the graphical model, in the order they need to be processed for instance. It can also be done with dictionaries or classes – choose whatever you are comfortable with, but spend time thinking it through first. If you get it wrong your code will get messy!
- A good approach to recording messages is a dictionary indexed
[from, to]
- The easiest way to include a known value in the model is often to add/replace the unary term for that value.
- This problem is small enough that you can brute force it – code to do so has been provided above. Do test that your implementation is correct by comparing them.
(10 marks)
- 3 marks for sensible data structures
- 5 marks for sending the messages correctly
- 2 marks for correctly calculating the marginals at the end
def marginals(known):
"""known is a dictionary, where a random variable index existing as a key in the dictionary
indicates it has been observed. The value obtained using the key is the value the
random variable has been observed as. Returns a 17x2 matrix, such that [rv,0] is the
probability of random variable rv being False, [rv, 1] the probability of being True."""
# **************************************************************** 10 marks
3. What’s broken?
Simply print out what the most probable broken part of each of the below machines is. Please print it as dictionary key: name of broken part
, e.g. A: wr
. This will allow automarking to be used.
If you can’t get your belief propagation implementation to work feel free to use brute_marginals
instead.
(1 mark)
repair = {}
repair['A'] = {nti['me'] : True}
repair['B'] = {nti['wv'] : True}
repair['C'] = {nti['wv'] : False, nti['lo'] : True}
repair['D'] = {nti['hp'] : False, nti['ta'] : False}
repair['E'] = {nti['hp'] : True, nti['wv'] : True, nti['vp']: True}
# **************************************************************** 1 mark