贝叶斯代写 Foundations of Clinical and Public Health Informatics

Foundations of Clinical and Public Health Informatics

Fall, 2018

 

Homework 3

Distributed: October 9th, 2018

Due: 11:59 PM, October 23rd, 2018

 

Relevant Lectures and Chapters:

Bayesian Networks and GeNIe

Natural Language Processing

Inpatient Clinical Decision Support

 

Midterm Note: The GeNIe and the NLP assignment will not be explicitly covered in the midterm. For the midterm, you are responsible for knowing the terms and concepts covered in the Bayesian, NLP, and inpatient clinical decision support lectures and reading material.

 

Question 1: (45 points) [Bayesian Networks and GeNIe]

 

For the following questions, you will have to download GeNIe from https://download.bayesfusion.com/files.html?category=Academia.

 

GeNIe only has a Windows version. For Mac users, you may have to install a Windows emulator software, such as Parallels or Wine, to get the Windows environment. You can install Parallels at the following links:

http://technology.pitt.edu/software/parallels-desktop and http://software.pitt.edu/index.aspx

 

Pneumonia is an infection of the lungs that can cause mild to severe illness in people of all ages. Features of pneumonia can include dyspnea (shortness of breath), cough, fever, chest pain, fatigue, nausea, vomiting, or chills. Pneumonia may also cause abnormalities on the chest x-ray. For the purpose of this assignment, we will develop a probabilistic diagnostic system for the diagnosis of pneumonia from the following four features (clinical findings): dyspnea, cough, fever, and xray.

Each feature has only two possible values:

pneumonia (absent, present)
dyspnea (no, yes)
cough (no, yes)
fever (no, yes)
xray (normal, abnormal)

The following probabilities are given:

dyspnea: sensitivity = 0.20, specificity = 0.95
cough: sensitivity = 0.10, specificity = 0.97
fever: sensitivity = 0.15, specificity = 0.90
xray: sensitivity = 0.35, specificity = 0.95

Assume that there are 1,200 pneumonia patients per 100,000 population.

Now answer the following questions:

  1. (3 points) Use GeNIE to encode the above probabilities about pneumonia and its features. First, you need to create and define chance nodes, which are encoded as yellow circles in GeNIE. You should have one node that represents pneumonia, and a node each for dyspnea, cough, fever and xray. You should have an arc from pneumonia to dyspnea, and three additional similar arcs for the three remaining features. Specify the states and the probabilities for each of the nodes based on the information given in the Preamble. Obtain screenshots of your network (both the diagram and the probability tables) and copy and paste them in your answer document.
  2. (3 points) Use GeNIE to compute P(pneumonia = present | dyspnea = no, cough = yes, fever = no, xray = abnormal). Show your answer and obtain a screenshot of your answer from GeNIE and copy it to your answer document.
  3. (3 points) Use GeNIE to compute P(pneumonia = present | dyspnea = no, cough = no, fever = no, xray = normal). Show your answer and obtain a screenshot of your answer from GeNIE and copy it to your answer document. Why is this probability different from the prevalence of pneumonia (2%)?
  4. (3 points) A patient who comes to the out-patient clinic complains that he has dyspnea and fever. Based on this information, use GeNIE to compute his probability of having pneumonia. Show your answer and obtain a screenshot of your answer from GeNIE and copy it to your answer document. The physician orders a chest X-ray for the patient which is reported as abnormal by the radiologist. Based on this additional information, use GeNIE to compute the revised probability of having pneumonia. Show your answer and obtain a screenshot of your answer from GeNIE and copy it to your answer document.
  5. (13 points) Assume that there is some cost associated with eliciting the value of each of the four features in a patient. You wonder which single feature if present or abnormal will increase most the probability of pneumonia being present.
  6. (5 points) Use GeNIE to compute the probability of pneumonia being present for each feature in the absence of information about the remaining features. Give the probabilities that you obtain.
  7. (5 points) Compute likelihood ratios for each of the four findings, and identify the most discriminative feature with your explanation.

 

 

 

Question 2: (45 points) [Natural Language Processing]

 

BEFORE YOUR START:

Option 1: Downloading Python 3.6 and Jupyter:

For this question, we will be programming in Python 3.6 using the open-source Jupyter notebook. Please go to the following website: https://jupyter-notebook-beginner-guide.readthedocs.io/en/latest/what_is_jupyter.html

Read the documentation and learn what Jupyter notebook is. Then, go ahead and install the Jupyter notebook app here: https://jupyter-notebook-beginner-guide.readthedocs.io/en/latest/install.html

If you have problems installing Python and/or Jupyter, please come see one of the TAs as early as possible.

 

Option 2 (recommended): If you don’t want to install Python or Jupyter locally, we have set up a Jupyter lab installation (hosted by the University of Pittsburgh’s Center for Research Computing). This is the best way to make sure you have the right environment and set up for this assignment and requires minimal effort on your part. Please see the following link:

https://docs.google.com/document/d/1K3gAJwcAoj4QqU5nU2Hua0aRl-JIn0JrQoxrIlO4oMg/edit?usp=sharing

 

ASSIGNMENT: https://github.com/dbmi-pitt/SocialMediaDataScience

Please go to the site to get started. Please start reading the README file before you start.

 

Item 1 in the README file is regarding how to get your environment set up (if you’ve done the “Before you start” (previous section) then you don’t need to read this).

 

Since we’ve added your twitter ID to our class account – you don’t need to read Item 2 (we’ve done item 2.2 for you). Please email Pritika (prd17@pitt.edu) if you have not received an email about a Twitter developer request from @harryhpitt or @pri96860177.

 

Please read Items 3 and beyond in the README file.

 

As a final note, once you start the modules, please take a rough log of how much time you spent on this portion of the assignment – this will be helpful for us in future iterations of this assignment. Also, please put any issues you run into in the issues tab on github:

 

https://github.com/dbmi-pitt/SocialMediaDataScience/issues

 

Have fun!

 

 

 

Question 3: (10 points) [Inpatient Clinical Decision Support]

In the field of computerized clinical decision support (CDSS) many CDSSs have been developed and described in the literature. Two classic CDSSs are the Leeds Abdominal Pain System developed by de Dombal and colleagues and MYCIN developed by Shortliffe and colleagues. Briefly compare and contrast these two systems in terms of:

 

  1. (1 point) the purpose of the two systems
  2. (3 points) the goals of the two systems
  3. (3 points) representation of medical knowledge in the two systems
  4. (3 points) type of inference used in the two systems