程序代写代做代考 part1

part1

PART I: Age regression from gray matter masks¶

This part of the coursework is about age regression from gray matter masks which have been extracted from brain MRI scans.

Each voxel in the gray matter masks is one feature, but because the number of voxels is huge, first a dimensionality reduction using PCA needs to be implemented, before the reduced data can be used to train a model for age regression.

Read the descriptions and code carefully and look out for the cells marked with ‘TASK’.¶

The following cell contains helper code to obtain filenames and for reading age information for each subject from a spreadsheet.

In [ ]:

import os
import re
import numpy
import xlrd
import SimpleITK as sitk

# Retrieve the list of patients
data_dir = ‘./data/graymatter’
imageNames = sorted(os.walk(data_dir).next()[2]) # Retrieve all the imagenames

# Read the spreadsheet to retrieve the age information for each subject
ages = []
csvfilename = ‘./data/meta/IXI.xls’
workbook = xlrd.open_workbook(csvfilename)
sheet = workbook.sheet_by_index(0)
idCells = sheet.col_slice(colx=0, start_rowx=1,end_rowx=None)
ageCells = sheet.col_slice(colx=11,start_rowx=1,end_rowx=None)
idAgeDic = dict( (ii.value, ageCells[loopId].value) for loopId,ii in enumerate(idCells))

This cell defines a function for reading gray matter masks and corresponding age labels.

In [ ]:

def readImagesAndLabels (imagenames):

ImgArray = []
LblArray = []
for ImageName in imagenames:

regexp_result = re.search(r’wc1IXI\d+’, ImageName)
subjectId = (int(regexp_result.group().split(‘wc1IXI’)[1]))
LblArray.append(idAgeDic[subjectId])

# Loading the image
fullImageName = data_dir + ‘/’ + ImageName
inImage = sitk.ReadImage(fullImageName)
inArray = sitk.GetArrayFromImage(inImage)
ImgArray.append(inArray.flatten())

# Debug information
if 0:
print ‘subjectName: {0}’.format(ImageName)
print ‘subjectId: {0}’.format(subjectId)
print ‘subjectAge: {0}\n’.format(subjectIdAgeDic[subjectId])

# Create a numpy array – training data
ImgArray=numpy.array(ImgArray,dtype=numpy.uint8) # 4D array – [nSubjects,Zdim x Ydim x Xdim]
LblArray=numpy.array(LblArray,dtype=numpy.float32) # 1D array – [nSubjects]

return ImgArray, LblArray

TASK 1.1: Dimensionality reduction¶
In the next cell you are asked to implement a dimensionality reduction using PCA from sklearn’s decomposition module. The prinicipal components should be learned from the training data, and then used to perform a dimensionality reduction for both training and testing data.

Check out http://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html

In [ ]:

from sklearn import decomposition

def pcaReduction(trainingData, testingData):

# Perform dimensionality reduction on the images using Principal Component Analysis

# ADD CODE HERE

return trainingData_reduced, testingData_reduced

TASK 1.2: Training a model for age regression¶
In the next cell you are asked to implement a function that takes input data and corresponding labels and trains a regression model. It is up to you to choose a suitable method from the many that are provided in sklearn.

Check out http://scikit-learn.org/stable/supervised_learning.html

In [ ]:

def trainRegressor (data, labels):

# ADD CODE HERE

return model

TASK 1.3: Apply the learned model on new data¶
In the next cell you are asked to implement a function that takes data and a learned regression model as input, applies the model to the data, and returns the predicted labels.

In [ ]:

def applyRegressor (data, model):

# ADD CODE HERE

return labels

The following cell implements an evaluation function that takes an array of true age labels and an array of predicted age labels, and assesses prediction quality by computing mean and root mean square errors. It also can optionally plot the true vs. predicted labels.

In [ ]:

import matplotlib.pyplot as plt

def evaluate(labels_true, labels_predicted, plot=False):

if plot:
%pylab inline
plt.figure(figsize=(6,6))
plt.scatter(labels_true, labels_predicted)
plt.plot([0, 100], [0, 100], ‘–k’, linewidth=3)
plt.axis(‘tight’); plt.xlabel(‘True age’,fontsize=15); plt.ylabel(‘Predicted age’, fontsize=15)
plt.tick_params(axis=’both’, which=’major’, labelsize=15); plt.grid(‘on’); plt.show()

# Age Prediction Errors
prediction_errors = labels_true – labels_predicted

# Mean error
mean_error = numpy.mean(numpy.abs(prediction_errors))
print ‘Mean error is {0}’.format(mean_error)

# Root mean squared error
root_mean_squared_error = numpy.sqrt(numpy.mean(numpy.power(prediction_errors,2)))
print ‘Root mean squared error is {0}’.format(root_mean_squared_error)

return prediction_errors

The next cell prepares the data for a very simple experiment where the images are split half/half into two sets, one for training and one for testing.

In [ ]:

# Preload data and split half/half into training and testing

images, labels = readImagesAndLabels(imageNames)

trainingImages = images[0::2]
trainingLabels = labels[0::2]

testingImages = images[1::2]
testingLabels = labels[1::2]

print ‘Number of training images is {0}’.format(len(trainingImages))
print ‘Number of testing images is {0}’.format(len(testingImages))

TASK 1.4: Simple experiment¶
In the next four cells you are asked to set up and execute a simple experiment using the above training and testing images. You need four steps: 1) dimensionality reduction, 2) train a regressor, 3) apply the regressor on test data, 4) evaluate the prediction quality

In [ ]:

# 1) Dimensionality reduction
# ADD CODE HERE

In [ ]:

# 2) Train a model
# ADD CODE HERE

In [ ]:

# 3) Test the model
# ADD CODE HERE

In [ ]:

# 4) Evaluate predictions
# ADD CODE HERE

TASK 1.5: Cross validation using k-folds¶
In the next cell you are asked to implement a k-fold cross validation such that every subject is used once for testing and prediction errors can be computed for all subjects.

In [ ]:

from sklearn.model_selection import KFold

def kfold_cross_validation(n_folds, imgs, lbls):
kf = KFold(n_splits=n_folds)
predictions = numpy.array([])

for foldId, (trainIds,testIds) in enumerate(kf.split(range(0,len(imgs)))):
print ‘Fold: {0}/{1}’.format(foldId+1,n_folds)

# ADD CODE HERE

predictions = numpy.concatenate((predictions,testingLabels_predicted))

return predictions

The following cell runs a 2-fold cross validation and compute errors for all subjects.

In [ ]:

predictions = kfold_cross_validation(2, images, labels)

errors = evaluate(labels, predictions, True)

TASK 1.6 (optional): Training size vs prediction error¶
In the next cell you are asked to explore prediction errors vs number of training subjects. One possibility to do this is to consecutively increase the size from the image set and use a k-fold cross validation on each set.

In [ ]:

# Preload training and testing data
nImages = len(imageNames)
imageSetSize = np.linspace(0.1,1,5)
plotList_nTrainImages = []
plotList_errors = []
for perc in imageSetSize:

folds = 2;
nImg = int(round(nImages * perc))
nTrainImg = int(round(nImg – nImg / folds))
print ‘Number of training images is {0}’.format(nTrainImg)

# ADD CODE HERE

plotList_nTrainImages.append(nTrainImg)
plotList_errors.append(errors)

In [ ]:

%pylab inline
plt.figure(figsize=(6,4))
plt.plot(plotList_nTrainImages, plotList_errors,’b-‘,marker=’o’, markersize=10)
plt.xlabel(‘Number of training images’, fontsize=15); plt.ylabel(‘Error (age)’, fontsize=15)
plt.tick_params(axis=’both’, which=’major’, labelsize=15); plt.grid(‘on’); plt.show()