Assignment: Deep Learning
The assignment is worth 20% of your final grade. Deadline is 29/07/2020
Why?
The purpose of this assignment is to explore some techniques in deep learning. In this assignment, you will discover how to develop and evaluate neural network models using Keras for a regression problem.
Read everything below carefully!
In this assignment you need to do the following tasks:
● Load a CSV dataset and make it available to Keras.
● Create a neural network model with Keras for a regression problem.
● Use scikit-learn with Keras to evaluate models using cross-validation.
● Perform data preparation in order to improve skills with Keras models.
● Tune the network topology of models with Keras.
1/4
The Dataset
We will use the Boston House Price Dataset. The dataset describes properties of houses in Boston suburbs and is concerned with modeling the price of houses in those suburbs in thousands of dollars. As such, this is a regression predictive modeling problem. There are 13 input variables that describe the properties of a given Boston suburb. The full list of attributes in this dataset are as follows:
1. CRIM: per capita crime rate by the town.
2. ZN: the proportion of residential land zoned for lots over 25,000 sq.ft
3. INDUS: the proportion of non-retail business acres per town.
4. CHAS: Charles River dummy variable (= 1 if tract bounds river; 0 otherwise).
5. NOX: nitric oxides concentration (parts per 10 million).
6. RM: average number of rooms per dwelling.
7. AGE: the proportion of owner-occupied units built prior to 1940.
8. DIS: weighted distances to five Boston employment centers.
9. RAD: index of accessibility to radial highways.
10. TAX: full-value property-tax rate per 10,000.
11. PTRATIO: pupil-teacher ratio by the town.
12. B: the proportion of blacks by the town.
13. LSTAT: % lower status of the population.
14. MEDV: Median value of owner-occupied homes in 1000s.
You can learn more about the Boston house price dataset on the UCI Machine Learning Repository.
2/4
Suggestion for your solution:
– Examine the data set in your Data Exploration phase. Describe your observations. Finding the correlation between attributes is a highly useful way to check for patterns in the dataset. Pandas offers three different ways to find the correlation between attributes (columns).
– Are there any missing or null data points in your dataset? you can use these pandas function to find them (dataset.isnull().sum() or dataset.isna().sum()). Sometimes, in a dataset, we will have missing values such as NaN or an empty string in a cell. We need to take care of these missing values so that our machine learning model doesn’t break.
– It is always good to generate a brief report that gives lots of statistical information about the dataset so that you get to know the structure or nature of the dataset
– Visualize the dataset using univariate plots and bivariate plots. As the name suggests, a univariate plot is used to visualize a single column or an attribute whereas a bivariate plot is used to visualize two columns or two attributes. You can use the Box plot, Density plot, Scatter plot and Pairplot. Describe your observations in your report file (i.e., analysis.pdf)
– Develop a baseline neural network model with Keras. Don’t forget to initialize the random number generator.
– Show how you can lift Performance by standardizing the dataset. Describe your approach and observations in your report file (i.e., analysis.pdf)
– Tune The Neural Network Topology. One way to improve the performance of a neural network is to add more layers. This might allow the model to extract and recombine higher-order features embedded in the data. Another approach to increasing the representational capacity of the model is to create a wider network. Describe your approach and observations in your report file (i.e., analysis.pdf)
3/4
What to Turn In
A tar or zip file named FirstName_LastName.{zip,tar,tar.gz} that contains a single folder or directory named FirstName_LastName. Please submit the following files on the BalckBoard.
1. A file named README.txt containing instructions for running your code
2. Your python code
3. A file named analysis.pdf containing your explanations
4. Any supporting files you need.
Note: In your analysis.pdf you should address all the points mentioned in “Suggestion for your solution”.
4/4
https://teams.microsoft.com/l/channel/19%3a3127a7ae1b694ee8a9ff90ac60d67b5e%40thread.t acv2/General?groupId=f26c21bf-970f-4c0e-8c17-1a4ac75ca8a9&tenantId=974d3927-9bb4-48c 9-b65c-6094393d030b
I hope everyone is doing well and staying safe and of course having fun at home!
First, let me give you a brief introduction about IR and then we can go through more detail.
The meaning of the term information retrieval can be very broad.
Information retrieval is a field of Computer science that looks at how data can be obtained from a collection of information resources.
In simple terms, we can say the Information retrieval system is a network of algorithms, which facilitate the search of relevant data/documents. Fairly simple! right?!
but, as an academic field of study, information retrieval is:
Based on this definition you can apply IR on huge amount of unstructured data
when a clerk says to you: “I’m sorry, I can only look up your order if you can give me your Order
ID”
5/4