Big Data – Project
Instructions
The last few weeks of this class will be dedicated to projects. This project may be done independently or in pairs. We will have higher standards for those working in pairs, but either way we expect it to be a substantial project on which you devote significant effort. It’s difficult to quantify “significant effort” and there’s no detailed grading rubric. Part of the purpose of the proposal is for the lecturer to provide feedback on whether the project appears to be of the appropriate scope, with higher expectations for pairs than for individuals.
Report
The report should briefly cover the following topics :
• Problem Definition : What is the problem that you are trying to solve ? What are the challenges of this problem ?
• Methodology : What is your methodology to attack the problem and the associated challenges ? What is the computational and space complexity of your solution in terms of input size ?
• Results and Discussion : What are the outcomes of the project ?
• Guideline : Briefly explain which code was used for which task.
• The code and final report of your project should be submitted before April 7, 23 :59 GMT.
Note that your report should not exceed 8 pages. Y
The rest of this document presents example project subjects to help you get started. While you can just pick a project from the list or get your inspiration from one of them, you are more than welcome to come up with your own ideas. Be creative !
Project – Predicting Airline Delays with Hadoop
One of the main goals is using machine learning algorithms to build predictive models with Python packages and data analysis programs. Training the original datasets is important to build models with its performance. Finding a good combination of technologies and programming languages would be cruicial to make a successful project.
Dataset The data can be downloaded from Bureau of Transportation Statistics where it is described in detail. An other link to more detailed data can be found here.
Possible tools
• Hadoop
• Python