程序代写代做代考 data structure scheme algorithm data mining decision tree ECE657A: Data and Knowledge Modeling and Analysis

ECE657A: Data and Knowledge Modeling and Analysis
Project Description W2017

There are two broad types of projects: (a) Application-oriented projects: You have a problem, perhaps
in your field of research, that you would like to analyze using the concepts and algorithms of this course,
and (b) Algorithm-oriented projects: You select an interesting data analysis/data mining technique that
you want to learn more about. Then you find multiple datasets to test out the algorithm on and
compare its performance against other algorithms. A good source of ideas for this type would be
academic papers proposing a new or seminal algorithm. You could recreate the experiments in a paper
and try to validate their results.

You may of course choose to develop or extend your own data analysis techniques, or simply apply
existing techniques (they do not necessarily need to be covered in the class) to data. Similarly, the
dataset may be a pre-existing one, or part of your work could be to collect and pre-process the raw data
needed for a new dataset.

Note that pure literature surveys are not acceptable. There must be a hands-on, experimental and
comparative element in your work.

Something new: One general requirement of the project is that somewhere in it you should have
present something new which goes beyond what we’ve learned in class. This could mean you used an
algorithm we did not discuss in detail, or at all. It could mean you go in depth into the original paper for
some famous algorithm and re-implement their experiments to show the class. You may not be sure
what the new part is until later in the project but you should highlight it in your presentations and
reports.

Projects should be worked on in groups of 2-3 people. Doing the project on your own will be
discouraged without a good reason. The project consists of pitch session, proposal, presentation, and a
written report. This document explains what is expected in each of these milestones. Marking schemes
for the proposal, presentations and final report will be available on the course page in the LEARN
system.

1 – Recommended Topics

• Possible Application-oriented projects

Sentiment analysis from social media Topic discovery and analysis from text
Mining of scientific publications Political event data analysis
Financial data mining Purchasing behaviour analysis and

recommendation systems
Crime Statistics Analysis and Mapping Flu/Disease/Trend prediction from social

media data

• Possible Algorithm-oriented projects
o Comparison of performance of different algorithms in a sub area:

Cost-sensitive classification Kernel-based clustering

Semi-supervised classification Clustering ensembles

Frequent pattern mining Supervised clustering

Neural network based classification Decision tree based classification

o Or find a recent paper introducing a relevant Data Analysis method and implement their
experiments on new data.

2- Proposal Pitch Session

In class each group will prepare to describe their project idea to other classmates, the prof and the TAs.
The idea is you need to prepare a pitch about what your project idea is why it will expand your
knowledge and skill as well as be educational for the class.

• Each group will be matched with two other groups and they will take turns pitching their idea.
• Your group will receive questions, ideas and feedback from everyone you pitch to verbally in

class.
• You will also receive written anonymous feedback from everyone you pitch to.
• The pitch itself is not graded for marks in any way, but…
• Your feedback of other groups will be graded. For every group you hear a pitch from you need to

fill out a feedback form and submit it by the end of class. These forms will be graded for quality
of the feedback, you will need to provide constructive detailed ideas from what you heard. So it
will be important to ask the group questions about their ideas so you can summarize it
sufficiently and provide input. Whether you agree with the proposal or not or agree with anyone
else’s opinion of the pitch idea will be irrelevant to your grade.

• The grades for the feedback from will come out of the 4% peer review grade for the course.
There will also be peer reviews of the final presentations which you will need to fill out for three
groups on the day your group is not presenting.

3 – Proposal

Your proposal should contain:

• Description of the project. You should clearly mention your main goal: is it classification,
clustering, or mining association rules or something else?

• A comprehensive review of 3 to 4 well-recognized research papers. At least a few for each
paper.

• A paragraph to discuss the expected challenges / difficulties.
• A sketch of your planned approach, algorithms, preprocessing methods, evaluation metrics, etc.
• Description of datasets that you plan to use. It should include a link and a brief description

about the properties of the data, such as its features, instances, preprocessing techniques, etc. If
you are going to use your own dataset, then a description of its source and preprocessing steps
is needed.

• List of key references.

Your proposal should be no longer than 2 pages, and submitted as a PDF file via the LEARN dropbox. It
will be graded. If not being approved, you will need to revise it based on our feedback

4 – Presentation

Your X minute group presentation would be via projected slides. Please allocate 3-5 minutes for
questions. (note: the presentation length will depend on the number of groups that can be fit in two
days, it should be 15-20 minutes). Each presentation should include the following:

• Introduction: Basic definitions, background and terminology used.
• Literature review: Based on papers from your literature search, summarizing common variants

of the method and data mining applications being used and the achieved results as claimed in
the literature.

• Description of the goal and the use of method in your project, such as types of data mining,
representation of the input, training requirements, output representation.

• Report and analyze your comparative experimental results.
• Something new: highlight one thing you discovered or explored that goes beyond what we

talked about in class, or which goes into something mentioned briefly in class to much greater
depth.

• A summary of your work: new findings and potential future directions.
• List of key references. 


A copy of presentation slides is to be deposited to the LEARN dropbox before your presentation.

5 – Report

Your report should be in Springer LNCS format, as if you were planning to submit it as a conference
paper. The suggested length of the report is ten (10) pages, but a maximum of up to thirteen (13) pages
is allowed if you are having trouble fitting figures. Bibliography can be extra beyond this. See Springer’s
`Information for LNCS Authors’ page with LATEX templates and a sample PDF (there is also a Word
template but Latex is suggested).

The report should include the following:

• Introduction to methods selected and task applied to.

• Brief review of literature on the selected methods and their application to similar problems.
• Description of the method selected with details on the options and parameters.
• Implementation: Software used, data structures, program structures, data representation and

any special set up needed. Please don’t put code in the report, only abstract descriptions and
diagrams. Your code can be submitted to dropbox to be looked at seperately.

• Testing: Test cases on the selected datasets and evaluation of the performance in comparison
with base line methods (for example, K-means for clustering, KNN for classification, PCA for data
reduction).

• Discussion of results and conclusions: Provide a discussion on the use of the method and its
suitability and/or limitations, and discuss the effect of each parameter on the trade-off in
performance.

• References to relevant literature you consulted about the data domain or the methods used.

The PDF format of report should be submitted via the LEARN dropbox, attached with related code,
datasets, and other supporting materials.

6 – Timelines

Deliverable Due Grade

Proposal Pitch (in class) Feb 1

Proposal Feb 15 5%

Presentations March 22,
March 29

7%

Peer Review Participation (in class) Feb 1, March
22, March 29

4%

Report March 29? 14%

Late submissions up to 3 days are accepted with the penalty of 10% per day. You should not submit a
work that you have performed for other classes, or have already been developed for your thesis.
Although if you have two course projects this term that are related it is possible, but talk to the prof.

ECE657A: Data and Knowledge Modeling and AnalysisProject Description W2017
1 – Recommended Topics
 Possible Algorithm-oriented projects
o Comparison of performance of different algorithms in a sub area:
2- Proposal Pitch Session
3 – Proposal
 Description of the project. You should clearly mention your main goal: is it classification, clustering, or mining association rules or something else?
 A comprehensive review of 3 to 4 well-recognized research papers. At least a few for each paper.
 A paragraph to discuss the expected challenges / difficulties.
 A sketch of your planned approach, algorithms, preprocessing methods, evaluation metrics, etc.
 Description of datasets that you plan to use. It should include a link and a brief description about the properties of the data, such as its features, instances, preprocessing techniques, etc. If you are going to use your own dataset, then a descrip…
 List of key references.
Your proposal should be no longer than 2 pages, and submitted as a PDF file via the LEARN dropbox. It will be graded. If not being approved, you will need to revise it based on our feedback
4 – Presentation
Your X minute group presentation would be via projected slides. Please allocate 3-5 minutes for questions. (note: the presentation length will depend on the number of groups that can be fit in two days, it should be 15-20 minutes). Each presentation s…
 Introduction: Basic definitions, background and terminology used.
 Literature review: Based on papers from your literature search, summarizing common variants of the method and data mining applications being used and the achieved results as claimed in the literature.
 Description of the goal and the use of method in your project, such as types of data mining, representation of the input, training requirements, output representation.
 Report and analyze your comparative experimental results.
 Something new: highlight one thing you discovered or explored that goes beyond what we talked about in class, or which goes into something mentioned briefly in class to much greater depth.
 A summary of your work: new findings and potential future directions.
 List of key references. 

A copy of presentation slides is to be deposited to the LEARN dropbox before your presentation.
5 – Report
Your report should be in Springer LNCS format, as if you were planning to submit it as a conference paper. The suggested length of the report is ten (10) pages, but a maximum of up to thirteen (13) pages is allowed if you are having trouble fitting fi…
The report should include the following:
 Introduction to methods selected and task applied to.
 Brief review of literature on the selected methods and their application to similar problems.
 Description of the method selected with details on the options and parameters.
 Implementation: Software used, data structures, program structures, data representation and any special set up needed. Please don’t put code in the report, only abstract descriptions and diagrams. Your code can be submitted to dropbox to be looked a…
 Testing: Test cases on the selected datasets and evaluation of the performance in comparison with base line methods (for example, K-means for clustering, KNN for classification, PCA for data reduction).
 Discussion of results and conclusions: Provide a discussion on the use of the method and its suitability and/or limitations, and discuss the effect of each parameter on the trade-off in performance.
 References to relevant literature you consulted about the data domain or the methods used.
The PDF format of report should be submitted via the LEARN dropbox, attached with related code, datasets, and other supporting materials.
6 – Timelines
Late submissions up to 3 days are accepted with the penalty of 10% per day. You should not submit a work that you have performed for other classes, or have already been developed for your thesis. Although if you have two course projects this term that…