程序代写代做代考 data mining database algorithm graph Homework Guidelines

Homework Guidelines
You may turn in your homework in the following formats:
(1) A formal write up (word or pdf) for the homework and a separate code.
(2) R Markdown, or Jupyter Notebook.
(3) A well-documented code, with text that is commented describing your
solutions as you would in a write up. This option is only available if you are using R.
These exercises may require some theory but will be largely centered on the application of technique to data. Please keep the following principals in mind when completing your homework:
Knowledge, Discovery, and Databases
1. Selectionofthetargetset(whichdataand/orvariablesaretobeusedfor data mining).
2. DataCleaningandPreprocessing(noiseremoval,outlieridentification, imputation for missing data).
3. Preprocessingthedata(transformations,trackingtime-dependent information and relevant covariates).
4. Decidingwhichdataminingtechniquesareappropriate(regression, classification, clustering, graphical modeling).
5. Analyzethecleandatausingdata-miningsoftware(algorithmsfordata reduction, dimensionality reduction, model fitting, prediction extraction).
6. Interpretingandassessingtheknowledgederivedfromdata-mining results.
It is imperative that you realize that there is more to the data mining then “Turning the Crank”. In fact, the turn of the crank relates to number five.
What to do (the rhythm of a scientific paper):
1) Tell me about the general problem you are trying to solve.
2) Tell me about any additional pre-processing, transformations, and
reductions done to the data. What was done and why?
3) Tell me about the method you are using, and any parameters settings, or
implementation details that I need to know.
4) Show me the results. Use tables and figures.
5) Reflect – tell me what it means in the context of the original problem and
dataset. In other words, interpret the results of your analysis.
Example of final interpretation:
Poor interpretation: I applied these methods to data, and selected the models with the best AIC.
Fair interpretation: both methods selected the same number of variables.

Better interpretation: both methods selected the same number of variables. However, the set of variables differs between methods. In particular variable A was included in the final variable set for method 1.
Best interpretation: Both methods selected the same number of variables. However, the set of variables differs between methods. In particular variable A was included in the final variable set for method 1. This is puzzling because variable B seems very important in the context of the problem, and is highly correlated with A. After further examination, it seems like B is on the border of inclusion, and lowering the threshold may be appropriate.
Miscellaneous
• Beware of plagiarism:
o copying in any form, either from students or the Internet.
o use citations if appropriate. (e.g., if you use a chunk of code
from the internet.)
• Do not give exact solutions on the forum.