In this third homework, you will demonstrate proficiency in working with the Python library NumPy for the purposes of data processing and basic analysis. You may only import the following libraries for this homework: numpy, matplotlib, seaborn, and random. You may not import any other libraries. Not all functions of the libraries needed for this assignment were explicitly taught in lecture. The goal here is for you to browse/understand how to use libraries to find functions that will help you perform tasks.
Kickstarter Analysis
Kickstarter (https://www.kickstarter.com) has established itself as the leading platform for funding creative ventures. Aspiring entrepreneurs in the arts can initiate fundraising campaigns on Kickstarter to support their projects. Some projects have been hugely successful, whereas many others have fallen well short of their fundraising objectives. The attached data file contains sample data on over 4000 Kickstarter fundraising campaigns. Each row contains a summary of each campaign, including the goal and amount pledged, the state of the project in securing funding (e.g., successful, failed), the category of the project (i.e., type of art), and whether the project was featured via a staff pick or spotlight (i.e., on the Kickstarter home page).
You have been hired as a data analyst to work on the data provided. To do this, you will create a Jupyter Notebook that performs the following:
• (5 points) Load the goal amount and amount pledged (in U.S. dollars) data across all projects. Hint: You will need to rely on the np.loadtxt function. Look up the different arguments that can be passed into this function.
• (5 points) Using only the data you have just loaded, display descriptive statistics that include:
a. Total number of projects in the data set
b. Goal amounts (in U.S. dollars): min, max, mean, median, and standard deviation
c. Pledge amounts (in U.S. dollars): min, max, mean, median, and standard deviation
d. Percentage of projects where the amount pledged met or exceeded the goal amount.
• (5 points) Load the country data.
• (5 points) Determine the frequency of each country and the proportion (percentage) of observations for each country. Hint: The total proportion across all countries should be near 1. Display a bar chart showing a title, the countries and percentages, properly labeled and formatted.
• (5 points) From the resulting visualization what inferences can you make about the various countries?
• (5 points) Load the project states data.
• (5 points) For each country, calculate the proportion (percentage) of projects that were successful. Display a bar chart showing a title, the countries and their success percentages, properly labeled and formatted. Which country was the least and most successful (on average)?
• (5 points) Load the staff pick and spotlight data.
• (5 points) Determine the total number of projects identified as staff picks and the total number of projects identified as spotlight projects. For each of these two project classifications, what was the success rate (percentage) of each?
• 10. (5 points) Which feature (staff pick or spotlight) is associated with a higher percentage of successful projects? What inferences can you make from the data to explain why one of the features has a higher percentage of successful projects?
• (10 points) You have done such a good job with the previous analyses, your boss has said to you: “Thank you for your analysis. The client is very happy, but wants to know what other insights we can gather from the data”. You only have a short time between now and the client meeting. Using data already loaded and/or loading more data of your choice, create a visualization of your choice, and make at least one inference from your analysis. [Instructor Note: This item is intentionally vague and is ready for your insights as a future analyst. A good data analyst knows more than how to answer questions they are asked. They are able to look at data and perform additional analyses unprompted to gain insights. Take this time to practice this skill].
• Format the Jupyter Notebook using an appropriate level of markdown cells so that your analysis is fully explained at each step of what you have done. Some examples of formatted notebooks are as follows:
• https://anaconda.org/jbednar/plotting_pitfalls/notebook
•https://nbviewer.jupyter.org/github/jrjohansson/scientificpythonlectures/blob/master/Lecture-4-Matplotlib.ipynb
Note: These are just examples and not prescriptive for what your Notebook should look like.