程序代写代做 graph Important Note on Homework Submissions

Important Note on Homework Submissions
Always hit the Save button before submitting to make sure your last changes are saved.
In the future, be sure to always label all axes, and to explicitly use print and plt.show() commands to report your results. When printing variables, be sure to include a prefix, such as a variable name, that tells us what this printed variable is. For example:
print(“x: “, x)
Failure to do so may result in a loss of points, even if not explicitly asked.
Also, be sure to actually write down answers to the discussion question, either for regular points or extra credit (TBD).
In Section
Part 1) A small survey of probability
A probabilistic model typically contains some systematic or deterministic component, alongside a random effect which is used to account for factors outside the deterministic component. Each probabilistic model consists of the following components, which you should commit to memory :p
Experiment : A particular incarnation, realization, or sample, of some process which is modeled randomly Sample Space ( or ): The set of all possible outcomes of some experiment
Event: A set of possible experimental outcomes (elements of the sample space)
Random Variable: A function which maps experimental outcomes to real numbers.
Probability function: A function which assigns probability to events (experimental outcomes) or which tells us the probability that a random variable will take on some range of values. If is a random variable, we usually write , where is the probability mass (or density) function associated with .
Example: Flipping two coins, and counting the number of heads: Sample space: {HH}{HT}{TH}{TT}
Random variable: X({HH}) = 2, X({HT}) = X({TH}) = 1, X({TT}) = 0
Randomness in outcome models uncertainty in initial position, velocity, of coin, alongside aerodynamic effects not explicitly modeled.
𝑋
)𝑥(𝑋𝑓 )𝑥 = 𝑋(𝑃 = )𝑥(𝑋𝑓
𝑋
𝜔
Ω𝑆
𝜔

Discussion Question
a) Give an example of a probabilistic model of some real world process. Note any deterministic model components, while justifying what exogeneous factors are modeled as noise and how (what type of random variable). Why did you choose this type of randomness?
b) For the model in part (b), identify the sample space, a possible event, and a random variable which might be interesting to analyze in this model.
Part 2: Combinatorics and Simulation
The probability of some event , can for some problems be computed “manually” by considering the total number of experimental outcomes which result in , relative to the size of the sample space:
As an example, when flipping two coins, there is only one way to get two heads, while there are four possible outcomes for this experiment. Thus, if is the event “we flip two coins and get two heads”, the .
When attempting to compute these probabilities by hand, it can be helpful to use some tools from combinatorics:
Permutation: The number of ways we an order a set of objects: (here ! is called a factorial)
Combination: The number of ways we can choose objects from a set of size (without replacing or ordering them): (also called choose ).
a) When rolling two 6 sided dice, whats the probability of getting two numbers which sum to an even number?
b) Probability of drawing 3 of a kind (and not 4 of a kind) when drawing a hand of size from a standard card deck?
Although we were able to manually compute the probability of the events described above, in practice it is usually impractical to enumerate all of the possible ways in which a particular event can happen in a given probabilistic model. Monte Carlo Simulation is an alternative to manual computation, wherein we first create a way to simulate our system of interest, run this simulation many (thousands of) times, and estimate the probability of any event by dividing the total number of simulations which produced the outcome by the total number of simulations.
𝐴𝐴
25 5
𝑛𝑘 1∗,…,)2−𝑘(∗)1−𝑘(∗𝑘 = !𝑘 = 𝑘𝑃 𝑘
4 =)𝐴(𝑃 1
𝐴
semoctuo latnemirepxe fo rebmun latoT
= )𝐴( 𝑃
emoctuo latnemirepxe na sa rucco dluoc A hcihw ni syaw fo rebmuN 𝐴
𝑘 𝑛
!)𝑘−𝑛(!𝑘 = 𝑛𝐶 !𝑛 𝑘
)𝐴(𝑃 𝐴

c) Write a simulation to verify your theoretical result in part (2a) (rolling dice) using trials and print your simulations estimate for the probability of this event.
d) Using your results from part (c) plot a histogram detailing the distribution of the sum of the two die rolls.
Part 3: Likelihoods
Given a random variable , and realizations, , of , we can assess the goodness of fit of any given model to this observed data using a likelihood function . In particular, a liklihood function returns the probability of the observed data assuming the data was generated by a given model , and thus encodes the strength of in modeling . By comparing the liklihoods of different models on the same dataset, we thus have a way to compare the relative strength of models in fitting data, with models showing a high likelihood on a given dataset being preferred over those that perform poorly.
As an example, given data , and assuming each datum was generated as a binomial random variable with success rate and number of trials , the liklihood of this data under this model will be given:
where means product from .
For reasons of computational tractability, we will rarely work directly with raw likelihoods, instead choosing to use the log of these liklihoods as our measure of model fitness:
.
a) Why might log likelihoods be preferable to regular likelihoods in practice (hint: Think of a trial of size , where each datum is independent).
001 = 𝑛
𝑀
0001 = 𝑛
))𝑀(𝐿(gol = )𝑀(𝐿𝐿
𝑥−𝑛)𝑝 − 1( 𝑥𝑝)𝑖𝑥( 1=𝑖∏ = )𝑀(𝐿
𝑖𝑖𝑛
𝑛
)𝑀(𝐿
𝑋 𝑛𝑥,…,2𝑥,1𝑥
𝑀 𝑋
𝑛:1=𝑖 1=𝑖∏ 𝑛
𝑛𝑝 𝑛𝑥,…,2𝑥,1𝑥
𝑋𝑀

b) Suppose you have been given the following data following the outcome of some repeated experiment:
[57, 44, 52, 57, 47, 55, 49, 48, 51, 42, 51, 54, 54, 55, 52, 52, 50, 47, 56, 43, 49, 54, 53, 58, 53, 49, 43, 58, 53, 45,
48, 53, 50, 50, 54, 53, 54, 56, 53, 45, 47, 52, 42, 49, 45, 42, 51, 47, 43, 47, 50, 50, 55, 57, 49, 46, 40, 46, 45, 52, 48, 49, 43, 55, 46, 53, 50, 57, 53, 49, 44, 54, 47, 56, 54, 49, 58, 57, 47, 46, 58, 45, 49, 56, 50, 43, 42, 49, 49, 53, 48, 52, 54, 40, 48, 52, 40, 46, 49, 45, 41, 48, 52, 53, 41, 48, 49, 47, 53, 52, 53, 50, 47, 49, 55, 54, 51, 46, 48, 53, 48, 49, 48, 44, 43, 59, 49, 44, 40, 57, 52, 49, 48, 57, 56, 52, 46, 50, 53, 50, 61, 62, 51, 59, 52, 51, 52, 55, 52, 55, 51, 54, 49, 56, 42, 49, 59, 56, 47, 41, 47, 50, 52, 52, 43, 54, 53, 51, 51, 42, 46, 48, 55, 48, 55, 53, 44, 49, 47, 53, 57, 57, 53, 50, 47, 53, 48, 44, 59, 50, 50, 45, 50, 56, 51, 54, 46, 52, 54, 53, 53, 44, 56, 47, 42, 50, 50, 52, 52, 51, 54, 50, 50, 44, 51, 47, 49, 51, 48, 46, 41, 52, 52, 52, 46, 54, 50, 45, 53, 55, 44, 51, 62, 47, 49, 52, 53, 49, 51, 55, 51, 55, 54, 45, 57, 49, 56, 46, 48, 60, 51, 47, 47, 52, 46, 53, 45, 42, 47, 54, 51, 52, 51, 41, 52, 49, 49, 49, 47, 53, 66, 55, 49, 57, 45, 50, 47, 51, 44, 45, 52, 40, 49, 54, 47, 56, 47, 49, 47, 56, 48, 51, 49, 47, 47, 46, 45, 63, 51, 45, 56, 54, 51, 46, 49, 45, 51, 56, 52, 46, 58, 45, 46, 47, 47, 40, 52, 48, 54, 50, 44, 41, 45, 55, 41, 51, 57, 50, 48, 54, 45, 48, 53, 43, 55, 46, 46, 42, 52, 59, 46, 46, 46, 58, 53, 54, 50, 53, 50, 40, 50, 46, 49, 51, 52, 43, 45, 48, 50, 46]
48, 50, 50, 51, 51, 44, 44, 47, 57, 54, 59, 48, 40, 40, 48, 48, 47, 48, 47, 51, 56, 48, 52, 57, 44, 53, 46, 50, 56, 42, 64, 46, 47, 51, 39, 44, 54, 48, 53, 50, 47, 54, 61, 45, 49, 55, 53, 56, 51, 59, 57, 49, 57, 49, 54, 46, 50, 53, 44, 45, 49, 56, 47, 44, 50, 46, 56, 57, 50, 54, 53, 54, 42, 48, 51, 42, 51, 52, 46, 50, 57, 49, 43, 48, 58, 47, 53, 50, 55, 58, 47, 51, 46, 42, 56, 51, 49, 48, 57, 45, 52, 49, 58, 54, 52, 46, 46, 55, 49, 55, 49, 48, 53, 42, 52, 47, 56, 49, 59, 55, 64, 53, 46, 52, 54, 45, 44, 48, 49, 45, 54, 58, 46, 51, 47, 48, 46, 41, 47, 55, 51, 46, 52, 55, 54, 43, 52, 50, 43, 57, 50, 47, 52, 43, 53, 49, 50, 45, 51, 57, 56, 46, 50, 50, 41, 41, 62, 51, 51, 51, 53, 59, 54, 47, 43, 44, 44, 43, 51, 53, 47, 50, 54, 51, 53, 48, 49, 44, 46, 59, 43, 47, 48, 53, 54, 55, 49, 44, 43, 52, 57, 52, 49, 53, 44, 51, 46, 50, 56, 46, 60, 50, 46, 50, 51, 53, 48, 49, 55, 57, 52, 54, 51, 40, 57, 60, 45, 55, 48, 59, 40, 45, 47, 50, 52, 56, 60, 55, 59, 47, 48, 49, 50, 49, 40, 58, 48, 46, 49, 60, 56, 50, 45, 51, 53, 40, 53, 50, 49, 58, 52, 52, 60, 53, 53, 53, 48, 53, 55, 54, 55, 50, 55, 51, 49, 51, 52, 51, 47, 51, 48, 52, 51, 56, 48, 52, 52, 45, 50, 51, 49, 42, 47, 57, 51, 50, 44, 45, 53, 52, 49, 58, 52, 46, 64, 49, 48, 45, 59, 50, 51, 56, 55, 47, 46, 47, 50, 47, 56, 42, 57, 54, 50, 47, 55, 47, 57, 51, 46, 50, 50, 50, 49, 44, 40, 57, 54, 61, 51, 60, 55, 62, 54, 44, 46, 49, 57, 56, 49, 58, 49, 53, 49, 50, 62, 55, 54, 46, 50, 56, 55, 54, 52, 50, 55, 44, 58, 46, 44, 52, 51, 56, 59, 51, 52, 49, 44, 56, 37, 45, 60, 57, 49, 48, 41, 49, 54, 42, 47, 36, 61, 55, 51, 42, 47, 56, 41, 49, 50, 38, 37, 45, 52, 41, 45, 49, 59, 49, 49, 45, 50, 47, 50, 42, 54, 54, 53, 41, 44, 58, 52, 44, 50, 57, 50, 50, 46, 52, 56, 49, 49, 53, 40, 47, 55, 50, 47, 40, 51, 54, 42, 44, 56, 45, 57, 50, 49, 46, 41, 43, 47, 53, 56, 45, 51, 50, 39, 45, 45, 55, 53, 44, 57, 48, 55, 52, 51, 54, 47, 46, 55, 46, 49, 46, 59, 51, 54, 44, 48, 49, 50, 45, 49, 50, 54, 50, 58, 47, 53, 52, 43, 43, 50, 46, 48, 49, 50, 43, 48, 53, 47, 47, 45, 49, 51, 51, 41, 40, 45, 45, 51, 53, 43, 42, 53, 59, 56, 59, 44, 44, 56, 54, 51, 51, 60, 50, 53, 43, 52, 44, 53, 51, 48, 47, 53, 51, 44, 59, 54, 58, 59, 48, 47, 46, 60, 50, 51, 46, 40, 49, 58, 57, 48, 45, 47, 50, 49, 52, 49, 50, 42, 40, 40, 42, 41, 56, 55, 41, 45, 53, 46, 44, 54, 44, 53, 46, 54, 53, 47, 46, 51, 51, 44, 49, 49, 52, 48, 52, 53, 65, 51, 51, 47, 49, 61, 53, 58, 41, 46, 50, 54, 57, 40, 51, 57, 55, 43, 44, 50, 47, 59, 46, 47, 52, 49, 51, 58, 50, 45, 46, 48, 50, 40, 46, 47, 47, 56, 49, 43, 42, 45, 50, 48, 45, 56, 55, 56, 49, 49, 53,
b) Print the (log) likelihood of this data for the following 9 binomial models, each with n=100 trials, and success parameters:
= [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]
c) Generate a bar chart of the log likelihoods corresponding to each model goodness of fit to this data.
d) Based on this plot, what is your estimate of the success parameter for this process? (Which model best fits this data?)
𝑝
𝑝

Homework Part 1:
Compute the probability of the following four events “by hand”. If you prefer to not show work in notebook, attach a photo of clearly written paper notes.
a) Probability of rolling two sixs and an even number when rolling three 6 sided die.
b) Probability of rolling three 6 sided die which sum to 15
c) Probability of getting four of a kind when drawing 7 cards from a 52 card deck
d) Probability of a full house (3 of 1 kind, 2 of another) when drawing 6 cards from a 52 card deck, where the sixth card is different from the other two kinds of card.
e) For (a,b,c,d), write a simulation to verify your results using trials. For each simulation, print your the estimate of the probability of these events, , generated from your simulation, and plot a histogram showing the raw outcomes of your trials. Be sure to choose an appropriate range, and number of bins, for each plot to ensure a good presentation of your data.
00001 = 𝑛
̂𝑝

Part 2:
Each morning, just before dawn, Professor Mimno awakes to the sounds of birds chirping in his backyard. Arising from his slumber, he then proceeds to his yard, and records the number of birds visible on the treeline horizon. For homework, you have been tasked with deducing a possible model which might describe the variable the total number of birds seen by Professor Mimno each week during the Winter months, using the observed weekly bird count list below:
[28,14,40,30,22,24,16,11,15,19,20,39,21].
a) Why might a Poisson model be reasonable for this task?
b) Supposing these totals were monthly through the year, instead of weekly during the Winter, why might a Poisson then be a less appropriate model?
c) The log likelihood of a Poisson distribution, with rate parameters , is defined:
For in each integer from 15 to 30 inclusive, compute the log likelihood of the data above, and use a line graph to plot your results with on the -axis and log likelihood on the -axis. Note you may have some issues computing the factorial in this expression, and may want to adjust the y axis to get a good plot; See Google for help.
d) Why are we choosing to work with log likelihoods rather than likelihoods? Can you compute the (regular) likelihood of this data?
e) Using your plot, what would your estimate be for ? I.e. what value of lambda do you believe best models the data? What is the interpretation of this parameter?
f) Compute the mean of the data. Is this number close to your value of chosen for part (e)? Why might this make sense?
𝜆
)!𝑖𝑥(gol − )𝜆(gol 𝑖𝑥 𝜆
1=𝑖∑
+ 𝜆𝑛− = )𝑛:1𝑥|𝜆(𝐿𝐿
𝑛
𝜆
𝑦𝑥𝜆
=𝑋
𝜆

Part 3:
Suppose you conduct an experiment, resulting in the following data:
[37, 33, 37, 40, 35, 51, 51, 39, 34, 34, 44, 40, 39, 30, 42, 43, 29,45, 35, 40, 44, 35, 41, 40, 42, 40, 34, 38, 42, 36, 38, 41, 40, 40,43, 41, 43, 36, 43, 47, 38, 36, 41, 39, 33, 36, 37, 44, 39, 43, 42,44, 40, 41, 41, 36, 37, 46, 44, 34, 40, 38, 46, 40, 41, 38, 44, 43,38, 35, 38, 41, 35, 41, 45, 32, 37, 46, 46, 36, 43, 34, 36, 49, 39,41, 36, 42, 40, 44, 46, 33, 39, 38, 43, 39, 34, 36, 35, 42]
a) Create a histogram of the data, using an appropriate range and number of bins.
b) Compute the mean and variance of this data. What type of random variable, or data generative process, might you assume has produced this data?
c) To verify your results, compute the log likelihood of the data under the following four models:
(i) Binomial, (ii) Binomial, (iii) Poisson (iv) Poisson,
and produce a (single) bar chart which displays these scores for each model.
d) Based on these likelihoods, which model presented in part (c) seems most appropriate as a model for this data? Why?
Part 4: Language Identification
In class we calculated the log likelihood of a sequence of die rolls for two different dice, and used those numbers to guess which die was more likely to have produced that sequence of rolls. We then did the same thing with the first names of students in the class. Instead of dice, we used name distributions for each year. In this problem you will do the same thing but with a sequence of letters. Instead of dice, you will compare the probability of each sequence of letters to the probability of each letter for several European languages.
Start by loading letter frequency data from the file letter_frequency.csv . This data is from Wikipedia (https://en.wikipedia.org/wiki/Letter_frequency#Relative_frequencies_of_letters_in_other_languages), collected by Adrianus Kleemans in a data file at this Github repo (https://github.com/akleemans/letter-frequency).
a) Look at the contents of the file letter_frequency.csv in a text editor, or through Jupyter. What is unusual about this CSV file?
b) Use the function pandas.read_csv to load the letter frequencies file. Consult the documentation for this function to specify the correct delimiter and to use the field Letter as the index column. Save the output in a variable letter_data .
The numeric values in the file are percentages, but we want probabilities. Multiply the letter_data data frame by 0.01 . Use the function head to display the first five rows of the data frame, to confirm that you separated the fields correctly. The value for a in French should be 0.07636.
04 = 𝜆 06 = 𝜆 6. = 𝑝 ,001 = 𝑛 4. = 𝑝 ,001 = 𝑛

letter_data = 0.01 * pandas.read_csv(“letter_frequency.csv”, delimiter=
“;”, index_col=”Letter”)
letter_data.head()
In [ ]:
c) Create a variable polish and set it equal to the column for “Polish” from the data frame. Print the first 26 values of this series. Which English letters are not used in Polish?
d) Use the .loc accessor to get the row in the original data set for “c” . Print this row. Which language does not use the letter c?
e) Create a function get_scores that takes one argument, a string s .
In this function:
Set s equal to s but lower case.
Use the function numpy.zeros to create a variable language_scores that is an array of 14 zeros (one for each language).
Write a for loop that iterates over each letter in s . Remember that a Python string is an array of letters! If the letter is in the index for letter_data , add the log of the row for the letter to language_scores (use
numpy.log ).
Return language_scores .
f) Use the get_scores function to evaluate the log probability of the string “abc” . The log function may produce errors, you can ignore these. The value for French should be -10.705159 and -inf for Icelandic.
Which language is most likely to have produced this string? Why is Icelandic negative infinity?
g) In the next section I have selected several short passages from Wikipedias in different languages. These languages are (in alphabetical order) Dutch, Finnish, German, Icelandic, Italian, Polish, and Portuguese. Attempt to identify each language by computing log-likelihood of each observation under each language model. Compare your guess with the result you get by using Google Translate to auto-detect the language of each passage. If your guess doesn’t agree with the auto-detected language, comment on why that might be.
In [ ]:
In [ ]:
get_scores(“Shorttrack is een schaatsrace op een ijshockeybaan. In tegen
stelling tot het langebaanschaatsen is de tijd van een rijder niet van b
elang: vier tot zes rijders starten tegelijk en wie als eerste over de f
inish komt, heeft gewonnen.”)
get_scores(“Per poetica e pensiero di Alessandro Manzoni si intendono le
convinzioni poetiche, stilistiche, linguistiche ed ideologiche che hanno
delineato la parabola esistenziale e letteraria di Manzoni dagli esordi
giacobini e neoclassici fino alla morte.”)

get_scores(“Kiinalaisen Wuhanin kaupungin hallinto määräsi koronavirukse
n oireita osoittavat henkilöt erityiselle karanteenivyöhykkeelle hallint
opakon uhalla.”)
In [ ]:
get_scores(“Útganga Breta úr Evrópusambandinu eða í daglegu tali Brexit
(sambland af ensku orðunum British „breskur“ og exit „útganga“) var úrs
ögn Bretlands úr Evrópusambandinu (ESB).”)
In [ ]:
In [ ]:
In [ ]:
In [ ]:
get_scores(“Ihre Bausteine sind vier verschiedene Nukleotide, die jeweil
s aus einem Phosphatrest, dem Zucker Desoxyribose und einer von vier org
anischen Basen bestehen.”)
get_scores(“Facebook, a maior mídia social e rede social virtual do mund
o, é fundada por Mark Zuckerberg e seus colegas Eduardo Saverin, Andrew
McCollum, Dustin Moskovitz e Chris Hughes, alunos da Universidade de Ha
rvard.”)
get_scores(“Od początku zawodowej kariery zawodnik Los Angeles Lakers. R
azem z Shaquille’em O’Nealem poprowadził zespół do trzech mistrzostw z r
zędu w latach 2000–2002. Po odejściu z Lakers O’Neala, Bryant został głó
wną postacią klubu.”)