HW5
Q:You may often see some people use an extra step called pre-train prior to actual training of the model. Submit a report about what the pre-train in CNN is and why we use it.
Before we have seen any data, we can know what models are reasonable by using prior probability distribution, which is a probability distribution over the parameters of a model that encodes our beliefs.
How concentrated the probability density in the prior is will tell us how weak or strong priors can be considered. A weak prior is a prior distribution with high entropy, such as a Gaussian distribution with high variance, allows the data to move the parameters more or less freely. A strong prior is a prior distribution with low entropy, such as a Gaussian distribution with low variance, determines where the parameters end up. An infinitely strong prior places zero probability on some parameters and means that these parameter values are completely forbidden, regardless of how much support the data give to those values.