贝叶斯估计是贝叶斯学派估计未知参数的主要方法,与频率学派相比,贝叶斯学派最主要的观点就是未知量是一个随机变量,在进行抽样分布之前,未知量有自己的分布函数,即所谓的先验分布。
而贝叶斯估计也就是通过引入未知量的先验分布来将先验信息和传统频率学派的总体信息和样本信息结合起来,得到一个未知量的后验分布,然后对未知量进行统计推断。
关于未知量是否可看作随机变量 在经典学派与贝叶斯学派 间争论了很长时间,后来这一观点渐渐被经典学派认同。如今两派的争论焦点已经变成了如何利用各种先验信息来合理地确定先验分布。
贝叶斯估计的基本思想
对于未知参数θ,假设其分布(先验分布)为h(Θ)。
总体分布以及样本分布都依赖于先验分布,因而将先验信息加入后的样本X与θ的联合分布(the joint conditional pdf ofX, givenΘ =θ,)变成了:
g(X,θ)=L(X∣θ)h(θ)=Πif(xi∣θ)h(θ)
Then the marginal pdf ofX is:
g1(X)=∫θg(X,θ)dθ=∫θL(X∣θ)h(θ)dθ
基于总体和样本信息,对未知参数的分布做出推断(后验分布posterior pdf,the conditional pdf ofΘ):
k(θ∣X)=g1(X)g(X,θ)=∫θL(X∣θ)h(θ)dθL(X∣θ)h(θ)
因为条件概率函数的积分值为 1,且分母与θ,所以后验分布的分布类型只与分子相关,写作:
k(θ∣X)∝L(X∣θ)h(θ)
point estimation
Mainly based on the Introduction to mathematical statistics Robert V. Hogg, Late Professor of Statistics,University of Iowa, Joseph W. McKean, Western Michigan University, Allen T. Craig,Late Professor of Statistics, University of Iowa
From the Bayesian viewpoint, performing point estimation is equal to selecting a decision functionδ,δ(X) is a predicted value ofθ (an experimental value of the random variableΘ). We just need a Loss functionL(θ,δ(X)) to help us. Assume that we use the strategy of minimizing the expected loss:
δ(X)=δargminEX[L(θ,δ(X))]=δargmin∫θL(θ,δ(X))k(θ∣X)dθ
Then we call the decision functionδ a Bayesian estimator ofθ.
ifL=(θ−δ)2, thenδ(X)=E(Θ∣X)
Generalize this to estimate a specified function ofθ, say,l(θ),we can get the Bayesian estimator ofl(θ):
EX[L(l(θ),δ(X))]=δargmin∫θL(l(θ),δ(X))k(θ∣X)dθ
And then,considering thatX is a random vector, we can see the expected loss as the condition function of loss. Here we have the expected value of the function(also called as the expected Risk):
∫x(∫θL(θ,δ(X))k(θ∣X)dθ)g1(x)dx=∫θ{∫xL(θ,δ(X))L(X∣θ)dx}h(θ)dθ
The integral within the braces in the latter expression is,for every givenθ∈Θ, the risk functionR(θ,δ); We can find that minimizing the expected risk is equivalent to minimizing the expected loss for bayes estimator.