4.Kullback-Leibler
divergence ( Information gain ): The Kullback–Leibler
divergence (or information divergence, information
gain, or relative entropy) is a way of comparing two distributions:
a “true” probability distribution
p(X), and an arbitrary
probability distribution q(X). If we compress data in a manner that assumes q(X) is the distribution underlying some data,
when, in reality, p(X) is the correct distribution, the Kullback–Leibler divergence is the number of average additional bits per datum necessary for compression. It is
thus defined
Although it is sometimes used as a 'distance metric',
KL divergence is not a true metric since it is not
symmetric and does not satisfy the triangle inequality (making it a semi-quasi metric).
5.Kullback–Leibler divergence of a prior
from the truth: Another interpretation of KL divergence is this: suppose a number X is about to be drawn randomly from a
discrete set with probability distribution p(x). If Alice knows the true distribution p(x), while Bob believes (has a prior) that the distribution is q(x), then Bob will be more surprised than Alice, on average, upon seeing the value of X. The KL divergence is the (objective) the expected value of Bob’s (subjective) surprisal minus Alice’s surprisal, measured in bits if the log is in base 2. In this way, the extent to
which Bob’s prior is “wrong” can be quantified in terms of how “unnecessarily surprised” it’s expected to make him.
No comments:
Post a Comment
If you have any doubt, let me know