Short notes with a little bit of clarification.
Self information of the event is proportional to probability of event occurrence. The more likely event is the less information it holds.
It could be expressed in the following formula.
$$I(x) = - logP(x)$$
Given that $ D(x) \in [0,1] $
From graph it's clear that event's with less probability are more informative. Ideally guaranteed event shouldn't provide information. $ P(x)=1 $
More details about this topic could be found in Yoshua Bengio's book chapter Probability and Information Theory
http://www-labs.iro.umontreal.ca/~bengioy/dlbook/