A conditional maximum-entropy (exponential-form) model p(x|w) on a discrete sample space.
This is useful for classification problems: given the context w, what is the probability of each class x?
The form of such a model is:
p(x | w) = exp(theta . f(w, x)) / Z(w; theta)
where Z(w; theta) is a normalization term equal to:
Z(w; theta) = sum_x exp(theta . f(w, x)).
The sum is over all classes x in the set Y, which must be supplied to the constructor as the parameter ‘samplespace’.
Such a model form arises from maximizing the entropy of a conditional model p(x | w) subject to the constraints:
K_i = E f_i(W, X)
where the expectation is with respect to the distribution:
q(w) p(x | w)
where q(w) is the empirical probability mass function derived from observations of the context w in a training set. Normally the vector K = {K_i} of expectations is set equal to the expectation of f_i(w, x) with respect to the empirical distribution.
This method minimizes the Lagrangian dual L of the entropy, which is defined for conditional models as:
L(theta) = sum_w q(w) log Z(w; theta)
- sum_{w,x} q(w,x) [theta . f(w,x)]
Note that both sums are only over the training set {w,x}, not the entire sample space, since q(w,x) = 0 for all w,x not in the training set.
The partial derivatives of L are:
dL / dtheta_i = K_i - E f_i(X, Y)
where the expectation is as defined above.
Methods