A conditional maximum-entropy (exponential-form) model p(x|w) on a discrete sample space.
This is useful for classification problems: given the context w, what is the probability of each class x?
The form of such a model is:
p(x | w) = exp(theta . f(w, x)) / Z(w; theta)
where Z(w; theta) is a normalization term equal to:
Z(w; theta) = sum_x exp(theta . f(w, x)).
The sum is over all classes x in the set Y, which must be supplied to the constructor as the parameter ‘samplespace’.
Such a model form arises from maximizing the entropy of a conditional model p(x | w) subject to the constraints:
K_i = E f_i(W, X)
where the expectation is with respect to the distribution:
q(w) p(x | w)
where q(w) is the empirical probability mass function derived from observations of the context w in a training set. Normally the vector K = {K_i} of expectations is set equal to the expectation of f_i(w, x) with respect to the empirical distribution.
This method minimizes the Lagrangian dual L of the entropy, which is defined for conditional models as:
L(theta) = sum_w q(w) log Z(w; theta)
- sum_{w,x} q(w,x) [theta . f(w,x)]
Note that both sums are only over the training set {w,x}, not the entire sample space, since q(w,x) = 0 for all w,x not in the training set.
The partial derivatives of L are:
dL / dtheta_i = K_i - E f_i(X, Y)
where the expectation is as defined above.
Methods
beginlogging(filename[, freq]) | Enable logging params for each fn evaluation to files named ‘filename.freq.pickle’, ‘filename.(2*freq).pickle’, ... |
clearcache() | Clears the interim results of computations depending on the |
crossentropy(fx[, log_prior_x, base]) | Returns the cross entropy H(q, p) of the empirical |
dual([params, ignorepenalty]) | The entropy dual function is defined for conditional models as |
endlogging() | Stop logging param values whenever setparams() is called. |
entropydual([params, ignorepenalty, ignoretest]) | Computes the Lagrangian dual L(theta) of the entropy of the |
expectations() | The vector of expectations of the features with respect to the |
fit([algorithm]) | Fits the conditional maximum entropy model subject to the |
grad([params, ignorepenalty]) | Computes or estimates the gradient of the entropy dual. |
log(params) | This method is called every iteration during the optimization process. |
lognormconst() | Compute the elementwise log of the normalization constant |
logparams() | Saves the model parameters if logging has been |
logpmf() | Returns a (sparse) row vector of logarithms of the conditional probability mass function (pmf) values p(x | c) for all pairs (c, x), where c are contexts and x are points in the sample space. |
normconst() | Returns the normalization constant, or partition function, for the current model. |
pmf() | Returns an array indexed by integers representing the values of the probability mass function (pmf) at each point in the sample space under the current model (with the current parameter vector self.params). |
pmf_function([f]) | Returns the pmf p_theta(x) as a function taking values on the model’s sample space. |
probdist() | Returns an array indexed by integers representing the values of the probability mass function (pmf) at each point in the sample space under the current model (with the current parameter vector self.params). |
reset([numfeatures]) | Resets the parameters self.params to zero, clearing the cache variables dependent on them. |
setcallback([callback, callback_dual, ...]) | Sets callback functions to be called every iteration, every function evaluation, or every gradient evaluation. |
setfeaturesandsamplespace(f, samplespace) | Creates a new matrix self.F of features f of all points in the |
setparams(params) | Set the parameter vector to params, replacing the existing parameters. |
setsmooth(sigma) | Specifies that the entropy dual and gradient should be computed with a quadratic penalty term on magnitude of the parameters. |