The entropy dual function is defined for conditional models as
- L(theta) = sum_w q(w) log Z(w; theta)
- sum_{w,x} q(w,x) [theta . f(w,x)]
or equivalently as
L(theta) = sum_w q(w) log Z(w; theta) - (theta . k)
where K_i = sum_{w, x} q(w, x) f_i(w, x), and where q(w) is the empirical probability mass function derived from observations of the context w in a training set. Normally q(w, x) will be 1, unless the same class label is assigned to the same context more than once.
Note that both sums are only over the training set {w,x}, not the entire sample space, since q(w,x) = 0 for all w,x not in the training set.
The entropy dual function is proportional to the negative log likelihood.