This function approximates both the feature expectation vector E_p f(X) and the log of the normalization term Z with importance sampling.
It also computes the sample variance of the component estimates of the feature expectations as: varE = var(E_1, ..., E_T) where T is self.matrixtrials and E_t is the estimate of E_p f(X) approximated using the ‘t’th auxiliary feature matrix.
It doesn’t return anything, but stores the member variables logZapprox, mu and varE. (This is done because some optimization algorithms retrieve the dual fn and gradient fn in separate function calls, but we can compute them more efficiently together.)
It uses a supplied generator sampleFgen whose .next() method returns features of random observations s_j generated according to an auxiliary distribution aux_dist. It uses these either in a matrix (with multiple runs) or with a sequential procedure, with more updating overhead but potentially stopping earlier (needing fewer samples). In the matrix case, the features F={f_i(s_j)} and vector [log_aux_dist(s_j)] of log probabilities are generated by calling resample().
Note that this is consistent but biased.
where Zapprox = exp(self.lognormconst()).
exp(logsumexp(log p_dot(s_j) - log aux_dist(s_j)))
= exp(logsumexp(theta.f(s_j) - log aux_dist(s_j)))