The F parameter should be a (sparse) m x size matrix, where m is the number of features and size is |W| * |X|, where |W| is the number of contexts and |X| is the number of elements X in the sample space.
The ‘counts’ parameter should be a row vector stored as a (1 x |W|*|X|) sparse matrix, whose element i*|W|+j is the number of occurrences of x_j in context w_i in the training set.
This storage format allows efficient multiplication over all contexts in one operation.