Counts are sampled from the Poisson distribution, and so the best
way to assess the quality of model fits is to use the product of
individual Poisson probabilities computed in each bin i, or the likelihood L:
L = (product)_i [ M(i)^N(i)/N(i)! ] * exp[-M(i)]
where M(i) = S(i) + B(i) is the sum
of source and background model amplitudes, and D(i) is the number of observed counts, in bin i.
The CASH statistic (Cash 1979,
ApJ 228, 939) is derived by (1) taking the logarithm of the likelihood
function, (2) changing its sign, (3) dropping the factorial term (which
remains constant during fits to the same dataset), and (4) multiplying
by two:
C = 2 * (sum)_i [ M(i) - D(i) log M(i) ]
The factor of two exists so that the change in CASH
statistic from one model fit to the next, (Delta)C, is distributed approximately as
(Delta)chi-square when the number
of counts in each bin is high (> 5). One can then in principle use
(Delta)C instead of (Delta)chi-square in certain model
comparison tests. However, unlike chi-square, the CASH statistic
may be used regardless of the number of counts in each bin.
The magnitude of the CASH statistic depends upon
the number of bins included in the fit and the values of the data
themselves. Hence one cannot analytically assign a `goodness-of-fit'
measure to a given value of the CASH statistic.
Such a measure can, in principle, be computed by performing Monte
Carlo simulations. One would repeatedly sample new datasets from the
best-fit model, and fit them, and note where the observed
CASH statistic lies within the derived distribution
of CASH statistics. (The ability to perform Monte
Carlo simulations is a feature that will be included in a future
version of Sherpa.)
The background should not be subtracted from the data when
this statistic is used. It should be modeled simultaneously
with the source, as in this example:
sherpa> DATA source.data
sherpa> BACK background.data
sherpa> SOURCE = [source model]
sherpa> BG = [background model]
sherpa> STATISTIC CASH
sherpa> FIT