Gregory-Loredo Variability Probability

The information in this why topic is taken from the "Effectiveness of the Gregory-Loredo Algorithm for Detecting Temporal Variability in Chandra Data" memo.

Summary

The Gregory-Loredo variability algorithm is one of the tests used to detect time variability in sources. Specifically, it provides the probability that the flux calculated from the source region is not constant throughout the observation. The Gregory-Loredo algorithm determines variability based on an odds ratio that the arrival times of the events within the source region for each science energy band are not uniformly distributed in time. The results of the Gregory-Loredo variability test are recorded in the columns var_prob and var_intra_prob / var_inter_prob in the Source Observations Table and Master Sources Table, respectively.

The algorithm is insensitive to the shape of the light curve, something that is a known problem with the current implementation of the Kolmogorov-Smirnov (K-S) Test. It also does not overinterpret the data in low count rate sources, requiring a statistically significant deviation from a flat distribution before yielding an odds ratio greater than one.

The addition of the secondary criterion - light curve fractions - results in a reliable variability test, though careful users may want to inspect the light curves of all sources with a nonzero variability index.

Background

For a detailed description of the Gregory-Loredo algorithm, refer to A New Method for the Detection of a Periodic Signal of Unknown Shape and Period (1992, ApJ 398, 146). Although the algorithm was developed for detecting periodic signals, it is a perfectly suitable method for detecting plain variability by forcing the period to the length of the observation.

The implementation of the Gregory-Loredo algorithm for the CSC consists of:

calculating the odds ratio
analyzing the light curve fractions
assigning a variability index

Calculating the Odds Ratio

N events are binned in histograms of m bins, where m runs from 2 to m_max. The algorithm is based on the likelihood of the observed distribution

$n_1, n_2, ..., n_m$

occurring. Out of a total number of m^N possible distributions, the multiplicity of this particular one is

$\frac{N!}{n_1! n_2! ... n_m!}$

The ratio of the latter to the former provides the probability that this distribution came about by chance. Hence the inverse is a measure of the significance of the distribution. In this way we calculate an odds ratio for m bins versus a flat light curve. The odds are summed over all values of m to determine the odds that the source is time-variable.

The information provided by the tool includes the ¹⁰log of total odds ratio (O), the corresponding probability (P) of a variable signal, the m value with the maximum odds ratio and the odds-weighted first moment of m, as well as the characteristic time scales represented by these two values.

A sample output file with odds ratios as a function of m: gl_out.txt.

All values of P above 0.9 are variable; all values of P below 0.5 are nonvariable. The ambiguous range - 0.5 < P < 0.9 - is handled by analyzing light curve fractions.

Light Curves and Analyzing Light Curve Fractions

In addition to the odds ratio, the program produces a file with the light curve evaluated at the optimal binning. This light curve is not simply the binned counts, rather, it is a weighted sum (with the weights being the odds ratios) of all light curve binnings ranging from a single bin (i.e., no variability, constant light curve) to m_max bins, with corrections for the fractional area (i.e., from dithering over chip edges and/or bad pixels and columns) in each bin. Thus, each light curve point contains a count rate that takes into account weighted contributions from all events in the entire light curve. Additionally, the standard deviation (±3σ) is provided for each point of the light curve.

As mentioned, there is an ambiguous range of probabilities: 0.5 < P < 0.9. For this range, a secondary criterion was developed; it is based on the light curve, its average σ, and the average count rate.

The program calculates the fractions f₃ and f₅ of the light curve that are within 3σ and 5σ, respectively, of the average count rate. If f₃ > 0.997 and f₅ = 1.0 for cases in the ambiguous range, the source is deemed to be nonvariable.

Dither and the Light Curve:

When searching for variability, the algorithm flags the special case of sources with characteristic times that are harmonics of one of the two dither periods (707 s and 1000 s for most ACIS observations). The program normalizes the dither out of the light curve, so that the dither does not affect the search for variability. As for the Kolmogorov-Smirnov and Kuiper's tests, the dither correction is a geometrical area correction, and does not take into account any spatial dependence of the chip responses. For example, if a soft X-ray source dithers from a frontside-illuminated chip to a backside-illuminated chip, the different soft X-ray responses of the two chips could introduce a dither period-dependent modulation of the detected counts. The current catalog procedures do not correct for such a possibility; however, warning flags are set if sources dither across chip edges, and a dither warning flag is set of the variability occurs at the harmonic of a dither frequency.

Additionally, in contrast to the Kolmogorov-Smirnov and Kuiper's tests, the Gregory-Loredo test implicitly assumes that any fractional area corrections are completely uncorrelated with any intrinsic time scales of variability in the light curve. In cases where this assumption is violated, the Gregory-Loredo test might lose sensitivity.

For detailed information on how and why the dither correction is done, refer to the following memos:

Assigning the Variability Index

The program assigns a variability index based on the values of O, P, f₃, and f₅ (for definitions, see above):

Variability Index	Condition	Comment
0	P ≤ 1/2	Definitely not variable
1	1/2 < P < 2/3 and f3 > 0.997 and f5 = 1.0	Not considered variable
2	2/3 ≤ P < 0.9 and f3 > 0.997 and f5 = 1.0	Probably not variable
3	0.5 ≤ P < 0.6	May be variable
4	0.6 ≤ P < 2/3	Likely to be variable
5	2/3 ≤ P < 0.9	Considered variable
6	0.9 ≤ P and O < 2.0	Definitely variable
7	2.0 ≤ O < 4.0	Definitely variable
8	4.0 ≤ O < 10.0	Definitely variable
9	10.0 ≤ O < 30.0	Definitely variable
10	30.0 ≤ O	Definitely variable