/ ciao4.16 / ahelp / wavdetect.html

Jump to: Description · Examples · Parameters · wavdetect step-by-step · ABOUT THE POSITION ERRORS RA_ERR, DEC_ERR and X_ERR, Y_ERR · Changes in CIAO 4.15 · Changes in CIAO 4.12 : Duplicate source diagnostics · Bugs · See Also

AHELP for CIAO 4.16

wavdetect

Context: Tools::Detect

Synopsis

Mexican-Hat Wavelet source detection (wtransform+wrecon)

Syntax

wavdetect  infile outfile scellfile imagefile defnbkgfile [scales]
[regfile] [clobber] [ellsigma] [interdir] [bkginput] [bkgerrinput]
[outputinfix] [sigthresh] [bkgsigthresh] [falsesrc] [sigcalfile]
[exptime] [expfile] [expthresh] [bkgtime] [maxiter] [iterstop]
[psffile] [log] [verbose]

Description

wavdetect correlates the image with wavelets of different scales (selected by the user) and then searches the results for significant correlations. It is a wrapper for the tools wtransform and wrecon, and works in two steps:

Step 1: wtransform detects probable source pixels within a dataset by repeatedly correlating it with "Mexican Hat" wavelet functions with different scale sizes.
Step 2: wrecon generates a source list with information from each wavelet scale. For each source, a cell is computed that contains the majority of the source flux, and source properties are computed within that cell.

Strengths

Separates closely-spaced point sources.
Finds extended sources so long as wavelet scales are chosen appropriately.

Weaknesses

Slower than celldetect
The tool requires a lot of memory. Data structures for a 512x512 image use up 36 Mb. A 2048x2048 image requires over 300 Mb. For images larger than 2048x2048, at least 1 Gb of memory is recommended. Datasets that do not fit in physical memory will page heavily to disk and processing will run very slowly.

The "wavdetect step-by-step" section at the end of this document contains a detailed description of how the tool identifies source detections. For a description of the theory and operation of wavdetect, see the wavdetect section of the CIAO Detect Manual and the paper "A Wavelet-Based Algorithm for the Spatial Analysis of Poisson Data", by P. E. Freeman et al. (ApJS 2002, v138, p.185; astro-ph/0108429).

Caveat: PSF Support

The characterization phase of wavdetect (i.e., wrecon) can incorporate PSF information. This is accomplished through a psfmap image, congruent with the input image, whose pixel values represent the size of the PSF at that image location (in particular, the radius of a circle which encloses a specified fraction, ecf, of the PSF at a given energy; see Example 1 and mkpsfmap). Chandra data are fully supported through mkpsfmap, which accesses Chandra PSF information through CALDB; data from other missions can be supported if the user can construct a psfmap file. For example, a psfmap file for an arbitrary instrument with a PSF that does not vary significantly over the field-of-view, can be generated by

dmimgcalc xmm_img.fits none xmm_psf.fits op="imgout=((img1*0)+9.0)"
dmhedit xmm_img.fits none add BUNIT "arcsec"

which generates an XMM psfmap with a constant size of 9".

If no psfmap file is provided, wavdetect cannot compute point-spread-function sizes. The tool will still run but the source characteristics will not be reliable. If the size is wrong, then the wrong scales may be used for the determination of source parameters, and thus the source properties may be wrongly estimated. However, the *detection* process is unaffected.

Examples

Example 1

wavdetect my_input.fits source_list.fits source_cell.fits image.fits
background.fits expfile=none psffile=mypsfmap.fits

where my_input.fits is a Chandra image and mypsfmap.fits is generated from

mkpsfmap my_input.fits mypsfmap.fits energy=1.49 ecf=0.393

The output data products are a source list "source_list.fits", a source cell image "source_cell.fits" a source image "image.fits" and a normalized background image "background.fits".

Example 2

wavdetect my_input.fits source_list.fits source_cell.fits image.fits
background.fits psffile=mypsfmap.fits scales="1 2 4 8 16"

Run wavdetect with 5 scales. Note that quotes around the scale list are required.

Example 3

wavdetect "my_input.fits[events]" source_list.fits source_cell.fits
image.fits background.fits scales="2 4 8" maxiter=3 psffile=psfmap.fits

Run wavdetect on an event list, using 3 scales and 3 cleansing iterations per scale.

Example 4

wavdetect my_input.fits source_list.fits source_cell.fits image.fits
background.fits expfile=none exptime=0 psffile=psfmap.fits

Read a primary image from "my_input.fits" and a PSF map from "psfmap.fits", generating a source list "source_list.fits", a source cell image "source_cell.fits" a source image "image.fits" and a normalized background image "background.fits". No exposure map is input, and exposure time is not set. A value of 1.0 will be used to estimate rate parameters in the output source list.

Example 5

wavdetect my_input.fits source_list.fits source_cell.fits image.fits
background.fits expfile=none exptime=10000.0 psffile=psfmap.fits

Read a primary image from "my_input.fits"and a PSF map from "psfmap.fits", generating a source list "source_list.fits", a source cell image "source_cell.fits" a source image "image.fits" and a normalized background image "background.fits". No exposure map is input, and the exposure time is set to 10000.0 This value will be used to estimate rates in the output source list.

Example 6

wavdetect my_input.fits source_list.fits source_cell.fits image.fits
background.fits expfile=expmap.fits exptime=0 psffile=psfmap.fits

Read a primary image from "my_input.fits"and a PSF map from "psfmap.fits", generating a source list "source_list.fits", a source cell image "source_cell.fits" a source image "image.fits" and a normalized background image "background.fits". An exposure map, expmap.fits is input. Values from the exposure map at source positions will be used to estimate rates in the output source list.

No PSF map file is provided, so wavdetect wil not compute point-spread-function sizes. The tool will still run, but the source characteristics will not be reliable.

Example 7

wavdetect my_input.fits source_list.fits source_cell.fits image.fits
background.fits expfile=expmap.fits exptime=10000.0

Read a primary image from "my_input.fits", generating a source list "source_list.fits", a source cell image "source_cell.fits" a source image "image.fits" and a normalized background image "background.fits". Both exposure map and exposure time are set. Rates in the output source list will be estimated from exptime, scaled by the quantity (expmap value at source location)/(max expmap value).

No PSF map file is provided, so wavdetect wil not compute point-spread-function sizes. The tool will still run, but the source characteristics will not be reliable.

Example 8

wavdetect my_input.fits source_list.fits source_cell.fits image.fits
background.fits expfile=none scales="1 2 4" falsesrc=1.0

Run wavdetect with 3 scales, specifying a false source rate of 1 per scale. The user should thus expect (on average) 3 false sources in the output sourcelist.

No PSF map file is provided, so wavdetect wil not compute point-spread-function sizes. The tool will still run, but the source characteristics will not be reliable.

For other wavdetect examples, please see the CIAO Science Thread Running wavdetect.

Parameters

name	type	ftype	def	min	max	units	reqd	autoname
infile	file	input					yes
outfile	file	output					yes	yes
scellfile	file	output					yes	yes
imagefile	file	output					yes	yes
defnbkgfile	file	output					yes	yes
scales	string		2.0 4.0
regfile	file							yes
clobber	boolean		no				no
ellsigma	real		3.0
interdir	file	output	.				no
bkginput	file	input					no
bkgerrinput	boolean		no				no
outputinfix	file						no
sigthresh	real		1e-06				no
bkgsigthresh	real		0.001				no
falsesrc	real		-1.0				no
sigcalfile	file	ARD	$ASCDS_CALIB/wtsimresult.fits				no
exptime	real		0			seconds
expfile	file	input
expthresh	real		0.1				no
bkgtime	real		0				no
maxiter	integer		2				no
iterstop	real		0.0001				no
psffile	file
log	boolean		no				no
verbose	integer		0	0	5		no

Detailed Parameter Descriptions

Parameter=infile (file required filetype=input)

Input image file. For images larger than 2048x2048, at least 1 Gb of memory is recommended.

Pixel values should be reasonable numbers. WAVDETECT probability distributions have been derived assuming Poisson statistics. WAVDETECT simply will not work with fluxed images. The calculated correlation values are so small that no detections occur.

Parameter=outfile (file required filetype=output autoname=yes)

File name of output source list.

If auto-naming (outfile=.) is used, the output file will have the suffix "_src"

Parameter=scellfile (file required filetype=output autoname=yes)

File name of the image showing the source cells, which delimit the image pixels which are used to estimate source properties (e.g. count rate).

If auto-naming is used (scellfile=.), the output file will have the suffix "_scell"

Parameter=imagefile (file required filetype=output autoname=yes)

Output file containing the reconstructed source image.

If auto-naming is used (imagefile=.), the output file will have the suffix "_image"

Parameter=defnbkgfile (file required filetype=output autoname=yes)

Default normalized background.

During its second stage, wavdetect makes one normalized background estimate from the stack of per-scale normalized backgrounds produced during the first stage. It writes this computed background to this file. If auto-naming is used (defnbkgfile=.), the output file will have the suffix "_nbkg"

Parameter=scales (string default=2.0 4.0)

The wavelet radii, in pixels, are given by this list of numbers. The list must be in quotes and can be separated by spaces, semicolons (;), or commas (,).

Wtransform produces a complete set of outputs for each scale. Small scales tend to detect small features, and larger scales larger features. The default parameter file only specifies 2 wavelet scales (2 and 4). This is fine for experimentation, but for completeness a scale list of "1 2 4 8 16" would be a reasonable default for Chandra data.

Values are typically 2**n, where n is an integer, and n_lo and n_hi are chosen with respect to instrumental PSF sizes. For the ROSAT PSPC, for instance, this could mean n_lo = 1 pixel, n_hi = 16 pixels. Note that larger scales might need to be used to characterize (i.e. derive good source property estimates for) extended sources.

2**n should not be larger than the size of the image (in pixels) divided by 5 (in which case the extent of the wavelet function is larger than the image itself, which could lead to strange results).

Note that wavdetect has only one parameter to specify scale size. If users want control over the xscale and yscale separately, or if they want to specify which scale sizes will be used for flux estimates, or if they want to experiment with many different runs of the second part of the process without repeating the initial phase, then they should run wtransform followed by (multiple) runs of wrecon.

Parameter=regfile (file default= autoname=yes)

File for ASCII region output.

If auto-naming is used (regfile=.), the output file will have the suffix "_reg". NB: autonaming for regfile is not currently operational.

Parameter=clobber (boolean not required default=no)

If set to "yes", existing output files will be overwritten.

Parameter=ellsigma (real default=3.0)

Size, in sigmas, to make the elliptical source regions.

ellsigma is a multiplicative factor applied to sigma, the standard deviation of the distribution, to scale the major and minor axes of the ellipses for each source. ellsigma affects both the outfile and the ASCII region file (regfile). This feature is included so that the graphics overlay will be more visible and under the user's control. Often a value greater than 3 is helpful.

Parameter=interdir (file not required filetype=output default=.)

Directory for intermediate results.

Parameter=bkginput (file not required filetype=input default=)

Pre-determined background input image.

Use a previously computed background in the specified file in place of constructing a new default normalized background. When using bkginput, enter "none" for the default normalized background file.

Parameter=bkgerrinput (boolean not required default=no)

If yes, use background error in file bkginput[2].

Parameter=outputinfix (file not required default=)

If needed to avoid file naming collisions with other wavdetect users, set to a unique [string] to give the set of outputs differing names. The string will be embedded in the file names.

Parameter=sigthresh (real not required default=1e-06)

Threshold for identifying a pixel as belonging to a source.

*After* the final background estimate B is made, the threshold correlation is recomputed by solving for C_o:

sigthresh = integral(C_o dC) PSD(C(B))

Here, C = avg[W*D], where D are the *raw* data. (The iterations are simply to compute background. Once we have that, we go back to the beginning.)

If C is less than or equal to C_o in a pixel, then that pixel is considered to be associated with a source. If the pixel is also the location of a local maximum in C-space, the location of that maximum is output to a FITS table.

sigthresh should not be significantly larger than one over the number of pixels in the image; for a 1024x1024 image, that means sigthresh ~ 1.e-6. If, in this case, sigthresh = 1.e-5, there will be approximately 10 strong background fluctuations detected as sources.

sigthresh should not be smaller than *roughly* 1.e-9 - 1.e-10; even at this point, the accuracy of the computed detection thresholds is unknown (though probably fine for most applications).

Parameter=bkgsigthresh (real not required default=0.001)

Significance threshold for cleansing data during iterations.

Significance threshold for cleansing data during iterations. In each pixel, during each iteration, the background, and correlation of the wavelet function and the current data (C' = avg[W*D']), are estimated. The background estimate B' implies a probability sampling distribution (PSD) for C', i.e. the distribution that C' would have if there was locally no source in the image and D' were sampled from background. A threshold C_o' is calculated from this PSD, using the supplied value of bkgsigthresh (e.g. 10**-2) by solving the following for C_o':

bkgsigthresh = integral(C_o' dC') PSD(C'(B'))

If C' is greater than or equal to C_o', the data in the pixel is replaced with the background estimate. In this way, putative sources are eliminated from the data image, along with weak (but otherwise undetectable) sources and background fluctuations, so that a better background estimate can be made.

This value should be no larger than 0.05 (the usual 5 per cent, or 95 per cent, statistical criterion for rejecting the null hypothesis, which is that the pixel in question has data sampled solely from the background); it should not be smaller than sigthresh.

Parameter=falsesrc (real not required default=-1.0)

Number of false sources allowed per scale in the image.

That number is combined with image size and scale information, using additional simulation results contained in the file defined by the sigcalfile parameter, to determine the threshold significance for each pixel; consequently the value given for the parameter sigthresh is ignored. If falsesrc is set to a negative number, the code uses sigthresh instead to determine thresholds.

The calibrated range of wavdetect is 0.1 to 1e-8 false sources per pixel. Depending on the significance settings and the input image size, users can go outside of this range in which case the results become unpredictable

Parameter=sigcalfile (file not required filetype=ARD default=$ASCDS_CALIB/wtsimresult.fits)

Data file for use by the false-source algorithm.

The file contains simulation results that are used with the false-source algorithm. This file is included with with the release and may be used for both Chandra and non-Chandra data. The user may not specify "CALDB" for this parameter.

Parameter=exptime (real default=0 units=seconds)

Time of the exposure, in seconds.

If set to zero and no exposure file is used, the program sets the exposure time to 1.0 when estimating sorce rates.

If set to a non-zero value and no exposure file is used, the program uses the input value when estimating source rates.

If set to zero and an exposure file is used, the program uses the value from the exposure file when estimating source rates.

If set to a non-zero value and an exposure file is used, the program uses the quantity exptime*(expfile value at source/max expfile value) when estimating source rates.

Parameter=expfile (file filetype=input default=)

Exposure file. Image of the data set exposure time.

Parameter=expthresh (real not required default=0.1)

If relative exposure is less than the parameter expthresh, then the pixel in question is not analyzed

For each pixel, a relative exposure is calculated (pixel exposure value over maximum value of exposure in map). If this relative exposure is less than the parameter expthresh, then the pixel in question is not analyzed: the background, correlation, et al., are set to zero.

expthresh should not be less than 0.1 (or else the accuracy of normalization will be too low); if set close to 1, only very limited regions of the FOV will be considered when source correlation maxima are listed. Typical values would be 0.1-0.2.

Parameter=bkgtime (real not required default=0)

Livetime of predetermined background.

Livetime of predetermined background. This parameter should only be used *if* the user has provided a normalized background image (rather than having wavdetect calculate it). A value of 0 will make the exposure time used for background time. If bkgtime is non-zero and no background map is supplied, the source parameters will be adversely affected (i.e. wrong).

Parameter=maxiter (integer not required default=2)

Maximum number of cleansing iterations per scale pair. If maxiter is greater than necessary, the program will usually stop on its own (see iterstop).

Parameter=iterstop (real not required default=0.0001)

If the ratio of *newly* cleansed pixels to the overall number of pixels in the dataset is less than the parameter iterstop, the program quits iterating and uses the current background estimate as the final background estimate.

The user specifies how many iterations the program should go through to calculate the background. However, the background is for all intents and purposes calculated if there are very few new pixels being cleansed (see bkgsigthresh). If the ratio of *newly* cleansed pixels to the overall number of pixels in the dataset is less than the parameter iterstop, the program quits iterating and uses the current background estimate as the final background estimate.

A typical value is 0.001: stop iterating when only one of every thousand pixels is being cleansed. This parameter should not be larger than 1, nor smaller than one over the number of pixels in the dataset.

Parameter=psffile (file default=)

Table of PSF size data.

The observation-specific PSF file should be created by running the mkpsfmap tool with the image that will be used as the wavdetect infile.

Parameter=log (boolean not required default=no)

If set to "no", log information will go to stdout. If set to "yes", file "wavdetect.log" will be created.

Parameter=verbose (integer not required default=0 min=0 max=5)

Level of log output. 0=none, 5=highest.

wavdetect step-by-step

`wavdetect' operates on its input in two stages: first (in `wtransform'), it detects putative source pixels within a dataset by repeatedly correlating it with "Mexican Hat" wavelet functions with different scale sizes. At each scale, the original image is correlated with the wavelet. Pixels with sufficiently large positive correlation values are removed from the image (as assumed sources), and subsequent correlations are performed at the same scale. This procedure of extracting source pixels from an image is called "cleansing". Typically, when very few source candidates are being found, or when an iteration-count limit is reached, the cleansing process stops. At this point, the estimated background (estimated from the cleansed image) is used to set detection thresholds, which are applied to the initial correlation image array values to identify putative sources. A set of outputs for the given wavelet scale is generated: a table of candidate sources (identified by correlation maxima), an image of the correlation of the data with the wavelet function, an image of the normalized (or flat-field) background (the image minus the "source" pixels), and the normalized background error image. Then the tool moves on to the next scale and repeats the process with a fresh copy of the image.

The second stage (`wrecon') generates a source list from information from the first stage at each wavelet scale [correlation maxima tables, normalized (flat-field) background images (with errors), and correlation images]. Each correlation maximum from stage 1 is tested to see if it represents an independent, new, source, or a source seen at other scales. For each source, a cell is computed that contains the majority of the source flux, and within that cell, source properties (location, count flux, etc.) are computed.

The net counts calculated by wavdetect are not corrected for any instrument effects such as vignetting, detector QE/U, dead area, pileup, or PSF fractions.

ABOUT THE POSITION ERRORS RA_ERR, DEC_ERR and X_ERR, Y_ERR

The X_ERR and Y_ERR are simple (net) counts-weighted variances around the centroid; RA_ERR & DEC_ERR are simply scaled to get to their correct units. So they represent a statistical 1-sigma error.

However, there are many places where systematic errors can be introduced.

The quality of the background, which can be affected by choice of wavelet scales. (This assumes that the tool is run such that wavdetect generates & uses its own background.)
There can be some shifts between where the simple centroid is located and the true source location, especially for sources far off-axis.
The data processing: pixel randomization, pixel quantization, aspect reconstruction, etc.

None of these are accounted for in wavdetect's error estimates.

Changes in CIAO 4.15

There have been two changes that may change the reported sources: higher-precision is used during convolution, which provides a larger dynamic range and less volatility of the results close to the threshold; and duplicated sources are now excluded.

The spatial columns in the output table will now contain WCS transformation. This may mean that these virtual column names may be different from those created with previous versions of CIAO: rather than starting with EQSRC they will use the names of the WCS transformation in the input file.

Changes in CIAO 4.12 : Duplicate source diagnostics

wavdetect will sometimes output nearly identical sources. While the root cause of this is now understood, the expected fix will likely cause undesirable changes to the output source lists. As an intermediate solution wavdetect now writes 6 additional diagnostic columns that can help users identify and assess this condition. The new columns are

CORRELATION_MAX(X_CORRELATION_MAX,Y_CORRELATION_MAX) : The location of the correlation maximum with the wavelet. For well behaved sources, the correlation maximum will be close to the source centroid (X,Y), and will be close to the location of the flux maximum. Some duplicate sources will have large distances between the correlation_max and the flux_max.
FLUX_MAX(X_FLUX_MAX,Y_FLUX_MAX) : The location of the local maximum closest to the correlation maximum. (See correlation_max discussion above.)
SCALES(X_WAVELET_SCALE,Y_WAVELET_SCALE): the actual wavelet scale the source was reconstructed using. Users should find that the scales are different for duplicate sources.

Bugs

Incorrect results using PSF map, psffile, in phyical pixel units.

wavdetect only knows how to convert a PSF map in arcsec units to logical pixel size to match the input image. All other units are ignored and the PSF map is treated as having logical pixel size values.

If the PSF map was created with physical pixel size values, and the image was binned by any value other than 1, then the PSF map values will not be correctly converted to match the input image. This may result in a different set of source detections and|or may result in the properties of those sources, including the radii, being wrong.

Users can check the units of the PSF map pixel values using dmlist, such as

unix% dmlist mypsfmap.fits cols
--------------------------------------------------------------------------------
Columns for Image Block broad.psfmap
--------------------------------------------------------------------------------
 
ColNo  Name                 Unit        Type             Range
   1   mypsfmap.fits[2613,3222] logical      Real4(2613x3222) -Inf:+Inf            
...

This example shows a PSF map with units of logical pixel size.

Autonaming of ASCII region files uses the .fits extension instead of .reg as would be expected.

In general, autonaming only works for simple cases.

PSF_SIZE equals NaN

The output PSF_SIZE and PSFRATIO columns may contain a value of NaN even for sources with otherwise valid properties (eg position).

Caveats

RA_ERR and DEC_ERR values near poles

The RA_ERR and DEC_ERR values are approximations and become increasing inaccurate the closer to the poles, DEC=+/-90.

Users may also see incorrect values for these errors for RA values very near 0|360 such that the error bar crosses the boundary and wraps around.

Workaround:

The X_ERR and Y_ERR values in physical pixels is correct. Users can use these together with the known plate scale to compute their own error estimates.

Problems running multiple wavdetect runs in parallel.

Multiple instances of wavdetect cannot reliably be run at the same time on the same computer with the default parameter settings. Beyond needing to change the PFILES environment variable, wavdetect also creates many temporary files (and lists of files) that need to be kept separate. These files are not locked and may get overwritten or clobbered by jobs running in parallel on the same machine.

Workaround:

Users should set the ASCDS_WORK_PATH environment variable before running wavdetect to a separate, unique location for each instance of wavdetect that is being run.

(ba)sh: export ASCDS_WORK_PATH=$PWD/tmpdir_A
(t)csh: setenv ASCDS_WORK_PATH $PWD/tmpdir_A

Source ellipses with radii equal to 0 (or a really small number)

The radius, "R" column, in the output source list is computed as the standard deviation of the source pixels along the semi-major and semi-minor axes. If the pixels that make up a source are aligned along a single X column or Y row or it is simply a delta function (single pixel) then the standard deviation is 0. A small value, based on the image blocking, is used in this case to provide some visual indication of the sources existence.

The source ellipse smaller than the PSF

For a source to be considered real, it must be found to be significant when it is smoothed at a wavelet scale larger than the PSF. Consider two sources: one with 5 counts and one with 100 counts, each in a single pixel. It will take only a small wavelet scale to smooth out the 5 counts before it would no longer be considered significant whereas it would take a much larger wavelet scale to smooth out the 100 counts before it would no longer be considered significant. In either case, the source ellipse is based on the smallest wavelet scale it was found to be significant (whether that scale is smaller than the PSF or not). This ensure that the best source position is returned and allows close pairs to be identified.

Overlapping source regions in cell image

The input wavelet scales are independent, orthogonal basis functions that describe the local structure on the scale of each wavelet. Within a single wavelet scale, sources must not overlap but since each wavelet scale represents an independent spatial scale, the same pixels can contribute to multiple sources. This is the very nature of a wavelet decomposition. (The Fourier analog is that all pixels contribute at each frequency.) The output cell image, whose pixel values represent which source a pixel "belongs to", cannot truly capture this and only represents the last (usually largest) wavelet scale.

Net counts wrong

Wavdetect is not a photometric tool; it is a source detection tool. The derived properties including location, counts, and sizes are provisional estimates. Users must perform full analysis on sources to determine reliable results.