/ ciao4.18 / ahelp / dmfiltering.html

Jump to: Description · Bugs · See Also

AHELP for CIAO 4.18

dmfiltering

Context: dm

Synopsis

Filtering tables and images with the Data Model

Description

The CIAO Data Model provides a flexible filtering syntax which may be applied to table (event) and image files. This document describes many varieties of filtering, organized by the input file and data type or coordinates.

A few examples of region filtering are included; refer to the dmregions help file for detailed information and further examples. Users may also be interested in filtering using 2D pixel masks as described in the dmmasks help file.

For a general introduction to Data Model syntax, see "ahelp dm".

1. Table filtering on columns with real data types
2. Table filtering on columns with integer data types
3. Table filtering on columns with character string data types
4. Table filtering on columns with bit data type
5. Table filtering on columns with logical data type
6. Table filtering on vector columns
7. Compound filters
8. Exclude filters
9. Filters defined in files: using the @ syntax
10. Image filtering on logical coordinates
11. Image filtering on physical coordinates
12. Image filtering on world coordinates
13. Difference between image filtering and table filtering

1. Table filtering on columns with real data type

The simplest kind of filter is a range filter

[filter energy=1000:2000]

which specifies that the DM should only see rows in the input file which satisfy 1000.0 <= energy < 2000.0. You can use this filter by appending it to a filename or block name in any of the CIAO tools:

unix% dmcopy "a.fits[filter energy=1000:2000]" b.fits
unix% dmcopy "a.fits[events][filter energy=1000:2000]" b.fits

Note that the prefix "filter" is optional:

unix% dmcopy "a.fits[energy=1000:2000]" b.fits

You can filter on multiple quantities:

unix% dmcopy "a.fits[energy=1000:2000,time=5410300:5410320]" b.fits

You can also filter on multiple ranges for each quantity:

unix% dmcopy "a.fits[energy=1000:2000,4000:8000,grade=0,2:4,6]" b.fits

This filter accepts rows which have energies between either 1000 and 2000 or 4000 and 8000, and grades equal to 0, 2 to 4, or 6. Note that you can leave out the colon if the min and max of a range are the same, so 0:0 becomes just 0. If you want to express "less than" and "greater than", you can just retain the colon but omit the min or max, so

[energy=:4000]

means accept energies up to 4000; whilst

[detx=:511,513:]

means accept all values of detx except 511.0 <= detx < 513.0.

NaN/NULL values

Columns many contain IEEE special values NaN, Inf, and -Inf. They may also contain integer, string, or logical NULL values (special values used to identify missing or otherwise invalid data). These values will not be included when any filter is used, so

[energy=4000:]

would omit any rows where the energy value is Inf. You can filter out all IEEE special values by using

[energy=:]

an open range which includes all finite values. The IEEE special values can be included using

[energy=Null,energy=4000:]

where the 'Null' token is used to identify all NaN or NULL values. The Null token can be used in combination with other range or list filters as shown.

2. Table filtering on columns with integer data types

The interpretation is a little different depending on whether the table column has integer or real data type. For an integer data type column,

[energy=4000:5000]

means (4000 <= energy <= 5000), in other words both ends of the range are included.

The syntax also supports the special pseudo-column #row, the row number of the unfiltered file:

[#row=100:200]

3. Table filtering on columns with character string data types

Filtering is a little more restricted with character string columns. You can only use the colon, not > or <:

[filter shape=m:n,s:z]

The range m:n includes everything beginning with m, and it includes the letter n, but not other strings beginning with n: for example, "ngc" is not within m:n since in an ASCII ordering ngc > n. The comparison is case-sensitive.

4. Table filtering on columns with bit data type

Columns with bit data type are a special case. The most common example is the STATUS column in the event files, which is 32 bits wide for Chandra ACIS event files generated between launch and at least 2003 (it may be increased at a later date). Suppose for simplicity you instead have a status column with only 6 bits, and wish to accept rows with the 3rd bit from the end set and the end bit equal to zero. A simple numeric filter of the type described above would have to be very complicated to describe all the numeric values for which these bits have the desired values; instead, the user supplies a `bitmask string':

[filter status=xxx1x0]

. The string may only contain the characters: 1 (corresponding bit must be set), 0 (bit must not be set) and x (wild card: bit may have any value).

The bit pattern display convention here is that the rightmost bit displayed (with the value 0 in the example above) is the least significant bit, which is bit 32 in the usual FITS convention and bit 0 in the usual C programmers convention.

To filter rows where ALL bits are cleared (ie. 0, not set), users can use the short hand expression

[filter status=0]

. This works regardless of the number of bits in the column. Note that there is no equivalent for checking if all bits are set (ie. 1).

5. Table filtering on columns with logical data type

Logical data columns contain a value of either true (1) or false (0). When filtering on a "true" value, any of the following will work:

unix% dmlist "srclist.fits[double=1]" data
unix% dmlist "srclist.fits[double=T]" data
unix% dmlist "srclist.fits[double=TRUE]" data
unix% dmlist "srclist.fits[double=true]" data

Likewise, for "false" values:

unix% dmlist "srclist.fits[double=0]" data
unix% dmlist "srclist.fits[double=F]" data
unix% dmlist "srclist.fits[double=FALSE]" data
unix% dmlist "srclist.fits[double=false]" data

6. Table filtering on vector columns

A vector column is a column which consists of two components, such a the DET(DETX,DETY) vector column. There are two ways to filter on a vector column:

a. define a rectangular region by filtering on each of the components separately:

unix% dmcopy "evt.fits[detx=4000:5000,dety=3000:4000]" rectangle.fits

b. use a region filter, either with the vector name ("DET") or the two components in parentheses ("(DETX,DETY)"):

unix% dmcopy "evt.fits[det=circle(4500,3500,120)]" circle.fits
unix% dmcopy "evt.fits[(detx,dety)=circle(4500,3500,120)]" circle.fits

"ahelp dmregions" has detailed information on region filtering.

7. Compound Filters

The DM syntax also supports compound (logical OR) filters:

unix% dmcopy \
 "evt.fits[(ccd_id=2,chipx=512:513)||(ccd_id=7,chipx=500:520)]" \
 two_bits.fits

Note that only lists of filters separated by "||" are supported; arbitrarily-complex C-style logical expressions are not allowed.

8. Exclude filters

The DM has the ability to invert a filter in order to exclude rows or pixels instead of including them:

unix% dmcopy "evt.fits[exclude sky=region(reg.ds9)]" holes.fits
unix% dmcopy "evt.fits[exclude pha=2:100,grade=7]" clean.fits

This is particularly useful in conjunction with compound filters (see the previous section):

unix% dmcopy \
 "evt.fits[exclude (ccd_id=0:6,8:9,chipx=513)||(ccd_id=7,chipx=512:513)]" \
 better.fits

Note that the Data Model cannot combine an exclude filter with any kind of include filter. This combination will produce an error, e.g. "[exclude sky=region(mask.fits)][energy=300:5000]".

9. Filters defined in files: using the @ syntax

Filter specifications may be applied from a file rather than including them on the command line. This easily allows the same filters to be applied to multiple input files. The filter filename is preceded by an "at" symbol (@).

unix% dmlist "acis_evt2.fits[@filters.lis]" counts

where filters.lis contains:

sky=rotbox(4148.125,4043.625,7.58978,22.338761,44.516094),
pi > 100

Each complete filter must be separated with a comma, just as they would be on the command line.

The "@" syntax can also be used to apply the GTIs (Good Time Intervals) from one file to another. To filter evt2.fits using the GTIs stored in gti.fits:

unix% dmcopy "evt2.fits[@gti.fits]" evt2_filtered.fits

10. Image filtering on logical coordinates

This filter on logical coordinates selects a 256 x 100 subset of the image:

unix% dmcopy "im.fits[#1=257:512,#2=1:100]" subset.fits

A region filter such as

unix% dmcopy "img[(#1,#2)=circle(145,356,25)]" circle.fits

sets everything outside the specified circle to zero (unless the BLANK keyword is defined, in which case that value is used). An image pixel is considered inside the region if the center of the pixel is inside the region; the edge of the region is considered as being inside the region.

The default behavior of a filter like this is to also shrink the resulting image to be as small as possible surrounding the circle. To suppress this behavior, and create a big, almost empty image the same size as the input file - with a small circle of data in the middle - use the "opt full" modifier:

unix% dmcopy "im.fits[(#1,#2)=circle(145,356,25)][opt full]" circle.fits

To explicitly set the value to be used for pixels lying outside the filter, use the 'opt null' directive:

unix% dmcopy "im.fits[(#1,#2)=circle(145,356,25)][opt null=-100]" circle.fits

"ahelp dmregions" has detailed information on region filtering. For more information on DM options, see "ahelp dmopt".

CIAO also supports the PROS image subsection syntax for filtering images:

unix% dmcopy "im.fits[1:100,1:200]" pros.fits

11. Image filtering on physical coordinates

To filter on the original physical coordinates, use the physical axis names:

unix% dmcopy "im.fits[x=4096.5:4213.5,y=4142.3:6120.1]" subset.fits
unix% dmcopy "im.fits[(x,y)=circle(4096,4096,12)]" circle.img

This will give you a square image bounding the given circle, and set to zero all pixels outside the circle. If you wish to mark these pixels as "invalid", you can use the "opt null" syntax described in "ahelp dmopt".

An image pixel is considered inside the region if the center of the pixel is inside the region; the edge of the region is considered as being inside the region.

"ahelp dmregions" has detailed information on region filtering.

12. Image filtering on world coordinates

World coordinates may also be used in a region filter:

unix% dmcopy 'im.fits[(x,y)=circle(11:03:28.4,-20:11:23.2,20.1\")]' circ.img

Note: The arcsecond symbol will work on the command line, but you may cause problems with the parameter tools, such as pset.

It is not possible to use world coordinates when filtering on a one dimensional range - i.e. [x=11:03:28.2:11:03:32.1]. In this cause, it's necessary to use "(x,y)=rectangle(...)" instead.

13. Difference between image filtering and table filtering

There is a subtle difference when using a region to filter a vector column in a table (see #6) versus an image (#11). When filtering a column in a table, the real-valued, floating-point column values are tested to see if they are inside the region. When filtering an image, the center of each binned pixel is checked. This can lead to large differences, especially when the regions are very small compared to the bin size.

unix% dmcopy 'event.fits[bin x=90:110:1,y=90:110:1]' image.fits
unix% dmlist 'event.fits[(x,y)=circle(100.6,100.6,0.1)]' data
unix% dmlist 'image.fits[(x,y)=circle(100.6,100.6,0.1)]' data

The two dmlist outputs will show different number of counts. The first one, filtering the table, will show one row for each event that is located with 0.1 from (106,100.6). The 2nd dmlist will always show a [1,1] image with 0 counts, because the region used is much smaller than a pixel and doesn't include the pixel center.

Both values are correct. They are just asking slightly different questions.

One may be inclined to always use the value output from the event list since it does not suffer from image quantization effects. However, one needs to be careful to use consistent values in their analysis. For example, if you extract counts from event file, but fraction of the PSF from an image:

unix% dmlist "evt.fits[(x,y)=circle(100,100,1)]" counts
unix% dmstat "psf.fits[(x,y)=circle(100,100,1)]" sig- cen-
unix% pget dmstat out_sum

In this example, the fraction of the PSF is taken over the whole pixel whereas the events are counted only over a fraction of the pixel; thus the psf fraction has been overestimated.

When the regions are large compared to the pixel this quantization effect has less affect. The difference is often minimal with large number of counts and large (compared to the bin size) regions; and is often just presumed to be factored into the statistical uncertainty.

Bugs

See the bugs page for the Data Model library on the CIAO website for an up-to-date listing of known bugs.

Refer to the CIAO bug pages for an up-to-date listing of known issues.