| AHELP for CIAO 4.4 | dm |
Context: dm |
Synopsis
CIAO Data Model: syntax for filtering and binning files
Description
The CIAO Data Model (DM) is a versatile interface used by CIAO to examine and manipulate standard format datafiles (e.g. FITS, ASCII). The DM enables powerful filtering and binning of datafiles. This document is an introduction to the DM syntax used by the CIAO tools.
Table of Contents
- 1. DM Syntax and Virtual Files
- 2. Virtual Columns
- 3. Renaming and Reordering Columns
Related help files contain information and examples illustrating the capabilities of the DM. A list of these files can also be obtain from the CIAO command line with "about dm" or "ahelp -k dm".
- ahelp dmascii: using text files in CIAO
- ahelp dmbinning: creating images from event files/tables
- ahelp dmfiltering: table and image filtering
- ahelp dmregions: the CIAO region-filtering syntax
- ahelp dmopt: controlling internal DM options (setting NULL characters, providing more memory to a tool)
- ahelp coords: a discussion of the Chandra coordinate systems
- ahelp subspace : how files keep a history of the filters applied to them
Detailed technical information is available from the Introduction to the Data Model memo
1. DM Syntax and Virtual Files
The Data Model offers an easy and powerful means of filtering data. The filtered file can be directly input to a tool without writing it to disk first; this is known as a "virtual file." The virtual file, which can also be referred to as a subspace, is simply a means of defining a subset of interest in the dataset.
The basic syntax of a virtual file is:
filename[block][filter][binning][option][rename] filename[block][filter][columns][option][rename]
filename: the input filename. All CIAO tools accept FITS file input, and many also accept ASCII files. Some tools only work on event files, while others require an input image. Refer to the individual tool help files for any restrictions.
[block]: the extension of the file to use, e.g. the name of the image or table. For FITS files, the block corresponds to an HDU and may be identified by name ("[EVENTS]") or number ("[2]"). If the block is not specified, the first "interesting" block is used (e.g. [EVENTS] for an event file). To view the blocks in a file, use "dmlist file.fits blocks".
[filter]: the filter to apply to the data. It indicates, for instance, which time period, energy range, or spatial region to use (e.g. "[time=1522012:1522320,1522400:1522600]"). Refer to "ahelp dmfiltering" for a full discussion of filtering.
[binning]: the binning specification for creating an image from an event file (e.g "[bin x=10:100:1,y=1:100:1]"). Refer to "ahelp dmbinning" for a full discussion of binning.
[columns]: the names of the columns to include ("[cols time,energy,]") or exclude ("[cols -phas]"). The syntax "[cols !phas]" may also be used, but the "!" symbol needs to be written as "\!" in the Unix shell, making the "-" syntax more convenient.
[option]: advanced options for the DM, such as specifying what the NULL character should be or how much memory to allow a tool to use. Refer to "ahelp dmopt" for a list of the available options.
[rename]: the name for the block in the output file. The default behavior is for the output to have the same block name, unless a file is binned to create an image; in that case, "_IMAGE" is added to the block name. (For information on renaming columns, refer to a later section in this file.)
2. Virtual Columns
A file may contain virtual columns whose values are calculated by applying a mathematical transform to an existing column. Virtual columns - such as EQPOS(RA,DEC) - do not physically exist in the event file; they are defined by the WCS information attached to another column, e.g. SKY.
The transformation is listed in the output of "dmlist evt2.fits cols":
1: EQPOS(RA ) = (+278.3860) +TAN[(-0.000136667)* (sky(x)-(+4096.50))]
(DEC) (-10.5899 ) (+0.000136667) ( (y) (+4096.50))
For most applications, these columns may be used the same as non-virtual columns in the file. It is possible to list, filter, and bin on virtual columns.
However, filtering and binning do not work reliably on virtual columns derived from non-monotonic coordinate transforms (e.g. MSC(THETA,PHI), or EQPOS near the poles; see "ahelp coords" for more information on these coordinate systems).
3. Renaming and Reordering Columns
It is possible to rename a column or change the order of the columns within a file. Note that certain CIAO tools require particular column names (e.g. time, energy), but none of the tools make assumptions about the order of the columns within a file.
To rename a column, run dmcopy with the column syntax "newname=oldname". Multiple columns may be renamed in the same command.
dmcopy "pi.fits[cols rate=count_rate]" pi_rate.fits
dmcopy "pi.fits[cols rate=count_rate, rate_err=count_rate_err,*]" \
pi_rate_all.fits
The "count_rate" column in pi.fits is renamed to "rate" in pi_rate.fits. With the first command, "rate" will be the only column in the output file. The "*" operator indicates that all other columns should be copied unchanged to the output file.
The columns will appear in the output file in the order in which they are specified. So in the renaming case, "rate" will be the first column in the output. The "cols" syntax can be used to reorder columns without modifying them as well:
dmcopy "pi.fits[cols energy,time,pi,count_rate, *]" reorder.fits
Note that for a vector column sky(x,y),
"evt.fits[cols x,y]"
will retain the information that (x,y) is a vector column called "sky". Any of the following, however, will separate the vector components and lose the vector-dependent coordinate systems like RA and Dec:
[cols x] [cols y,x] [cols x,pha,y]
Example 1
acisf01843N002_evt2.fits[EVENTS][cols #1,#2,#3] acisf01843N002_evt2.fits[cols time,ccd_id,node_id]
Select three columns of the EVENTS block by number or by name.
Example 2
acisf01843N002_evt2.fits[#row=1:4]
Select rows 1-4 from a FITS file.
Example 3
dmlist "evt.fits[events][pha=30:200,time=10:20,50:60]" data
Use the tool dmlist to print specific data values from the file to the screen. A filter is applied to the "events" block in the file evt.fits. The filter selects rows in the table for which the value of the pha column is >= 30 and < 20, and for which the time is either >= 10 and < 20 or >= 50 and < 60. Both the the pha and time filters must be satisfied for a row to pass the filter.
Example 4
acisf01843N002_evt2.fits[EVENTS][bin x=3200:4800:4,y=3200:4800:4]
Bin an event file into an image with this input to the tool dmcopy.
Example 5
acisf01843N002_evt2.fits[EVENTS][bin pi=1:1024:1]
Use this specification as input to the tool dmextract to bin an event file into a PI spectrum.
Example 6
dmcopy "evt.fits[cols -status]" evt_new.fits
Removed the status column from evt.fits.
Bugs
General
(08 Oct 2012)
The WCS library that the DM uses has a problem computing coordinate transforms that involve the CAR transform.
(08 Oct 2012)
If a users forgets to specify the cols directive, it may lead to a segmentation violation instead of generating a useful error message.
% dmlist emap.hist"[bin_low, bin_high, counts]" cols # 62735: Received error signal SIGSEGV-segmentation violation. # 62735: An invalid memory reference was made. # 62735: segmentation fault: DMLIST (1) is: exit_upon_error->NULL
(The image doesn't have to be square; it just needs to have 8192^2 pixels.)
This condition may be met when the "update=no" option is used. Normally, when you filter a dataset, the data subspace (which describes the boundaries of each column's data and therefore is the intersection of the initial minima and maxima with any subsequent filters) gets updated to reflect the filtering. However, when you give the "update=no" option, you instruct the DM not to update the subspace to reflect the current filter. Therefore, the full ranges for x and y are used in the binning, and you get a 8192x8192 image (and a seg fault, for the reason described above).
If a logical column exists in the file, the DM will generate this warning. The output is unaffected by this bug.
The DM doesn't treat BSCALE and BZERO as structural keywords so they get copied to the output. If you have a file which is a floating point image, the addition of these keywords will create incorrect output results.
Filtering Data
(08 Oct 2012)
When filtering on WCS columns, the range is taken by converting range of the parent columns and using those as the limits of the WCS columns. When the transform is highly non-linear, eg the TAN-P transform used to go from DETX,DETY to THETA,PHI, this can leads to incorrect limits and incorrect filters. Users who want to filter on WCS columns should give explict ranges and not rely on the computed min/maxes.
bad% dmcopy "evt.fits[theta=:1]" good% dmcopy "evt.fits[theta=0:1]"
(11 Sep 2012)
The datamodel replaces '0' in filters with a small value, 1e-16. If the other values being filtered are generally smaller than this, as one might have with flux values, then the wrong set of rows or pixels will be returned.
unix% cat foo.dat
#data
0.0
1.0e-12
1.0e-14
1.0e-16
1.0e-18
1.0e-20
unix% dmlist foo.dat"[data>0]" data,clean
# data
1E-12
1E-14
unix% dmlist foo.dat"[data=0]" data,clean
# data
0
1E-16
1E-18
1E-20
For less than and greater than, users can work around this by using a small number other than 0
% dmlist foo.dat"[data>1e-90]" data,clean
# data
1E-12
1E-14
1E-16
1E-18
1E-20
"col=foo" is okay, but "col=foo,bar" isn't.
Workaround:
Use "col=foo,col=bar" instead.
When region-filtering images, you can create a vector on the fly from any two axes by using a filter like "(#1,#3)=circle(...)". Although the image is filtered correctly with a temporary vector, the region filter isn't recorded in the subspace. Hence, tools that use the filtered file don't know that pixels outside the filter region are invalid. As a result, dmstat reports no nulls in the filtered image (unless you explicitly tell the DM to set pixels outside the filter to null by using "opt null=...").
For example, setting xmax > xmin and/or ymax > ymin. Instead it appears that the Data Model simply swaps the min and max values.
The exit status of dmcopy is also incorrectly set to 0 (success):
unix% dmcopy "image.fits[#1=1:20,#2=:]" delme.fits # DMCOPY (CIAO): [ftColRead]: FITS error 308 bad first element number in dataset image.fits Block 1 PRIMARY unix% echo $status 0
Workarounds:
Omit the "#2=:" from the filter
Specify a range for both elements: [#1=1:20,#2=1:20]
For example:
unix% dmcopy "acis_img.fits[exclude sky=region(src.fits)][opt full,update=no]" filtered.fits
The regions are correctly excluded; however, the image is also clipped at the bounding box around all the excluded shapes, so the corners of a few chips are removed.
Workarounds:
-
Remove update=no. In this case, the Data Model internally inverts all exclude filters to be an inclusive filter, and correctly filters the image.
Be aware that this process is much slower if the region is large. In that case, it will also add a large region keyword to the file's header, noticeably slowing down any operation on that file.
-
For ASCII region files, it is also possible to manually invert the filter in the file. The "field()" region syntax is used to include the entire field, then remove the undesired sources. For instance,
# Region file format: CIAO version 1.0 circle(1635.5,4113.5,135.11408) circle(3975,4233,20) circle(2565.5,4129.5,40) circle(2129.5,4007.5,40)
would become
# Region file format: CIAO version 1.0 field() -circle(1635.5,4113.5,135.11408) -circle(3975,4233,20) -circle(2565.5,4129.5,40) -circle(2129.5,4007.5,40)
and the dmcopy filtering command would be
unix% dmcopy "acis_img.fits[sky=region(src.ascii)][opt full,update=no]" filtered.fits
This command:
unix% dmcopy "input.fits[sky=circle(4096,4096,100),y=4020:4100,4250,4350]" \
output.fits
fails to do the y filter altogether. This bug also applies to exclude filters.
For example, the following commands both fail:
unix% dmlist "region.fits[shape!=Annulus]" data unix% dmlist catalog.fits"[COMMENT!='weak'][cols COMMENT]" data
For example, this command does not find all instances of "PMterm" in the selected columns:
unix% dmlist stat.fits"[src=PMterm||det=PMterm||mst=PMterm]" counts 13
Compare to
unix% dmlist stat.fits"[cols det,src,mst]" data,clean | grep PMterm | wc -l 27
This example command does not work:
unix% dmlist "evt2.fits[(ccd_id=5||ccd_id=7),pha=2500:3500]" blocks
Workaround:
Rewrite to include the filter conditions in each part of the conditional.
unix% dmlist "evt2.fits[(ccd_id=7,pha=2500:3500)||(ccd_id=5,pha=2500:3500)]" blocks
ASCII Kernel
In the DM, you can normally do
unix% dmlist evt.fits"[cols ra,dec]" data
even though RA and Dec are just coordinate systems defined on the X and Y columns in the file; the DM applies the transform on the fly. This doesn't work yet for ASCII files.
DTF-FIXED header lines may be up to 1024 characters long. However, if the keyword is longer than the FITS standard, the comment is truncated.
unix% input.txt output.dtf'[opt kernel=text/dtf-fixed]'
In input.txt:
TTYPE14 = 'Class' / LV Class Exo: M = missile [B = tactical ballistic missile (except Redstone) apo=80:200] R = research rocket O = orbital LV V = RTV Y = Exo weather rocket X = Big test rocket D= Deep space launch
In output.dtf:
TTYPE14 = "Class " / LV Class Exo: M = missile [B = tactical ballistic missile (except Redstone) apo
Binning & Rebinning Images
For example:
unix% dmcopy acis.img"[bin x=::5,y=::6]" acis5x6.img
Using the same value for both axes works correctly:
unix% dmcopy acis.img"[bin (x,y)=::5]" acis5.img
unix% dmextract Input event file (ccd3.sky4.fits[y=3767:][bin sky=annulus(3786,3767,0:380:4)]): Enter output file name (rprof.fits): # dmextract (CIAO 4.0 Beta 2): WARNING: Input file, "ccd3.sky4.fits[y=3767:]", has no rows in it. Bus error
Python module
(14 Sep 2012)
The CXC datamodel allows for keywords and columns to be used interchangably; which allows users to extract a single column value using the dmKeyRead routine.
For array columns this should return the 1st array element; however, currently it returns an array the length of the original column with only the first element correctly populated.
>>> import cxcdm as cdm
>>> tab = dmTableOpen("FOV.fits")
>>> cdm.dmKeyRead( tab, "x")
(<dmDescriptor object at 0x2b7fb5b49288>, array([ 3.84334592e+003, 1.12619810e-312, 0.00000000e+000,
0.00000000e+000, 0.00000000e+000, 0.00000000e+000,
0.00000000e+000, 4.79243676e-322, 1.12619810e-312,
1.12619810e-312, 0.00000000e+000, 0.00000000e+000]))
Only the first array element above is valid; the other values are uninitialized memory.

![[CIAO Logo]](../imgs/ciao_logo_navbar.gif)