Synopsis
Modify and create columns in a table
Syntax
dmtcalc infile outfile expression [clobber] [verbose]
Description
`dmtcalc' is a flexible general purpose table calculation utility. It can change existing columns add new ones using arbitrary expressions which can involve other columns - possibly smoothed - mathematical functions such as sqrt(), random numbers, values taken from the file header, and much more. It uses the same syntax as the dmgti and dmimgcalc tools: this syntax is described in detail in "ahelp dmmath".
The input expression can either be given directly on the command-line or stored in an external file and accessed as a stack - i.e. "@filename". The use of a stack allows more than one column to be created or modified at one time.
When dmtcalc starts, it will make an effort to assure that any data dependencies in the expressions (referenced columns or keywords, or new columns defined in previous expressions) are satisfied, that there are not any data type conflicts within the expressions, and that there are no conflicts of array size or dimensionality. If any of these tests fail, an error message will print, and the tool will exit.
Changing the value of an existing column
dmtcalc cannot be used to change the value of an existing column. Rather, users can create a new column with the same name as an existing column though they need to explicitly copy the values (if desired). For example
expression="DY=DY"
Will create a new column "DY" in the output file and the values of the new column will be set equal to the values of the input column. Similarly
expression="DZ=(int)DZ"
Will create a new DZ column that matches the datatype of the expression, in this case an integer datatype. The original values will be truncated and saved in the new output column.
Any/all other column properties are not copied including: column description, column units, WCS, NULL values, etc. As with all new columns, they are added after all the existing columns in the file so the order of the columns is not preserved.
Overwriting times in the GTI blocks
dmtcalc correctly changes the start and stop columns in the file (and thus the order can change). However, it then proceeds to copy the data subspace from the input to the output file and so the values are overwritten with the original values. To override tstart/tstop, they need to be put into a new extension:
infile="file.fits[gti][new_gti]"
Then use the reverse notation when using as a filter:
infile="file_2.fits[@file.fits[new_gti][gti]]"
Examples
Example 1
dmtcalc evt2.fits evt2_edit.fits expression="dtime=(time-TSTART)*1.0e-3"
This time we have used dmtcalc to create a new column called "DTIME" in the output file evt2_edit.fits (new or changed columns always have their names written in upper case in the output file, however they are written in the expression).
The contents of the DTIME column are set equal to
(time-TSTART)*1.0e-3
where TSTART is replaced with the value of the TSTART keyword from the header. The case of header keywords is unimportant, so
(time-tstart)*1.0e-3
would give the same result.
For this particular example DTIME can be considered to be the time since the start of the observation in kilo-seconds. In general the expressions used in these examples are just used to illustrate dmtcalc's capabilities and so should not be taken to have any real meaning for Chandra (or any other) data. More details of the allowed expressions are given in "ahelp dmmath".
Example 2
dmtcalc LV.newer"[cols length,diameter,launch_mass]" calc.out.txt"[opt kernel=text/simple]" expr="result=diameter*length*(diameter/launch_mass)"
Run dmtcalc on a fixed-format text file, creating a new column named "result"; see "ahelp dmascii" for more information on working with ASCII files.
Example 3
dmtcalc evt2.fits evt2_edit.fits expression="energy=((float)pha)*0.1"
Here we use dmtcalc to over-write the values in the energy column to be ten percent of the PHA values. Since the PHA values are stored as integers we convert them to floating-point numbers - by use of the term "(float)" - before the division.
If we had omitted the "(float)" term, the output would still have been automatically converted to a real since we are multiplying by a real number. It would, however, have changed the ENERGY column to Real8 (ie double) format instead of keeping the Real4 (float) format.
The name of the energy column in the output file (here "evt2_edit.fits") will be in upper case - whatever its case in the input file - since its values have been changed by dmtcalc.
Since "expression" is a required (automatic) parameter, you may be tempted to enter something like:
unix% dmtcalc evt2.fits evt2_edit.fits "energy=((float)pha)*0.1"
This will fail because the parameter parser interprets this as setting a (nonexistent) dmtcalc parameter named "energy" to "((float)pha)*0.1".
Example 4
dmtcalc evt2.fits evt2_edit.fits expression="newpha=(pha+(long)(1.25e-3*(time-TSTART)))"
The contents of the NEWPHA column are set equal to the PHA column added to the integer part of
1.25e-3*(time-TSTART)
The data type of the output column is integer (Int4), since the PHA column is an integer and the time expression has been converted (also called cast) to an integer value (here of type "long").
Example 5
dmtcalc evt2.obs1,evt2.obs2 evt2.combined expression="newpha=(pha+(long)(1.25e-3*(time-TSTART)))"
This does the same as the previous example except that the input is now two files - evt2.obs1 and evt2.obs2. There is only one output file - evt2.combined - which contains the NEWPHA column. The row order of the output image is determined by the order of the files in the input stack: the first files contents, then the second file, etc until all the input files have been processed. The dmsort tool can be used to change the row order of the output file if required.
Example 6
dmtcalc lc.orig lc.smoothed expression="rate=(rate:9-0.123)"
Here the file lc.orig (which is assume to be a lightcurve so contains a rate column) is converted to lc.smoothed in which the RATE column is equal to the smoothed - by 9 rows - value of the rate and then a dc offset (0.123) is subtracted from it. The value for the dc offset would have to be calculated prior to running the program - e.g with dmstat as shown in the following csh/tcsh example
unix% dmstat "lc.orig[cols rate]" sigma- unix% set dc=`pget dmstat out_mean` unix% dmtcalc lc.orig lc.smoothed expression="rate=(rate:9-$dc)"
Example 7
dmtcalc evt2_bary.fits evt2_phase.fits @calcphase.lis
The expression can also be stored in an ASCII file and accessed as a stack. Here we use the contents of calcphase.lis to create an output file in which two new columns have created - PHASE and GPHASE - which contain phase information for the events. The contents of the stack file are:
unix% cat calcphase.lis .dtime=(time-TSTART) GPHASE=.dtime*19.794885 PHASE=GPHASE-(long)GPHASE
We have also included the use of a temporary variable (here .dtime) even though it is not really necessary for this particular example.
The results are only valid if the frequency of the source is 19.794885 Hz (and it's period is not changing, and you have applied the barycenter correction to the events file). For more details on using dmtcalc to add phase information to an event file see the Phase Resolved Spectroscopy thread on the CIAO website.
Example 8
dmtcalc evt2_bary.fits evt2_phase.fits expression=".dtime=(time-TSTART),GPHASE=.dtime*19.794885,PHASE=GPHASE-(l ong)GPHASE"
Here we repeat the previous example but give the expression directly - i.e. without the use of a stack file.
Example 9
dmtcalc evt2.fits evt2_edit.fits @randpha.lis
unix% cat randpha.lis newpha= pha + (long) \ ( 1.25e-3*(time-TSTART) + 10*(#rand(1034)-0.5) )
Here we define the expression in an external file and use the "\" character to indicate a continuation line. The contents of the NEWPHA column are set equal to the PHA column added to the integer part of
1.25e-3*(time-TSTART) + 10*(#rand(1034)-0.5)
where #rand(1034) means a random number between 0 and 1 where 1034 is used to seed the random number generator. This seeding only happens once - ie it is not re-set for each row of the table.
The result is similar to the earlier example, except that a random element of +/-5 has been added onto the calculation.
Example 10
dmtcalc srclist.wav srclist_filt.wav expression="if(psfratio==#nan)then(psfratio=0.0)"
Here we use the support for conditional expressions to filter out the NaN values from the PSRATIO column of srclist.wav, replacing any such values by 0.0.
In order for the expression "if(a)then(b)" to work correctly, the same column must be used in both sections of the comparison.
Example 11
dmtcalc srclist.wav srclist_filt.wav expression="scol=shape~='cir'"
This expression creates the column SCOL which contains boolean values: TRUE (1) or FALSE (0). The row entry is true if the entry shape column contains the string "cir" and false otherwise. The "~=" operator returns true if the string on the left-hand side contains the string given on the right-hand side.
Parameters
name | type | ftype | def | min | max | reqd | stacks |
---|---|---|---|---|---|---|---|
infile | file | input | yes | yes | |||
outfile | file | output | yes | ||||
expression | string | input | yes | yes | |||
clobber | boolean | no | |||||
verbose | integer | 0 | 0 | 5 | no |
Detailed Parameter Descriptions
Parameter=infile (file required filetype=input stacks=yes)
Input file to be operated upon. A copy will be made of this file, with any additions/modifications indicated by the expression parameter.
Unless an extension is specified when the file is input, the first table extension will be used. A stack of files may be input, but they must all have the same structure - ie have the same extension name and contain the same columns - since the columns in the first file are assumed to exist - in the same order - in all subsequent files. The behaviour of the tool is not guaranteed if this condition does not hold.
When a stack is supplied, the input files are processed in the order they are supplied to create one output file. The dmsort tool can be used to change the order of rows in the output file.
Parameter=outfile (file required filetype=output)
Name of output file to create.
The output file will have the same extension name as that of the first input file. All the keywords and subspaces in the first file will be copied to the output, in addition to any new columns defined in the expression parameter.
Parameter=expression (string required filetype=input stacks=yes)
A stack of expressions to be evaluated for each row in the input file.
If many expressions are desired, then they should be put into an ascii file, and input as @filename. There is no limit on the number of expressions that can be evaluated. When providing expressions on the command line, note that spaces may be interpreted as stack separators, which can produce unexpected error messages. See "ahelp dmmath" for a description of the syntax allowed here.
Parameter=clobber (boolean default=no)
Clobber the output file?
Parameter=verbose (integer not required default=0 min=0 max=5)
Level of verbose information printed during processing.
Creating Vector Columns
In order to make a new vector column with dmtcalc, it must be used in conjunction with other CIAO tools. Here is a method for creating a vector column "skymod(xmod,ymod)":
First, create the skymod column and generate a new output file (temp1.fits), then filter the original sky column out of that file (temp2.fits):
dmtcalc old.fits"[cols x,y]" temp1.fits expr="xmod=...,ymod=..." dmcopy temp1.fits"[cols -sky]" temp2.fits
The Data Model filter "[cols x,y]" indicates that only those two columns should be included in the new output file. This means that there will be only one vector and one MTYPE/MFORM pair. Update these keywords with the new vector and component names:
dmhedit temp2.fits filel="" op=add key=MTYPE1 value='skymod' dmhedit temp2.fits filel="" op=add key=MFORM1 value='xmod,ymod'
Now paste the rest of the columns from the old file to the new one:
dmpaste old.fits temp2.fits new.fits
Whitespace in Expressions
When an expression is given on the command line - so not as a stack file - then spaces are important. This is because spaces are treated as stack separators - as discussed in "ahelp stack" - which is unlikely to be the desired behaviour here.
So, setting
expression="energy= (float) pha / 10.0"
will not work. You have to say either
expression="energy=(float)pha/10.0"
- i.e. remove all the spaces - or surround the expression in ()'s, since this stops the checks for stack separators. This means that
expression="energy=( (float) pha / 10.0 )"
is a valid expression.
An alternative solution is to write the expression in a text file ("expr.lis" say) and then set
expression="@expr.lis"
since the white space in stack files is not important.
Changes in CIAO 4.17
-
Fix for "~=" (substring) operator bug causing memory corruption with newer compilers.
Bugs
- log() of integer values
Using the log or log10 operator on integer columns returns an integer value. The value is truncated.
-
Workaround:
If unsure of the datatype, the value can be explicitly cast to a real-valued number
newcol=log((float)oldcol) newcol=log(oldcol*1.0)
- When using status bits in an expression, only the last condition is checked.
When using status bits in an expression such "status==X1F,X3T,X18F", only the last condition (here, "X18F") is checked.
-
Workaround:
Set the logic explicitly:
- logical "AND": status=((bits==X1T)&&(bits==X2T))
- logical "OR": status=((bits==X1T)||(bits==X2T))
- The tool does not follow the usual order of operations of mathematics when evaluating an expression.
-
For example, the expression
x+814*24*3600-260086780.04
is treated as
x+(814*24*(3600-260086780.04))
instead of the expected
x+(814*24*3600)-260086780.04
-
Workaround:
Include the parentheses, as shown in the last code snippet, to force the correct order of operations.
- When creating a new double vector column, the results are incorrect if one (but not both) of the values in the expression are integers, even if it is cast as a double.
-
For example, this file will work:
unix% cat ok.expr (r=(double){3.5,4.}) unix% dmlist output_ok.fits'[cols r]' data --------------------------------------------- Data for Table Block HISTOGRAM --------------------------------------------- ROW R[2] 1 [ 3.50 4.0] 2 [ 3.50 4.0] 3 [ 3.50 4.0] (etc.)
while this does not:
unix% cat not_ok.expr (r=(double){3.5,4}) unix% dmlist output_not_ok.fits'[cols r]' data ------------------------------------------------- Data for Table Block HISTOGRAM ------------------------------------------------- ROW R[2] 1 [ 3.50 0] 2 [ 3.50 0] 3 [ 3.50 0] (etc.)
- Virtual columns cannot be used in an expression
-
For example, when using an event list, one cannot say
unix% dmtcalc evt1.fits foo.fits exp='dra=(ra-ra_nom)'
because the 'ra' column is not a real column in the file; see the virtual columns dictionary entry for an explanation.
-
Workaround:
Use Data Model column renaming:
unix% dmtcalc evt1.fits"[cols foo=ra,*]" foo.fits expr='dra=(foo-ra_nom)'
- Units are not preserved, even for simple calculations.
If you calculate npha as npha=pha; which should have units of PHA [adu]; but it will be unitless. Even more confusing is pha=pha will be unitless since pha in the output file is treated a new column.
- The speed basically scales as O(N^2): small files are not so bad, but big files slow down exponentially.
- Problems accessing individual elements in a vector and/or array column
-
There are various problems doing things like:
pha[9]=1; foo=pha[9];
where you try to access the individual elements in a vector and/or array column.
- Changing the value of an existing column
There are issues when changing values of existing columns: what info is kept, what is thrown away (e.g. data types, null, units, descriptions, order in file). In particular there are various problems with arrays vs. vector vs. vector-array columns.
- Booleans do not work on array columns
For example, you cannot do above=(phas>20) since the PHAS array contains 9 (or 25) values. Currently only the first value in the array is checked.
- Using vector components in a complicated expression may cause dmtcalc to hang. (01 Jun 2006)
-
For example:
expression="dist=((4125-x)^2)-((4025-y)^2)"
-
Workaround:
Refer to the component by vector notation instead:
expression="dist=((4125-sky[0])^2)-((4025-sky[1])^2)"
dmtcalc tries to do this subsitution internally in the first example, but fails.
- dmtcalc cannot be used to modify values of vector columns.
-
Vector columns such as SKY(X,Y), POS(X,Y), CHIP(CHIPX,CHIPY), RD(TG_R,TG_D), cannot be modified using dmtcalc.
Internally, dmtcalc treats vector columns as arrays. So SKY(X,Y) becomes SKY[0] and SKY[1] which is what is written on output if the values are set. For example
unix% dmtcalc wav_srclist.fits modified_srclist expression="pos=pos" unix% dmlist wav_srclist.fits | grep POS 5 POS(X,Y) pixel Real8 2288.50: 4900.50 Physical coordinates 5: EQSRC(RA_SRC ) = (+230.1725)[deg] +TAN[(-0.000136667)* (POS(X)-(+4096.50))] ciao:/data/da/Docs/ciaoweb/ciao47/bugs% unix% dmlist modified_srclist | grep -i POS 26 POS[2] Real8(2) -Inf:+Inf User defined column
shows that just copying the values in the POS vector column creates a POS array.
-
Workaround:
Users can modify vector columns using the Python Crates package
- The valid range of vector columns is not preserved in the output file, even when not used or modified.
-
When the input file contains a vector column, the legal range of values stored in the raw FITS TLMIN/TLMAX keywords associated with each raw column are not preserved. The MIN/MAX for each raw column in the vector are set equal to the 1st column.
So in this example, a wavdetect source list with the vector column POS(X,Y) (which are columns number 5 and 6), have different ranges:
unix% dmlist wav_src.fits header,clean,raw | grep 'TLM..[56]' TLMIN5 = 2028.50000000 / TLMAX5 = 5970.50000000 / TLMIN6 = 1800.50000000 / TLMAX6 = 5630.50000000 /
After running dmtcalc, the values are different. The values for the Y column have been changed to be equal to the range of the X column.
unix% dmtcalc wav_src.fits new_col expression="newcol=#trand" unix% dmlist new_col header,clean,raw | grep 'TLM..[56]' TLMIN5 = 2028.50000000 / TLMAX5 = 5970.50000000 / TLMIN6 = 2028.50000000 / TLMAX6 = 5970.50000000 /
These column ranges are used when the file is filtered, which can produce incorrect results.
Caveats
- Output columns with same name as input require special consideration.
-
For example, The expression "if(a)then(b)" does not work as expected in dmtcalc if two different columns are used in the comparison.
unix% dmtcalc in.fits out.fits "expr=if(pi>500)then(pi=-1)"
works as expected ("pi" used for both pieces of the conditional), but
unix% dmtcalc in.fits out.fits "expr=if(energy>5000)then(pi=-1)"
does not work. This is because the pi value in the expression is not always initialized to the input. Instead you must explicitly initialize it
unix% dmtcalc in.fits out.fits "expr=pi=pi;if(energy>5000)then(pi=-1)"
- The #nan can be use for float-point NaN checks, but integer NULL values cannot be checked the same way.
See Also
- concept
- dmmath
- dm
- dmfiltering, dmopt
- tools::core
- dmappend
- tools::image
- dmimgcalc
- tools::table
- dmgroup, dmjoin, dmmerge, dmpaste, dmsort, dmtabfilt, dmtype2split
- tools::timing
- dmgti, mtl_build_gti