Synopsis
Using the Data Model with text files
Description
CIAO users are familiar with the flexible filtering and binning capability that the Data Model tools provide with FITS files. Since CIAO 4.0, the same tools also work on ASCII (text) files, using the 'ASCII kernel'.
This kernel enables easy text file manipulation by DM-specific tools such as dmlist, dmcopy, dmstat, and dmtcalc. However, other CIAO tools (e.g. aconvolve) are not guaranteed to work smoothly with ASCII files.
For example, a raw text file containing a table with three columns and four rows:
unix% cat sample.dat 21.0 41.3 21.8 22.0 41.1 20.2 23.0 43.8 17.3 24.0 12.3 11.1
then we can use this file with many CIAO tools; for instance
unix% dmlist sample.dat cols -------------------------------------------------------------------------------- Columns for Table Block sample.dat -------------------------------------------------------------------------------- ColNo Name Unit Type Range 1 col1 Real8 -Inf:+Inf 2 col2 Real8 -Inf:+Inf 3 col3 Real8 -Inf:+Inf
You can use the full CIAO virtual-file syntax to filter these files; for example to display only those rows of the last two columns where the third column is between 11 and 20 you could say:
unix% dmlist "sample.dat[col3=11:20][cols col2,col3]" data,clean # col2 col3 43.80 17.30 12.30 11.10
This command may be repeated with dmcopy to create an output file of the filtered data in FITS or text format:
unix% pset dmcopy infile="sample.dat[col3=11:20][cols col2,col3]" unix% dmcopy outfile=filtered.fits unix% dmcopy outfile="filtered.txt[opt kernel=text/simple]"
Additional examples are included later is this document. Note that the DM creates FITS format output by default; the kernel option must be specified every time to make the output be a text file.
Supported Text Formats
There are currently four formats recognized by the ASCII kernel:
text/raw
Raw text table format consists of free-format columns with no header information. There are only two supported datatypes for values in this format: dmDOUBLE and dmTEXT. All columns are scalar and are given the default names "col1", "col2", etc.
text/simple
The simple format is compatible with the SM plotting program. It is similar to raw, but allows for the inclusion of header information provided as a series of comment lines. Each line of the header must begin with the comment character, (see "comment" option). Either the first or last line of this header block, may be used to specify the table column names, (see "colnames" option). In its briefest form, the header consists of a single line of column names.
text/dtf or text/dtf-fixed
Data Text Format (DTF) is a pseudo-FITS format with support for the full list of datatypes, header keywords and data subspaces. Free format is the default, but fixed-format fields are also supported. This format MUST be used in order to define an image.
text/tsv
Generic TSV format files are recognized, as well as the extended header detail provided by the Chandra Source Catalog (CSC) output format. TSV flavor can not be auto-determined, it must either be specified by this kernel syntax or by putting the TSV flavor specification line at the top of the file using the syntax:
#TEXT/TSV
On input, the particular ascii format will be auto-determined. The user may override this by including the "kernel" option on the input file name. Since the default output kernel is FITS, the user MUST specify the output ascii flavor by including the "kernel" option in the output file name.
Ascii Kernel Options
There are several options that may be used to tailor the interface for a particular text file. You can use these options to read text files with a slightly different structure from the default, for example by skipping header lines or changing the field separator. Multiple options should be provided as a comma-separated list.
[opt colnames={value}]
Specifies which header line defines the column names. This line begins with a comment character, followed by a list of names delimited by the same character as the data. Supported values are "first", "last", and "none". If the "none" value is given, columns are defined from the data and auto-named "col1", "col2", etc. .
Column names may contain alphanumeric characters as well as the hash (#), underscore (_) or dash (-) character, although the latter should be avoided if possible. One may define array and vector columns by using the following name syntax:
Syntax | Description | Example |
---|---|---|
name(cpt1,cpt2) | Vector of 2 components | POS(X,Y) |
name[size] | Array of length n | PHAS[3] |
[opt comment={value}]
Lines that begin with the given character (e.g. "#") will be treated as comments. For text/raw format files, all comment lines are ignored. For text/simple format, comment lines which occur prior to the first data line will be retained, while any occuring within the data segment will be ignored. There is one special comment line which the "colnames" option controls, as described above.
[opt nullstr={value}]
On input, this value specifies an arbitrary string which should be interpreted as representing a NULL value. This is in addition to the 'default' NULL values for each datatype:
Type | Values |
---|---|
INTEGER | {empty}, {tnull}, -, INDEF, INF |
REAL | {empty}, -, NaN, INDEF |
STRING | {empty} |
On output, this string will be used to represent all NULL values.
[opt skip={value}]
Skip the given number of lines at the beginning of the file. This helps handle some formats with fixed headers. For example, '[opt skip=3]' will skip the first three (3) lines of the file.
[opt sep={value}]
Define the given character(s) to be the separator for data fields. Any printing ASCII character from space (' ') to tilde ('~') may be a separator character, except the single quote ('), double quote ("), and backslash (\) characters. In addition, the non-printing tab character (HT) may be used (specified as '\t'). If more than one character is to be used, or if the space or comma character is used, the list of separators must be enclosed by quotes. Each instance of the character is interpreted as a new field.
Examples
Syntax | Description |
---|---|
[opt sep=:] | colon delimited values |
[opt sep=" "] | space delimited values |
[opt sep=":;"] | values delimited by EITHER colon or semi-colon |
[white] qualifier
If the "white" qualifier is included, the separator is treated as whitespace. This means that if you have multiple separator characters next to each other, they only count as one separator.
The defaults for each output format are:
Option | Raw | Simple | DTF |
---|---|---|---|
colnames | none | first | n/a |
comment | '#' | '#' | n/a |
nullstr | "",NaN | "",NaN | "",NaN |
skip | 0 | 0 | 0 |
sep | ' \t\r' | ' \t\r' | ' \t\r' |
white | on | on | on |
Limitations
The ASCII kernel was developed to allow CIAO users to use the familiar DM syntax in manipulating and filtering text files; it is not intended as a replacement for the FITS kernel in pipelines. The following are some limitations of the kernel:
- While all the basic Data Model tools should work with text file input, it has not been fully tested with other CIAO tools (e.g. csmooth), and some may not be compatible with the ASCII kernel.
- Region filtering on Ascii images is not yet supported.
- To create a dataset with more than one block or write header keywords, you MUST use the DTF flavor. Therefore, the simple and raw formats cannot be used with any CIAO tools which require multiple blocks and header keywords.
Examples
Example 1
unix% dmlist "input.txt[time=100:1000,energy=10:20]" data
Filter a text file on time and energy, printing the filtered data to the screen.
Example 2
unix% dmcopy input.fits "output.txt[opt kernel=text/simple]"
Copying a FITS file to a simple text output file that may be used by other code.
Example 3
unix% dmcopy input.txt output.fits
Create a FITS file from a text file.
Example 4
unix% dmcopy input.txt "output.txt[opt kernel=text/dtf-fixed]"
Convert simple text table to fixed format DTF table.
Example 5
unix% dmcopy "data.txt[time=100:200][opt sep=@]" "filtered.out[opt kernel=text,sep=&]"
Copy the filtered input data to the output file, changing the separator character from "@" to "&". The output could then be used to create a table in LaTeX.
Example 6
unix% dmextract "event.fits[bin pi]" "pha.txt[opt kernel=text/dtf]" type=pha1 text_spectrum_program pha.txt > new.pha.txt unix% dmsort new.pha.txt sort.pha.fits
Create a Type 1 PHA file in DTF text format, propogating the full header. Use an external text-based tool, then run dmsort. The final file is again in FITS format.
Example 7
unix% dmtcalc LaunchData.txt"[cols length,diameter,launch_mass]" calc_results.txt"[opt kernel=text/simple]" expr="result=diameter*length*(diameter/launch_mass)"
Run dmtcalc on a fixed-format text file, creating a new column named "result".
Example 8
unix% dmcopy "events.txt[bin energy=500:700:10]" "myout.img[opt kernel=text/dtf]"
Bin an ASCII table file into an ASCII image file.
Sample files
Below are basic samples of each of the ASCII formats.
Table: text/raw
alpha 167413425.5456684232 319 434 2119.869 4193.105 beta 167413425.5456684232 219 532 2020.054 4096.020 gamma 167413425.5867084265 607 483 3448.656 4140.115 delta 167413427.1456484795 420 331 4306.355 4289.419
Table: text/simple
#name time tx ty sky(x,y) alpha 167413425.5456684232 319 434 2119.869 4193.105 beta 167413425.5456684232 219 532 2020.054 4096.020 gamma 167413425.5867084265 607 483 3448.656 4140.115 delta 167413427.1456484795 420 331 4306.355 4289.419
Table: text/dtf
XTENSION='TABLE' HDUNAME = "MyINFO" TFIELDS = 7 TTYPE1 = "name " TFORM1 = "10A " / data format of field. TTYPE2 = "time " / Time of event TFORM2 = "1D " / data format of field. TTYPE3 = "tx " / Tile position - X TFORM3 = "1I " / data format of field. TTYPE4 = "ty " / Tile position - Y TFORM4 = "1I " / data format of field. TTYPE5 = "x " / Sky position - X TFORM5 = "1E " / data format of field. TTYPE6 = "y " / Sky position - Y TFORM6 = "1E " / data format of field. TTYPE7 = "useFlag " / Record usable? TFORM7 = "1L " / data format of field. TTYPE8 = "status " / Event status bits TFORM8 = "3X " / data format of field. MTYPE1 = sky MFORM1 = x,y END alpha 167413425.5456684232 319 434 2119.869 4193.105 T 001 beta 167413425.5456684232 219 532 2020.054 4096.020 T 010 gamma 167413425.5867084265 607 483 3448.656 4140.115 F 011 delta 167413427.1456484795 420 331 4306.355 4289.419 T 100
Image: text/dtf
XTENSION='IMAGE' HDUNAME = "MyIMG" BITPIX = 16 NAXIS = 2 NAXIS1 = 4 NAXIS2 = 3 MTYPE1 = "sky " MFORM1 = "x,y " END 1 2 3 4 5 6 7 8 9 10 11 12
Bugs
See the bugs page for the Data Model library on the CIAO website for an up-to-date listing of known bugs.
Refer to the CIAO bug pages for an up-to-date listing of known issues.