Last modified: 13 Jan 2022

URL: https://cxc.cfa.harvard.edu/ciao/threads/dmascii_basic/

Using ASCII Files in CIAO

CIAO 4.17 Science Threads


Overview

Synopsis:

CIAO users are familiar with the flexible filtering and binning capability that the Data Model tools provide with FITS files. The same tools also work on ASCII (text) files containing tables via the "ASCII kernel."

The ASCII kernel allows easy text file manipulation by the tools dmlist, dmcopy, dmstat, and dmtcalc. The majority of the other DM-specific tools (e.g. dmlist, dmstat) also support ASCII input; refer to the Limitations section of this thread for exceptions.

Related Links:

Last Update: 13 Jan 2022 - Review for CIAO 4.14. No changes.


Contents


Filtering Data

If a basic text file - sample.dat - contains a table with three columns and four rows:

 21.0  41.3  21.8
 22.0  41.1  20.2
 23.0  43.8  17.3
 24.0  12.3  11.1

then dmlist may be used to select a range of data from two of the columns:

unix% dmlist "sample.dat[col3=11:20][cols col2,col3]" data,clean

#  col2                 col3
                43.80                17.30
                12.30                11.10

By default, unnamed columns are referred to as "col1", "col2", etc. If column names are provided in the file, they may be used in the filter:

unix% cat input.txt
# ROW    time                 ccd_id energy               pi

     1  87.5272969157    0       14707.57031250   1008
     2  87.5272969157    0     13968.8378906250    957
     3  87.5272969157    0       15152.52343750   1024
     4  87.5683369190    7       268.5079650879     19
     5  87.5683369190    7      1101.3159179688     76
     6  87.5683369190    7      2045.5782470703    141
...

unix% dmlist "input.txt[time=10:100][cols energy]" data

--------------------------------------------------------------------------------
Data for Table Block input.txt
--------------------------------------------------------------------------------

ROW    energy

     1       14707.57031250
     2     13968.8378906250
     3       15152.52343750
     4       268.5079650879
     5      1101.3159179688
     6      2045.5782470703
     7     13929.0205078125
     8      3547.0227050781
     9      1672.4017333984
...

ASCII to FITS; FITS to ASCII

The DM creates FITS format output by default; the kernel option must be specified every time to make the output be a text file.

To copy a FITS file to a simple text output format, e.g. to be used by another piece of code:

unix% dmcopy input.fits "output.txt[opt kernel=text/simple]"

To create a basic FITS file from ASCII input:

unix% dmcopy input.txt output.fits

DM filter syntax may be included when creating a new file. Here dmcopy is used to create an output file of filtered data in FITS or text format:

unix% dmcopy "sample.dat[col3=11:20][cols col2,col3]" filtered.fits

unix% dmcopy "sample.dat[col3=11:20][cols col2,col3]" "filtered.txt[opt kernel=text/simple]"

About the Kernel

Output Text Formats: kernel option

There are four output formats allowed with the ASCII kernel: text/raw, text/simple, text/dtf, and text/tsv. The output is specified by including the "kernel" option in the output file name.

  • [opt kernel=text/raw]

    Simple text table format consists of free-format columns with no header keywords. The format understands only two datatypes: numbers (treated as double precision) and text strings. All columns are scalar and are given the default names "col1", "col2", etc.

  • [opt kernel=text/simple]

    This format is similar to text/raw, but has an optional header defining the column names. In its simplest form, the header consists of a single line of whitespace-separated column names preceded by the comment character; see the "comment" and "colnames" options below. The text/simple option is compatible with the SM plotting program.

  • [opt kernel=text/dtf]

    Data Text Format (DTF) is a pseudo-FITS format with support for headers and data subspaces. Free format tables are the default, but fixed-format fields are also supported, as described in the Using the ASCII Kernel Manual (PS).

  • [opt kernel=text/tsv]

    Generic TSV format files are recognized, as well as the extended header detail provided by the Chandra Source Catalog (CSC) output format. TSV flavor can not be auto-determined, it must either be specified by this kernel syntax or by putting the TSV flavor specification line at the top of the file (#TEXT/TAB-SEPARATED-VALUES).

    Note that some of the additional header info (e.g. UCD) from the CSC format will be lost, since the DM does not yet support these concepts.


Additional Options

There are several other options that may also be used to qualify a text file. Multiple options are specified as a comma-separated list. You can use these options to allow CIAO to read tables in text files with slightly different formats from the default, for example by skipping header lines or changing the field separator.

[opt sep=:], [opt sep=:,white], [opt sep=":;"]

Define the given character (e.g. ":", used here, or "/") , to be the separator for data fields. The "sep" option defines each instance of the character as a new field. This example represents four fields, with the second one being empty:

14.1::23.2:15.1

If the "white" qualifier is included, the separator is treated as whitespace. This means that if you have multiple separator characters next to each other, they only count as one separator. The same example - "14.1::23.2:15.1" - then represents only three fields.

More than one character may be defined as the separator. For instance, [opt sep=":;"] defines both ":" and ";" as separators. The only printable characters which may not be used are single quote ('), double quote ("), and backslash (\).

[opt skip=3]

Skip the given number of lines (e.g. 3) at the beginning of the file. This helps handle some formats with fixed headers.

[opt comment=#]

Lines that begin with the given character (e.g. "#", the default) prior to the first data line will be treated as comments. There is one special comment line which the "colnames" option controls, as described in the next item.

[opt colnames=first]

The first comment-character line is treated as a space-separated list of column names. The value "first" is the default; other possible values are "last" (the last comment-character line prior to the first data line) and "none" (none of the lines are treated as a colnames definition).

[opt nullstr="",NaN]

When specified on an input file, the nullstr value specifies an arbitrary string which represents a NULL value. This is in addition to the 'default' NULL values for each datatype:

  • INTEGER: {empty}, {tnull}, -, INDEF, INF
  • REAL: {empty}, -, NaN, INDEF
  • STRING: {empty}

When specified on an output file, the string will be used to represent all NULL values.

The defaults for each output format are:

  • text/raw - comment='#',sep=" \t\r",white,colnames=none,skip=0,nullstr='"",NaN'
  • text/simple - comment='#',sep=" \t\r",white,colnames=first,skip=0,nullstr='"",NaN'
  • text/dtf - sep=" \t\r",white,colnames=none,skip=0,nullstr='"",NaN'

Limitations

The ASCII kernel was developed to allow CIAO users to use the familiar DM syntax in manipulating and filtering text files; it is not intended as a replacement for the FITS kernel in pipelines.

The following are some limitations of the kernel. Refer also to the ASCII Kernel section of the Data Model bugs pages for known bugs.

  • The kernel does not always work well with other tools, e.g. dmextract, acis_process_events, etc.

  • To create a dataset with more than one block or write header keywords, you have to use the DTF flavor. Therefore, the simple and raw formats cannot be used with any CIAO tools which require multiple blocks and header keywords.

  • Files larger than 2 GByte are not supported in the ASCII kernel.

  • Header lines longer than 1024 bytes are not supported. Data lines may be arbitrarily long.


History

14 Dec 2007 new for CIAO 4.0
02 Jan 2008 updated for CIAO 4.1: added nullstr to the Additional Options section; images are now supported (removed item from Limitations section)
25 Jan 2010 reviewed for CIAO 4.2: no changes
11 Jan 2011 updated for CIAO 4.3: TSV format is now supported
03 Jan 2012 reviewed for CIAO 4.4: no changes
03 Dec 2012 Review for CIAO 4.5; no changes
25 Nov 2013 Review for CIAO 4.6. No changes.
17 Dec 2014 Reviewed for CIAO 4.7; no changes.
13 Jan 2022 Review for CIAO 4.14. No changes.