Datastack

Datastack is a CIAO Sherpa extension package for manipulating and fitting a stack of related datasets. It provides stack-enabled (i.e. vectorized) versions of the key Sherpa commands used to load data, set source models, get and set parameters, fit, and plot.

For X-ray spectral analysis this means that a group of related datasets, for instance 10 observations of the same source taken at different times, can be analyzed as if they were a single merged dataset. This has the important advantage of using the appropriate individual response functions while hiding the complexity (or tedium) of defining models and parameters for all the datasets.

Quick look

The example below shows what is required to simultaneously fit three spectra of a source taken from different epochs. Bear in mind that these commands would also work in the case of 20 or even 100 spectra, at which point the utility of datastack becomes more obvious.

from datastack import *
load_pha([], 'data/acis_1_pha3.fits')
load_pha([], 'data/acis_2_pha3.fits')
load_pha([], 'data/acis_3_pha3.fits')
subtract([])
group_counts([], 10)
notice([], 0.5, 7)
set_source([], xsphabs.gal * powlaw1d.powID)
link([], 'pow.gamma')
set_par([], 'gal.nh', 0.04)
fit([])
plot_fit_resid([])

That’s all it takes. To try this out go into the datastack installation source directory (Installation):

cd examples
sherpa

Then cut and paste the example commands and watch it go.

Download

The datastack package is available for download at http://cxc.harvard.edu/contrib/datastack/downloads. That directory also contains the example data files examples_data.tar.gz.

Installation

The datastack package includes a single module that must be made available to the CIAO python so that Sherpa can import it. The first step is to untar the package tarball, change into the source directory, and initialize the CIAO environment:

tar zxvf datastack-<version>.tar.gz
tar zxvf examples_data.tar.gz -C datastack-<version>/
cd datastack-<version>
source /PATH/TO/ciao/bin/ciao.csh

There are three methods for installing. Choose ONE of the three.

Best:

The following command installs the datastack module to $HOME/.local/lib/python2.7:

python setup.py install --user

Using the --user switch you can make a local repository of packages that will run within Sherpa and/or the system Python 2.7.

Also good:

If you have write access to the CIAO installation you can just use the CIAO python to install the modules into the CIAO python library. Assuming you are in the CIAO environment do:

python setup.py install

This puts the new modules straight in to the CIAO python library so that any time you (or others in the case of a system-wide installation) enter the CIAO environment then datastack will be available. You do NOT need to set PYTHONPATH. The only downside is that the module will likely need to be re-installed after a new CIAO release.

Quick and dirty:

An alternate and simple installation strategy is to just leave the module file in the source directory and set the PYTHONPATH environment variable to point to the source directory:

setenv PYTHONPATH $PWD

This method is fine in the short term but you always have to make sure PYTHONPATH is set appropriately (perhaps in your ~/.cshrc file). And if you start doing much with Python you will have PYTHONPATH conflicts and things will get messy.

Test

To test the installation change to the source distribution directory and do the following:

cd examples
sherpa
execfile("fit_spectra.py")

This should run through in a reasonable time and produce output indicating the simultaneous fit of four spectra.

Example: fit spectra

The script examples/fit_spectra.py shows an example of simultanously fitting four X-ray spectra with a power law and Galactic absorption:

from datastack import *

clear_stack([])

load_pha([], 'data/acis_1_pha3.fits')
load_pha([], 'data/acis_2_pha3.fits')
load_pha([], 'data/acis_3_pha3.fits')
load_pha([], 'data/acis_4_pha3.fits')

group_counts([], 15)
ignore([], None, 0.5)
ignore([], 7, None)
subtract([])

set_source([], xsphabs.gal * powlaw1d.powID)
link([], 'pow.gamma')
set_par([], 'gal.nh', 0.04)
freeze([], "gal.nh")

fit([])
print 'pow.ampl:', get_par([], 'pow.ampl')
plot_fit_resid([])

Let’s break this down.

from datastack import *

This line overrides a subset of the native Sherpa commands with the stack-enabled versions. The section Overloading the Sherpa commands describes this a little more carefully and shows why this isn’t so scary. The section Object-oriented datastacks shows how to avoid the import * which will bother Pythonistas.

Next the script clears any existing datasets in the data stack. This is a good idea if you are running a script over and over to debug. Without clearing the stack the subsequent load data commands will put another copy of all the datasets onto the stack.

clear_stack([])

Next we load a few datasets into the stack. This introduces the appearance of [] (a Python empty list) as the first argument of stack-enabled commands. It informs the command to operate on the stack of data instead of a single dataset like normal. This will load datasets and automatically assign id values of 1, 2, 3, and 4 respectively:

load_pha([], "data/acis_1_pha3.fits")
load_pha([], "data/acis_2_pha3.fits")
load_pha([], "data/acis_3_pha3.fits")
load_pha([], "data/acis_4_pha3.fits")

Now we tell Sherpa to group by counts, ignore data outside 0.5 to 7 keV, and subtract the background for each dataset in the stack. Underneath this just runs the corresponding native Sherpa command on each dataset.:

group_counts([], 15)
ignore([], None, 0.5)
ignore([], 7, None)
subtract([])

In this case we are using [] to apply the command to all datasets. But it is possible to specify subsets of the stack by providing a list of the dataset IDs, for instance subtract([1,2]) or ignore([3], None, 0.5).

Next is setting the source model for fitting. This is where the datastack module is doing more than simply iterating over the native Sherpa commands:

set_source([], xsphabs.gal * powlaw1d.powID)

In this example there is a common galactic absorption that applies for all datasets but we want an independent power law component for each. We use the usual Sherpa model syntax but insert the ID string in the model component name to tell set_source() to create an independent powerlaw component for each dataset. The above command is essentially equivalent to the following:

set_source(1, xsphabs.gal * powlaw1d.pow1)
set_source(2, xsphabs.gal * powlaw1d.pow2)
set_source(3, xsphabs.gal * powlaw1d.pow3)
set_source(4, xsphabs.gal * powlaw1d.pow4)

Now let’s say that we want to fit just a single powerlaw photon index gamma for the data stack but leave the normalizations independent. This is done by linking the gamma parameters:

link([], "pow.gamma")

Notice that the ID is only needed when setting the source model. In all the other commands that refer to model components there is no need for the ID (and in fact you cannot supply it).

Now we need to set the initial powerlaw index and then set the Galactic absorption and freeze it:

set_par([], "gal.nh", 0.04)
freeze([], "gal.nh")

Finally we fit, print some fit values, and plot the fit with residuals:

fit([])
print "pow.ampl:", get_par([], "pow.ampl.val")
plot_fit_resid([])

This brings up four plot windows with the fit residuals for each dataset in a separate window.

By default datastack is chatty about what it’s doing but you can disable this with:

set_stack_verbose(False)

Datastack details

The data stack

When the datastack module is imported it automatically creates a default (or “session”) data stack that is used whenever a stack-enabled command is called with a stack specifier as the first argument. The data stack itself is just a list of dataset ids plus a little extra information mostly pertaining to the source model expression.

Some non-Sherpa commands are available for data stack manipulation:

Command Description
clear_stack Clear the stack
show_stack Show synopsis for stack elements
get_stack_ids Get a list of the ids in the stack
set_stack_verbose Set verbose mode (False => quiet)

Non-default or multiple data stacks are supported using object-oriented datastacks.

Stack specifier

The datastack module overrides certain Sherpa commands so that they can accept a stack specifier as the first argument. A stack specifier is a Python list that gives the dataset ids within the stack on which to operate. An empty list [] implies operating on all elements of the stack. Examples of valid stack specifiers are:

[]        # All elements
[1,3,4]   # Dataset ids 1, 3, and 4 within stack
["pha1", "pha2", 3]  # Dataset ids can be strings

If an id is given that it is not a member of the stack then an error will be reported.

The stack specifier is what distinguishes between a stack command and a native Sherpa command. If the first argument to a function is a list then the stack-enabled version is called, otherwise the native Sherpa command is called. Within a Sherpa session these can be mixed freely:

subtract([2,3,4])  # Subtract bkg for ids=2,3,4 in stack
subtract(5)        # Subtract bkg for id=5 (native Sherpa)

This distinction is generally sufficient. Nevertheless some users may prefer to use object-oriented datastacks to keep everything tidy from the Python programmer perspective.

Data loading

A subset of Sherpa data loading commands are supported. These commands will execute the native Sherpa command and add the dataset to the stack. The stack specifier defines the id of the new dataset. If an empty list [] is given then the first available integer id will be used (starting from 1 and incrementing until an unused dataset id is found). Available commands are:

Command Description
load_arrays Load NumPy arrays into a dataset
load_ascii Load ASCII data by id
load_data Load spectrum, table, or ASCII data by id
load_image Load image data by id
load_pha Load PHA data by id

Model definition

The datastack module supports five model definition commands:

Command Description
set_model Set a Sherpa model by model id
set_source Set a Sherpa model by model id (alias)
set_bkg_model Set a bkg model by data id and bkg id
set_full_model Same as Sherpa set_full_model
set_bkg_full_model Same as Sherpa set_bkg_full_model

For each of these commands datastack extends the Sherpa model definition language to substitute the dataeset id for any instance the # symbol within the model component name. To support this extended syntax the model definition must be a quoted Python string, unlike the native Sherpa command that allows for a Python expression e.g. set_source(1, xsphabs.gal * powlaw1d.pow1).

Consider this example:

set_source([], "xsphabs.gal * powlaw1d.pow_ID")

Here there is a common model (gal) that applies for all datasets and an independent model pow_ID for each dataset. The above command is essentially equivalent to the following:

set_source(1, "xsphabs.gal * powlaw1d.pow_1")
set_source(2, "xsphabs.gal * powlaw1d.pow_2")
set_source(3, "xsphabs.gal * powlaw1d.pow_3")

Parameter manipulation

Datastack supports the following commands for manipulating model parameters:

Command Description
get_par Return a list of model parameters
set_par Set initial values for a model parameter
thaw Thaw a list of parameters
freeze Freeze a list of parameters
link Link a parameter with an associated value
unlink Unlink a parameter

The get_par and set_par commands are extended in datastack to accept any attribute of a model component and not just a parameter name. For instance:

set_source([], "gauss1d.gaussID")      # create model components "gaussID"
set_par([], "gauss.integrate", False)  # disable model integration for stack
set_par([], "gauss.fwhm.min", 2.0)     # set min for gauss.fwhm parameter
set_par([], "gauss.pos", 5.0)          # set gauss.pos parameter value
pars = get_par([], "gauss.pos")        # get a list of parameter objects
parvals = get_par([], "gauss.pos.val") # get the position values

The get_par command returns a list of the corresponding attribute values. The set_par command only accepts a single value and applies this to all elements in the stack. To set a parameter value to a corresponding list use code like the following:

parname = "gauss.pos"
parvals = [1.2, 3.0, 5.6]
for i in get_stack_ids():
    set_par([i], parname, parvals[i])

The link command has a different behavior in the datastack context. It only accepts a single parameter name and links the first stack dataset with all subsequent ones. Assume that the current data stack has 3 datasets with ids 1, 2, and 3. Then both of the commands:

link([1,2,3], "gauss.fwhm")
link([], "gauss.fwhm")

are equivalent to:

link("gauss2.fwhm", "gauss1.fwhm")
link("gauss3.fwhm", "gauss1.fwhm")

The ability within the native Sherpa link command to link a parameter to an arbitrary model expression is not currently supported within the datastack module.

Plotting commands

Plotting of data stacks is supported by opening a new plot window for each dataset in the stack and then running the native Sherpa command accordingly. The plot window will be identified by the dataset id.

The following plotting commands are supported. There is no datastack support for most ChIPS commands.

Command Description
plot_arf Plot ancillary response data
plot_bkg Plot background counts
plot_bkg_chisqr Plot background chi squared contributions
plot_bkg_delchi Plot background delta chi
plot_bkg_fit Plot background counts with fitted background model
plot_bkg_fit_delchi Send background fit and background delta chi plots to the visualizer
plot_bkg_fit_resid Send background fit and background residuals plots to the visualizer
plot_bkg_model Plot background convolved model
plot_bkg_ratio Plot background ratio
plot_bkg_resid Plot background residuals
plot_bkg_source Plot the unconvolved background model
plot_chisqr Send a chi^2 plot to the visualizer
plot_data Send a data plot to the visualizer
plot_delchi Send a delta chi plot to the visualizer
plot_energy_flux Send a energy flux distribution to the visualizer
plot_fit Send a fit plot to the visualizer
plot_fit_delchi Send fit and delta chi plots to the visualizer
plot_fit_resid Send fit and residuals plots to the visualizer
plot_model Send a model plot to the visualizer
plot_order Plot convolved source model by multiple response order
plot_psf Send a PSF plot to the visualizer
plot_ratio Send a ratio plot to the visualizer
plot_resid Send a residuals plot to the visualizer
plot_source Plot unconvolved source model

Other commands

The following commands are also available within datastack. These simply apply the native command to each of the stack datasets, passing along any arguments and returning a list corresponding the return value for each dataset.

Command Description
get_bkg Return an background PHA dataset by data id and bkg_id
get_bkg_model Return the background convolved model by data id and bkg id
get_source Return a Sherpa model by model id
group_adapt Create and set grouping flags adaptively so that each group contains
group_adapt_snr Create and set grouping flags adaptively so that each group contains
group_bins Create and set grouping flags by number of bins with equal-widths
group_counts Create and set grouping flags using minimum number of counts per bin
group_snr Create and set grouping flags so each group has a signal-to-noise
group_width Create and set grouping flags by a bin width.
ignore Exclusive 1D ignore on interval(s) for all available
notice Exclusive 1D notice on interval(s) for all available
subtract Subtract background counts

This list is a small subset of Sherpa commands where stacking could be utilized. Most commands which can take a dataset id as the first argument are likely to function properly within the datastack context. The create_stack_func is expected in a near-term release to create a stacking-enabled version of any appropriate Sherpa command.

Object-oriented datastacks

The Datastack functionality is largely implemented with a single DataStack class. Users may work with a DataStack object instead of the session-based syntax shown in most of the examples.

Partial object-oriented

The first level up from the pure session-based syntax (where a stack specifier is a Python list) is to explicitly create the datastack as an object and consistently use that object as the stack specifier:

from datastack import *
ds = DataStack()
load_pha(ds, 'data/acis_1_pha3.fits')
load_pha(ds, 'data/acis_2_pha3.fits')
group_counts(ds, 20)
ignore(ds[3,4], 6.0, None) # Ignore for datasets 3, 4

This is illustrated in more detail in examples/partial_object.py.

Fully object-oriented

The next level is to use a fully object-oriented approach:

import datastack
ds = datastack.DataStack()
ds.load_pha('data/acis_1_pha3.fits')
ds.load_pha('data/acis_2_pha3.fits')
ds.group_counts(20)
ds[3,4].ignore(6.0, None) # Ignore for datasets 3, 4

This gets rid of the undesirable from datastack import * line that is needed in the session-based view. This object-oriented approach is what is really happening under the hood. The default session-based version uses these methods with a special internal DATASTACK object that gets created when the module is imported. See examples/object_oriented.py for a working example.

Overloading the Sherpa commands

Most people won’t care, but here is the code that is used to wrap every Sherpa user function (i.e. every function in sherpa.astro.ui) for the session-based interface.

In a nutshell, the code first checks if there are any args. If not, then it runs the native Sherpa function. Otherwise check to see if the first one is a stack specifier (a list or a DataStack). If not, then again just run the native Sherpa function. At this point the user is trying to call a stack function, so try calling the like-named method for the appropriate DataStack object. If this fails then raise an exception.

def _sherpa_ui_wrap(func):
    def wrap(*args, **kwargs):
        wrapfunc = func
        if args:
            if isinstance(args[0], DataStack):
                datastack, args = args[0], args[1:]
            elif isinstance(args[0], list):
                datastack, args = (DATASTACK[args[0]] if args[0] else DATASTACK), args[1:]
            else:
                return func(*args, **kwargs) # No stack specifier so use native sherpa func

            try:
                wrapfunc = getattr(datastack, func.__name__)
            except AttributeError:
                raise AttributeError(
                    '{0} is not a stack-enabled function.'.format(func.__name__))

        return wrapfunc(*args, **kwargs)

    wrap.__name__ = func.__name__
    wrap.__doc__ = func.__doc__
    return wrap