Chandra X-Ray Observatory
	(CXC)
Skip to the navigation links
Last modified: 1 Dec 2015

URL: http://cxc.harvard.edu/sherpa/threads/pha_intro/

Introduction to Fitting PHA Spectra

Sherpa Threads (CIAO 4.9 Sherpa v1)


Overview

Synopsis:

The basic steps used in fitting spectral data are illustrated in this thread. The data used herein were created by running the Creating ACIS Spectra for Pointlike Sources thread.

There are many options and variables that may affect how this process is applied to your data; for a more detailed explanation of the steps, see the following threads:

For a detailed explanation of the fitting concepts behind X-ray spectral analysis in Sherpa, see the document Spectral Fitting on the Sherpa References page.

Before fitting ACIS data sets with restricted pulse-height ranges, please read the CIAO caveat page "Spectral analyses of ACIS data with a limited pulse-height range."

Last Update: 1 Dec 2015 - updated for CIAO 4.9, updated for Python 3 compatibility.


Contents


Load the Spectrum & Instrument Responses

First, load the spectrum file:

sherpa> load_pha("3c273.pi")
WARNING: systematic errors were not found in file '3c273.pi'
statistical errors were found in file '3c273.pi' 
but not used; to use them, re-read with use_errors=True
read ARF file 3c273.arf
read RMF file 3c273.rmf
WARNING: systematic errors were not found in file '3c273_bg.pi'
statistical errors were found in file '3c273_bg.pi' 
but not used; to use them, re-read with use_errors=True
read background file 3c273_bg.pi

Since the RESPFILE, ANCRFILE, and BACKFILE header keywords were updated in the spectrum file, the response files (RMF and ARF) and background file are automatically read in as well. If the default dataset ID of "1" is to be used, it does not need to be explicitly included in the load function; only the data filenames are required in this case.

Sherpa issued a warning about systematic and statistical errors, which were not loaded. The statistical errors are calculated using the appropriate fit statistics set with set_stat in the Sherpa session. The standard treatment of systematic errors supplied with load_syserror is to add the array of systematic errors in quadrature to the statistical errors. Advanced methods to account for non-linear calibration uncertainties described in Lee et al. (2011) are available within pyblocxs Bayesian functions. However, they require the calibration products that are not available at this moment.

If Sherpa does not automatically read in the background and response files, read them manually:

Use show_all, show_data, and show_bkg to get the status of the Sherpa session. Some additional commands are used to get the total number of counts and counts rates calculated from the data.

sherpa> show_all()
Data Set: 1
Filter: 0.1248-12.4100 Energy (keV)
Bkg Scale: 0.134921
Noticed Channels: 1-1024
name           = 3c273.pi
channel        = Float64[1024]
counts         = Float64[1024]
staterror      = None
syserror       = None
bin_lo         = None
bin_hi         = None
grouping       = Int16[1024]
quality        = Int16[1024]
exposure       = 38564.6089269
backscal       = 2.52643646989e-06
areascal       = 1.0
grouped        = True
subtracted     = False
units          = energy
rate           = True
plot_fac       = 0
response_ids   = [1]
background_ids = [1]

RMF Data Set: 1:1
name     = 3c273.rmf
detchans = 1024
energ_lo = Float64[1090]
energ_hi = Float64[1090]
n_grp    = UInt64[1090]
f_chan   = UInt32[2002]
n_chan   = UInt32[2002]
matrix   = Float64[61834]
offset   = 1
e_min    = Float64[1024]
e_max    = Float64[1024]

ARF Data Set: 1:1
name     = 3c273.arf
energ_lo = Float64[1090]
energ_hi = Float64[1090]
specresp = Float64[1090]
bin_lo   = None
bin_hi   = None
exposure = 38564.1414549

Background Data Set: 1:1
Filter: 0.1248-12.4100 Energy (keV)
Noticed Channels: 1-1024
name           = 3c273_bg.pi
channel        = Float64[1024]
counts         = Float64[1024]
staterror      = None
syserror       = None
bin_lo         = None
bin_hi         = None
grouping       = Int16[1024]
quality        = Int16[1024]
exposure       = 38564.6089269
backscal       = 1.87253514146e-05
areascal       = 1.0
grouped        = True
subtracted     = False
units          = energy
rate           = True
plot_fac       = 0
response_ids   = [1]
background_ids = []

Background RMF Data Set: 1:1
name     = 3c273.rmf
detchans = 1024
energ_lo = Float64[1090]
energ_hi = Float64[1090]
n_grp    = UInt64[1090]
f_chan   = UInt32[2002]
n_chan   = UInt32[2002]
matrix   = Float64[61834]
offset   = 1
e_min    = Float64[1024]
e_max    = Float64[1024]

Background ARF Data Set: 1:1
name     = 3c273.arf
energ_lo = Float64[1090]
energ_hi = Float64[1090]
specresp = Float64[1090]
bin_lo   = None
bin_hi   = None
exposure = 38564.1414549


sherpa> data_sum = calc_data_sum(id=1)  # total counts (or values) in the data
sherpa> print(data_sum)
        736.0

 
sherpa> data_cnt_rate = calc_data_sum()/get_exposure(id=1)  # calculating count rate in cts/sec 
sherpa> print(data_cnt_rate)
        0.0190848557908


sherpa> bkg_sum = calc_data_sum(bkg_id=1)   # total counts (or values) in the background data
sherpa> print(bkg_sum)
       216.0


sherpa> bkg_cnt_rate = calc_data_sum(bkg_id=1)/get_exposure(bkg_id=1)  # calculating background count rate in cts/sec
 
sherpa> print(bkg_cnt_rate)
        0.00560099028644

Plot the data:

sherpa> plot_data()

The data are plotted in energy space—as seen in Figure 1—since the instrument model provides the information necessary to compute the predicted counts for each bin. In general, the units of the x-axis are determined by the value in the units field of the data, which may be accessed with 'print(get_data().units)' or show_filter, and modified with set_analysis.


Filter the Data & Subtract the Background

The CIAO 'Why' topic on Choosing an Energy Filter contains information on selecting energy range for spectral modeling. We can use the Sherpa ignore or notice functions to select the energy range between 0.1 and 6.0 keV. These functions are applied to all data sets. The other two functions, notice_id()/ignore_id(), require explicit input of the source data set ID, as the first argument; the second argument defines the lower energy of the range, and the third the higher energy of the range. These are useful for multiple data sets requiring different filters. (The notice_id filter will automatically be applied to the associated background data when the background data set ID (bkg_id) parameter is not used, as in the example in this thread. A different filter for the background may be set by issuing the notice_id or ignore_id command with the bkg_id entered as the fourth argument to the function.) The data between 0.1 and 6.0 keV will be noticed with use of either of the following commands:

At this point, we also opt to subtract the background data:

sherpa> subtract()

sherpa> plot_data()

Figure 2 shows the resulting plot.

The axis scaling for all plots created in the current Sherpa session may be changed to log by calling set_xlog and set_ylog with no arguments (and changed back to linear with set_xlinear/set_ylinear):

sherpa> set_xlog()
sherpa> set_ylog()

(To set the plot axis scaling for a specific type of plot, e.g., model, data, or fit plots, the set_xlog/set_ylog or set_xlinear/set_ylinear commands should be called with the appropriate argument, either "data", "model", "source", "fit", or "delchi"—similar to those accepted by the generic Sherpa plot function.)

To change the default settings for plot_data so that both the x- and y-axes will be drawn using a log-scale each time the function is called in the session, use the get_data_plot_prefs function:

sherpa> p = get_data_plot_prefs() 
sherpa> p["xlog"] = True
sherpa> p["ylog"] = True

To learn how to change the default axis scale from linear to logarithmic so that these commands do not have to be run in each Sherpa session, see this Sherpa FAQ.


Defining the Source Model

Before fitting the data, it is necessary to define a model that characterizes the source. All models available in Sherpa, or only models belonging to a specific category, may be returned at the Sherpa prompt by calling the list_models function accordingly:

sherpa> list_models()         # all models, same as 'list_models("all")'

sherpa> list_models("xspec")  # all xspec models

sherpa> list_models("2d")     # Sherpa 2D analytic models

Here, we use a source model composed of two model components:

  • powlaw1d - a one-dimensional power-law.
  • xsphabs - an XSpec photoelectric absorption model.

We define an expression that is the product of these two components. The hydrogen column density (nH) is set to the known Galactic value for the source and the parameter is frozen so that it will not be allowed to vary in the fit.

sherpa> set_source(xsphabs.abs1 * powlaw1d.p1)
sherpa> abs1.nH = 0.07
sherpa> freeze(abs1.nH)

The current source model definition may be displayed:

sherpa> show_model()
Model: 1
apply_rmf(apply_arf((38564.6089269 * (xsphabs.abs1 * powlaw1d.p1))))
   Param        Type          Value          Min          Max      Units
   -----        ----          -----          ---          ---      -----
   abs1.nh      frozen         0.07            0       100000 10^22 atoms / cm^2
   p1.gamma     thawed            1          -10           10
   p1.ref       frozen            1 -3.40282e+38  3.40282e+38
   p1.ampl      thawed            1            0  3.40282e+38

Note that Sherpa and XSPEC absorption models have to be multiplied by a model which has normalization and amplitude parameters, such as powlaw1d; they cannot be used as single models in the source expression. It may be necessary to modify the parameter values, since the Sherpa guess functionality does not apply to absorption models. However, we can use this command to guess the initial parameter values and ranges for the power-law model component (parameter values are not automatically guessed in Sherpa 4.7). To have Sherpa automatically query for the initial parameter values when a model is established, set 'paramprompt(True)' (it is 'False' by default).

sherpa> guess(p1)

sherpa> show_model()
Model: 1
apply_rmf(apply_arf((38564.6089269 * (xsphabs.abs1 * powlaw1d.p1))))
   Param        Type          Value          Min          Max      Units
   -----        ----          -----          ---          ---      -----
   abs1.nh      frozen         0.07            0       100000 10^22 atoms / cm^2
   p1.gamma     thawed            1          -10           10           
   p1.ref       frozen            1 -3.40282e+38  3.40282e+38           
   p1.ampl      thawed  0.000148802  1.48802e-06    0.0148802           

The guess command makes an initial guess at parameter values to ensure convergence, but it is always a good idea to check that the initial range of values (soft limits) is sensible for the data being fit. Note that the initial parameter values can also be entered with set_par which is more appropriate for complex models, as guess is just a simple function and can make the parameter space too narrow for the search. set_par should be used in scripts.


Fitting

Now we are ready to run the fit, using the Sherpa default fit statistic (chi2gehrels) and optimization method (levmar). (The available fit statistics and optimization methods may be returned with the list_stats and list_methods commands, and they may be changed with set_method and set_stat.)

sherpa> fit()
Dataset               = 1
Method                = levmar
Statistic             = chi2gehrels
Initial fit statistic = 720.951
Final fit statistic   = 37.9079 at function evaluation 19
Data points           = 44
Degrees of freedom    = 42
Probability [Q-value] = 0.651155
Reduced statistic     = 0.902569
Change in statistic   = 683.043
   p1.gamma       2.15852     
   p1.ampl        0.000224841 

The fit information returned by the fit command includes the statistic value for chi2gehrels, goodness-of-fit and reduced χ2, along with the best-fit parameter values of the photon index and amplitude. The function calc_stat_info and its associated get_stat_info command, may be used to return the goodness-of-fit statistics without having to re-run the fit:

sherpa> calc_stat_info()
Dataset               = 1
Statistic             = chi2gehrels
Fit statistic value   = 37.9079
Data points           = 44
Degrees of freedom    = 42
Probability [Q-value] = 0.651155
Reduced statistic     = 0.902569

sherpa> goodness = get_stat_info()

sherpa> print(goodness[0])
name      = Dataset [1]
ids       = [1]
bkg_ids   = None
statname  = chi2gehrels
statval   = 37.90791394722276
numpoints = 44
dof       = 42
qval      = 0.65115526728
rstat     = 0.90256937969578

The calc_stat_info command is appropriate for accessing the fit statistics at the Sherpa prompt, where the information is printed to the screen, whereas get_stat_info is more useful for parsing this information within a script. Note that get_fit_result is available to access the full information returned by the fit, which is also useful when working on a script.

The best fit model with the data and residuals may be plotted in the same window:

which creates Figure 3. The errors are plotted as "sigma", the sigma residuals of the fit [(data - model)/error].

The plot can be modified using chips commands directly or with the chipsgui


Examining Fit Results

Goodness of fit

The show_fit and get_fit_results commands allow access to the best-fit values and detailed information after the fit has been performed:

sherpa> show_fit()
Optimization Method: LevMar
name    = levmar
ftol    = 1.19209289551e-07
xtol    = 1.19209289551e-07
gtol    = 1.19209289551e-07
maxfev  = None
epsfcn  = 1.19209289551e-07
factor  = 100.0
verbose = 0

Statistic: Chi2Gehrels
Chi Squared with Gehrels variance.

    The variance is estimated from the number of counts in each bin,
    but unlike `Chi2DataVar`, the Gaussian approximation is not
    used. This makes it more-suitable for use with low-count data.

    The standard deviation for each bin is calculated using the
    approximation from [1]_:

    sigma(i,S) = 1 + sqrt(N(i,s) + 0.75)

    where the higher-order terms have been dropped. This is accurate
    to approximately one percent. For data where the background has
    not been subtracted then the error term is:

    sigma(i) = sigma(i,S)

    whereas with background subtraction,

    sigma(i)^2 = sigma(i,S)^2 + [A(S)/A(B)]^2 sigma(i,B)^2

    Notes
    -----
    The accuracy of the error term when the background has been
    subtracted has not been determined. A preferable approach to
    background subtraction is to model the background as well as the
    source signal.

    References
    ----------

    .. [1] "Confidence limits for small numbers of events in
           astrophysical data", Gehrels, N. 1986, ApJ, vol 303,
           p. 336-346.
           http://adsabs.harvard.edu/abs/1986ApJ...303..336G

    

Fit:Dataset               = 1
Method                = levmar
Statistic             = chi2gehrels
Initial fit statistic = 720.951
Final fit statistic   = 37.9079 at function evaluation 19
Data points           = 44
Degrees of freedom    = 42
Probability [Q-value] = 0.651155
Reduced statistic     = 0.902569
Change in statistic   = 683.043
   p1.gamma       2.15852     
   p1.ampl        0.000224841 


# retrieve a single value with get_fit_results():
sherpa> print(get_fit_results().qval)
0.65115526728
sherpa> print(get_fit_results().rstat)
0.90256937969578

The number of bins in the fit (Data points), the number of degrees of freedom, i.e. the number of bins minus the number of free parameters, and the final fit statistic value are reported. If the chosen statistic is one of the χ2 statistics, as in this example, the reduced statistic (i.e. the statistic value divided by the number of degrees of freedom) and the probability (Q-value) are included as well.

The calc_chisqr command calculates the statistic contribution per bin:

sherpa> calc_chisqr()

array([  8.18670622e+00,   5.97517739e+00,   1.12652083e+00,
         2.71025790e-01,   1.52207810e-01,   6.42161939e-03,
         7.45727738e-01,   3.65860398e-01,   7.64873353e-01,
         7.54502439e-02,   6.15842067e-01,   2.33671949e+00,
         2.01502435e-01,   3.58009309e-02,   9.88244156e-01,
         3.98373405e-01,   1.35754790e-02,   1.26939213e+00,
         1.71510119e+00,   2.12309165e-01,   8.44131456e-02,
         1.43547769e-01,   5.85914897e-01,   4.80663926e-02,
         1.97970268e+00,   1.00169043e-01,   1.05894757e+00,
         4.39179080e-01,   3.28736224e-01,   6.84274787e-02,
         1.50497166e-01,   1.24954784e+00,   6.00580428e-01,
         5.36106815e-04,   8.38895755e-01,   1.14989314e+00,
         1.77147214e-01,   6.99719723e-02,   2.85830872e-01,
         2.05559864e+00,   2.35970901e-01,   2.13692364e-01,
         1.51027449e-01,   4.34788005e-01])

Confidence intervals

The covariance() command—which may be shortened to covar()—computes covariance matrices and provides an estimate of confidence intervals for the thawed parameters; also see the related command conf():

sherpa> covar()
Dataset               = 1
Confidence Method     = covariance
Iterative Fit Method  = None
Fitting Method        = levmar
Statistic             = chi2gehrels
covariance 1-sigma (68.2689%) bounds:
   Param            Best-Fit  Lower Bound  Upper Bound
   -----            --------  -----------  -----------
   p1.gamma          2.15852   -0.0827857    0.0827857
   p1.ampl       0.000224841 -1.48255e-05  1.48255e-05


sherpa> conf()
p1.gamma lower bound:	-0.0827857
p1.ampl lower bound:	-1.48255e-05
p1.ampl upper bound:	1.48255e-05
p1.gamma upper bound:	0.0834108
Dataset               = 1
Confidence Method     = confidence
Iterative Fit Method  = None
Fitting Method        = levmar
Statistic             = chi2gehrels
confidence 1-sigma (68.2689%) bounds:
   Param            Best-Fit  Lower Bound  Upper Bound
   -----            --------  -----------  -----------
   p1.gamma          2.15852   -0.0827857    0.0834108
   p1.ampl       0.000224841 -1.48255e-05  1.48255e-05

The output is the best-fit parameter value with positive and negative error estimates.


Flux and Counts

To calculate the flux of a PHA data set, use the calc_photon_flux and calc_energy_flux commands. The flux may be calculated over the entire data set or over a specific range:

sherpa> calc_photon_flux()
           0.00046996626360194062

sherpa> calc_photon_flux(2., 10.) 
           7.2654476285442739e-05

sherpa> calc_energy_flux()
           9.614834696462046e-13

sherpa> calc_energy_flux(2., 10.)
           4.5513003755033505e-13

To calculate the counts of a PHA data set, use the calc_data_sum, calc_model_sum, or calc_source_sum commands. As with the flux calculations, these commands may be given a range:

sherpa> calc_data_sum()
           706.85714092017133

sherpa> calc_data_sum(2., 10.)
           306.2301570173737

sherpa> calc_model_sum()
           638.45677172410706

sherpa> calc_model_sum(2., 10.)
           270.73047856183859

sherpa> calc_source_sum()
           0.046996626710353269

sherpa> calc_source_sum(2., 10.)
           0.0072654476008457147

Scripting It

The file fit.py is a Python script which performs the primary commands used above; it can be executed by typing exec(open("fit.py").read()) on the Sherpa command line.

The Sherpa script command may be used to save everything typed on the command line in a Sherpa session:

sherpa> script(filename="sherpa.log", clobber=False)

(Note that restoring a Sherpa session from such a file could be problematic since it may include syntax errors, unwanted fitting trials, et cetera.)

The CXC is committed to helping Sherpa users transition to new syntax as smoothly as possible. If you have existing Sherpa 3.x scripts or save files, submit them to us via the CXC Helpdesk and we will provide the CIAO/Sherpa 4.9 syntax to you.


History

14 Nov 2007 rewritten for CIAO 4.0 Beta 3
29 Apr 2008 show_all command is available in CIAO 4.0
09 Dec 2008 figures moved inline with text
09 Dec 2008 updated for Sherpa 4.1
16 Feb 2009 example of guess functionality added
29 Apr 2009 new script command is available with CIAO 4.1.2
15 Dec 2009 updated for CIAO 4.2
09 Jul 2010 updated for CIAO 4.2 Sherpa v2: S-Lang version of thread removed
15 Dec 2010 updated for Sherpa in CIAO 4.3: use of log_scale replaced with set_xlog/set_ylog; list_models is available with new argument options; new functions calc_stat_info and get_stat_info return goodness-of-fit information
15 Dec 2011 reviewed for CIAO 4.4 (no changes)
13 Dec 2012 updated for CIAO 4.5: background data may now be filtered separately from associated source data using the new bkg_id argument of the notice_id/ignore_id commands
04 Jun 2013 added a paragraph on statistical and systematic errors to the section "Load the Spectrum and Instrument Responses". Made small edits to the text.
03 Dec 2013 reviewed for CIAO 4.6
06 Apr 2015 updated for CIAO 4.7, no content change
01 Dec 2015 updated for CIAO 4.8, outputs updated
01 Dec 2015 updated for CIAO 4.9, updated for Python 3 compatibility.


Last modified: 1 Dec 2015
Smithsonian Institute Smithsonian Institute

The Chandra X-Ray Center (CXC) is operated for NASA by the Smithsonian Astrophysical Observatory. 60 Garden Street, Cambridge, MA 02138 USA.   Email:   cxchelp@head.cfa.harvard.edu Smithsonian Institution, Copyright © 1998-2017. All rights reserved.