About Chandra Archive Proposer Instruments & Calibration Newsletters Data Analysis HelpDesk Calibration Database NASA Archives & Centers Chandra Science Links

Skip the navigation links
Last modified: 04 Aug 2009
Hardcopy (PDF): A4 | Letter

Calculating Uncertainties by Simulating Flux Distributions

Sherpa Threads (CIAO 4.1)

[S-Lang Syntax]



Overview

Last Update: 04 Aug 2009 - Updated for S-Lang users to take advantage of the new "ciao_utils" scripts module.

Synopsis:

This thread demonstrates a method of determining flux uncertainties by sampling the parameter values using an uncorrelated, normal distribution. The necessary module is provided as part of the CIAO contributed scripts.

Read this thread if:

Use this thread if you are interested in estimating the uncertainties in your modelled fluxes based on unthawed parameters.

Proceed to the HTML or hardcopy (PDF: A4 | letter) version of the thread.




Contents



Introduction to the Flux Error Script

The sherpa_contrib.flux_dist module provides routines for creating and plotting flux probability distributions of Sherpa models—that is, the probability distribution of flux values that account for uncertainties in the model parameter values.

All thawed model parameters are sampled from the Gaussian distribution where the mean is set as the best-fit parameter value and the variance determined by the diagonal elements of the covariance matrix. The univariate Gaussian is assumed by default; treating each parameter independently. If there are correlations between parameters, then the multivariate Gaussian is available, using the off-diagonal elements of the covariance matrix. The flux is calculated for each sampled set of the parameters and recorded. The histogram of the simulated flux values give the flux probability distribution for the assumed model.

The flux_dist module includes the functions:

  • sample_energy_flux — sample from the energy flux distribution
  • sample_photon_flux — sample from the photon flux distribution
  • get_energy_flux_hist — creates histograms of the simulated energy flux values
  • get_photon_flux_hist — creates histograms of the simulated photon flux values
  • plot_energy_flux — plot the energy flux probability distribution histogram
  • plot_photon_flux — plot the photon flux probability distribution histogram
  • write_arrays — write out arrays to an ASCII file

A VERSION.CIAO_scripts file is included in the scripts pacakge. This allows you to check if you are working with the newest set of scripts:

unix% cat $ASCDS_CONTRIB/VERSION.CIAO_scripts
04 Aug 2009

The VERSION.CIAO_scripts file is updated when a newer scripts package is installed. Please check that you have at least this version of the scripts package installed before continuing. If you do not have the scripts installed or need to update to a newer version, refer to the Scripts page.



Getting Started

Please follow the Obtaining Data Used in Sherpa thread. In this example, we use the spectral data products for 3C273, obtained within the pha_intro sub-directory.

The flux_dist module is part of the Sherpa contributed scripts module, distributed with the CIAO Contributed Scripts and Modules.

In order to simulate and plot the flux distribution, we must first load the module into Sherpa,which is only necessary to do once per session.

sherpa> require("sherpa_flux_dist");

Above, we have imported just the flux_dist script; however, if the user would like to do less book keeping, it is possible to load the entire module instead with:

sherpa> require("sherpa_contrib");

Which makes available every Sherpa contributed script to the user.



Fitting a Model to the Spectrum

We start with loading the spectrum, background, ARF and RMF with load_pha; filtering the data to model the 0.5-7.0 keV energy range; and set Sherpa to account for background subtraction in the standard manner.

sherpa> load_pha("3c273.pi");
sherpa> notice(0.5,7.0);
sherpa> subtract();

Choosing χ2 statistics, with variance calculated from the data, we fit an absorbed power-law model to the data using a constant neutral hydrogen column density of 0.07×1022 cm-2.

sherpa> set_stat("chi2datavar");
sherpa> set_model(xsphabs.abs1*xspowerlaw.p1);
sherpa> abs1.nh = 0.07;
sherpa> freeze(abs1.nh);
sherpa> fit();
 Solar Abundance Vector set to angr:  Anders E. & Grevesse N. Geochimica et Cosmochimica Acta 53, 197 (1989)
 Cross Section Table set to bcmc:  Balucinska-Church and McCammon, 1998

Dataset               = 1
Method                = levmar
Statistic             = chi2datavar
Initial fit statistic = 1.1167e+11
Final fit statistic   = 61.6619 at function evaluation 22
Data points           = 44
Degrees of freedom    = 42
Probability [Q-value] = 0.0255736
Reduced statistic     = 1.46814
Change in statistic   = 1.1167e+11
   p1.phoindex    2.15694     
   p1.norm        0.000224563 

Note that it is possible to do freeze(abs1) instead of freeze(abs1.nh) since the chosen absorption model only contains a single parameter. Generally, freezing and thawing a model component will affect all the parameters associated with the component.



Estimating Uncertainties by Simulating the Flux Distribution

We now simulate the energy flux, between 0.5-7.0 keV, a hundred times using sample_energy_flux, which returns one or more samples of the model flux, accounting for errors in the model parameters. The simulations choose parameters assuming their Gaussian distributions, with the mean set to the best-fit values and the variance given by the covariance matrix for all thawed parameters. The flux is then calculated for these parameters. The results will contain one-hundred realizations of the model and one-hundred values of the energy flux.

sherpa> flux100 = sample_energy_flux(0.5,7.;num=100);

Similarly, we may do this with the photon flux distribution using sample_photon_flux. Examining the structure of the array produced by sample_energy_flux:

sherpa> print(flux100);
7.97427e-13 2.0935 0.000225854 
7.88473e-13 1.97115 0.000204666 
7.56808e-13 2.09878 0.000215118 
7.31099e-13 2.13278 0.000212585 
6.89458e-13 2.14225 0.000201727 
8.70441e-13 2.10078 0.000247753 
      ⋮        ⋮           ⋮ 

Determining Statistical Values from the Sample of Flux Values

The attributes of the array are the values of flux, photon spectral index, and normalization factor, respectively. We can operate on these arrays to determine further statistical values, such as the mean, median, and standard deviation of the distribution determined by the flux simulation using the simple_stats function available with the "ciao_utils" contributed scripts module or the external "stats" S-Lang module.

S-Lang arrays are called on as [row,column], so we can examine only the flux elements of the flux100 array by doing:

sherpa> print(flux100[*,0]);
7.76804e-13
7.26314e-13
7.36409e-13
6.82095e-13
7.52217e-13
7.65804e-13
     ⋮

and using the "ciao_utils" module's simple_stats function to determine the mean, median, and standard deviation of the sample of fluxes, in units of ergs/cm2/sec, we find:

sherpa> require("ciao_utils");

sherpa> simple_stats(flux100[*,0]);

sherpa> s.mean;
7.62658e-13

sherpa> s.median;
7.57762e-13

sherpa> s.stddev;
5.67517e-14

Alternatively, if the user has successfully built the S-Lang "stats" module, which is independent of external libraries, and placed the module library, stats-module.so, in $ASCDS_LIB/slang/v2/modules/ with permission to execute and stats.slsh in $ASCDS_INSTALL/share/slsh/local-packages, we may access the module within Sherpa. Using the module functions mean, median, and stddev on the sample of fluxes, in units of ergs/cm2/sec, we find:

sherpa> require("stats");

sherpa> fluxes = flux100[*,0];

sherpa> mean(fluxes); 
7.62658e-13

sherpa> median(fluxes);
7.57762e-13

sherpa> stddev(fluxes);
5.67517e-14

Determining Quantile Values of the Flux Sample

By estimating the number of points in the array, which is a known from the number of simulations performed, using the information from the length and array_sort functions, the quantiles for this sample may be obtained.

We index the 1D fluxes array first in order to sort the fluxes in ascending order:

sherpa> idx = array_sort(fluxes);

sherpa> sf = fluxes[idx];

sherpa> print(sf);
6.26294e-13
6.34977e-13
6.43346e-13
     ⋮
8.65064e-13
8.88447e-13
8.92073e-13

and then get the value of the flux for the 95% quantile and 50% quantile, which corresponds to the median. Since we ran a hundred simulations, we know that we're looking for the 95th and 50th elements of the array. But in more generality, the quantiles can be determined using the length function, to determine the number of elements in the column.

sherpa> q95 = int(0.95*length(fluxes)-1);
sherpa> print(sf[q95]);
8.60619e-13

sherpa> q50 = int(0.5*length(fluxes)-1);
sherpa> print(sf[q50]);
7.57762e-13

What we have done, is define a variable, which determines the array index for the 1D array element representing the quantile of interest. The x*length(flux100[*,0]) term determines the array element of the quantile we are interested in, but remembering that the array indicies start at zero, we subtract one from the quantile value. The int function converts numerical values into integer-type values, which is necessary since S-Lang requires the array index being referenced to be integer-type. Although the length function produces an integer-type value, by multiplying it by some quantile factor, the value becomes double-type.



Creating a Histogram of a Flux Probability Distribution

We can visualize the flux distribution with a histogram created by the get_energy_flux_hist function and check whether the distribution is Gaussian or another shape. We simulate the flux distribution between 0.5-7.0 keV ten-thousand times, to produce good statistics. We use the function, get_energy_flux_hist which produces a data object that contains all the information about the simulated sample of parameters and a histogram, normalized to unity, representing the flux probability distribution. By default, get_energy_flux_hist creates a histogram with 75 bins; however, the optional parameter, bins, may be included to change the binning.

sherpa> hist10000 = get_energy_flux_hist(0.5,7.;num=10000);

Similarly, this may be done in photon units with get_photon_flux_hist.

We may display the flux probability distribution histogram determined by and get_energy_flux_hist using plot_energy_flux, which plots the calculated energy flux probability distribution, which is the flux distribution for the model component accounting for the errors on the model parameters (Figure 1).

sherpa> plot_energy_flux(0.5,7.;recalc=NULL);
[Thumbnail image: histogram of flux probability distribution produced by plot_energy_flux]

[Version: full-size]

[Print media version: histogram of flux probability distribution produced by plot_energy_flux]

Figure 1: Histogram of the Flux Distribution

This is a histogram of the energy flux probability distribution, plotted using plot_energy_flux, which is determined by get_energy_flux_hist.

Determining the Uncertainties from the Flux Distributions

Taking the histogram data array, we replot the histogram as a scatter plot to check the probability distribution with a Gaussian function, fitting the Gaussian to the histogram with least-squares statistics.

sherpa> load_arrays(2,hist10000.xlo,hist10000.xhi,hist10000.y,Data1DInt);
sherpa> plot_data(2);

sherpa> set_model(2,gauss1d.g0);
sherpa> g0.integrate = NULL;
sherpa> guess(2,g0);
sherpa> g0.ampl.min = 0;
sherpa> g0.ampl.max = 10;
sherpa> g0.ampl = 1;

sherpa> set_stat("leastsq");
sherpa> fit(2);
Dataset               = 2
Method                = levmar
Statistic             = leastsq
Initial fit statistic = 1.67883
Final fit statistic   = 0.0698581 at function evaluation 21
Data points           = 76
Degrees of freedom    = 73
Change in statistic   = 1.60897
   g0.fwhm        1.23978e-13 
   g0.pos         7.54426e-13 
   g0.ampl        0.985099    

We use load_arrays to load the histogram flux probability distribution into dataset ID=2 where hist10000.xlo and hist10000.xhi are the low and high bins in the histogram and hist10000.y is the probability of getting the flux within the bin.

Plotting this fit, we can see the good agreement between the Gaussian distribution and flux probability distribution (Figure 2).

sherpa> plot_fit(2);
sherpa> set_plot_ylabel("Frequency");
sherpa> set_plot_xlabel("Energy Flux [ergs cm^{-2} s^{-1}]");
[Thumbnail image: flux probability distribution fitted with a gaussian]

[Version: full-size]

[Print media version: flux probability distribution fitted with a gaussian]

Figure 2: Gaussian Fitted Flux Distribution

The energy flux probability distribution replotted as a scatter-plot with a Gaussian function fitted to the data (red).

We can now determine the uncertainty in the modelled flux, by calculating the standard deviation instead of the FWHM of the Gaussian using the relationship: FWHM = σ(8 ln 2)1/2 ≅ 2.35482σ.

sherpa> fac0 = sqrt(8*log(2));
sherpa> sig = g0.fwhm.val/fac0;

sherpa> print(sig);
5.26487e-14

Since we have generated two distributions of fluxes, we can examine the results of the mean of the flux100 sample and compare it with the mean position of the Gaussian fit to the histogram, g0.pos. We see that the two mean values, <flux100>≅7.627×10-13 ergs cm-2 s-1 and <hist10000>≅7.544×10-13 ergs cm-2 s-1, are approximately the same. The standard deviations similarly, are equivalent to each other, σflux100≅5.675×10-14 ergs cm-2 s-1 and σhist10000≅5.265×10-14 ergs cm-2 s-1. This is expected for normal distributions, as was simulated; however, if the flux distribution were different (e.g. skewed-, bi-modal-, or κ-distributions), then the calculated mean and σ using the "stats" module may be very different. Visualization of the distributions using the histogram is a useful sanity check.

We can write the array to an ASCII file using the write_arrays function, including column headers:

sherpa> write_arrays("hist10000.dat",{hist10000.xlo,hist10000.xhi,hist10000.y};fields={"xlo","xhi","y"});

and reload the simulation results into another session with the load_ascii function:

sherpa> load_ascii("hist10000.dat";ncols=3,dstype=Data1DInt);

for future analysis.



Scripting It

The file, flux_script.sl , performs the primary commands used above. It can be execupted by typing execfile("flux_script.sl"); on the Sherpa command line. The Sherpa script command may be used to save everything typed on the command line in a Sherpa session:

sherpa> script(filename="sherpa.log", clobber=0);

The CXC is committed to helping Sherpa users transition to new syntax as smoothly as possible. If you have existing Sherpa scripts or save files, submit them to us via the CXC Helpdesk and we will provide the CIAO/Sherpa 4.1 syntax to you.



History

30 Apr 2009 New for CIAO 4.1
04 Aug 2009 Updated for S-Lang users to take advantage of the new "ciao_utils" scripts module.

Return to Threads Page: Top | All | Fitting | Statistics
Hardcopy (PDF): A4 | Letter
Last modified: 04 Aug 2009


The Chandra X-Ray Center (CXC) is operated for NASA by the Smithsonian Astrophysical Observatory.
60 Garden Street, Cambridge, MA 02138 USA.    Email: cxcweb@head.cfa.harvard.edu
Smithsonian Institution, Copyright © 1998-2004. All rights reserved.