SciPy 2011 |
July 13 |
Fit parameterized models
Compute confidence regions
- Introduction to modeling and fitting with Sherpa
- Rooting Finding: Sherpa’s confidence method
- A Bayesian Approach: MCMC methods in pyBLoCXS
- Non-linear Regression Example
- Using conf
- Näive sampling of parameter space
- Exploring the posterior distribution with pyBLoCXS
What can Sherpa do?
- Supports 1-D and 2-D model fitting out of the box. Can be extended to N-D.
- Reads data from a variety of sources; FITS, text columns, even NumPy arrays.
- Includes support for user-defined models functions. Sherpa’s model syntax uses the Python parser coupled with NumPy ufuncs to build up arbitrarily complex model expressions.
- Robust optimization methods; Levenberg-Marquardt, Nelder-Mead simplex, and Monte Carlo/Differential Evolution.
- Confidence methods; conf and covariance compute confidence limits, while interval_projection and region_projection compute 1-D and 2-D confidence regions.
- Hooks for user-defined fit statistics and optimization methods.
In the case where the
fit statistic is normalized
correctly, the user-specified uncertainty
corresponds
to one standard-deviation confidence limits for each parameter one by
one. At the minimum, the quantity
is defined as the
change in the fit statistic value.
(1)
For example, 1.6 standard-deviation confidence intervals each map to a change in fit statistic of 2.56.
Common Sherpa fit statistics, include

(2)
(3)
Where
represents the observed data points;
, the
calculated model values;
, the model parameters; and
, the uncertainties on the observed data points;
- Find the best-fit parameter values.
- Calculate confidence intervals centered on the best-fit values.
- For each parameter dimension, the multi-dimensional parameter space can take the shape of an asymmetric paraboloid near the minimum.
The confidence intervals can be reduced to a root solving problem by translating
the y-axis by an amount equal to
and selecting points along the
fit statistic curve.
Parameter space in non-linear problems is not always well behaved...
The current set of parameters maps to a local minimum.
has limitations – Calculating the error using data variance can overestimate the distribution while model variance can underestimate it.
- Using the Poisson likelihood (e.g. Cash) eliminates the bias typically found with a modified or weighted
.
- Statistical model for Poisson data (low counts)
(4)
Where
represents the posterior distribution;
, the likelihood;
, the prior;
and
is considered constant.
(5)
Where
represents the model parameters;
, the observed
data; and
, the initial information.
- A Python implementation of a Markov chain Monte Carlo (MCMC) algorithm designed to handle Bayesian Low-Count X-ray Spectral (BLoCXS) analysis with Sherpa
- Explores entire parameter space at the suspected minimum
- Not hung up in a local minima
- Includes a flexible definition of priors
- Draws parameter proposals using a multivariate t-distribution
- Currently limited to simple models
“Analysis of Energy Spectra with Low Photon Counts via Bayesian Posterior Simulations” - Van Dyk, Connors, Kashyap & Siemiginowska 2001, ApJ. 548, 224
Draw parameters, calculate the likelihood and posterior probability of the “proposed” parameter values given the observed data, and use a Metropolis-Hastings criterion to accept or reject the proposal.
Main MCMC loop
Sampling the posterior probability distribution
- Metropolis-Hastings, MH
- Centered on the best-fit parameters
- Metropolis-Hastings mixed with Metropolis jumping rule, MetropolisMH
- Centered on the current draw of parameters
- Specify a scalar multiple of the covariance matrix
Priors
- Supports priori knowledge such as the range of parameters
- Priors and associated hyper-parameters are defined by the user
- A non-information (flat) prior is the default for each parameter
- Parameters can be optionally scaled according to an inverse or log scale
The Thurber problem is an example of non-linear least squares regression from the Statistical Reference Datasets (StRD) at the National Institute of Standards and Technology (NIST). The observed data results from a NIST study of semiconductor electron mobility. The data includes 37 observations with the dependent variable (y) represented as electron mobility and the independent variable (x) as the log of the density.
(6)
(7)
| Parameter | Certified Values | Sherpa Values | Percentage |
![]() |
1.2881396800E+03 | 1.28813971e+03 | 99.999 |
![]() |
1.4910792535E+03 | 1.49106665e+03 | 99.999 |
![]() |
5.8323836877E+02 | 5.83229092e+02 | 99.998 |
![]() |
7.5416644291E+01 | 7.54148565e+01 | 99.998 |
![]() |
9.6629502864E-01 | 9.66284739e-01 | 99.999 |
![]() |
3.9797285797E-01 | 3.97967752e-01 | 99.999 |
![]() |
4.9727297349E-02 | 4.97257372e-02 | 99.997 |
- Python 2.6 or 2.7
- asciitable 0.7 or later
- NumPy 1.5.1 or later
- matplotlib 1.0.0 or later
- Sherpa
- For the Sherpa examples below, the latest released version 4.3.0 is fine.
- pyBLoCXS
- For the pyBLoCXS examples below, I recommend the latest version in the github repository.
Read the observed data from the text file using asciitable. Define the model according to the specification and the initial parameter values.
import asciitable
tbl = asciitable.read('Thurber.dat',
Reader=asciitable.NoHeader,
data_start=36,
delimiter="\s")
# Columns as NumPy arrays
x = tbl['col2']
y = tbl['col1']
p0 = [1000, 1000, 400, 40, 0.7, 0.3, 0.03]
def calc(p, x):
xx = x**2
xxx = x**3
return ( (p[0] + p[1]*x + p[2]*xx + p[3]*xxx) /
(1. + p[4]*x + p[5]*xx + p[6]*xxx) )
# define a tolerance
tol = 1.e-9
Load the Thurber arrays into a Sherpa data set, indicate least-squares with Levenberg-Marquardt, set the user-defined function as the Sherpa model, and fit.
import sherpa.ui as ui
names = ['b%i' % (ii+1) for ii in range(len(p0))]
ui.load_arrays(1, x, y, ui.Data1D)
ui.set_stat('leastsq')
ui.set_method('levmar')
ui.set_method_opt('gtol', tol)
ui.set_method_opt('xtol', tol)
ui.set_method_opt('ftol', tol)
ui.set_method_opt('epsfcn', tol)
ui.load_user_model(calc, 'mdl')
ui.add_user_pars('mdl', names, p0)
ui.set_model('mdl')
ui.fit()
popt = ui.get_fit_results().parvals
Sherpa example: API level UI - fitting with user-model
Find the best-fit parameter values using least-squares with a Python function and NumPy array-based API.
Click to Show/Hide Sherpa API example
import sherpa.optmethods
import numpy
# Fit statistic function returns tuple of statistic value and residuals
def calc_resid(p):
resid = (y - calc(p, x))
return (sum(resid**2), resid)
pmax = numpy.finfo(numpy.float32).max
pmin = -pmax
lm = sherpa.optmethods.LevMar()
tol = 1.e-9
lm.config['gtol']=tol
lm.config['xtol']=tol
lm.config['ftol']=tol
lm.config['epsfcn']=tol
# Sherpa's extension of LMDIF includes strict parameter boundaries
def fit(fcn, pars, parmins=None, parmaxes=None):
pars = numpy.array(pars)
result = lm.fit(fcn, pars, parmins, parmaxes)
return result[1]
popt = fit(calc_resid, p0, [pmin]*len(p0), [pmax]*len(p0))
Compare to SciPy leastsq
import scipy.optimize
def calc_resid(p):
return (y - calc(p, x))
popt = scipy.optimize.leastsq(calc_resid, p0,
ftol = tol, xtol = tol,
gtol = tol, epsfcn = tol)[0]
Sherpa’s confidence method employs Müller’s root finding algorithm using a translated 1-D parameter space near the minimum.
ui.set_stat('cstat')
ui.set_method('neldermead')
ui.fit()
ui.conf()
# lower error bars
pmins = ui.get_conf_results().parmins
# upper error bars
pmaxes = ui.get_conf_results().parmaxes
| Parameter | Best Fit | Lower Bound | Upper Bound |
![]() |
1288.12 | -12.1594 | 12.1594 |
![]() |
1452.67 | -73.3571 | 17.8398 |
![]() |
557.281 | -7.09913 | 34.3927 |
![]() |
70.2984 | -10.1567 | 2.42915 |
![]() |
0.943534 | -0.0575953 | 0.0433009 |
![]() |
0.387899 | -0.02639 | 0.0199346 |
![]() |
0.0403176 | -0.0134162 | 0.00914532 |
To compute the covariance matrix, Sherpa first estimates the information matrix by finite differences by reducing a multi-dimensional problem to a series of 1-D problems. Sherpa then iteratively applies second central differencing with extrapolation (Kass 1987). The covariance matrix follows by inverting the information matrix.
ui.covar()
# lower error bars
pmins = ui.get_covar_results().parmins
# upper error bars
pmaxes = ui.get_covar_results().parmaxes
# where pmins == -pmaxes
# Access the covariance matrix
cov = ui.get_covar_results().extra_output
| Parameter | Best Fit | Lower Bound | Upper Bound |
![]() |
1288.12 | -12.1594 | 12.1594 |
![]() |
1452.67 | -55.506 | 55.506 |
![]() |
557.281 | -39.7166 | 39.7166 |
![]() |
70.2984 | -7.58595 | 7.58595 |
![]() |
0.943534 | -0.0471354 | 0.0471354 |
![]() |
0.387899 | -0.0217024 | 0.0217024 |
![]() |
0.0403176 | -0.0107599 | 0.0107599 |
Sherpa example: API level UI - confidence and covariance methods
Utility wrappers to Sherpa API functions provide a convenient Python function and NumPy array-based API.
Click to Show/Hide Sherpa API example
API interface to confidence
import conf
def calc_chi2(p):
err = numpy.sqrt(y)
resid = (y - calc(p, x))/err
return (sum(resid**2), resid)
pmins, pmaxes = conf.conf(calc_chi2, fit, range(len(p0)), popt)
API interface to covariance
import conf
pmins, pmaxes, cov = conf.covar(calc_chi2, range(len(p0)), popt)
# file conf.py
import sherpa.estmethods
import numpy
import logging
logger = logging.getLogger('sherpa')
logger.setLevel(logging.ERROR)
_min = -numpy.finfo(numpy.float32).max
_max = numpy.finfo(numpy.float32).max
_tol = numpy.finfo(numpy.float32).eps
def conf(stat_cb, fit_cb, parnums, pars,
parmins=None, parmaxes=None, parhardmins=None, parhardmaxes=None,
sigma=1, eps=0.01, tol=0.2, maxiters=200, remin=0.01,
verbose=False, do_parallel=False, numcores=1, openinterval=False):
xorig = numpy.array(pars)
mins = numpy.array([_min]*len(pars))
maxes = numpy.array([_max]*len(pars))
if parmins is None: parmins = mins
if parmaxes is None: parmaxes = maxes
if parhardmins is None: parhardmins = mins
if parhardmaxes is None: parhardmaxes = maxes
def stat(pvals):
stat, fvec = stat_cb(pvals)
return stat
def fit(pvals, pmins, pmaxes, i):
mu = numpy.array(xorig)
mu[i] = pvals[i]
keepers = range(len(pvals))
keepers.remove(i)
current_pars = pvals.take(keepers)
current_mins = pmins.take(keepers)
current_maxes = pmaxes.take(keepers)
def cb(modified_x):
mu.put(keepers, modified_x)
return stat_cb(mu)
xopt = fit_cb(cb, current_pars, current_mins, current_maxes)
mu.put(keepers, xopt)
return stat(mu)
get_par_name = lambda i : 'par' + str(i)
report_progress = lambda i, lower, upper : None
(mins, maxes, flags,
nfits, junk) = sherpa.estmethods.confidence(
pars, parmins, parmaxes, parhardmins, parhardmaxes,
sigma, eps, tol, maxiters, remin, verbose, parnums,
stat, fit, report_progress, get_par_name,
do_parallel, numcores, openinterval)
return mins, maxes
def covar(stat_cb, parnums, pars,
parmins=None, parmaxes=None, parhardmins=None, parhardmaxes=None,
sigma=1, eps=0.01, tol=0.01, maxiters=200, remin=0.01):
def stat(pvals):
stat, fvec = stat_cb(pvals)
return stat
fit_cb = lambda : None
report_progress = lambda i, lower, upper : None
xorig = numpy.array(pars)
mins = numpy.array([_min]*len(pars))
maxes = numpy.array([_max]*len(pars))
if parmins is None: parmins = mins
if parmaxes is None: parmaxes = maxes
if parhardmins is None: parhardmins = mins
if parhardmaxes is None: parhardmaxes = maxes
(mins, maxes, flags,
junk, cov) = sherpa.estmethods.covariance(
pars, parmins, parmaxes, parhardmins, parhardmaxes,
sigma, eps, tol, maxiters, remin, parnums,
stat, fit_cb, report_progress)
return mins, maxes, cov
Draw selected parameters from a uniform distribution while the others remain at their best-fit values. The magnitudes of the colorbar are on a log scale (log-log-likelihood).
import numpy
popt = numpy.array(ui.get_fit_results().parvals)
scales = numpy.sqrt(cov.diagonal())
num = 10000
# Run parameters 1 and 2.
params = [0, 1]
samples = [numpy.random.uniform(val-abs(scale), val+abs(scale), num)
for val, scale in zip(popt.take(params), scales.take(params))]
samples = numpy.asarray(samples).T
Log-likelihood density (log scale) by sampling from a uniform distribution
Plotting uniform example
Calculate the log-likelihood on the uniform sample of selected parameters. Remaining parameters are at their best-fit values.
Click to Show/Hide
import sherpa.stats
_min = -numpy.finfo(numpy.float32).max
_max = numpy.finfo(numpy.float32).max
cstat = sherpa.stats.CStat()
def calc_stat(p):
stat = cstat.calc_stat(y, calc(p, x), numpy.ones_like(y))[0]
return (stat, None)
nm = sherpa.optmethods.NelderMead()
popt = nm.fit(calc_stat, popt, [_min]*len(p0), [_max]*len(p0))[1]
scales = numpy.sqrt(cov.diagonal())
num = 10000
# Run parameters 1 and 2.
params = [0, 1]
samples = [numpy.random.uniform(val-abs(scale), val+abs(scale), num)
for val, scale in zip(popt.take(params), scales.take(params))]
samples = numpy.asarray(samples).T
def calc_likelihood(p):
return -0.5 * calc_stat(p)[0]
others = list(set(range(len(popt))) - set(params))
bestfits = numpy.repeat(popt.take(others)[:,numpy.newaxis], num, axis=1).T
samples = numpy.concatenate([samples, bestfits], axis=1)
likelihood = numpy.asarray(map(calc_likelihood, samples))
import pylab
pylab.scatter(samples[:,params[0]], samples[:,params[1]],
c=-1.*numpy.log(-1.*likelihood), cmap=pylab.cm.jet)
Draw selected parameters from a normal distribution assuming no correlations. The other parameter values are kept at their best-fit values. The magnitudes of the colorbar are on a log scale (log-log-likelihood).
samples = [numpy.random.normal(val, scale, num)
for val, scale in zip(popt.take(params), scales.take(params))]
samples = numpy.asarray(samples).T
Log-likelihood density (log scale) using a normal distribution (uncorrelated)
Plotting normal example
Calculate the log-likelihood on the normal sample of selected parameters. Remaining parameters are at their best-fit values.
Click to Show/Hide
samples = [numpy.random.normal(val, scale, num)
for val, scale in zip(popt.take(params), scales.take(params))]
samples = numpy.asarray(samples).T
samples = numpy.concatenate([samples, bestfits], axis=1)
likelihood = numpy.asarray(map(calc_likelihood, samples))
import pylab
pylab.scatter(samples[:,params[0]], samples[:,params[1]],
c=-1.*numpy.log(-1.*likelihood), cmap=pylab.cm.jet)
Draw selected parameters from a multivariate normal distribution using the covariance matrix. The other parameter values are kept at their best-fit values. The magnitudes of the colorbar are on a log scale (log-log-likelihood).
samples = (popt.take(params) +
numpy.random.multivariate_normal(numpy.zeros_like(popt),
cov, num).take(params, axis=1))
Log-likelihood density (log scale) using multivariate normal distribution (correlated)
Plotting a multivariate normal example
Calculate the log-likelihood on the multivariate normal sample of selected parameters using the covariance matrix. Remaining parameters are at their best-fit values.
Click to Show/Hide
samples = (popt.take(params) +
numpy.random.multivariate_normal(numpy.zeros_like(popt),
cov, num).take(params, axis=1))
samples = numpy.concatenate([samples, bestfits], axis=1)
likelihood = numpy.asarray(map(calc_likelihood, samples))
import pylab
pylab.scatter(samples[:,params[0]], samples[:,params[1]],
c=-1.*numpy.log(-1.*likelihood), cmap=pylab.cm.jet)
Calculate draws by probing the posterior probability using Metropolis-Hastings. Assumes a flat prior for each parameter.
import pyblocxs
pyblocxs.ui = ui
ui.set_stat('cash')
ui.set_method('simplex')
ui.fit()
ui.covar()
pyblocxs.set_sampler('MH')
stats, accept, params = pyblocxs.get_draws(niter=1e4)
Plot a trace of
draws versus iteration. Rejected
draws are shown as gaps in the curve.
pyblocxs.plot_trace(params[0], 'b1')
A trace plot show the draws for
per iteration
Plot the log-likelihood density using the Sherpa Cash statistic and pyBLoCXS with Metropolis-Hastings.
import pylab
pylab.figure()
pylab.scatter(params[0], params[1], c=stats, cmap=pylab.cm.jet)
Log-likelihood density using Metropolis-Hastings in pyBLoCXS
Plot the log-likelihood density using the Sherpa Cash statistic and pyBLoCXS with Metropolis-Hastings mixed with Metropolis. Jumps occurs from the current parameter proposal. pyBLoCXS assumes a flat prior for each parameter. User-defined priors can be set up using pyblocxs.set_prior().
pyblocxs.set_sampler('MetropolisMH')
stats, accept, params = pyblocxs.get_draws(niter=1e4)
pylab.figure()
pylab.scatter(params[0], params[1], c=stats, cmap=pylab.cm.jet)
Log-likelihood density using Metropolis and Metropolis-Hastings in pyBLoCXS
Use the parameter draws to calculate summary statistics, quantiles, and plot the CDF.
print("Mean: {0}".format(numpy.mean(params[0])))
print("Median: {0}".format(numpy.median(params[0])))
print("Std Dev: {0}".format(numpy.std(params[0])))
sp = numpy.sort(params[0])
print("95% quantile: {0}".format(sp[0.95*(len(params[0])-1)]))
print("50% quantile: {0}".format(sp[0.50*(len(params[0])-1)]))
pylab.figure()
pyblocxs.plot_cdf(params[0], 'b1')
Cumulative density, quantiles and median for 
Bin up the parameter draws and plot the PDF.
pylab.figure()
pyblocxs.plot_pdf(params[0], 'b1')
Probability density for 
- Multi-dimensional parameter space is not always uniform, so Sherpa provides options to explore its topology.
- Sherpa’s root-finding method conf can estimate fit parameter confidence limits.
- pyBLoCXS includes methods to efficiently probe the posterior probability distribution. Confidence limits and summary statistics follow once the parameter distribution is known.
- Sherpa
- Improve distribution methods; building from source and package management (Ubuntu debs, Red Hat RPMs, Mac disk images).
- Provide additional access for Python community users; repository on github
- pyBLoCXS
- Include support for instrumental uncertainties (non-linear) within the MCMC loop.
- Continue development and testing - include additional distributions beyond student’s t.
Copyright Smithsonian Astrophysical Observatory (SAO)
GPL v2 License
Supported Platforms
o Linux - CentOS, Fedora, Ubuntu
o Mac - OSX 10.5, 10.6