bayesphot: Bayesian Inference for Stochastic Stellar Populations

What Does bayesphot Do?

Bayesphot is a package for performing Bayesian inference for the physical properties of a stellar system using its measured photometric properties, in a case where the photometric properties vary non-deterministically with the physical properties. Formally, bayesphot answers the following question: consider a stellar system characterized by a vector of \(\mathbf{x} = (x_1, x_2, \ldots x_N)\) physical properties. We have a physical model that lets us sample the expected photometric properties as a function of physical properties, i.e., that for some sample of \(K\) systems with physical properties \(\mathbf{x}_k\) we are able to compute the corresponding photometric properties \(\mathbf{y}_k = \mathbf{y} = (y_1, y_2, \ldots y_M)_k\). Now suppose that we observe such a system, and we observe it to have photometric properties \(\mathbf{y}_{\mathrm{obs}}\), with some set of photometric errors \(\mathbf{\sigma}_{\mathbf{y}} = (\sigma_{y_1}, \sigma_{y_2}, \ldots \sigma_{y_M})\), which are assumed to be Gaussian-distributed. What should we infer about the posterior probability distribution of the physical properties, i.e., given a set of prior probabilities \(p(\mathbf{x})\), plus our measurements, what is \(p(\mathbf{x} \mid \mathbf{y}_{\mathrm{obs}}, \mathbf{\sigma}_{\mathrm{y}})\)?

The kernel density estimation algorithm that bayesphot uses to answer this question is described and derived in the slug methods paper. Bayesphot is implemented in two parts: a shared object library that is implemented in c, and that is compiled at the same time that slug is built, and a python wrapper class called bp that is included in the slugpy.bayesphot module. The following sections describe how to use bp objects to generate posterior PDFs.

Creating bp Objects

The bp class can be imported via:

from slugpy.bayesphot import *

or:

from slugpy.bayesphot import bp

Once imported, a bp object can be instantiated. The call signature for the bp constructor class is:

def __init__(self, dataset, nphys, filters=None, bandwidth='auto',
             ktype='gaussian', priors=None, sample_density=None,
             reltol=1.0e-3, abstol=1.0e-10, leafsize=16):

A full description of all options is included in the Full Documentation of slugpy.bayesphot, but the essential features are summarized here.

The argument dataset is an array of shape (N, M) that contains the library of N models that represents the training set for the Bayesian analysis. Each model consists of M properties; the first nphys of these are physical properties that are the quantities to be inferred from the observations, while the remaining ones are photometric properties. An important point is that the dataset object is NOT copied, so altering it after the bp object is created will result in erroneous results.

The priors and sample_density arguments are used to compute the weighting to apply to the input models. The priors argument specifies the prior probability to assign to each model; it can be either an array giving a prior probability directly, or a callable that can take the physical properties of models as an input and return the prior probability as an output. Similarly, the sample_density argument specifies the probablity distribution from which the physical models were selected; as with priors, it can be an array or a callable.

The bandwidth argument specifies the bandwidth to use in the kernel density estimation; this need not be the same in each dimension. The bandwidth can be specified as a float, in which case it is the same for every dimension, or as an array of M elements giving the bandwidth for every dimension. Finally, it can be set to the string auto, in which case the bp will attempt to make a reasonable choice of bandwidth autonomously. However, this autonomous choice will probably perform less well than something that is hand-chosen by the user based on their knowledge of the library. As a rule of thumb, bandwidths should be chosen so that, for typical input photometric values, there are ~10 simulations within the 1 kernel size.

Note that both priors and bandwidth are properties of the bp class, and can be altered after the bp object is created. This makes it possible to alter the priors and bandwidth without incurring the computational or memory cost of generating an entirely new bp object.

Using bp Objects

Once a bp object is instantiated, it can be used to compute likelihood functions, marginal probabilities, and MCMC sample ensembles, and to search the library for the best matches to an input set of photometry.

The likelihood function is implemented via the bp.logL method, which has the call signature:

def logL(self, physprop, photprop, photerr=None):

The argument physprop is a set of physical properties, the argument photprop is a set of photometric properties, and the argument photerr is an (optional) set of photometric errors. All of these must be arrays, the size of whose trailing dimension matches the number of physical properties (for physprop) or the number of photometric properties (for photprop and photerr); the leading dimensions of these arrays are broadcast together using normal broadcasting rules. The quantity returned is the log of the joint probability distribution of physical and photometric properties. Specifically, the quantity returned for each input set of physical and photometric properties is

\[\log p(\mathbf{x}, \mathbf{y}, \sigma_{\mathbf{y}}) = \log A \sum_{i=1}^N w_i G(\mathbf{x}, \mathbf{y}; \mathbf{h}')\]

where \(A\) is a normalization constant chosen to ensure that the PDF integrated over all space is unity, \(\mathbf{x}\) is the vector of physical properties, \(\mathbf{y}\) is the vector of photometric properties, \(\sigma_\mathbf{y}\) is the vector of photomtric errors, \(w_i\) is the weight of the ith model as determined by the priors and sample density,

\[G\left(\mathbf{x}, \mathbf{y}; \mathbf{h}'\right) \propto \exp\left[-\left(\frac{x_1^2}{2h_{x_1}'^2} + \cdots + \frac{x_N^2}{2h_{x_N}'^2} + \frac{y_1^2}{2h_{y_1}'^2} + \cdots + \frac{y_M^2}{2h_{y_M}'^2} \right)\right]\]

is the N-dimensional Gaussian function, and

\[\mathbf{h'} = \sqrt{\mathbf{h}^2 + \sigma_{\mathbf{y}}^2}\]

is the modified bandwidth, which is equal to the bandwidth used for kernel density estimation added in quadrature sum with the errors in the photometric quantities (see the slug method paper for details).

Estimation of marginal PDFs is done via the bp.mpdf method, which has the call signature:

def mpdf(self, idx, photprop, photerr=None, ngrid=128,
         qmin=None, qmax=None, grid=None, norm=True):

The argument idx is an int or a list of ints between 0 and nphys-1, which specifies for which physical quantity or physical quantities the marginal PDF is to be computed. These indices refer to the indices in the dataset array that was input when the bp object was instantiated. The arguments photprop and photerr give the photometric measurements and their errors for which the marginal PDFs are to be computed; they must be arrays whose trailing dimension is equal to the number of photometric quantities. The leading dimensions of these arrays are broadcast together following the normal broadcasting rules. By default each physical quantity will be estimated on a grid of 128 points, evenly spaced from the lowest value of that physical property in the model library to the highest value. The parameters qmin, qmax, ngrid, and grid can be used to override this behavior and set the grid of evaluation points manually. The function returns a tuple grid_out, pdf; here grid_out is the grid of points on which the marginal PDF has been computed, and pdf is the value of the marginal PDF evaluated at those gridpoints.

MCMC calculations are implemented via the method bp.mcmc; this method relies on the emcee python module, and will only function if it is installed. The call signature is:

def mcmc(self, photprop, photerr=None, mc_walkers=100,
         mc_steps=500, mc_burn_in=50):

The quantities photprop and photerr have the same meaning as for bp.mpdf, and the quantities mc_walkers, mc_steps, and mc_burn_in are passed directly to emcee, and are described in emcee’s documentation. The quantity returned is an array of sample points computed by the MCMC; its format is also described in emcee’s documentation. Note that, although bp.mcmc can be used to compute marginal PDFs of the physical quantities, for marginal PDFs of 1 quantity or joint PDFs of 2 quantities it is almost always faster to use bp.mpdf than bp.mcmc. This is because bp.mpdf takes advantage of the fact that integrals of cuts through N-dimensional Gaussians can be integrated analytically to compute the marginal PDFs directly, though needing to evaluate the likelihood function point by point. In contrast, the general MCMC algorithm used by emcee effectively does the integral numerically.

The bp.bestmatch method searches through the model library and finds the N library entries that are closest to an input set of photometry. The call signature is:

def bestmatch(self, phot, nmatch=1, bandwidth_units=False):

Here phot is the set of photometric properties, which is identical to the photprop parameter used by logL, mpdf, and mcmc. The argument nmatch specifies how many matches to return, and the argument bandwidth_units specifies whether distances are to be measured using an ordinary Euclidean metric, or in units of the kernel bandwidth in a given direction. The function returns, for each input set of photometry, the physical and photometric properties of the nmatch models in the library that are closest to the input photometric values. This can be used to judge if a good match to the input photometry is present in the library.

Caching

The bp class is built to compute posterior PDFs of some quantities, marginalising over others. To speed this process, it uses an internal KD tree representation of the data. By default the KD tree spans all physical and photometric dimensions of the underlying data set. When marginalising over some dimensions, however, it can be much more efficient to have a KD tree that only spans the dimensions that are not being marginalised, particularly if there are many of them. For example, suppose that we have a library of sample star clusters with a range of masses, ages, extinctions, metallicities, and photometric measurements. We are interested in the posterior PDF of mass given a set of photometry, marginalising over age, extinction, and metallicity. In this case it is more efficient to have a KD tree that does not split in the age, extinction, or metallicity dimensions. Since evaluating the marginal posterior PDF of mass using such a tree is much faster, if we are going to compute marginal posterior PDFs of mass many times (for example for many photometric measurements) it is advantageous to pay the one-time cost of constructing the better-optimised KD tree and then use it for all subsequent calculations.

To handle this, bp has an optional caching capability that will cache KD tree representations of the data that are optimal for computing posterior PDFs marginalised over certain dimensions. One can construct such a cached tree by invoking the bp.make_cache method. The syntax is simple:

bp.make_cache(margindims)

where margindims is a listlike containing the dimensions that will be marginalized out. Once created, the cache will be used for all subsequent evaluations using bp.mpdf and related methods (mpdf_phot and mpdf_gen) that marginalise over those dimensions.

Caching can also be done automatically. The bp constructor accepts the keyword caching which specifies the type of caching to use. The default value, none, uses no caching. The value lazy causes a cached KD tree to be build whenever one of the marginal PDF methods is invoked, and is stored for future use. (Warning: lazy mode is not thread-safe – see Parallelism in bayesphot.) Finally, aggressive mode constructs caches for all possible one-dimensional marginalisations of physical variables given observed photometry, and all possible one-dimensional marginalisations of variables by themselves, when the bp object is first constructed.

Parallelism in bayesphot

The bp class supports parallel calculations of posterior PDFs and related quantities, through the python multiprocessing module. This allows efficient use of multiple cores on a shared memory machine, circumventing the python global interpreter lock, without the need for every process to read a large simulation library or store it in memory. The recommended method for writing threaded code using bp objects is to use have a master process create the bp object, and then use a Process or Pool object to create child processes the perform computations using bp methods such as bp.logL or bp.mpdf. It is often most efficient to combine this with shared memory objects such as RawArray to hold the outputs.

An example use case for computing 1D marginal PDFs on a large set of photometric values is:

# Import what we need
from slugpy.bayesphot import bp
from multiprocessing import Pool, RawArray
from ctypes import c_double
import numpy as np

# Some code here to create / read the data set to be used by
# bayesphot and store it in a variable called dataset

# Create the bayesphot object
my_bp = bp(dataset, nphys)

# Some code here to create / read the photometric data we want to
# process using bayesphot and store it in an array called phot,
# which is of shape (nphot, nfilter). There is also an array of
# photometric errors, called photerr, of the same shape.

# Create a holder for the output
pdf_base = RawArray(c_double, 128*nphot)
pdf = np.frombuffer(pdf_base, dtype=c_double). \
      reshape((nphot,128))
grd_base = RawArray(c_double, 128)
grd = np.frombuffer(grd_base, dtype=c_double)

# Define the function that will be responsible for computing the
# marginal PDF
def mpdf_apply(i):
    grd[:], pdf[i] = my_bp.mpdf(0, phot[i], photerr[i])

# Main thread starts up a process pool and starts the computation
if __name__ == '__main__':
    pool = Pool()
    pool.map(mpdf_apply, range(nphot))

    # At this point the grd and PDF contain the same as they would
    # if we had done
    #     grd, pdf = my_bp.mpdf(0, phot, photerr)
    # but the results will be computed much faster this way

For an example of a more complex use case, see the LEGUS cluster pipeline.

The full list of bp methods that are thread-safe is:

  • bp.logL
  • bp.mpdf
  • bp.mcmc
  • bp.bestmatch
  • bp.make_approx_phot
  • bp.make_approx_phys
  • bp.squeeze_rep
  • bp.mpdf_approx

Thread safety involves a very modest overhead in terms of memory and speed, but for non-threaded computations this can be avoided by specifying:

bp.thread_safe = False

or by setting the thread_safe keyword to False when the bp object is constructed.

Finally, note that this parallel paradigm avoids duplicating the large library only on unix-like operating systems that support copy-on-write semantics for fork. Code such as the above example should still work on windows (though this has not been tested), but each worker process will duplicate the library in physical memory, thereby removing one of the main advantages of working in parallel.

Full Documentation of slugpy.bayesphot

This defines a class that can be used to estimate the PDF of physical quantities from a set of input photometry in various bands, together with a training data set.

class slugpy.bayesphot.bp.bp(dataset, nphys, filters=None, bandwidth='auto', ktype='gaussian', priors=None, pobs=None, sample_density=None, reltol=0.01, abstol=1e-10, leafsize=16, nosort=None, thread_safe=True, caching='none')[source]

A class that can be used to estimate the PDF of the physical properties of stellar population from a training set plus a set of measured photometric values.

Properties
priors : array, shape (N) | callable | None
prior probability on each data point; interpretation depends on the type passed; array, shape (N): values are interpreted as the prior probability of each data point; callable: the callable must take as an argument an array of shape (N, nphys), and return an array of shape (N) giving the prior probability at each data point; None: no reweighting is performed, so all data points in the library have equal prior probability, i.e. the prior is the same as the sampling of points in the library
pobs : array, shape (N) | callable | None
the probability that a particular object would be observed, which is used, like prior, to weight the library; interpretation depends on type. None means all objects are equally likely to be observed, array is an array giving the observation probability of each object in the library, and callable means must be a function that takes an array containing the photometry, of shape (N, nhpot), as an argument, and returns an array of shape (N) giving the probability of observation for that object
bandwidth : ‘auto’ | float | array, shape (M)
bandwidth for kernel density estimation; if set to ‘auto’, the bandwidth will be estimated automatically; if set to a scalar quantity, the same bandwidth is used for all dimensions
nphys : int
number of physical properties in the library
nphot : int
number of photometric properties in the library
ndim : int
nphys + nphot
__init__(dataset, nphys, filters=None, bandwidth='auto', ktype='gaussian', priors=None, pobs=None, sample_density=None, reltol=0.01, abstol=1e-10, leafsize=16, nosort=None, thread_safe=True, caching='none')[source]

Initialize a bp object.

Parameters
dataset : array, shape (N, M)
training data set; this is a set of N sample stellar populations, having M properties each; the first nphys represent physical properties (e.g., log mass, log age), while the next M - nphys represent photometric properties
nphys : int
number of physical properties in dataset
filters : listlike of strings
names of photometric filters; not used, but can be stored for convenience
bandwidth : ‘auto’ | float | array, shape (M)
bandwidth for kernel density estimation; if set to ‘auto’, the bandwidth will be estimated automatically; if set to a scalar quantity, the same bandwidth is used for all dimensions
ktype : string
type of kernel to be used in densty estimation; allowed values are ‘gaussian’ (default), ‘epanechnikov’, and ‘tophat’; only Gaussian can be used with error bars
priors : array, shape (N) | callable | None
prior probability on each data point; interpretation depends on the type passed; array, shape (N): values are interpreted as the prior probability of each data point; callable: the callable must take as an argument an array of shape (N, nphys), and return an array of shape (N) giving the prior probability at each data point; None: all data points have equal prior probability
pobs : array, shape (N) | callable | None
the probability that a particular object would be observed, which is used, like prior, to weight the library; interpretation depends on type. None means all objects are equally likely to be observed, array is an array giving the observation probability of each object in the library, and callable means must be a function that takes an array containing the photometry, of shape (N, nhpot), as an argument, and returns an array of shape (N) giving the probability of observation for that object
sample_density : array, shape (N) | callable | ‘auto’ | None
the density of the data samples at each data point; this need not match the prior density; interpretation depends on the type passed; array, shape (N): values are interpreted as the density of data sampling at each sample point; callable: the callable must take as an argument an array of shape (N, nphys), and return an array of shape (N) giving the sampling density at each point; ‘auto’: the sample density will be computed directly from the data set; note that this can be quite slow for large data sets, so it is preferable to specify this analytically if it is known; None: data are assumed to be uniformly sampled
reltol : float
relative error tolerance; errors on all returned probabilities p will satisfy either abs(p_est - p_true) <= reltol * p_est OR abs(p_est - p_true) <= abstol, where p_est is the returned estimate and p_true is the true value
abstol : float
absolute error tolerance; see above
leafsize : int
number of data points in each leaf of the KD tree
nosort : arraylike of bool, shape (N) | None
if specified, this keyword causes the KD tree not to be sorted along the dimensions for which nosort is True
thread_safe : bool
if True, bayesphot will make extra copies of internals as needed to ensure thread safety when the computation routines (logL, mpdf, mcmc, bestmatch, make_approx_phot, make_approx_phys, mpdf_approx) are used with multiprocessing; this incurs a minor performance penalty, and can be disabled by setting to False if the code will not be run with the multiprocessing module
caching : ‘aggressive’ | ‘lazy’ | ‘none’

strategy for caching subsets of the data with some dimensions marginalised out; behavior is as follows:

‘agressive’
on construction, store sorted data for fast calculation of 1D PDFs of variables by themselves, and 1D PDFs of all physical variables marginalised over all other physical variables; this significantly increases the memory footprint and construction time, but greatly speeds up subsequent evaluation of any of these quantities, and is generally the best choice for prudction work in parallel
‘lazy’
sorted data sets for fast computation are created and cached as needed; this will make the first computation of any marginal PDF slower, but speed up all subsequent ones, without imposing any extra time at initial construction; this mode is generally best for interactive work, but is not thread_safe = True; memory cost depends on how many different marginal PDF combinations are calculated, but is always less than aggressive
‘none’
no caching is performed automatically; the user may still manually cache data by calling the make_cache method
Returns
Nothing
Raises
IOError, if the bayesphot c library cannot be found
Notes

Because the data sets passed in may be large, this class does not make copies of any of its arguments, and instead modifies them in place. However, this means it is the responsibility of the user not to alter the any of the arguments once they are passed to this class; for example, dataset must not be modified after it is passed. Altering the arguments, except through this class’s methods, may cause incorrect results to be generated.

Caching with ‘aggressive’ or ‘lazy’ does create significant memory overhead.

__weakref__

list of weak references to the object (if defined)

bandwidth

The current bandwidth

bestmatch(phot, photerr=None, nmatch=1, bandwidth_units=False)[source]

Searches through the simulation library and returns the closest matches to an input set of photometry.

Parameters:
phot : arraylike, shape (nfilter) or (…, nfilter)
array giving the photometric values; for a multidimensional array, the operation is vectorized over the leading dimensions
photerr : arraylike, shape (nfilter) or (…, nfilter)
array giving photometric errors, which must have the same shape as phot; if this is not None, then distances will be measured in units of the photometric error if bandwidth_units is False, or in units of the bandwidth added in quadrature with the errors if it is True
nmatch : int
number of matches to return; returned matches will be ordered by distance from the input
bandwidth_units : bool
if False, distances are computed based on the logarithmic difference in luminosity; if True, they are measured in units of the bandwidth
Returns:
matches : array, shape (…, nmatch, nphys + nfilter)
best matches to the input photometry; shape in the leading dimensions will be the same as for phot, and if nmatch == 1 then that dimension will be omitted
dist : array, shape (…, nmatch)
distances between the matches and the input photometry
bestmatch_phys(phys, nmatch=1, bandwidth_units=False)[source]

Searches through the simulation library and returns the closest matches to an input set of photometry.

Parameters:
phot : arraylike, shape (nphys) or (…, nphys)
array giving the physical values; for a multidimensional array, the operation is vectorized over the leading dimensions
nmatch : int
number of matches to return; returned matches will be ordered by distance from the input
bandwidth_units : bool
if False, distances are computed based on the logarithmic difference in physical properties; if True, they are measured in units of the bandwidth
Returns:
matches : array, shape (…, nmatch, nphys + nfilter)
best matches to the input properties; shape in the leading dimensions will be the same as for phot, and if nmatch == 1 then that dimension will be omitted
dist : array, shape (…, nmatch)
distances between the matches and the input physical properties
clear_cache(margindims=None)[source]

This method deletes from the cache

Parameters
margindims : listlike of integers
list of marginalised dimensions that should be removed from the cache, in the same format as make_cache; if left as None, the cache is completely emptied
Returns
Nothing
draw_phot(physprop, physidx=None, photerr=None, nsample=1)[source]

Returns a randomly-drawn sample of photometric properties for one or more input sets of physical properties.

Parameters:
physprop : arraylike
physical properties to be used; the final dimension of the input must have len(physidx) indices, or nphys indicates if physidx is None; if the input is a multidimensional array, the operation is vectorized over the leading dimensions physical properties
physidx : arraylike
indices of the physical quantities being constrained; if left as None, all physical properties are set, and physprop must have a trailing dimension of size equal to nphys; otherwise this must be an arraylike of <= nphys positive integers, each unique and in the range [0, nphys), specying which physical dimensions are constrained
photerr : arraylike, shape (nphot)
photometric errors to apply to the output photometry; these are added in quadrature with the kernel density estimation bandwidth
nsample : int
number of random samples to draw for each set of physical properties; must be positive
Returns:
samples : arraylike
sample of photometric properties; the shape of the output is (…, nsample, nphot), where nphot is the number of photometric properties, and the leading dimension(s) match the leading dimension(s) of physprop
draw_sample(photerr=None, nsample=1)[source]

Returns a randomly-drawn sample of physical and photometric properties from the kernel density function

Parameters:
nsample : int
number of random samples to draw for each set of physical properties; must be positive
photerr : arraylike, shape (nphot)
photometric errors to apply to the output photometry; these are added in quadrature with the kernel density estimation bandwidth
Returns:
samples : array, shape (nsample, nphys+nphot)
random sample drawn from the kernel density object; for the final dimension in the output, the first nphys elements are the physical quantities, the next nphot are the photometric quantities
logL(physprop, photprop, photerr=None, margindim=None)[source]

This function returns the natural log of the likelihood function evaluated at a particular log mass, log age, extinction, and set of log luminosities

Parameters
physprop : arraylike, shape (nphys) or (…, nphys)
array giving values of the physical properties; for a multidimensional array, the operation is vectorized over the leading dimensions
photprop : arraylike, shape (nfilter) or (…, nfilter)
array giving the photometric values; for a multidimensional array, the operation is vectorized over the leading dimensions
photerr : arraylike, shape (nfilter) or (…, nfilter)
array giving photometric errors; for a multidimensional array, the operation is vectorized over the leading dimensions
margindim : int | arraylike of ints | None
The index or indices of the physical or photometric properties to be maginalized over, numbered from 0 - nphys-1 for physical properties and from nphys - nfilter + nfilter - 1 for photometric properties. If this keyword is set, then physprop and/or photprop should have fewer than nphys or nphot elements due to the omission of marginalised dimensions. If all physical or photometric dimensions are marginalised out, that corresponding argument for physprop or photprop should be set to None
Returns
logL : float or arraylike
natural log of the likelihood function
make_approx_phot(phys, squeeze=True, filter_ignore=None)[source]

Returns an object that can be used for a fast approximation of the PDF of photometric properties that corresponds to a set of physical properties. The PDF produced by summing over the points returned is guaranteed to account for at least 1-reltol of the marginal photometric probability, and to represent the shape of the PDF in photometric space within a local accuracy of reltol as well.

Parameters:
phys : arraylike, shape (nphys) or (N, nphys)
the set or sets of physical properties for which the approximation is to be generated
squeeze : bool
if True, the representation returned will be squeezed to minimize the number of points included, using reltol as the error tolerance
filter_ignore : None or listlike of bool
if None, the kernel density representation returned covers all filters; otherwise this must be a listlike of bool, one entry per filter, with a value of False indicating that filter should be excluded from the values returned; suppressing filters can allow for more efficient representations
Returns:
x : array, shape (M, nphot), or a list of such arrays
an array containing the list of points to be used for the approximation, where nphot is the number of photometric filters being returned
wgts : array, shape (M), or a list of such arrays
an array containing the weights of the points
Notes:
if the requested relative tolerance cannot be reached for numerical reasons (usually because the input point is too far from the library to allow accurate computation), x and wgts will be return as None, and a warning will be issued
make_approx_phys(phot, photerr=None, squeeze=True, phys_ignore=None, tol=None)[source]

Returns an object that can be used for a fast approximation of the PDF of physical properties that corresponds to a set of photometric properties. The PDF produced by summing over the points returned is guaranteed to account for at least 1-reltol of the marginal photometric probability, and to represent the shape of the PDF in photometric space within a local accuracy of reltol as well.

Parameters:
phot : arraylike, shape (nfilter) or (N, nfilter)
the set or sets of photometric properties for which the approximation is to be generated
photerr : arraylike, shape (nfilter) or (N, nfilter)
array giving photometric errors; the number of elements in the output lists will be the size that results from broadcasting together the leading dimensions of phot and photerr
squeeze : bool
if True, the representation returned will be squeezed to minimize the number of points included, using reltol as the error tolerance
phys_ignore : None or listlike of bool
if None, the kernel density representation returned covers all physical properties; otherwise this must be a listlike of bool, one entry per physical dimension, with a value of False indicating that dimension should be excluded from the values returned; suppressing dimensions can allow for more efficient representations
tol : float
if set, this tolerance overrides the value of reltol
Returns:
x : array, shape (M, nphys), or a list of such arrays
an array containing the list of points to be used for the approximation, where nphys is the number of physical dimensions being returned
wgts : array, shape (M), or a list of such arrays
an array containing the weights of the points
Notes:
if the requested relative tolerance cannot be reached for numerical reasons (usually because the input point is too far from the library to allow accurate computation), x and wgts will be return as None, and a warning will be issued
make_cache(margindims)[source]

This method builds a cache to do faster calculation of PDFs where certain dimensions are marginalised out. If such caches exist, they are used automatically by all the computation methods.

Parameters
margindims : listlike of integers
list of dimensions to be marginalised out; physical dimensions go from 0 - nphys-1, photometric dimensions from nphys to nphys + nphot - 1
Returns
Nothing
mcmc(photprop, photerr=None, mc_walkers=100, mc_steps=500, mc_burn_in=50)[source]

This function returns a sample of MCMC walkers sampling the physical parameters at a specified set of photometric values.

Parameters:
photprop : arraylike, shape (nfilter) or (…, nfilter)
array giving the photometric values; for a multidimensional array, the operation is vectorized over the leading dimensions
photerr : arraylike, shape (nfilter) or (…, nfilter)
array giving photometric errors; for a multidimensional array, the operation is vectorized over the leading dimensions
mc_walkers : int
number of walkers to use in the MCMC
mc_steps : int
number of steps in the MCMC
mc_burn_in : int
number of steps to consider “burn-in” and discard
Returns
samples : array
array of sample points returned by the MCMC
mpdf(idx, photprop, photerr=None, ngrid=128, qmin=None, qmax=None, grid=None, norm=True)[source]

Returns the marginal probability for one or mode physical quantities for one or more input sets of photometric properties. Output quantities are computed on a grid of values, in the same style as meshgrid.

Parameters:
idx : int or listlike containing ints
index of the physical quantity whose PDF is to be computed; if this is an iterable, the joint distribution of the indicated quantities is returned
photprop : arraylike, shape (nfilter) or (…, nfilter)
array giving the photometric values; for a multidimensional array, the operation is vectorized over the leading dimensions
photerr : arraylike, shape (nfilter) or (…, nfilter)
array giving photometric errors; for a multidimensional array, the operation is vectorized over the leading dimensions
ngrid : int or listlike containing ints
number of points in each dimension of the output grid; if this is an iterable, it must have the same number of elements as idx
qmin : float or listlike
minimum value in the output grid in each quantity; if left as None, defaults to the minimum value in the library; if this is an iterable, it must contain the same number of elements as idx
qmax : float or listlike
maximum value in the output grid in each quantity; if left as None, defaults to the maximum value in the library; if this is an iterable, it must contain the same number of elements as idx
grid : listlike of arrays
set of values defining the grid on which the PDF is to be evaluated, in the same format used by meshgrid
norm : bool
if True, returned pdf’s will be normalized to integrate to 1
Returns:
grid_out : array
array of values at which the PDF is evaluated; contents are the same as returned by meshgrid
pdf : array
array of marginal posterior probabilities at each point of the output grid, for each input set of photometry; the leading dimensions match the leading dimensions produced by broadcasting the leading dimensions of photprop and photerr together, while the trailing dimensions match the dimensions of the output grid
mpdf_approx(x, wgts, dims='phys', dims_return=None, ngrid=64, qmin='all', qmax='all', grid=None, norm=True)[source]

Returns the marginal posterior PDF computed from a kernel density approximation returned by make_approx_phys or make_approx_phot. Outputs are computed on a grid of values, in the same style as meshgrid.

Parameters:
x : array, shape (M, ndim), or a list of such arrays
array of points retured by make_approx_phot or make_approx_phys
wgts : array, shape (M) or a list of such arrays
array of weights returned by make_approx_phot or make_approx_phys
dims : ‘phys’ | ‘phot’ | arraylike of ints
dimensions covered by x and wgts; the strings ‘phys’ or ‘phot’ indicate that they cover all physical or photometric dimensions, and correspond to the defaults returned by make_approx_phys and make_approx_phot, respectively; if dims is an array of ints, these specify the dimensions covered by x and wgts, where the physical dimensions are numbered 0, 1, … nphys-1, and the photometric ones are nphys, nphys+1, … nphys+nphot-1
dims_return : None or arraylike of ints
if None, the output PDF has the same dimensions as specified in dms; if not, then dimreturn must be a subset of dim, and a marginal PDF in certain dimensions will be generated
ngrid : int or listlike containing ints
number of points in each dimension of the output grid; if this is an iterable, it must have the same number of elements as idx
qmin : float | listlike | ‘zoom’ | ‘all’
minimum value in the output grid in each quantity; if this a float, it is applied to each dimension; if it is an iterable, it must contain the same number of elements as the number of dimensions being returned, as gives the minimum in each dimension; if it is ‘zoom’ or ‘all’, the minimum is chosen automatically, with ‘zoom’ focusing on a region encompassing the probability maximum, and ‘all’ encompassing all the points in the representation
qmax : float | listlike | ‘zoom’ | ‘all’
same as qmin, but for the maximum of the output grid
grid : listlike of arrays
set of values defining the grid on which the PDF is to be evaluated, in the same format used by meshgrid
norm : bool
if True, returned pdf’s will be normalized to integrate to 1
Returns:
grid_out : array
array of values at which the PDF is evaluated; contents are the same as returned by meshgrid
pdf : array
array of marginal posterior probabilities at each point of the output grid, for each input cluster; the leading dimensions match the leading dimensions produced by broadcasting the leading dimensions of photprop and photerr together, while the trailing dimensions match the dimensions of the output grid
mpdf_gen(fixeddim, fixedprop, margindim, ngrid=128, qmin=None, qmax=None, grid=None, norm=True)[source]

Returns the marginal probability for one or more physical or photometric properties, keeping other properties fixed and marginalizing over other quantities. This is the most general marginal PDF routine provided.

Parameters:
fixeddim : int | arraylike of ints | None
The index or indices of the physical or photometric properties to be held fixed; physical properties are numbered 0 … nphys-1, and phtometric ones are numbered nphys … nphys + nphot - 1. This can also be set to None, in which case no properties are held fixed.
fixedprop : array | None
The values of the properties being held fixed; the size of the final dimension must be equal to the number of elements in fixeddim, and if fixeddim is None, this must be too
margindim : int | arraylike of ints | None
The index or indices of the physical or photometric properties to be maginalized over, numbered in the same way as with fixeddim; if set to None, no marginalization is performed
ngrid : int or listlike containing ints
number of points in each dimension of the output grid; if this is an iterable, it must have nphys + nphot - len(fixeddim) - len(margindim) elements
qmin : float | arraylike
minimum value in the output grid in each quantity; if left as None, defaults to the minimum value in the library; if this is an iterable, it must contain a number of elements equal to nphys + nphot - len(fixeddim) - len(margindim)
qmax : float | arraylike
maximum value in the output grid in each quantity; if left as None, defaults to the maximum value in the library; if this is an iterable, it must have the same number of elements as qmin
grid : listlike of arrays
set of values defining the grid on which the PDF is to be evaluated, in the same format used by meshgrid
norm : bool
if True, returned pdf’s will be normalized to integrate to 1
Returns:
grid_out : array
array of values at which the PDF is evaluated; contents are the same as returned by meshgrid
pdf : array
array of marginal posterior probabilities at each point of the output grid, for each input set of properties; the leading dimensions match the leading dimensions produced by broadcasting the leading dimensions of photprop and photerr together, while the trailing dimensions match the dimensions of the output grid
mpdf_phot(idx, physprop, ngrid=128, qmin=None, qmax=None, grid=None, norm=True)[source]

Returns the marginal probability for one or mode photometric quantities corresponding to an input set of physical properties. Output quantities are computed on a grid of values, in the same style as meshgrid.

Parameters:
idx : int or listlike containing ints
index of the photometric quantity whose PDF is to be computed, starting at 0; if this is an iterable, the joint distribution of the indicated quantities is returned
physprop : arraylike, shape (nphys) or (…, nphys)
physical properties to be used; if this is an array of nphys elements, these give the physical properties; if it is a multidimensional array, the operation is vectorized over the leading dimensions physical properties – the function must take an array of (nphys) elements as an input, and return a floating point value representing the PDF evaluated at that set of physical properties as an output
ngrid : int or listlike containing ints
number of points in each dimension of the output grid; if this is an iterable, it must have the same number of elements as idx
qmin : float or listlike
minimum value in the output grid in each quantity; if left as None, defaults to the minimum value in the library; if this is an iterable, it must contain the same number of elements as idx
qmax : float or listlike
maximum value in the output grid in each quantity; if left as None, defaults to the maximum value in the library; if this is an iterable, it must contain the same number of elements as idx
grid : listlike of arrays
set of values defining the grid on which the PDF is to be evaluated, in the same format used by meshgrid
norm : bool
if True, returned pdf’s will be normalized to integrate to 1
Returns:
grid_out : array
array of values at which the PDF is evaluated; contents are the same as returned by meshgrid
pdf : array
array of marginal posterior probabilities at each point of the output grid, for each input set of properties; the leading dimensions match the leading dimensions produced by broadcasting the leading dimensions of photprop and photerr together, while the trailing dimensions match the dimensions of the output grid
ndim

This is the number of physical plus photometric properties for the bayesphot object.

nphot

This is the number of photometric properties for the bayesphot object.

nphys

This is the number of physical properties for the bayesphot object.

pobs

The current set of observation probabilities for every simulation in the library; data returned are in the same order as the data originally used to construct the library

priors

The current set of prior probabilities for every simulation in the library; data returned are in the same order as the data originally used to construct the library

sample_density

The density with which the library was sampled, evaluated for each simulation in the library

squeeze_rep(x, wgts, dims=None)[source]

Takes an input array of positions and weights that form a kernel density representation and approximates them using fewer points, using an error tolerance of reltol

Parameters:
x : array, shape (N, ndim)
an array of points forming a kernel density representation; on exit, x will be resized to (M, ndim) with M <= N
wgts : array, shape (N)
an array of weights for the kernel density representation; on exit, wgts will be resized to (M), with M <= N
dims : array, shape (ndim)
array specifying which dimensions in the kernel density representation the coordinates in x correspond to; if left as None, they are assumed to correspond to the first ndim dimensions in the data set
Returns:
Nothing