Using the binning module

Some systematics tests will benefit from being performed on subsamples of the data binned in some way. For example, PSF models may be more accurate in regions of high stellar density, so producing PSF characterization tests as a function of galactic latitude might be useful. Shape measurement may be more likely to fail for low signal-to-noise objects or objects which are small relative to the PSF, so checking shape measurement accuracy in signal-to-noise or size bins may reveal problems that cannot be seen when most of the signal comes from larger or more well-resolved objects. Some tests may also benefit from selection cuts, which you can also think of as a single broad bin with a defined lower edge and an infinite upper edge.

To make these tests easier, Stile contains some simple functions to bin your data. Two of them–stile.BinStep and stile.BinList–have simple, predefined ways of acting on your data set. The third, stile.BinFunction, uses user-defined functions to split your data.

Basic interface

To start binning your data, you create a Bin* object that will contain binning definitions. For instance, if you wanted to bin the ra column in 10 bins:

>>> bin_object = stile.BinStep(field='ra', n_bins=10, low=0, high=360)

To use it, call the object; it returns a list of objects which you can apply to your data to produce properly binned subsets.

>>> for single_bin in bin_object():
>>>     binned_data = single_bin(data)

binned_data is a subset of data with the same format.

The single_bin above is actually another class called a stile.binning.SingleBin. It knows its boundaries and it also contains a string you can use in program outputs.

>>> single_bin = bin_object()[0]
>>> print single_bin.low
0.0
>>> print single_bin.hi
36.0
>>> print single_bin.short_name
'0'

Types of binning schemes

The stile.BinList is the simplest class. To create it, call it with a list of bin edges and a field name (see the Data structure documentation for more information on field names).

>>> bin_list_object = stile.BinList(field='g1',
      bin_list=[-10, -1, -0.5, -0.3, -0.1, -0.05, 0, 0.05, 0.1, 0.3, 0.5, 1, 10])

stile.BinStep is also fairly simple. It generates bins that are equally spaced in linear or log space based on the provided arguments. It is created using at least three of the arguments low (the low edge of the lowest bin), high (the high edge of the highest bin), n_bins (the number of bins to create), and step (the step size for the bin). All four arguments may be passed, but will be checked for consistency if so.

>>> bin_step_object = stile.BinStep(field='g2', low=-2, high=2, step=0.1)
>>> bin_step_object = stile.BinStep(field='g2', low=-2, high=2, n_bins=40)

will create identical binning schemes.

Finally, stile.BinFunction is available for more complex binning schemes, especially those that rely on more than one field of data. To use it, you will need a function that either 1) accepts an entire data array (with fields defined as described in Data structure) and returns a vector of integers corresponding to the bin number for each row in the data array, or 2) accepts an entire data array plus an integer bin number and returns a Boolean mask. You will also need to specify the maximum expected number of bins, either as an argument passed to the constructor or as an attribute of the function. Then, you define the bin object as

>>> bin_function_object = stile.BinFunction(func, n_bins=n_bins)

if the function returns a vector of bin indices, or

>>> bin_function_object = stile.BinFunction(func, n_bins=n_bins, returns_bools=True)

if it returns Boolean masks. This object can be called like any other Bin* object to create a list of callable objects, and it will work with stile.ExpandBinList as well. However, the child objects it creates when you call it don’t have .low or .high attributes, so any automatic processing or looping that assumes these attributes exist (such as for naming files) will fail.

Combining binning schemes

Maybe you have two binning schemes you’d like to use at once: a binning in magnitude and a binning in galaxy weight 'w'. There is a function, stile.ExpandBinList, to automatically loop through all the possible pairs of those binning schemes.

Note

The interface for ExpandBinList() may be changing in the near future–see Stile issue 82.

stile.ExpandBinList returns a list of lists. The inner lists are all possible pairs (tuples) of the binning schemes passed to the function. So, for example, given the magnitude binning object magnitude_bin_object and the galaxy weight binning object weight_bin_object, the data would be binned like this:

>>> for bin_set in stile.ExpandBinList(magnitude_bin_object, weight_bin_object):
>>>     binned_data = data
>>>     for bin in bin_set:
>>>         binned_data = bin(binned_data)

stile.ExpandBinList can accept any number of bin objects as arguments (including none). In the lists it returns, the first object passed as an argument changes most slowly, followed by the second, etc (so the first item in the list it returns will be [magnitude_bin_object_0, weight_bin_object_0], the second will be [magnitude_bin_object_0, weight_bin_object_1], etc).