Data structure

Stile requires that data be in a format that can be indexed by a column name. That means you should be able to do

>>> ra = data['ra']

to get the right ascension of your data. There is a standard set of Stile column names:

  • dec, the declination of the object
  • ra, the RA of the object
  • x, the x coordinate of the object
  • y, the y coordinate of the object
  • g1, a shear component in the ra or x direction
  • g2, a shear component 45 degrees from the ra or x direction
  • sigma, a size parameter for objects with dimension [length] in arbitrary units
  • psf_g1, the g1 of the psf at the location of this object
  • psf_g2, the g2 of the psf at the location of this object
  • psf_sigma, the sigma of the psf at the location of this object
  • w, the weight to apply per object
  • z, the redshift of the object

g1 and g2 may be defined in either sky coordinates or in chip coordinates. Stile leaves it to the user to make sure that the appropriate coordinate system is chosen for the test the user wishes to do. But in general, if x coordinates appear in the data array, g1 and g2 should be measured relative to the x direction, and if ra and dec appear in the data array, g1 and g2 should be measured relative to the ra direction.

Of course, not every data array needs to include all of these columns!

For some tests, a dict would be okay. However, we usually assume that the data is in a contiguous array so we can mask it:

>>> masked_data = data[mask]

which does not work on a dict. The data types that can handle both masking and column calls via names are: the FITS catalogs of pyfits or astropy; NumPy formatted arrays; and NumPy record arrays. The difference between a formatted array and a recarray is that a recarray has an extra layer of Python code when calling columns by strings. That could cause some slowdown if you’re doing a large number of indexing calls. On the other hand, they’re a lot easier to make.

To make a recarray from vectors ra, dec, g1, g2, and w:

>>> numpy.rec.fromarrays([ra, dec, g1, g2, w], names=['ra', 'dec', 'g1', 'g2', 'w'])

Note that this will cause all columns to be in the same format–so if any of those are a string, all columns will appear as strings. You can get around this with more complicated data types, as explained in the documentation for numpy.core.records.fromarrays().

To make a formatted array from an existing array, we have a Stile helper function called stile.FormatArray. From a data array arr with ra in the 0th column, dec in the 1st, g1 in the 2nd, g2 in the 3rd, and w in the 4th, you can do either:

>>> stile.FormatArray(arr, fields=['ra', 'dec', 'g1', 'g2', 'w'])

or

>>> stile.FormatArray(arr, fields={'ra': 0, 'dec': 1, 'g1': 2, 'g2': 3, 'w': 4})

The dict form doesn’t need to give names to all the columns, but the list form does. The dict form can also have strings as the values if you’re rewriting column names from an existing formatted array/recarray/FITS catalog. If the array was not previously a formatted or structured array of some kind, all columns will be cast to the most complex column type; if the array was already a formatted or structured array, only the names of the columns will change.

Once you have this array, it can be reused for all tests on that data.