.. _outlier-detection-imaging: Outlier Detection Algorithm =========================== This module serves as the interface for applying ``outlier_detection`` to direct image observations. The code implements the basic outlier detection algorithm used with JWST data, but adapted to Roman. Specifically, this routine performs the following operations: #. Extract parameter settings from input model and merge them with any user-provided values. * See :ref:`outlier detection arguments ` for the full list of parameters. #. By default, resample all input images. * The resampling step starts by computing an output WCS that is large enough to encompass all the input images. * All 18 detectors from the *same exposure* will get resampled onto this output WCS to create a mosaic of all the chips for that exposure. This product is referred to as a "grouped mosaic" since it groups all the chips from the same exposure into a single image. * Each dither position will result in a separate grouped mosaic, so only a single exposure ever contributes to each pixel in these mosaics. * The ``fillval`` parameter specifies what value to use in the output resampled image for any pixel which has no valid contribution from any input exposure. * The resampling can be controlled with the ``pixfrac``, ``kernel`` and ``weight_type`` parameters. * The ``pixfrac`` indicates the fraction by which input pixels are "shrunk" before being drizzled onto the output image grid, given as a real number between 0 and 1. This specifies the size of the footprint, or "dropsize", of a pixel in units of the input pixel size. * The ``kernel`` specifies the form of the kernel function used to distribute flux onto the separate output images. * The ``weight_type`` indicates the type of weighting image to apply with the bad pixel mask. Available options are ``ivm`` (default) for computing and using an inverse-variance map and ``exptime`` for weighting by the exposure time. * The ``good_bits`` parameter specifies what DQ values from the input exposure should be used when resampling to create the output mosaic. Any pixel with a DQ value not included in this value (or list of values) will be ignored when resampling. * Resampled images will be written out to disk with suffix `_outlier_coadd` by default. * **If resampling is turned off** through the use of the ``resample_data`` parameter, a copy of the unrectified input images (as a ModelLibrary) will be used for subsequent processing. #. Create a median image from all grouped observation mosaics. * The median image is created by combining all grouped mosaic images or non-resampled input data pixel-by-pixel. * The ``maskpt`` parameter sets the percentage of the weight image values to use, and any pixel with a weight below this value gets flagged as "bad" and ignored when resampled. #. By default, the median image is blotted back (inverse of resampling) to match each original input image. * **If resampling is turned off**, the median image is compared directly to each input image. #. Perform statistical comparison between blotted image and original image to identify outliers. * This comparison uses the original input images, the blotted median image, and the derivative of the blotted image to create a cosmic ray mask for each input image. * The derivative of the blotted image gets created using the blotted median image to compute the absolute value of the difference between each pixel and its four surrounding neighbors with the largest value being the recorded derivative. * These derivative images are used to flag cosmic rays and other blemishes, such as moving object trails. Where the difference is larger than can be explained by noise statistics, the flattening effect of taking the median, or an error in the shift (the latter two effects are estimated using the image derivative), the suspect pixel is masked. * The ``backg`` parameter specifies a user-provided value to be used as the background estimate. This gets added to the background-subtracted blotted image to attempt to match the original background levels of the original input mosaic so that cosmic-rays (bad pixels) from the input mosaic can be identified more easily as outliers compared to the blotted mosaic. * Cosmic rays are flagged using the following rule: .. math:: | image\_input - image\_blotted | > scale*image\_deriv + SNR*noise * The ``scale`` is defined as the multiplicative factor applied to the derivative which is used to determine if the difference between the data image and the blotted image is large enough to require masking. * The ``noise`` is calculated using a combination of the detector read noise and the poisson noise of the blotted median image plus the sky background. * The user must specify two cut-off signal-to-noise values using the ``snr`` parameter for determining whether a pixel should be masked: the first for detecting the primary cosmic ray, and the second for masking lower-level bad pixels adjacent to those found in the first pass. Since cosmic rays often extend across several pixels, the adjacent pixels make use of a slightly lower SNR threshold. #. Update input data model DQ arrays with mask of detected outliers. Memory Model for Outlier Detection Algorithm --------------------------------------------- The outlier detection algorithm can end up using massive amounts of memory depending on the number of inputs, the size of each input, and the size of the final output product. Specifically, #. The input :py:class:`~romancal.datamodels.ModelLibrary` all input exposures would have been kept open in memory to make processing more efficient. #. The initial resample step creates an output product for EACH input that is the same size as the final output product, which for imaging modes can span all chips in the detector while also accounting for all dithers. For some Level 3 products, each resampled image can be on the order of 2Gb or more. #. The median combination step then needs to have all pixels at the same position on the sky in memory in order to perform the median computation. The simplest implementation for this step requires keeping all resampled outputs fully in memory at the same time. Many Level 3 products only include a modest number of input exposures that can be processed using less than 32Gb of memory at a time. However, there are a number of ways this memory limit can be exceeded. This has been addressed by implementing an overall memory model for the outlier detection that includes options to minimize the memory usage at the expense of file I/O. The control over this memory model happens with the use of the ``in_memory`` parameter. The full impact of this parameter during processing includes: #. The ``on_disk`` parameter gets set to `True` when opening the input :py:class:`~romancal.datamodels.library.ModelLibrary` object. This causes modified models to be written to temporary files. #. Computing the median image uses temporary files. Each resampled group is split into sections (1 per "row") and each section is appended to a different temporary file. After resampling all groups, each temporary file is read and a median is computed for all sections in that file (yielding a median for that section across all resampled groups). Finally, these median sections are combined into a final median image. These changes result in a minimum amount of memory usage during processing at the obvious expense of reading and writing the products from disk.