Automatic Source Extraction

Rigorously defining what's definitely a real source and what definitely isn't is impossible. The only way to truly verify something is to go back and do another observation to see if the signal is recovered by a repeated observation. Even so, we can still quantify things enough to be useful for algorithmic approaches.

With AGES our automatic techniques mainly rely on GLADoS. While we mainly use it as an additional, supplementary technique to increase completeness, it can also be used independently. As with other source-finding algorithms its reliability never approaches 100% except for the brightest sources, so again this by no means eliminates the need for human inspection. Things must be set up quite carefully for this to work well, despite a lot of effort to maximise reliability under different conditions.

GLADoS works by taking advantage of the multiple polarisations in which we observe the data. The HI signal itself isn't polarised, so a real signal should appear in both polarisations. Noise, however, is uncorrelated, so if a spike is present in one polarisation but not another, it's likely spurious. This doesn't help eliminate bright artificial signals (like airport radar and overhead satellites), but it does reduce the random variations in the noise that look like galaxies.

GLADoS operates as follows :

The user provides the main input cube which is the average of both polarisations, plus the individual polarisation cubes as well. They set the parameters which define what constitutes a source : its S/N and velocity width, and some other less important variables.
GLADoS searches each spectra in the main cube. If it finds any pixel is sufficiently bright, it makes a crude estimate of the velocity width. If both the brightness and width are sufficient, if accepts this as a provisional source.
GLADoS now extracts the spectra of the provisional source from the individual polarisations, which have been typically smoothed to increase their sensitivity slightly. It then runs the same checks on these two spectra (though the numerical values of the acceptance criteria may be different). If a source meets the criteria in all three cubes, it is accepted for later inspection.
When steps 1-3 are complete, GLADoS now has a list of all pixels in the cube classified as potential sources. Using the Starlink STILTS algorithms (nothing to do with Elon Musk), it now combines the detections and finds the brightest points within each contiguous group of pixels. It then rejects everything else, so the user only has to check the brightest part of each potential detection (typically real galaxies span a few dozen pixels each).
Now begins the user-review phase. For each detection, GLADoS presents the user with the spectra in all cubes (average and both polarisations), an optical map from the SDSS, and images of part of the data cube at that point from different projections. It gives them the option to accept, reject, or mark the source as "maybe" for future, even more detailed investigations. It also produces a detection map in case sources in step 4 might have been incorrectly combined (over-merging).

The actual operation of GLADoS takes a few minutes to a few hours, depending on the data set. Although it sounds complicated, its efficacy has been demonstrated to be high. One of the key advantages is that we operate on cubes where the brightness has been converted from absolute flux to signal to noise, which means only minor adjustments to the search parameters are needed from cube to cube. Since it operates only on source brightness and extent, only the most minimal of assumptions are made about what constitutes a source.