Rhys Taylor's home page
Science, Art and Data Visualisation


I suspect it's a rite of passage for HI astronomers to write their own source-finding algorithm. Here's mine, which you can download here. It's just a couple of Python scripts inside a zip file. Installation instructions for the required STARJAVA/STILTS packages are included. An older IDL version, which doesn't have a GUI or most of the other nicer features, is also available here.

GLADoS is currently unsupported. Feel free to download it and try it out, but at present I myself don't have it working on my system, so I won't be able to help if anything goes wrong !

The idea for the Galaxy Line Analysis for Detection Of Sources is actually quite simple. The 21 cm emission line of neutral atomic hydrogen is unpolarised, meaning the waves are oriented in random directions perpendicular to their line of travel (this page has a nice diagram). The radio receivers generally measure polarisations along two orthogonal angles. Being unpolarised, any genuine signal should have equal strength in each polarisation. 

Now the clever bit : the noise should be uncorrelated between the receivers. 

That means while real signals should have about equal strength in both polarisations, noise is likely to vary substantially. It's not guaranteed, especially for weak signals. But it's a good starting point, and an excellent way to eliminate single-channel spikes in the noise that can otherwise plague source extractors. Strong spikes are especially problematic because if they're strong in one polarisation, then even if they're completely absent in the other, then the averaged cube (which is what we normally look at because it's more sensitive) will still show a signal.

Idealised example. If a source is sufficiently strong in one polarisation (red) but completely absent in the other (blue), then it will still show up as an easily-detectable signal in the averaged data (black).

The process that GLADoS goes through to turn this from a principle into a working source-finding routine is described in a bit more detail here (and in this paper, section 2.2). It doesn't do anything fancy and for good reason : I found that Polyfind's approach of seeing if the candidate signals looked like a set of template spectra was horribly complicated and just didn't work.

In brief...

GLADoS begins by examining the more sensitive averaged cube. In each pixel it extracts a spectrum and looks for peaks above a S/N threshold, and if any are found, it then checks the individual polarisations for corresponding signals. It searches these over some user-specified range of velocities rather than only checking the exact same channels. It also applies a bit of smoothing to increase the sensitivity in the individual polarisations. 

The reason for this is that the most interesting sources tend to be the faintest ones, and I prioritise completeness over reliability.  Since real sources have finite width, when they're close to the noise level it pays to be a little bit generous with the matching criteria, otherwise faint sources could easily missed in the less sensitive individual polarisations. GLADoS also measures the approximate velocity width of the source, which is another criteria it takes into account for designating something a candidate, but this really is a very crude estimate indeed.

After this core part of the process GLADoS does a few more things. It uses the STILTS algorithms to combine candidate detections in close proximity to each other, reducing them down to a single peak S/N position in each clump. It also extracts sky and PV maps from the HI data cube as well as a full map of the pixels initially classed as detections before this merging process. All of this is returned to the user in the graphical interface shown below. So the user goes through only the spectra of the merged candidates (the brightest bits of each clump), but they still get to see the initial detection candidates as well. This helps show when the merging might have been over-zealous and combined clumps which are actually likely separate objects.

The user interface looks like this. First you set up everything in the main screen (sorry things are cut off by the tooltip !) :

And then a few minutes later you review things in another screen :

Early screenshot : the spectrum panel on the left now shows the individual polarisations as well as the averaged data.

The key part is the panel on the right. From the start I've accepted that this sort of thing is never going to be better than semi-automatic, having experienced the joys of a source extraction algorithm with 3% reliability first-hand. For this reason the review panel lets you group the sources, moving their catalogue data into accepted, rejected, or uncertain folders, as well as querying NED and the SDSS data at the corresponding position as supplementary data.

How well does this work ? Well, quite ! It's a simple, reasonably fast technique (even without parallelisation) that gets respectable levels of completeness and reliability even at low S/N levels. I personally wouldn't use this or any other algorithm as a substitute for the billions of years of evolution behind the human eye, but as a supplement it's worth doing. It nearly always picks up at least a few credible candidates I'm surprised I missed.

To be honest I haven't touched GLADoS since 2020. I was working on it properly until late 2019 but then I got "briefly" distracted and the pandemic struck. I don't think it would be a difficult matter to upgrade it to Python 3, but having spent most of 2023 finishing off FRELLED I'm in no mood to embark on another coding project of any kind. Still, expect more exact numbers if and when I get around to polishing it off.

Possible upgrades to GLADoS would be a curve-of-growth analysis for a more accurate velocity width estimate, and a way to search for signals that exceed the integrated S/N rather than simple peaks. But source finding algorithms always sound simple in theory while in practise they tend to spiral out of control, which is why I'm reluctant to start this one up again except to move to Python 3. Writing a source extractor is best described as one of those experiences which is character building, good for the soul... great to do once in a while, but you wouldn't want to do it all the time.