Rhys Taylor's home page
Science, Art and Data Visualisation

Visual Source Extraction

Why it's okay to look at data

The human eye is capable of recognising a source in a fraction of a second. In the blink of an eye we can tell if a dark forest is empty or home to a hungry tiger – if we couldn't do this, our ancestors wouldn't have survived. True, many of them didn't. But enough did. Our visual capabilities are both extremely accurate and extremely fast, even by the standards of modern computers.


Pretty much everyone accepts the need to look at data at some level. It simply isn't possible to objectively quantify every conceivable sort of source geometry; if you want to spot something weird, by far the simplest option is to just look at the damn data. This can influence everything from the structure of individual galaxies to the counter-intuitive structures visible (or not visible) in the noise.


I don't want to give the impression that there's some pervading anti-visual bias in the astronomical community. There isn't, and there are times when the data set is so large that visual techniques just aren't possible. What I do want to stress though is that great care needs to be taken when we have to proceed purely algorithmically, and that there's nothing inherently better or worse than either technique. See the data structures page to understand the basics of the data I describe here.



Visually cataloguing data

Despite human visual prowess, merely glancing at the data isn't enough. For AGES our strategy is to inspect the data using a series of different approaches, progressively becoming more and more sensitive :


  1. Look at the data as a 3D volume, masking sources wherever they're clearly visible. This 3D view can only really show the brightest sources, but it's the easiest way to see the whole source at once. It rapidly gives an intuitive sense of the structure of the data and it's easy to create masks around each source. (Masking simply means creating a virtual object that blocks the view of part of the data)
  2. Using this masked cube, go through the cube slice-by-slice in the projection which shows Right Ascension on the x-axis and velocity on the y-axis. In this view most of our galaxies look like short lines, which are easy to distinguish from the pseudo-random noise in the background. Showing individual slices like this gives significantly higher sensitivity than the 3D view, but defining 3D masks in a 2D view is necessarily slower - we have to keep panning back and forth through the slices to check each mask is the right size. Ah, but since we've already masked the brightest galaxies, this step is minimised.
  3. Just for luck we also go through each channel (that is, with the x-axis as Right Ascension and the y-axis as Declination, with each image corresponding to a slightly different velocity). This is even more tedious than step two, since in this projection the galaxies only show up as brighter patches of noise. Their only distinguishing feature is that they persist across multiple images. But since we've already done steps one and two, and use the masked cube here, this isn't nearly so bad as it might have been.
  4. We now give the data to an unfortunate victim someone else and get them to do the whole thing again. However we may or may not give them the fully masked cube to make their life easier, depending on whether we want greater completeness or reliability (something multiple people find independently is much more likely to be real, and less subject to the transient whims of one bored astronomer).


This isn't always as linear as it sounds. In regions contaminated by interference there are often many structures that look for all the world like galaxies, and even elsewhere it can be damn hard to distinguish between a faint ripple in the noise and a genuine signal. So we often flick back and forth between different viewing techniques, and then we might jointly decide as a team if something is worth keeping or not. We also inspect the spectra as well as using other, independent visualisation techniques. 


A related point is that we can also compare the optical images of the regions where we think we've detected something. I emphasise "can" because whether this is a good idea or not depends on what you want to do ! If you're interested in dark galaxies this is often a bad idea, since you risk biasing your sample in favour of detections which have a corresponding galaxy. But if you find a faint signal in a region you can see is badly affected by interference, this can be a valuable way to save yourself a lot of hassle : as a rule, objects without optical counterparts tend to be far more likely to be spurious than real. This extra check needs to be used carefully, and there isn't a foolproof guide as to how to do this. The only real way to learn source extraction is by experience – though it isn't difficult, and can be easily grasped in a few days. In fact I'm pretty sure a monkey could do it, if we had access to a monkey... stupid, narrow-minded funding agencies... grrr !


Likewise, we can also supplement the data with known catalogues. For example if we know a galaxy is present in our survey volume, we can just extract the spectrum at that point and see what's there. Knowing the coordinates and redshift of a galaxy make even a faint signal detected at that location a lot more convincing. Again, this method is perfectly legitimate in the right circumstances, but it needs to be used judiciously.


Finally, the above procedures were developed in conjunction with the dedicated visualisation and cataloguing package FRELLED. Prior to that we used kvis, which has no cataloguing features (coordinates need to be recorded by the moral obscenity of manually typing them !) and no 3D display. In those days we used to go through each projection of the cube one by one. Masking sources was possible but incredibly tedious, leaving us with a frankly stupid choice : don't mask and hope we didn't record the same sources twice, or do the masking and risk dying of boredom. And we'd have all of this done by at least two or three different people. Fortunately with FRELLED all that has passed into oblivion. Even so, we normally use an automatic extractor as a fifth step in the process.