LOTSS DR2 - Observations & Processing

As shown in Fig. \ref{fig:DR2-region}, LoTSS-DR2 consists of 841 pointings and it covers a total of 5634 square degrees which corresponds approximately to our contiguous coverage at the time of beginning the LoTSS-DR2 processing run. The data release is formed by two contiguous regions that are centred at approximately 12h45m00s +44$^\circ$30$\arcmin$00$\arcsec$ (RA-13 region) and 1h00m00s +28$^\circ$00$\arcmin$00$\arcsec$ (RA-1 region) and span 4178 and 1457 square degrees, or 626 and 215 pointings, respectively. The data were taken between 2014-05-23 to 2020-02-05 as part of the LoTSS projects LC2\_038, LC3\_008, LC4\_034, LT5\_007, LC6\_015, LC7\_024, LC8\_022, LC9\_030, LT10\_010 and the co-observing projects LC8\_014, LC8\_030, DDT9\_001, LC9\_011, LC9\_012, LC9\_019, LC9\_020, COM10\_001, LC10\_001, LC10\_010, LC10\_014, LT10\_012, LC11\_013, LC11\_016, LC11\_019, LC11\_020, LC12\_014. All the data that were processed as part of this data release are stored in the LOFAR Long Term Archive (LTA{\footnote{\url{https://lta.lofar.eu/}}}) with approximately 62\% in Forschungszentrum J\"{u}lich\footnote{\url{http://www.fz-juelich.de}}, 32\% in SURF\footnote{\url{https://www.surf.nl/}} and the remaining 6\% in Pozna\'{n}\footnote{\url{http://www.man.poznan.pl/online/pl/}}. The vast majority of pointings were observed for a total of 8\,hrs with 48\,MHz (120-168\,MHz) of bandwidth which allows for two pointings to be observed simultaneously with current LOFAR capabilities.
However, primarily due to the co-observing program\footnote{The \url{https://www.lofar-surveys.org/co-observing.html}} through which we exploit the multi-beam capability of LOFAR and accumulate LoTSS data simultaneously with observations conducted for other projects, for 18 of the pointings in LoTSS-DR2 we have used data that has the same frequency coverage but a total integration time of $\sim$16\,hrs.
The overall observing time utilised for this data release is 3451\,hrs and the volume of archived data that was processed is 7.6\,PB. Thus the average data size for an 8\,hr pointing (two observed simultaneously) is 8.8\,TB but there is significant variation because data that have been recorded since 2018-09-11 are typically five times smaller than those before this date due to Dysco compression (\citealt{Offringa_2016}) being utilised by the radio observatory prior to ingesting data into the LTA in more recent observations.

To process the data they are first `staged' in the LTA; staging is the
procedure of copying data from tape to disk and is necessary to make
the large archived datasets available for transfer to a compute
cluster. The data are then processed with a direction independent (DI)
calibration pipeline that is executed on compute facilities at
Forschungszentrum J\"{u}lich and SURF (see \citealt{Mechev_2017}
and \citealt{Drabent_2019}). These compute clusters are connected to
the local LTA sites with sufficiently fast connections to mitigate the
difficulties that would be experienced if we were to download these
large datasets to external facilities. Unfortunately data transfer issues are not yet fully mitigated as we currently do not process data on a compute cluster local to the Pozna\'{n} archive and instead we copy these data (6\% of LoTSS-DR2) to Forschungszentrum J\"{u}lich or SURF for processing.

The DI calibration
pipeline\footnote{\url{https://github.com/lofar-astron/prefactor}}
used for this data processing follows the same procedure as that used
in LoTSS-PDR and LoTSS-DR1. This method is described in \cite{vanWeeren_2016}
and \cite{Williams_2016} and makes use of several software packages
including the Default Pre-Processing Pipeline (DP3;
\citealt{van_Diepen_2018}), LOFAR SolutionTool (LoSoTo;
\citealt{deGasperin_2019}) and AOFlagger (\citealt{Offringa_2012}).
The pipeline corrects for direction independent errors such as the
clock offsets between different stations, ionospheric Faraday
rotation, the offset between XX and YY phases and amplitude
calibration solutions (see \citealt{deGasperin_2019} for a detailed
description of these effects). The \cite{Scaife_2012} flux density scale is
used for the amplitude calibration and we use TGSS-ADR1 sky models\footnote{The TGSS-ADR1 catalogues have gaps in the region around 8h45m +31$^\circ$30$\arcmin$ and here we use the \cite{Scheers_2011} LOFAR Global Sky Model instead} of our
target fields for an initial phase calibration, although both the
amplitude and phase calibration are refined during subsequent
processing. For regular LoTSS processing we have set up the pipeline to reduce the data volume, typically by a factor of 64 by averaging both in time and frequency. This is because the archived LoTSS data typically have a frequency resolution of 16 channels per 0.195\,MHz subband and a time resolution of 1\,s to facilitate future studies with the international LOFAR stations as well as spectral and time dependent studies, but such high time and frequency resolution data is not required for 6$\arcsec$ imaging. During the DI calibration the data are therefore averaged to a frequency resolution of 2 channels per 0.195\,MHz subband and a time resolution of 8\,s.

Once the DI calibration pipeline is complete, the smaller, more averaged, output datasets can be downloaded to other compute clusters for further processing with a more computationally expensive direction dependent (DD) calibration and imaging pipeline\footnote{\url{https://github.com/mhardcastle/ddf-pipeline}}. The DD routine is an improvement upon that used in LoTSS-DR1 and again makes use of kMS (\citealt{Tasse_2014} and \citealt{Smirnov_2015}) for direction dependent calibration, and of DDFacet (\citealt{Tasse_2018}) to apply the direction dependent solutions during imaging. Compared to LoTSS-DR1, the most significant changes are the fidelity of faint diffuse emission and the increased dynamic range (see Sect. \ref{sec:emission_recovery} and \ref{sec:dynamic_range} respectively). The LoTSS-DR2 DD pipeline and its performance are described in detail in \cite{Tasse_2021}; however, for completeness we briefly summarise the procedure below.

We begin the processing with just a quarter of the DI calibrated channels (spaced across the frequency coverage) by creating a wide-field
($8.3^\circ \times 8.3^\circ$) image. Using the resulting sky model we
revise the direction independent calibration and tessellate the field
into 45 different directions. The recalibrated data are imaged to
update the sky model, and with the new model, calibration solutions are derived towards each of the 45 directions simultaneously. Then, we image the wide-field again but this time applying the phase corrections from the direction dependent calibration solutions which allows us to produce a further improved sky model.
Here we perform an initial refinement of the
flux density scale through the bootstrap procedure described by
\cite{Hardcastle_2016}, which was also used in the LoTSS-DR1
processing. The flux density scale is further refined during mosaicing but this initial refinement helps ensure emission is described by a power-law which aids the deconvolution.
Direction dependent calibration solutions are again
derived from the up-to-date sky model and this time both the amplitude
and phase are applied in the subsequent imaging step. Using these
solutions, together with the updated sky model, we predict the
apparent direction-independent view of the sky and perform a further
direction-independent calibration step using that model and a further
imaging step. All the data are then included for the first time and
direction-independent followed by direction-dependent calibration solutions
are derived using the latest sky model. The data are then imaged
again, and further direction-dependent calibration solutions are
derived from the resulting sky model before the final imaging steps
are conducted with the latest calibration solutions.

The final imaging
steps result in: (i) full-bandwidth high (6$\arcsec$) and low (20$\arcsec$)
resolution Stokes I images; (ii) three 16\,MHz bandwidth high (6$\arcsec$) resolution Stokes I images with central frequencies of 128, 144 and 160\,MHz; (iii) Stokes Q and U low (20$\arcsec$) and very
low (4$\arcmin$) resolution undeconvolved image cubes with a frequency
resolution of 97.6\,kHz; (iv) and a Stokes V full-bandwidth low
(20$\arcsec$) resolution undeconvolved image. Here only Stokes I products are deconvolved due to the deconvolution capabilities of DDFacet at the time of processing.
Once the data are processed, the final products are archived and an
automated quality assessment of the image is conducted to assess the
astrometry, flux density scale accuracy and noise level.

Some notable aspects of the DD pipeline processing include the improvement of
the astrometric accuracy of the final high resolution Stokes I images
by performing a facet-based astrometric alignment (as in LoTSS-DR1) with sources in the
the Pan-STARRS optical catalogue (\citealt{Flewelling_2020}) and applying
appropriate shifts when imaging (see \citealt{Shimwell_2019}).
To deconvolve thoroughly, throughout the processing we refine the masks used for deconvolution,
we also continuously propagate previously derived deconvolution components to subsequent
imaging steps to avoid having to fully deconvolve at each imaging
iteration, and we regularise the calibration solutions to effectively
reduce the number of free parameters that are applied when imaging.
Moreover, as characterised in Sect. 3.3 of \cite{Shimwell_2019} and detailed in \cite{Tasse_2018}, by using a facet-dependent point spread function we account for time-averaging and bandwidth-smearing effects (e.g. \citealt{Bridle_1999}) for deconvolved sources, this would otherwise be significant (a $\sim30\%$ reduction in peak brightness at a distance of 2.5$^\circ$ from the pointing centre) when imaging at 6$\arcsec$ with 2 channels per 0.195\,MHz subband and a time resolution of 8\,s. Finally, we note that the restoring beam used in DDFacet for each image product type is kept constant over the data release region and that all image products are made with a $uv$-minimum of 100\,m with the $uv$-maximum varied to provide images at different resolutions - the highest resolution 6$\arcsec$ images use baselines up to 120km (i.e. all LOFAR stations within the Netherlands).

The DD calibration has been primarily conducted on the LOFAR-UK compute
facilities\footnote{\url{https://lofar-uk.org/lucf.html}} hosted at
the University of Hertfordshire, but a small fraction of processing was also carried out
on the Italian LOFAR computing facilities\footnote{url{http://www.lofar.inaf.it/index.php/en/analisi-dati-en/computationa-data-analysis}} and compute clusters at Leiden University and the University of Hamburg. The DI and DD processing, as well as
the observational status and quality indicators are all kept track of
in central MySQL databases which are updated during the data processing. This allows us to easily coordinate automated processing across many different compute clusters with minimal user interaction.

The mosaicing and cataloguing follow the same procedure as used
for LoTSS-DR1 which is described in \cite{Shimwell_2019}. This implies a mosaic is produced for each pointing by reprojecting all neighbouring pointing images onto the same frame as the central pointing and averaging together the images using weights equal to the station beam attenuation combined with the image noises. Poorly calibrated facets, which are generally caused by severe ionospheric or dynamic range effects, are identified in each image as those with larger than 0.5$\arcsec$ astrometric errors (derived from cross matching with Pan-STARRS) and these regions are blanked in the individual pointing images prior to mosaicing. On average this results in 15$\pm$22\% of the pixels within 30\% of the primary beam power level being excluded for a given pointing. Unlike in LoTSS-DR1, we further refine the flux density scale of the images during the mosaicing procedure by applying the method that is described in Sect. \ref{sec:flux_scale}. Sources are detected on the mosaiced images using \textsc{PyBDSF} (\citealt{Mohan_2015}) with wavelet decomposition and a 5$\sigma_{LN}$ peak detection and 4$\sigma_{LN}$ threshold to define the boundaries of source islands, where $\sigma_{LN}$ is the local background noise. During source detection, \textsc{PyBDSF} characterises emission with Gaussian components which are automatically combined into distinct sources to create the source catalogue. This automated association of Gaussian components into final sources is limited because of various reasons such as the complexity and the extent of the source structures, the angular separation between components of the emission related to the same source, and the entanglement of emission from distinct objects. As described in Sect. \ref{sec:value_added_cats}, our attempts to refine the \textsc{PyBDSF} catalogues through source association/deblending, and cross-identification with optical/infrared (e.g. \citealt{Williams_2019} and \citealt{Kondapally_2021}) are ongoing.

The mosaic images, and catalogues derived from them, have significant overlap so when producing the final full-area catalogue we remove duplicate sources by only keeping those in a given mosaic if they are closest to the centre of that particular mosaic. Our final full-area catalogue consists of 4,396,228 radio sources made up of 5,121,366 Gaussian components. The overall sensitivity distribution is shown in Fig. \ref{fig:mosaic-noisemap} and some example maps from the data release are shown in Fig. \ref{fig:example-maps}.

Help

Intranet Tools