Chris Broekema (ASTRON)
Context and introduction
Data-intensive natural sciences collect vast amounts of data and process these using general purpose computing and dedicated software. The result is a scientifically useful data product. Such areas of science, for instance radio astronomy, require large amounts of computational resources to produce science results. While the energy cost of these will not be at the same scale as a generic cloud-provider data center, it is still clear that in the current climate crisis we cannot continue to consume resources without limit.
Goals and tools in this project
In this project we aim to estimate and visualise both the energy- as well as computational efficiency of some of the flagship codes used in the LOFAR telescope. This is done by taking the open source codes, available from the gitlab hosted by ASTRON (https://git.astron.nl/RD/DP3, https://gitlab.com/aroffringa/wsclean) and analyzing these using both classic and AI-enhanced analysis tools. This, combined with a representative data set and combination of input parameters, will result in an estimated efficiency and runtime (on a virtual piece of hardware).
Using tools developed both by ASTRON in collaboration with the Netherlands eScience Center (https://github.com/NLeSC/PowerSensor, https://github.com/nlesc-recruit/PowerSensor3, https://git.astron.nl/RD/pmt) as well as open source tools developed elsewhere (https://github.com/eas4dc/EAR), we compare the estimated impact of the code, with a measured ground truth.
Deliverables
We expect this to be a MSc thesis level project. A successful project will result in
- Technical analysis of the selected code repository with detailed bottleneck discussions and suggestions for solution directions
- A test run of the selected code, with a representative data set and combination of configuration parameters, while measuring the energy consumed via the tools mentioned above
- An analysis of the difference between measured and estimated energy consumed by the code
- MSc thesis