Big Data for Big Questions

Astronomers with big questions like ‘How did we get here’, ‘What was there at the beginning’ and ‘What is out fate’. So it feels inevitable that answering these should be hard, and we should approach the question with caution. After all, we cannot simply believe in our answers, instead we need to agree with evidence-based conclusions drawn from data. As our questions delve us deeper and deeper into these mysteries, we need more and more data. And therein lies the biggest problem facing us today-  how do we deal with such Big Data.

The Guardian newspaper recently published an article on this.

Astronomical data is and has always been big data. Once that was only true metaphorically, now it is true in all senses. We acquire it far more rapidly than the rate at which we can process, analyse and exploit it. This means we are creating a vast global repository that may already hold answers to some of the fundamental questions of the Universe we are seeking.

Does this mean we should cancel our up-coming missions and telescopes – after all why continue to order food when the table is replete? Of course not. What it means is that, while we continue our inevitable yet budget limited advancement into the future, so we must also simultaneously do justice to the data we have already acquired.

Citizen science is one solution. Sites like Galaxyzoo and other projects on simultaneously engage the public and perform a vital scientific role.

But the near future presents a new set of problems..

Thus far, human ingenuity, and current technology have ensured that data storage capabilities have kept pace with the massive output of the electronic stargazers. The real struggle is now figuring out how to search and synthesize that output.20150420-CompletedTMA

The DKI solar telescope in Hawai will produce 15-20Tbyte of data per day, starting 2017. We need to be able to visualiize that, make it science-ready, and then transport it across the internet. As such we are looking at new ways of data mining, machine learning and database systems to help us understand out nearest and star.

It seems that the original science of data, astronomy, has a lot to learn from the new kid on the block, data science. Think about it. What if, as we strive to acquire and process more photons from across the farther reaches of the universe, from ever more exotic sources with even more complex instrumentation, that somewhere in a dusty server on Earth, the answers are already here, if we would just only pick up that dataset and look at it … possibly for the first time.