The experimental chatbot ChatGPT is having a moment.
Developed with artificial-intelligence techniques by the Silicon Valley research institute OpenAI and trained on a massive database of written text, the chatbot was released to the public as a free research preview last month — and quickly took the internet by storm. Users can ask it to answer questions, generate lesson plans, even write poetry and comedy sketches. No matter what prompt you throw at it, ChatGPT has an uncanny ability to generate fluid answers in simple, sturdy sentences that appear informed and knowledgeable.
Naturally, these capabilities have led some to speculate about how the technology could upend science education.
“Um… I just had like a 20 minute conversation with ChatGPT about the history of modern physics,” tweeted Peter Wang, a tech founder and self-described former physicist. “If I had this […] as a tutor during high school and college.... OMG. I think we can basically re-invent the concept of education at scale. College as we know it will cease to exist.”
So, we wondered: How does ChatGPT do answering astronomy questions? We figured the best way to find out was to ask the bot some of our readers’ questions submitted to the Ask Astro section of Astronomy.
To be blunt, it didn’t go so well. Based on about a dozen questions from recent issues that we put to ChatGPT (admittedly, a small sample), we can firmly recommend against relying on it as an astronomy tutor.
To be fair to ChatGPT, it handles many basic questions impressively. Ask it something like “What is an astronomical unit?” or “What is the life cycle of a star?” — something you might easily find in a textbook or a Wikipedia article — and it can generate a perfectly reasonable response.
But once you start asking questions whose answers are not so obvious and dig into the subtleties of astrophysics — the types of informed, discerning questions that our readers regularly send us — ChatGPT becomes prone to giving answers that seem plausible, but are very wrong. To make things worse, ChatGPT is also very bad at knowing when it is making things up, and it can’t cite any sources.
That’s not surprising. ChatGPT doesn’t perform any calculations or query any database of facts or statistics. It’s a language model, generating its responses word-by-word based on the patterns it has learned from its training dataset. And although the reinforcement learning (RL) process by which it learned to generate responses has granted it the ability to generate natural-sounding prose, as OpenAI writes, “during RL training, there’s currently no source of truth.” So it makes sense that ChatGPT does well with answers that can easily be found in the reams of text that it has processed (or are similar), but struggles to explain complicated astrophysics.
Trial and error
Here’s one example of a question which led ChatGPT astray.
Q: On average, how many stars are formed each year in the Milky Way?
ChatGPT: It is difficult to say exactly how many stars are formed in the Milky Way each year, as the rate of star formation can vary over time. However, it is generally thought that the Milky Way produces about one new star per year on average. This may seem like a small number, but keep in mind that the Milky Way is a vast galaxy with billions of stars, so even a small number of new stars added each year can add up quickly.
This answer is built around a kernel of fact, but makes a crucial error — confusing the number of stars with the amount of material that makes them up. As astrophysicist Annette Ferguson of the University of Edinburgh in Scotland wrote in our November 2022 issue, astronomers estimate that roughly 1 to 2 solar masses of stars form each year in our galaxy. But, she explains: “Of course, not all of this mass goes into a single star. Most stars that form are low-mass stars with masses smaller than that of our Sun — the most common stars in our galaxy are red dwarfs. So, on average, we expect that roughly six to seven new stars form in the Milky Way every year.”
ChatGPT also struggled with the intricacies of black hole research. This answer starts out okay (albeit tautologically), but goes downhill when it tries to sum up the current state of the art.
Q: What would happen if two supermassive black holes merged?
ChatGPT: When two supermassive black holes merge, the result is a single, larger black hole. The process of merging black holes is called a “black hole merger.”
During the merger, the black holes will release a tremendous amount of energy in the form of gravitational waves. These waves are ripples in the fabric of space-time that are created by the acceleration of massive objects. The waves produced by the merging of two supermassive black holes can be detected using highly sensitive instruments called gravitational wave detectors.
The merging of two supermassive black holes can also produce a bright burst of light called a quasar. Quasars are some of the brightest objects in the universe and are thought to be powered by the accretion of matter onto a supermassive black hole.
It’s worth noting that the merger of two supermassive black holes is a rare event, and it’s likely to take place over a timescale of millions or billions of years. However, the detection of gravitational waves from such a merger would be a major scientific discovery and could provide insight into the nature of black holes and the evolution of galaxies.
Contrary to ChatGPT’s confident assertion, current gravitational wave detectors cannot detect supermassive black hole (SMBH) mergers. All the detections of black hole mergers to date by facilities like the Laser Interferometer Gravitational-wave Observatory (LIGO) have been of stellar-mass black holes. Supermassive black holes at the centers of galaxies — with millions of Suns worth of mass — produce gravitational waves with wavelengths so long it can take years for a single wave to pass Earth. There are ongoing attempts to detect this background sea of waves, but not with the instruments we call gravitational wave detectors. One project, the NANOGrav collaboration, has gathered tantalizing evidence with radio telescopes by measuring variations in the timing of pulsars, whose signals may be perturbed by gravitational waves as they cross the galaxy.
ChatGPT made another error in its answer — momentarily conflating a black hole merger with the process of accreting matter. A quasar is an active galaxy whose central black hole is steadily feasting on matter and giving off strong radio waves, not a sudden flash of light that could accompany a supermassive black hole merger. (For more on why an SMBH merger could generate detectable light when stellar-mass black hole mergers do not, see the answer from Xin Liu, astrophysicist at the University of Illinois, in our June 2022 issue.)