Needle in a Haystack: How Our AI Tracked the Chinese Balloon in Millions of Square Miles of Unlabeled Satellite Data
The path of the Chinese balloon, as mapped by our RAIC product using Planet Labs PBC data.
In February 2023, reports of a massive Chinese balloon flying over the U.S. captured national attention. The balloon delayed a diplomatic visit, drew unanimous condemnation in Congress, and even had some (misguided) people planning to shoot it down themselves.
Having launched high-altitude balloons of my own in the past, I was intrigued by the incident. Though much smaller (starting at around 10’ in diameter), my balloons were like the Chinese balloon in that they were helium-filled and intended to carry payloads to very high altitudes. In my case, the balloons were designed to ascend to around 100,000 feet, high enough for its cameras to capture both the blackness of space and the curvature of the Earth.
Image from the apogee of my second launch balloon launch (90,700’).
In the days after the Chinese balloon was shot down off the coast of South Carolina by a fighter jet, as speculation about its origin and path raged on, I began to wonder: could Synthetaic’s RAIC (Rapid Automated Image Categorization) AI product do what no one else had done and find the balloon in satellite data? Equipped with commercial PlanetScope satellite imagery from Planet Labs PBC, I was about to find out that not only could we find the balloon in minutes, but we could track it back to its original launch location.
A Textbook Case for RAIC
Artificial Intelligence (AI) has historically relied on extensive human labeling and the building of bespoke models. When you read about how much AI models have improved over the past five years, you might assume this problem has been solved, but it hasn’t. In fact, today’s AI algorithms are so data-hungry that the companies best-known for their AI advancements are now training on image sets with billions of labeled images. In almost no real-world application is it tractable to get billions of labeled images. And in time-sensitive scenarios, the thousands of person-years it takes to label data at that scale makes the entire process unfeasible.
The Chinese balloon represented a perfect catch-22 for these methods: how could AI find something in satellite data that had never been found in satellite data before? There wasn’t a single reference image, let alone the tens of thousands needed to even start to train a traditional supervised model.
That’s where RAIC comes in. RAIC, or Rapid Automated Image Detection, eliminates the need for labeled data or pre-existing trained models by facilitating collaboration between AI and a human user. The tool is designed to immediately detect and classify anything in visual data, whether that’s satellite imagery, photographs, or full-motion video. RAIC’s unique “human nudge” feature allows the user to guide the AI. In doing so, it brings together the amazing pattern matching and contextual association capabilities of our human brains with the computer’s ability to sift through terabytes of data in minutes. We believe (and demonstrated with this project!) that this human-machine collaboration is fundamental to extracting AI-based insights at speed and scale.
Finding the Chinese balloon was a textbook use case for RAIC, and a perfect test of what the technology could help accomplish. Even I was surprised by how quickly it aced that test.
The Great Balloon Search
I started from a simple hand drawing of how I thought the balloon might look in satellite imagery. Knowing that many satellite-borne cameras take multiple photos through separate colored filters, separated by fractions of a second, I drew something resembling a red, green, and blue snowman. This would represent the balloon getting captured through these various filters as it moved with the high-altitude winds. I fed that image to RAIC and used it to search imagery taken above South Carolina before the balloon was shot down. Within two minutes, RAIC had returned a positive match.
This rough drawing of what the balloon might look like in satellite data was enough for RAIC to find a match over South Carolina in less than two minutes.
Like any researcher, I was skeptical that we had really found it so quickly. We were able to confirm it was the balloon through size, shape, nearby social media reports, and wind maps showing where it was expected to be. We were also able to use parallax to calculate the balloon’s altitude. Once we knew what we’d found was the balloon, we could use that image as our new starting point in RAIC to find more locations.
The RAIC Context Map feature powers the human-machine collaboration. Each RAIC search returns an array of possible matches, clustered by similarity to the original source image and to each other. The user can then “nudge” the AI by indicating which results are positive matches.
The team at Planet, the first satellite imagery company to image the Earth’s landmass every day provided us with the commercially-available data needed to map the balloon’s entire journey from China to the Atlantic Ocean. Without this daily scan and archive of the globe, we could never have found the balloon. If the balloon was a needle in Planet’s beautiful haystack of archive data, RAIC was like bringing a magnet to the search, making nearly trivial what was otherwise nearly impossible.
“Plotting this balloon path is truly a story of the unknown unknowns,” Kevin Weil, Planet’s president of product and business, remarked to me. “Given the size of our archive, it’s a veritable playground for companies like Synthetaic to train AI and ML models and to build algorithms that can extract objects and patterns – like tracking surveillance balloons over oceans – it’s all possible today.”
In this GIF assembled from 8-band imagery, the balloon can be seen over Missouri.
To target our search after the first detection, we continued to look to open-source intelligence, as well as wind forecasts and HYSPLIT models. The process had the Synthetaic team in sleuth mode, combing through social media chatter about where the balloon had been spotted from the ground so we could load in additional PlanetScope satellite imagery.
Before long, the team had traced the balloon’s entire journey across the continental U.S., working backward from South Carolina to its entry point north of Spokane, WA. Space News and Wired both shared the project with their readers, but at the time of their publications, we still had not followed the path all the way back to its origin. After a string of successful searches, the balloon was becoming more elusive the farther back we traced it, and we lost track of it completely over Alaska.
If this had happened as recently as a few years ago, that would have been the end of the story: a few days combing imagery of the Aleutian Islands for another sighting of the balloon, only to have to give up. Without knowing both what to look for and generally where to find it, most AI can only do so much.
But by even finding the balloon in the data in the first place, RAIC had already done what was very recently impossible. With Planet’s massive historical archive of Earth imagery and RAIC’s ability to search that data near-instantaneously, we would be able to do it again.
Working backward to model the balloon’s potential path prior to entering the U.S., we ingested into RAIC seven days of Planet data for a vast area of land and ocean containing eastern China, Taiwan, North Korea, South Korea, much of Japan, and the East China and South China Seas — totaling about 60 terabytes.
“This example highlights an under-appreciated aspect of Planet’s dataset — the ability to answer questions no one knew to ask,” said Robert Simmon, Planet’s senior data visualization engineer. “Our archive of high resolution, near-daily data preserves a unique record of events no matter where they occur. A record that can not only help locate an unusual flying object, but also determine when a road was built in a remote area or how the course of a river changed in the heart of the Amazon.”
Due to variables including wind speed and the possibility that the balloon was being actively controlled, we ingested a large area of Earth imagery, taken across seven days, to find the balloon again.
RAIC is incredibly efficient at searching terabytes of data, so ingesting that data was much more time-consuming than the RAIC search itself. During a late-night session, RAIC returned a positive match off the coast of Taiwan. We paused to celebrate on a Teams video call, and knew we were back on track.
Armed with a reference image of the balloon over the ocean, the Synthetaic team traced the balloon back to its origin near Hainan Island in the South China Sea. In total, RAIC achieved 12 detections of the balloon over its two-week journey from China to South Carolina, in addition to detecting a second, similar balloon over Colombia during the same period. To do so, RAIC searched a swath of satellite data equivalent to double the area of the Earth’s land surface.
In this GIF assembled from 8-band imagery, the balloon can be seen flying over the ocean near Taiwan.
RAIC had done exactly what it promises: from a haystack millions of square miles in size, it pulled the needle out like a magnet. It performed analysis that no other technology could in this timeframe. In fact, it performed analysis that most would have thought was impossible.
RAIC: The only limit is your imagination
We brought our findings to The New York Times, whose Visual Investigations Team thoroughly reported on them. Of particular interest to the Times were new insights and conclusions that could be drawn from the balloon’s exact locations (as opposed to social media conjecture) as revealed by RAIC.
For example, when the balloon’s path as mapped by RAIC can be overlaid on a map of Department of Defense sites in ArcGIS, it becomes apparent that the balloon had not flown over any military bases. This contrasted with initial reports: contrary to allegations that the balloon had flown over Malmstrom Air Force Base in Great Falls, Montana, its path took it south of that facility, including over the ghost town of Lennep, MT. That said, the line of sight of the balloon given its altitude most certainly extended its RF signal and optical range across many sensitive sites.
Just as RAIC had found 12 instances of the balloon starting from a hand drawing, it was also able to find additional potential balloon launch sites in mainland China and Mongolia from a single example found elsewhere in the map data.
The potential applications of RAIC go far beyond this high-profile news story. There’s an almost unfathomable quantity of geospatial data available today, and more being captured every second, and RAIC makes it possible to make sense of it all. To date, RAIC has been used in a variety of contexts and industries, including defense, search and rescue, science and technology, and conservation. For example, the World Food Programme is using RAIC to support time-sensitive disaster relief, building models to find flood victims.
RAIC fundamentally changes what’s possible with AI. Instead of thousands of labelers drawing boxes on data to train a model, or an army of analysts staring at satellite pixels, a single person can now nudge an unsupervised AI into what they need it to be. That allows them to build an AI and run it in seconds across millions of images. It takes us from a place of needing millions of labeled images to only needing (in this case) a hand drawing based on an educated guess. RAIC is the way AI is supposed to work.
The New York Times:
March 21, 2023
February 25, 2023
February 17, 2023