Background
A traditional Artificial Intelligence (AI) method of feature detection relies on a human developing an algorithm they think will allow a computer to identify a feature. In contrast, a Machine Learning (ML) AI approach is capable of learning and improving as it is further trained. While just five years ago, ML required significant programming acumen, today free packages exist. Ultralytics’ You Only Look Once (YOLO) is a popular ML free package, used in some recent papers to identify impact craters.
There are three primary types of feature recognition (all implemented in YOLO): classification, detection, and segmentation. Classification would say, “there is a computer monitor in this image,” and if I have trained it to detect different types of monitors (such as flat-screens TVs versus cathode ray tubes, CRT TVs), it could tell you what type of monitor. Detection is, “there is a computer monitor within this [tight] bounding box.” Segmentation is, “here is a trace of the computer monitor’s outline.” In this specific research, I was primarily interested in impact crater detection, however, ML-based segmentation was a reach goal I was unable to obtain.
For impact craters, we are interested in three metrics: true positives, false positives, and false negatives (TP, FP, and FN). These are: real craters correctly identified, features that are not craters but were identified as craters, and real craters that are missed. The goal is to maximize the first and minimize the other two. In the AI field, these are usually expressed as precision (P) and recall (R): P = TP ÷ (TP + FP), while R = TP ÷ (TP + FN). If TP is large while FP and FN are very small, P and R approach 1.0. One can think of precision as, “The fraction of total found features that are real,” while recall is, “The fraction of existing, real features that were found.”
Crater analysts are usually most interested in knowing where the crater is (latitude and longitude) and its size (diameter). Therefore, the primary focus in this research was crater detection: Having the ML-based AI say, “there is a crater within this tight bounding box.”
In the last ≈decade, manual impact crater databases have been generated for many of the larger bodies in the solar system with solid surfaces. Several of these were done by me, including a ≈640,000-crater database of Mars (Robbins & Hynek, 2012), a ≈2,100,000-crater database of the Moon (Robbins, 2019), and crater databases with several hundred craters on Pluto and Charon (Robbins et al., 2017; Robbins & Dones, 2023). Databases like these are important references for the planetary science community, but the lunar and Martian crater databases are limited to craters larger than 1 km across (D ≥ 1 km). Databases for other bodies tend to be similarly limited to kilometer-scale impacts, such as 5 km on Mercury (Herrick et al., 2018), or 4 km on Dione (Kirchoff & Schenk, 2015). Where imagery is not a limiting factor for going to smaller diameters, the limitation is the vast numbers of craters at smaller sizes become impossible for humans to map in a reasonable time (craters follow a power-law distribution with an exponent of –3 to –4, meaning that the number of ≥100 m craters is ~1,000–10,000× the number of ≥1 km craters).
However, AI-based crater databases – whether ML or not – have a poor track record in the crater community. At planetary science conferences for well over three decades, researchers have presented AI-based crater identification tools. Those researchers are often computer scientists who think that it is easy to identify a circle. They create their code, consider it good enough for most applications, and there is no further development. In contrast, crater researchers tend to see the codes as not good enough, consider the AI useless, and continue manual identification and measurement. This has recently become a bigger potential issue with the advent of a planetary science group from Curtin University who used YOLOv3 to create a 94-million D ≥ 50 m crater database of Mars without any manual validation (Lagain et al., 2021), and a mixture of YOLOv3 and v5 to create a 190-million crater database of the Moon (Benedix et al., 2023). It is incredibly tempting to use these products even though the researchers have self-reported the accuracy as ~80–90%, which has been recently independently demonstrated as severely inflated (Lee, 2023). The Curtain University’s mentality has been closer to the computer scientists, where they consider this “good enough” to do some science, though many other crater researchers disagree.
In contrast with that approach, I piloted – and then expanded on – a hybrid approach in this IR&D project, leveraging the speed of a computer with the contextual knowledge of the human analyst, which can increase crater detection speed by >10× while maintaining the high quality of a human analyst, potentially opening new avenues of external funding.
The primary objective of this work was to develop a workflow to leverage some of the latest work in ML feature recognition with the human analyst’s ability to weed through that work and effectively “grade” it. The secondary objective was to have the computer not just recognize the crater, but to trace it out to better understand the shape (also with corrective human input). These were both driven by the goal to significantly speed crater data-gathering without sacrificing quality.
Approach
The approach to the primary goal of object detection was to work with the public, free, YOLO version 8 software, supply it with different training datasets to train detection models, run the models on data of different planetary bodies, manually grade the results, and iterate. This process is straightforward, but time-consuming to the point that almost all allotted time was subsumed by this task. Therefore, the second goal of improving segmentation (crater rim tracing) was only briefly addressed by making a few improvements to current, deterministic code.
Accomplishments
There were significant accomplishments towards the first goal. As an exploratory IR&D project, developing a better understanding of how to work with the machine learning packages for impact crater detection was a significant part of this effort.
- Understanding hardware limitations and requirements of YOLO and many other machine learning codes.
- Understanding how training datasets work, and what's required of them in terms of quality and quantity.
- Creating training data from scratch and from existing impact crater databases.
- Experimenting with training the AI on individual images or on planetary mosaics.
- Experimenting with planetary images or mosaics with a single lighting and camera geometry, or with highly variable lighting and camera geometry.
- Developing fast interface for grading AI-based candidate crater detections.
- Testing automated crater detection on Mercury, Moon, Mars, Vesta, Ceres, and Rhea.
Progress was also made on the second goal of improving segmentation tools, though only a deterministic, non-ML version:
- A Monte Carlo walk around features in edge-detected, polar coordinates was implemented.
- The median of the random walks forms the new feature trace.
- User-interactivity was implemented to allow the user to require the trace go through specific location(s) and/or exclude certain region(s). This is useful if, for example, a younger feature is over-printed, and the code would otherwise follow that feature instead of the feature of interest.
- When ML-based segmentation is eventually implemented, potentially through a follow-on internal research and development proposal.
Overall, this was a successful IR&D project.