Background
There are multiple approaches for the filtering of large data sets. This project compared data mining methods and machine learning techniques for the classification of spatiotemporal regions of space to quantify their effectiveness on a test bed of MMS data. Machine learning and AI are the most popular technologies available, but are they truly the best way to go through data sets such as telemetry data?
Approach
Our objective was to compare data mining methods and machine learning techniques for the classification of spatiotemporal regions of space and quantify their effectiveness. To do this, the focus was on identifying the pristine solar wind region from MMS observations. Our plan was to use data mining and machine learning for the identification and compare our results. Our overall approach was as follows:
- Improve the data mining codes for future work.
- Use machine learning for image classification.
- Take a heuristics-based approach.
- Take a neural-network approach.
- Use machine learning for identifying regions of space.
- Reproduce results from Olshevsky, et al. 2021.
- Convert from Keras based approach to Pytorch.
- Apply data mining to identify the pristine solar wind.
- Apply machine learning to identify pristine solar wind.
- Compare the data-mining approach to machine learning.
Accomplishments
The existing data mining codes were substantially improved for reuse and specific regions were found that would have been otherwise impossible to find (Figure 1). Since the process is largely iterative in removing false positives, some time was spent isolating the appropriate conditions (Figure 2). For the machine learning aspect, improvements were made to existing image classification codes for identifying invalid plots. Our ML codes were able to duplicate the results of Olshevsky, and we were able to convert the Keras based approach to the more modern PyTorch; however, the ML approach was much more complex and could not improve upon our classical data mining approach. For this type of data (3D time-series), there seemed to be no advantage of using ML techniques.

Figure 1: A four-minute interval of the solar wind identified between two magnetosheath intervals. This very narrow region of space would have been nearly impossible to find using machine learning image classification routines since the granularity of the images would have been very small and the number of images for a large data set such as MMS would have been overwhelming.

Figure 2: MMS is only in the pristine solar wind if all conditions are met and can be a very small region of space in time. This figure shows how often MMS was in the pristine solar wind during December of 2023.