A search for dark matter among Fermi-LAT unidentified sources with systematic features in Machine Learning

Written by Viviana Gammaldi.

Summary of the paper with the same title published in MNRAS.

The recent 4FGL Fermi-LAT catalogue, the result of 8 years of telescope operation, is a collection of sources with associated gamma-ray spectra, containing important information about their nature. As shown in Fig. 1, somehow surprisingly, an important fraction of objects in the Fermi-LAT catalogs, ca. 1/3 of the total, remain as unidentified (unIDs), i.e., objects lacking a clear single association to a known object identified at other wavelengths, or to a well-known spectral type emitting only in gamma rays, e.g. certain pulsars. Indeed, there is the exciting possibility that some of these sources could be a DM signal. Among other prospective sources of gamma rays from DM annihilation events, dark satellites or subhalos in the Milky Way, with no optical counterparts, are the preferred candidates, as they are expected to exist in high number according to standard cosmology and they would not be massive enough to retain gas/stars. Further, main galaxies in local Universe, e.g. dwarf irregular galaxies, may also represent good candidates for unIDs.

We propose a new approach to solve the standard, Machine Learning (ML) binary classification problem of disentangling prospective DM sources (simulated data) from astrophysical sources (observed data) among the unIDs of the 4FGL Fermi-LAT catalogue.

In particular, we are interested in one of the parametrizations of the gamma-ray spectrum used in the 4FGL, known as the Log-Parabola (LP), which allow us to identify different astrophysical sources of gamma rays by means of at least two parameters, the emission peak ,Epeak, and the spectral curvature, beta. Indeed, we introduce the DM sample in the parameter space by fitting the simulated DM gamma-ray spectrum with the same LP functional form (Fig. 2, left panel). Furthermore, we artificially build two systematic features for the DM data which are originally inherent to observed data: the detection significance and the relative uncertainty on the spectral curvature, beta_rel. We do it by sampling from the observed population of unIDs, assuming that the DM distributions would, if any, follow the latter. In Fig. 2 we show the parameter space without the uncertainty on beta (left panel) and by including the uncertainty on beta, created for the DM sample as systematic feature.

Fig. 2: beta-Epeak parameter space. Left panel: Astrophysical (yellow), DM (magenta) and unIDs (red) sources are shown. Right panel: Same as left panel, but including the uncertainty on beta for the training/test set (grey data) and the unIDs sources to be classified (red data point).

Finally, we consider different ML models for the classification task: Logistic Regression, Neural Network (NN), Naive Bayes and Gaussian Process, out of which the best, in terms of classification accuracy, is the NN, achieving around 93% performance. Applying the NN to the unIDs sample, we find that the degeneracy between some astrophysical and DM sources (visible as overlapping region in Fig. 2) can be partially solved within by including systematic features in the classification task (Fig. 3). Nonetheless, due to strong statistical fluctuations, we conclude that there are no DM source candidates among the pool of 4FGL Fermi-LAT unIDs.

Fig. 3: Probability for each unIDs to be a DM source. Left panel: results adopting only two feature beta-Epeak for classification. Right panel: results for the four-features (beta, Epeak, sigma, beta_rel) classification.

Further details can be found in https://doi.org/10.1093/mnras/stad066 .