Mapping the occurrence probability of geothermal manifestations at a national scale using MaxEnt
Abstract
The geotectonic environment of Central America provides the region with a high potential for harnessing geothermal energy, however, the use of this energy source continues to be scarce due to a variety of factors, including limited access to exploration-oriented resources. It is therefore a priority to generate information that facilitates the preliminary stages of geothermal projects in order to achieve a greater contribution of available renewable sources to the energy matrix. In this study, the MaxEnt model was employed to predict locations with geothermal potential in Honduras. The model incorporated the coordinates of 177 surface manifestations and utilized various predictors, including the distribution of volcanoes, geological formations, faults, aquifers, and surface temperature. These predictors were disaggregated into 23 environmental variables. A map with an adequate prediction precision (Area Under the Receiver Operating Characteristic Curve = 0.868) was generated, showing that except for the eastern central region of Honduras, the country has a significant prevalence of surface with high and very high probability of occurrence of geothermal manifestations. Among the variables analyzed, the occurrence of geothermal manifestations can be primarily attributed to the proximity to volcanoes in the region.
Geothermal energy is heat that by conduction mechanisms through rocks and convective processes of water can be found in the subsoil and exploited through direct use or by transforming it into other forms of energy. The use of this renewable resource for power generation at a commercial level began in Italy at the early 20th century [1], [2], however, only approximately 15% of the estimated worldwide potential for geothermal power generation, which ranges between 70 to 80 GW, is currently being utilized [3].
Geothermal energy represented only 0.51% of renewable energy globally in 2021. Although the contribution is low, the worldwide generation of geothermal energy increased 13.9% from 2019 to 2021 [4].
In Latin America, México has the largest installed capacity in geothermal energy, producing 951 MW, which contributes 1.14% of its energy matrix. Chile has incorporated this resource into the electricity generation network with 45 MW, which represents 0.17% of its installed capacity [5]. Central America, located in the ring of fire, with interaction between the Cocos plate subducting into the Caribbean plate, and the Caribbean and North American plates moving against each other in a strike-slip fault system; possesses a geotectonic setting that provides it with a significant potential for the utilization of geothermal resources, with depths ranging from 500 m to 3000 m capable of reaching temperatures of 200 °C to 300 °C [6]. El Salvador, Costa Rica, Nicaragua, and Guatemala are on the top 14 countries with the highest contribution of geothermal energy to their energy matrix; however, it is only possible to produce around 12% of their estimated potential [7]. Honduras had not utilized this resource for electricity generation until 2017, when a plant was established contributing 39 MW to the national grid. This capacity represents 1.3% of its total installed electricity generation capacity [5].
Considering the low environmental impact of the exploration, exploitation, generation, and distribution of geothermal resources [8], and in order to reduce pollution associated with the use of fossil fuels, geothermal energy is being promoted both to maximize its contribution to the energy matrix, as in low enthalpy projects aimed at direct uses, however, its inclusion is not proving so expeditious. The high exploration costs and the risk associated to unsuccessful searching for geothermal sites, the lack of regulatory clarity for the exploration and exploitation, and the difficulties of access to economic resources for the development of geothermal projects, constitute main limitations for the development of this field [9], [10].
In order to carry out a more effective exploration of existing geothermal resources and reduce their associated costs, methods have been proposed to improve pre-feasibility studies using open-access satellite information. In addition, techniques such as distribution models, which were originally developed for other disciplines, are being adapted for this purpose. MaxEnt is a machine learning model, commonly used to forecast species distribution [11], [12], which is also applicable to general problems in which a probability density function must be approximated based on parameters correlated with the presence of a species or the occurrence of a phenomenon [13], [14].
In recent decades, MaxEnt has become an important tool for diagnosing renewable energies, including biomass [15], [16], wind power [17], as well as solar, hydro, ocean, and geothermal sources[16], [18]. The use of distribution models for the identification of geothermal sites, yielded important benefits, such as the reduction of costs and time in exploratory processes, reduction of the risk associated with mining activity, and the generation of inputs for decision making. Likewise, global-scale maps have been developed and serve as valuable initial resources. However, their precision needs to be evaluated through regional models conducted by experts, and their estimations should be tested using various parameters and local data. [19].
In this study, MaxEnt was used to identify potential geothermal sites at a national scale in Honduras, Central America. For the training of the model, the location of surface manifestations of geothermal activity were introduced. The national distribution of aquifers, volcanic activity, faults, geological formations, and surface temperature were selected as predictor variables. A probability map of geothermal manifestations occurrence was generated and subjected to independent validation. This validation involved evaluating whether points indicating the presence of geothermal manifestations, which were not used for model training, are geographically located in the areas with the highest calculated probability of occurrence. The greatest contribution to the prediction of new presence points in the model is attributed to the proximity to volcanoes in the region.
The materials and methods for the modelling, selection of variables, evaluation of model performance, and preparation of the probability map are described below.
Due to the nature of the variables selected for the prediction of geothermal manifestations, MaxEnt modeling software (version 3.4;
Five geographic information layers: (1) hydrogeological map, (2) geological map, (3) surface temperature derived from Landsat-8 images, (4) volcanic activity, and (5) regional faults were disaggregated into 23 environmental variables. These layers were processed using techniques such as reclassification, radiometric calibration, and Euclidean distance in the QGIS geographic information system [24] (Figure 1 ). The resulting environmental variables, along with the locations of geothermal manifestations, were inputted into MaxEnt to generate the probability map.
MaxEnt model workflow
The acceptable threshold for model validation, in terms of percentage of data, for precision parameterization of MaxEnt has been found to be between 10 and 30% [25]. For this study, out of the 222 available points identified in the inventory of geothermal manifestations in the Republic of Honduras [22], [26], 177 points were randomly selected for model training, resulting in 45 points left out for model validation, representing 25% of the training data to ensure a good spatial representativeness at the national level. The distribution of the training and validation points is depicted in the sampling location of the geothermal manifestations map (Figure 2 ). The inventory of geothermal manifestations in the Republic of Honduras has been generated by the national energy authority and external collaborators through a comprehensive survey covering the entire territory.
Sampling location of geothermal manifestations
A total of six different models were conducted, varying the combination of introduced environmental variables and adjusting the parameters of features, iterations, and regularization. The aim was to identify the configuration that yielded the best performance, using the AUC statistic as the reference value.
The Cloglog option was selected as output of the model, which provides an estimate of probability of occurrence between 0 and 1.
A total of 23 environmental variables (Table 1 ) were derived from 5 layers of geographic information. This disaggregation was conducted to include all available attributes within the data for analysis, aiming to maximize entropy, that is, provide MaxEnt with sufficient input to determine the contribution of different data dimensions to the occurrence of geothermal manifestations. For instance, the geology layer contains not only lithological attributes but also information regarding the chronology of formations. Therefore, two distinct variables, lithology and chronology, were introduced to the algorithm from a single base GIS layer. Below is a description of all 5 layers of geographic information.
Global Volcanic Activity: Volcanoes were chosen as predictors, considering their association with magma chambers, which serve as significant sources of geothermal energy. A map of volcanic activity [27] was used, which includes (1) volcanic fields, (2) stratovolcanoes, (3) shields, (4) pyroclastic shields, (5) pyroclastic cones, (6) Maar, (7) lava domes, (8) volcanic fissures, (9) complex volcanoes, (10) scoria cones, and (11) calderas.
Variable code | Geographic data included in variable | Variable code (continued) | Geographic data included in variable (continued) |
---|---|---|---|
prstrvlc | Volcanic field, Stratovolcano, Shield, Pyroclastic Shield, Pyroclastic cone(s), Maar, Lava dome, Fissure vents, Complex volcano, Cinder cone, Caldera | lit_gral | Metamorphic; Plutonic; Sedimentary mixed, Sedimentary siliclastic, volcanic flow, volcanic tuff-pyroclastic |
prstrvlcvf | Volcanic field | lit_esp | Andesitic and rhyolitic pyroclastic rocks, volcaniclastics; basalt and andesite flows and sills; basalt and andesite flows, pyroclastic rocks; boulders, cobbles, gravel, sand, mud; calcareous shales, limestones, marls, dolomites; granite, granodiorite, diorite, tonalite; heterogeneous redbeds, Jaitique limestone; muds with limestones and volcanic ash; redbeds with limestone and conglomerate; red-brown shales, thin limestones and sandstones; schist, phyllite, gneiss, quartzite, marble, quartz veins; shales, sandstones, coals; tan shales, sandstones, conglomerates; tuffs, andesites, pyroclastic rocks |
prstrvlcsv | Stratovolcano | crono | Cretaceous; Cretaceos-tertiary; Jurassic- cretaceus; Quaternary; Quaternary-Tertiary; Tertiary; unknown (paleozoic) |
prstrvlcs | Shield | pclas_fault | Normal fault |
prstrvlcps | Pyroclastic Shield | pclas_fault | Inverse fault |
prstrvlcpc | Pyroclastic cone(s) | pclas_fault | Strike slip |
prstrvlcm | Maar | pcat_fault | Existing fault |
prstrvlcld | Lava dome | pcat_faulti | Interpreted fault |
prstrvlcfv | Fissure vents | pcross_fau | Fault crossing |
prstrvlccv | Complex volcano | fte_tip | Extensive and highly productive aquifers (1); Local and extensive aquifers, moderately productive (2); Local aquifers, moderate to highly productive (3); Local and extensive aquifers, poor to moderately productive (4) and Rocks with local and limited groundwater resources (5). |
prstrvlccc | Cinder cone | lst | NA: Continuous values |
prstrvlcc | Caldera |
National Geological Map: The geological map of the Republic of Honduras [28], with a scale of 1:500,000, provided data on chronology and lithological composition, which were considered relevant given their heat trapping role in geothermal systems. Regarding chronology, the map provided the distribution of rocky bodies according to their geological period of origin, including formations of the (1) Cretaceous, (2) Cretaceous-Tertiary, (3) Jurassic-Cretaceous, (4) Quaternary, (5) Quaternary-Tertiary, (6) Tertiary, and (7) Paleozoic. Regarding the general composition of the present lithology, the distribution of (1) metamorphic, (2) plutonic, (3) mixed sedimentary, (4) siliciclastic sedimentary, (5) volcanic flows, and (6) pyroclastic volcanic tuffs was obtained. Likewise, the distribution of rocks according to their specific composition including (1) andestic and rhyolitic pyroclastic rocks, volcaniclastics; (2) basalt and andesite flows, and sills, pyroclastics; (3) basalt and andesite flows, pyroclastic rocks; (4) boulders, cobbles, gravel, sand, mud; (5) calcareous shales, limestones, marls, dolomites; (6) granite, granodiorite, diorite, tonalite; (7) heterogeneous redbeds, (8) Jaitique limestone; (9) muds with limestones and volcanic ash; (10) redbeds with limestone and conglomerate; red-brown shales, thin limestones, and sandstones; (11) schist, phyllite, gneiss, quartzite, marble, quartz veins; (12) shales, sandstones, coals; (13) tan shales, sandstones, conglomerates, and (14) tuffs, andesites, pyroclastic rocks.
Regional Fault Map: Faults are significant geological structures that play a crucial role in the utilization of geothermal potential [29]. They can impact the permeability of a site by serving as conduits for geothermal water or acting as barriers. Data regarding the location and typology of faults were obtained in the global instrumental catalog of earthquakes [30], [31] and the Geotectonic Map of the Republic of Honduras [32], which include the distribution of (1) normal, (2) inverse, and (3) transformant faults; (1) existing or (2) interpreted, and (1) fault crossings.
National Hydrogeological Map: Aquifers were selected as a predictor variable, taking into account their function as heat reservoirs and as communicators of geothermal flows potentially associated with surface manifestations. A hydrogeology layer [33] was used, which provided the distribution of (1) extensive and highly productive aquifers; (2) local and extensive aquifers, moderately productive; (3) local aquifers, moderate to highly productive; (4) local and extensive aquifers, poor to moderately productive, and (5) rocks with local and limited groundwater resources.
Land Surface Temperature derived from Landsat-8: There is a correlation between anomalies in the surface soil temperature and the presence of sites with geothermal potential [34], [35], for which the use of remote sensing data has been especially useful. Open-source satellite data was used to generate a national temperature map. The Landsat-8 observation satellite is equipped with a thermal infrared sensor that enables the generation of images of the Earth's surface temperature. Using Google Earth Engine [36], a surface temperature image was created by calculating the median from a time series of 2021 Landsat-8 images. The image captured maximum values of 45.51 oC and minimum values, resulting from the detection of temperatures within the clouds, of -11.62 oC
To evaluate the preliminary models, threshold-dependent tests were performed as a signal of the prediction accuracy of each geothermal potential map. The model was evaluated using the jackknife analysis [37] to verify if the precision of the model improved when less important variables were removed.
To compare the prediction accuracy among models, the area under the curve (AUC) of the receiver operating characteristic (ROC) plot [38] was calculated until the optimal performance was achieved.
For the AUC calculation, an independent validation was performed using 45 locations of geothermal manifestations that were not entered for model training. The validation was carried out using the probability map generated by MaxEnt and the points selected for validation (Figure 3 ). From this combination, ROC and AUC curves were generated to measure the performance of the model.
Model validation workflow
The contribution of the variables given by the jackknife analysis was assessed, removing those with less contribution and re-running the model to observe differences in its performance. Based on this criterion, 6 different models were created, whose configurations are detailed in Table 2 .
Model | Features | Iterations | Regularization | Variables |
---|---|---|---|---|
R01 | Linear, Quadratic, Product. | 500 | 3 | All |
R02 | Linear, Quadratic, Product, Hinge | 500 | 2 | All |
R03 | All | 900 | 1 | All |
R04 | All | 900 | 1 | All except pclas_faultss, prstrvlcld, prstrvlcpc, prstrvlcps, prstrvlcfv |
R05 | All | 900 | 1 | All except pclas_faultss, prstrvlcld, prstrvlcpc, prstrvlcps, prstrvlcfv, prstrvlcs |
R06 | All | 900 | 1 | Just crono, fte_tip, lit_esp, lit_gral, lst, pfault, prstvlc |
Once the model with the best performance was selected, a map depicting the probability of occurrence of geothermal manifestations was created. to enhance readability, the values generated by the Cloglog output of the model (0 to 1) were reclassified into five qualitative categories of equal ranges (Table 3 ).
Value | 0 - 0.2 | 0.21 - 0.4 | 0.41 - 0.6 | 0.61 - 0.8 | 0.81 - 1 |
---|---|---|---|---|---|
Category assigned | Very low | Low | Medium | High | Very high |
The results of the jackknife analysis (Figure 4 ) demonstrate that the environmental variable with the highest individual contribution to the model is prstrvlc (proximity to all types of volcanoes). The larger size of the blue bar indicates the contribution to the model if that particular variable were considered alone.
Jackknife of regularized training gain for model R03
On the other hand, lst (Land Surface Temperature) is the variable that has the greatest impact on the model, as removing it would decrease its efficiency. The smallest cyan bar represents the improvement gained by the model after suppressing the variable.
Calderas, pyroclastic cones, pyroclastic shields, and volcanic fields are variables that individually generate a low contribution (Figure 4 ), but when suppressed in R04, they resulted in a decrease in the overall performance of the model. In addition to these variables, when other variables with low individual contribution were suppressed, such as the shields in R05, and all individual types of volcanism and faulting in R06, the observed performance was progressively lower (Table 4 ), highlighting the importance of all variables used in this study.
R01 | R02 | R03 | R04 | R05 | R06 | |
---|---|---|---|---|---|---|
MaxEnt model | 0.803 | 0.831 | 0.868 | 0.864 | 0.861 | 0.819 |
Independent validation | 0.789 | 0.823 | 0.863 | 0.857 | 0.856 | 0.804 |
The omission rate (Figure 5a ) represents the similarity between the prediction of the omission error in the occurrence of geothermal manifestations and the presence data entered into the model. The proximity between the omission line in training samples and the predicted omission diagonal is an indicator of the accuracy of the model.
The ROC curve (Figure 5b ) indicates the probability that a randomly selected point of presence of geothermal manifestations is located in a pixel with a higher probability value of presence prediction calculated by the model, than a point where no manifestations have been registered. The greater the separation of the ROC curve from the diagonal, the higher the predictive power of the model.
Omission rate (a); and ROC curve (b)
Of the 6 runs performed, R03 achieved the maximum AUC values: 0.868 for the MaxEnt model and 0.863 for the independent validation (Table 4 ). The AUC is a probabilistic metric that measures the performance of a binary classification model based on the true positive rate against the false positive rate. The AUC value ranges from 0 to 1. A value of 0 indicates that the model's predictions are entirely incorrect, while a value of 0.5 suggests that the model's predictions are no better than random chance. An AUC of 1 indicates that the model has perfect performance, being able to accurately distinguish between positive and negative instances [39], [40]. Specifically, AUC values between 0.5 and 0.7 indicate poor performance, while values between 0.7 and 0.9 suggest moderate performance. AUC values greater than 0.9 denote excelent model performance [39]. According to Swets [41], an AUC value greater than 0.8 indicates a model with a good fit.
A probability map of geothermal manifestations was generated (Figure 6 ).
Geothermal manifestations probability map for the Republic of Honduras
Considering the limited availability of information regarding the abundance and distribution of geothermal resources in the region, it is anticipated that the probability map of geothermal manifestations for the Republic of Honduras will contribute to enhancing the understanding of Honduras' energy potential. This knowledge can be utilized as a valuable input in geothermal exploration projects, particularly during the initial stages of pre-feasibility and feasibility assessments. Furthermore, the methodology presented in this study, which relies on open-access data and free software, can be readily adopted to create probability maps for the Central American region and other areas where there is an interest in exploring this renewable energy source.
The results of this study show that the variable that most contributes to the increase in the probability of occurrence of geothermal manifestations in Honduras is the proximity to volcanoes. At first, this fact could seem surprising taking into account that there are currently no active volcanoes in Honduras; however, active volcanoes near to the study area in El Salvador and Nicaragua were considered. This puts into perspective the influence of the resources of neighboring countries for the use of the renewable energy potential in the region. The geothermal manifestations probability map for the Republic of Honduras (Figure 6 ) shows that the highest probability of geothermal manifestations occurrence is associated with the subduction zone between the Cocos Plate and the Caribbean Plate, as well as the interaction of the Caribbean Plate and the North American Plate. On the other hand, inactive volcanoes do not contribute significantly.
Future work in geothermal potential prediction should include new layers of information through a multidisciplinary approach that incorporates additional variables from different fields of specialization. This approach aims to potentially enhance the performance of the model. Future predictions that integrate data such as carbon dioxide concentration, seismic activity values, and distance to rivers—similar to those selected in previous research [19], [42]—could be particularly valuable for regions that lack proximity to volcanic activity. While volcanic activity has demonstrated its effectiveness as a strong predictor of geothermal manifestations in this study, it is not the only factor to consider. Furthermore, the inclusion of new data currently being generated at the national level pertaining to hydrological, tectonic, and climatic aspects should be considered for a subsequent study conducted on a more local scale.
The map generated in this study reveals that, with the exception of the eastern central region of Honduras, the country exhibits a significant prevalence of surfaces with a high and very high probability of geothermal manifestations occurring. The potential for utilizing energy resources through geothermal manifestations presents a particularly valuable opportunity at both the private and public levels. This is especially significant considering the historically low exploitation of this renewable source, which has notable economic, social, and environmental implications.