Sentinel2GlobalLULC: A Sentinel-2 RGB image tile dataset for global land use/cover mapping with deep learning | Scientific Data – Nature.com

0
188

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.
Advertisement
Carousel with three slides shown at a time. Use the Previous and Next buttons to navigate three slides at a time, or the slide dot buttons at the end to jump three slides at a time.
29 November 2019
Tomàs Artés, Duarte Oom, … Jesús San-Miguel-Ayanz
09 June 2022
Christopher F. Brown, Steven P. Brumby, … Alexander M. Tait
11 May 2021
Karina Winkler, Richard Fuchs, … Martin Herold
20 May 2020
Ji Young Lee, Jinhoon Jeong, … Jeong-Sik Byeon
21 January 2021
Nancy L. Harris, David A. Gibbs, … Alexandra Tyukavina
15 July 2022
Michela Antonelli, Annika Reinke, … M. Jorge Cardoso
15 April 2021
Lucas von Chamier, Romain F. Laine, … Ricardo Henriques
31 October 2022
Daniel Moreira, João Phillipe Cardenuto, … Edward Delp
31 October 2022
Jason Lequyer, Reuben Philip, … Laurence Pelletier
Scientific Data volume 9, Article number: 681 (2022)
206 Accesses
2 Altmetric
Metrics details
Land-Use and Land-Cover (LULC) mapping is relevant for many applications, from Earth system and climate modelling to territorial and urban planning. Global LULC products are continuously developing as remote sensing data and methods grow. However, there still exists low consistency among LULC products due to low accuracy in some regions and LULC types. Here, we introduce Sentinel2GlobalLULC, a Sentinel-2 RGB image dataset, built from the spatial-temporal consensus of up to 15 global LULC maps available in Google Earth Engine. Sentinel2GlobalLULC v2.1 contains 194877 single-class RGB image tiles organized into 29 LULC classes. Each image is a 224 × 224 pixels tile at 10 × 10 m resolution built as a cloud-free composite from Sentinel-2 images acquired between June 2015 and October 2020. Metadata includes a unique LULC annotation per image, together with level of consensus, reverse geo-referencing, global human modification index, and number of dates used in the composite. Sentinel2GlobalLULC is designed for training deep learning models aiming to build precise and robust global or regional LULC maps.
Land-Use and Land-Cover (LULC) mapping aims to characterize the continuous biophysical properties of the Earth surface as categorical classes of natural or human origin, such as forests, shrublands, grasslands, marshlands, croplands, urban areas or water bodies, etc.1. High resolution LULC mapping plays a key role in many fields, from natural resources monitoring, to biodiversity conservation, urban planning, agricultural management or climate and earth system modelling2,3,4. Multiple LULC products have been derived from satellite information at the global scale (Table 2), contributing to a better monitoring, understanding, and territorial planning of our planet5,6. However, despite the acceptable global accuracy of each individual product, a considerable disagreement among products has been reported4,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22. These reports explain that this disagreement is due to several methodological reasons, including: (1)Given that different satellite sensors with different spatial resolutions were used in each product, the difference in precision from coarse to fine resolution imagery partially determines the final quality of each product. (2)Different pre-processing techniques, like atmospheric corrections, cloud removal and image composition were used in each product. (3)Each product has a different updating frequency (from regularly to never updated products). (4)Different classification systems (i.e., LULC legends) were adopted in each product, usually each one focusing on a distinct application. (5)Different classification techniques, field-data collection approaches, and subjective interpretations were used to create each product. (6)Different validation techniques and different ground truth reference data were used in each product, which impedes a reliable accuracy comparison.
Over the last few years, several attempts have been made to overcome these inconsistencies with a harmonised approach capable of providing better control in the validation and comparison over the growing number of existing LULC products23,24. Even though, users still have some issues regarding appropriate product selection due to the following factors: (1)In most cases, users are unable to find a product that fits their desired LULC class or geographic region of interest25,26. (2)These products are usually collected at a coarse resolution, which makes analysis at a finer scale difficult12. (3)These products offer a limited number of LULC classes that usually change from one product to another27.
In parallel, deep artificial neural networks, also known as Deep Learning (DL), are increasingly used in LULC mapping with promising potential28. This interest is motivated by the good performance of DL models in computer vision and, particularly of Convolutional Neural Networks (CNNs) in remote sensing image classification and many applications29,30,31,32,33. However, to reach high performance, DL models need to be trained on large smart datasets34. The concept of smart data involves all pre-processing methods that improve its data value and veracity, in addition to the quality of its associated expert annotations35.
Currently, there exist several remote sensing datasets derived from satellite and aerial imagery ready for training DL models for LULC mapping (Table 1). However, they still suffer from some limitations, particularly the following factors that complicate their application with DL models: (1)First, none of them represent the global heterogeneity of the broad categories of LULC classes throughout the Earth. Usually, they are biased towards specific regions of the world, limited to national or continental scales, which can propagate such bias to the DL models36,37,38. As illustration, the reader can see how visual features of urban areas may change from one country to another (Fig. 1). (2)Second, they are relatively small and have only hundreds to few thousands of annotated data records39. (3)Third, they suffer from high variability in atmospheric conditions, and they have high inter-class similarity and intra-class variability, which makes their class differentiation difficult39.
Illustration from different countries of the Sentinel-2 satellite images corresponding to one of the 29 Land-Use and Land-Cover (LULC) classes (e.g. Urban and built-up area) extracted from Sentinel2GlobalLULC dataset. Each image has 224 × 224 pixels of 10 × 10 m resolution. Pixel values were calculated as the 25th-percentile of all images captured between June 2015 and October 2020 that were not tagged as cloudy. Fifteen LULC products available in Google Earth Engine agreed in annotating each image to represent one LULC class.
To overcome these limitations, we introduce in this paper Sentinel2GlobalLULC40, a smart dataset with 29 annotated LULC classes at global scale built with Sentinel-2 RGB imagery. Every image in this dataset is geo-referenced and has a unique LULC annotation. Each image label was carefully built from a consensus approach by combining up to 15 global LULC maps available in Google Earth Engine(GEE)41. We released a tif and jpeg version of each image and a CSV file for each LULC class containing the coordinates of each image center, and additional metadata. Sentinel2GlobalLULC aims to foster the creation of accurate global LULC products by exploiting the currently offered advantages by DL. Sentinel2GlobalLULC could be used to train and/or evaluate DL based models for global LULC mapping. We expect this dataset to improve our understanding and modelling of natural and human systems around the world.
To build Sentinel2GlobalLULC, we followed two main steps. First, we established a spatio-temporal consensus between 15 global LULC products for 29 LULC classes. Then, we extracted the maximum number of Sentinel-2 RGB images representing each class. Each image is a tile that has 224 × 224 pixels at 10 × 10 m spatial resolution and was built as a cloud-free composite from all the Sentinel-2 images acquired between June 2015 and October 2020. Both tasks were implemented using GEE, an efficient programming, processing and visualisation platform that allowed us to have free manipulation and access to all used LULC products and Sentinel-2 imagery, simultaneously.
To establish the spatio-temporal consensus between different LULC products for each one of the 29 LULC classes, we followed four steps: (1)Identification of the LULC products to be used in the consensus, (2)Standardization and harmonization of the LULC legend that was subsequently used to annotate the image tiles, (3)Spatio-temporal aggregation across LULC products, and (4)Spatial reprojection and tile selection based on optimized spatial purity thresholds.
The adopted purity measure for spatio-temporal agreement across the 15 global LULC products we selected from GEE (Table 2) aims to find areas of high consensus to maximize the annotation quality. Spatial and temporal consensus across such rich diversity of LULC products, in terms of spatial resolution, time coverage, satellite source, LULC classes and accuracy, was used as a source of robustness for our subsequent LULC annotation. Products outside GEE were not used due to computing limitations.
Land cover (LC) data describes the main type of natural ecosystem that occupies an area; either by vegetation types such as shrublands, grasslands and forests, or by other biophysical classes such as permanent snow, bare land and water bodies. Land use (LU) includes the way in which humans modify or exploit an area, such as urban areas or agricultural fields.
To build our 29 LULC classes nomenclature, we established a standardization and harmonization approach based on expert knowledge. During this process, we took into account both the needs of different practitioners in the global and regional LULC mapping field and the thematic resolution of the global LULC legends available in GEE. Our nomenclature consists of 23 LC and 6 LU distinct classes identified through specific consensus rules across 15 LULC products (see Table 4). A six-level (L0 to L5) hierarchical structure was adopted in the creation of these 29 LULC classes (Fig. 2). To facilitate the inter-operability of our 29 legends at the finest level L5 across all LULC products and with the widely used FAO’s hierarchical Land Cover Classification System (LCCS)1, we have established an LULC classification system where the 29 classes can be mapped directly to FAO’s LCCS as explained in the table of Supplementary File 1. The LC part in our dataset contains 20 terrestrial ecosystems and 3 aquatic ecosystems. The terrestrial systems are: Barren lands, Grasslands, Permanent snow, Moss and Lichen lands, Close shrublands, Open shrublands, in addition to 12 Forests classes that differed in their tree cover, phenology, and leaf type. The aquatic classes are: Marine water bodies, Continental water bodies, and Wetlands; furthermore, wetlands were divided into 3 classes: Marshlands, Mangroves and Swamps. The LU part is composed of urban areas and 5 coarse cropland types that differed in their irrigation regime and leaf type. In Table 3, you can find the semantic definition of each one of the 29 classes in Sentinel2GlobalLULC. We provided a table in Supplementary File 2, for a more detailed definition of each LULC class.
Tree representation of the six-level (L0 to L5) hierarchical structure of the Land-Use and Land-Cover (LULC) classes contained in the Sentinel2GlobalLULC dataset. Outter circular leafs represent the final or most detailed 29 LULC classes (C1 to C29) of level L5. The followed path to define each class is represented through inner ellipses that contain the names of intermediate classes at different levels between the division of the Earth’s surface (square) into LU and LC (level L0) and the final class circle (level L5). All LULC classes belong to three levels at least, except the 12 forest classes that belong to L5 only.
For each one of the 29 LULC classes, we combined in space and time the global LULC information among the 15 GEE LULC products. This way, each image was annotated with a LULC class only if all combined products agreed in its corresponding tile (i.e., 100% of agreement in space and time). For each product and LULC type, we first set one or more criteria to create a global mask at the native resolution of the product in which each pixel was classified as 1 or 0 depending on whether it met the criteria for belonging to that LULC type or not, respectively (see first stage in Table 4). For certain LULC classes, some products did not provide any relevant information, so they were not used. For example (Table 4), in Grasslands (C3), Open Shrublands (C4) and Close Shrublands (C5), we combined 14 products, while in UrbanBlUpArea (C29) and Permanent Snow (C23) we only combined 10 and 7 products, respectively.
Then (see second stage in Table 5), for each LULC type, we calculated the average of all the masks obtained from each product to create a final global probability map from all products with values ranging between 0 and 1. Value 1 meant that all products agreed to assign that pixel to a particular LULC class, while 0 meant that none of the products assigned it to that particular class (Fig. 3). These 0-to-1 values are interpreted as the spatio-temporal purity level of each pixel to belong to a particular LULC class and are provided as metadata with each image.
Example of the process of building the final global probability map for one of the 29 Land-Use and Land-Cover (LULC) classes (e.g. C1: “Barren”) by means of spatio-temporal agreement of the 15 LULC products available in Google Earth Engine (GEE). The final map is normalized to values between 0 (white, i.e., areas with no presence of C1 in any product) and 1 (black spots, i.e., areas containing or compatible with the presence of C1 in all 15 products), whereas the shades of grey corresponds to the values in between (i.e., areas that did not contain or were not compatible with the presence of C1 in some of the products). This process is divided into two stages: the first stage (the blue part, see details in Table 4) and the second stage (the yellow part, see details in Table 5). LULC products available for several years are represented with superposed rectangles, while single year products are represented with single rectangles. GMP: global probability map, NA: Not Available.
As an example of the first stage (see details in Table 4), to specify if a given pixel belongs to Dense Evergreen Needleleaf Forest, we evaluated its tree cover level using “ ≤ “ and “ ≥ “, while for bands containing the leaf type information, we used the equal operator “ = “. For the spatio-temporal combination of multiple criteria we have used the following operators: “AND”,“OR” and “ADD”. For example, we combined the tree cover percentage criteria with the leaf type criteria using “AND” to select forest pixels that met both conditions. To combine many years instances of the same product, we used “ADD”, except for product P13, where we used “AND” to identify permanent water areas only. Whenever we used the “ADD” operator, we normalized pixel values afterwards to bring it back to a probability interval between 0 and 1 using the division by the total number of combined years or criteria.
In the second stage (see details in Table 5), we combined for each LULC class the 15 global probability maps previously derived from each product to create a final global probability map (Fig. 3). This combination was carried out using various operators such as “ADD”, “MULTIPLY” and “OR”, depending on the LULC type. When “ADD” was used, the final pixel values were normalized by dividing the final addition value of each pixel by the total number of added products. The “MULTIPLY” operator was mostly used at the end, to remove urban areas from non-urban LULC classes, or to remove water from non-water LULCs. The multiplication operator was also adopted to make sure that a certain criteria was respected in the final probability map. For instance, for the swamp class, we multiplied all pixels in the final stage by a water mask where saline water areas have a value of 0 in order to eliminate mangrove from swamp pixels and vice versa. Finally, we used “OR” operator between different water related products to take advantage of the fact that they complement each other in terms of spatial-temporal coverage and accuracy.
In GEE, when two products are aggregated using “ADD”, “MULTIPLY” or any other operator, the output is aggregated at the spatial resolution of the product at the left of the operator. Hence, to maintain the finest spatial resolution in the final probability map, we multiplied everything by product P15 and placed it at the left of the final “MULTIPLY” operation (See Table 5). Hence, all the 29 final probability maps were generated at the P15 spatial resolution of 30 m/pixel (except the urban class C29 which maintained the 30 m/pixel resolution of product P14).
Since our objective was finding pure Sentinel-2 image tiles of 224 × 224 10-m pixels representing each LULC class, we reprojected the 30 m/pixel probability maps to 2240 m/pixel using the spatial mean reducer in GEE. That is, each pixel value at 2240 m resolution was computed using the mean over all the 30m-pixel values contained within it. Hence, the resulting pixel values at 2240 m resolution represent the purity level that each Sentinel-2 image tile of 224 × 224 10-m pixels has. We illustrated the reprojection and selection processes in Fig. 4.
Example of the workflow to obtain a Sentinel-2 image tile of 2240 × 2240 m for one of the 29 Land-Use and Land-Cover (LULC) classes (e.g. C1: “Barren”). The process starts with the reprojected final global probability map obtained from stage two (Table 5) and ends with its exportation to the repository of a Sentinel-2 image tile of 224 × 224 pixels. The white rectangle is the only one having a probability value of 1 (Recall that the purity threshold used for Barren was 1, i.e., 100%). The black pixels has a null probability value, while the probability values between 0 and 1 are represented in gray scale levels.
For each one of the reprojected maps, we defined a pixel value threshold to decide whether a given 2240 × 2240 m tile was representative of each LULC class or not. Since training DL image classification models needs a large number of high quality (both in terms of image quality and annotation quality) image tiles to reach a good accuracy, when the spatial purity of 100% (full agreement across products in all the pixels of the 224 × 224 tile) resulted in a small number of agreement tiles for a particular class, the purity threshold was decreased for that class until the number of tiles was larger than 1000 or further decreased in less abundant classes to a minimum of 75% of purity. The found purity value is always provided as metadata for each image in the dataset, so the user can always restrict its analysis to those image tiles and classes at any desired purity level. Decreasing the purity threshold down to 75% for the less abundant classes (e.g swamp, mangrove, etc.) was a trade-off between maintaining a good data annotation quality and providing a sufficient number of tiles in each class. In Table 6, we present the number of agreement tiles found at different purity thresholds ranging from 75% to 100% for each LULC class. This spatial purity was not further decreased since machine learning image classification models are known to be robust when the target class is spatially dominant in each training image (it occupies more than 60% of the pixels in the scene)42. On the other hand, when the number of pure tiles for a LULC class was too large to be downloaded (i.e., greater than 14000), we applied a selection algorithm as described in the Supplementary File 3, to download a maximum of 14000 spatially representative images. For this, the world was divided into a one-degree squared cell grid. If a cell contained less than 50 image tiles, we selected them all. If it contained more than 50, we applied that automatic maximum geographic distance algorithm that selected images as far from each other as possible in a number proportional to the number of existing images in that cell. The map in Fig. 6 shows the global distribution of the selected 194877 image tiles contained in Sentinel2GlobalLULC and distributed in 29 LULC classes.
Sentinel2GlobalLULC provides the user with two types of data: Sentinel-2 RGB images (jpeg and geotif versions) and CSV files with associated metadata. In the following subsections, we describe the process for associating metadata, including the Global Human Modification (GHM) index.
As an additional metadata related to the level of human influence in each image, we calculated for each tile in GEE, the spatial mean of the global human modification index for terrestrial lands43, where 0 means no human modification and 1 means complete transformation. Since the original GHM product was mapped at 1 × 1 km resolution, we reprojected it to 2240 × 2240 m using the same reprojection procedure explained in (Re-projection and Selection of purity threshold).
Once the tiles were selected, for each LULC class we listed the image tiles in descendent order of purity. Metadata included: geographical coordinates of each tile centroid, tile purity value, name and ID of the LULC class, and average GHM index for that tile. Then, we used the geographical coordinates of each tile to identify its exact administrative address geolocation. To implement this reverse geo-referencing operation, we used a free request-unlimited python module called reverse_geocoder. This way, we assigned a country code, two levels of administrative departments, and the locality to each tile.
For LULC classes that had more than 14000 pure tiles, we have released the coordinates before and after the distance-based selection in case the user wants to download more tiles or use our consensus coordinates for other purposes.
After extracting all these pieces of information and grouping them into CSV files, we went back to the geographic center coordinates of each tile and used them to extract the corresponding 224 × 224 Sentinel-2 RGB tiles using GEE. Each exported image was identical to the 2240 × 2240 m area covered by its Sentinel-2 tile.
We chose “Sentinel-2 MSI (Multi-Spectral Instrument) product” since it is free and publicly available in GEE at the fine resolution of 10 × 10 m. We chose “Level-1C” (i.e., top-of-atmosphere reflectance) since it provides the longest data availability of Sentinel-2 images without any modification of the data. To build RGB images, we extracted the three bands B4, B3 and B2 that correspond to Red, Green and Blue channels, respectively. More bands available in Sentinel-2 or even in Sentinel-1 images can be incorporated in the future to our dataset. However, computational limitations (i.e., the size of the dataset would be impractical) did not allowed us to handle it as a first goal. In addition, the spatial resolution of the images would be heterogeneous across bands.
To minimize the inherent noise due to atmospheric conditions (e.g. clouds, aerosols, smoke, etc.) that could affect the satellite RGB images, every image was built as a temporal aggregation of all images gathered by Sentinel-2 satellites between June 2015 and October 2020. During this aggregation, only the highest quality images in the corresponding image collection were considered, as we firstly discarded all image instances where the cloud probability exceeded 20% according to the metadata provided in their corresponding Sentinel-2 collection. Then, we calculated the 25th-percentile value between all remaining images for each reflectance band (R, G, and B), and built the final image with the obtained 25-percentile values in each pixel for its RGB bands. The 25th-percentile choice was adopted giving its suitability in atmospheric noise reduction44,45,46,47,48.
Usually, Sentinel-2 MSI product includes true colour images in JPEG2000 format, except for the “Level-1C” collection used here. The three original bands (B4, B3, and B2) required a saturation mapping of their reflectance values into 0–255 RGB digital values. Thus, we mapped the saturation reflectance of 3558 into 255 to obtain true RGB channels with digital values between 0 and 255. The choice of these mapping numbers was taken from the Sentinel-2 true colour image recommendations section of Sentinel user guidelines. Finally, after exporting the selected tiles for each LULC class as “.tif” images, we converted them into “.jpeg” format using a lossless conversion algorithm.
To implement all our methodology steps, we first created a javascript in GEE for each LULC class. Each script is a multi-task javascript where we implemented a switch command to control which task we want to execute (between the spatio-temporal aggregation task, the spatial reprojection and tiles selection task, or the data exportation task). In each one of these scripts, we selected from GEE LULC datasets repository the 15 LULC products used to build the consensus of that LULC class. Each script was responsible of elaborating the spatio-temporal combination of the selected products and generating the final consensus map for that LULC class as described in the subsection “Combining products across time and space”. Then, it exports the final global probability map as an asset into GEE server storage to make its reprojection faster. In the same script, once the consensus map exportation was done, we imported it from the GEE assets storage and reprojected it to 2240 × 2240 m resolution; then, we exported the new reprojected map into GEE assets storage again to make its analysis and processing faster. Afterwards, we imported the reprojected map into the same script and applied different processing tasks. During this processing phase, many purity threshold values were evaluated. Then, we elaborated in this same script the pure tiles identification and their center coordinates exportation into a CSV file. A distinct GEE script was developed to import, reproject and export the global GHM map. The resulted GHM map was saved as an asset too, then imported and used in each one of the 29 LULC multi-task scripts.
A python script was developed separately to read the exported CSV files for each LULC class and apply the reverse geo-referencing on their pure tiles coordinates then add the found geolocalization data (country code, locality…etc) to the original CSV files as new columns. Then, another python script was implemented to read the new resulted CSV files with all their added columns (reverse geo-referencing data, GHM data) and use the center coordinates of each pure tile in that class to export first its corresponding Sentinel-2 satellite geotiff image within GEE through the python API. Finally, after downloading all the selected geotiff images from our Google drive, we created another python script to convert these geotiff images into JPEG format.
Sentinel2GlobalLULC v2.140 dataset is stored in the following Zenodo repository (https://doi.org/10.5281/zenodo.6941662). This dataset consists of three zip compressed folders:
Sentinel-2 GeoTiff images folder: This folder contains the exported Sentinel-2 RGB images for each LULC class grouped into sub-folders named according to each LULC class. Each image has a filename with the following structure: “LULC class ID_LULC class short name_Pixel probability value_Image ID_GHM value_(Latitude,Longitude)_Country code_Administrative department level1_Administrative department level2_Locality”. Pixel probability value can be interpreted as the spatial purity of the image to represent that LULC class and was calculated as the spatial mean of all the pixels of the final probability maps contained in each image tile, reprojected and expressed as a percentage. Short names for all classes were derived from the original ones in a way to have exactly 13 characters each, and IDs for different classes were assigned randomly. This information for each class is explained in Table 7.
Sentinel-2 JPEG images folder: This folder contains the same images as in the GeoTiff folder, but converted into “.jpeg” format while preserving the same nomenclature and organization. In Fig. 5, we illustrate an image tile for each one of the 29 classes in JPEG format.
Image tiles examples for each one of the 29 Land-Use and Land-Cover (LULC) classes contained in the Sentinel2GlobalLULC dataset.
Global map of the distribution of the 2240 × 2240 m tiles representing 29 Land-Use and Land-Cover (LULC) classes that were generated from the spatio-temporal agreement across the 15 global LULC products available in Google Earth Engine. The purity threshold used for each LULC class is specified in Table 6.
CSV files folder: For user convenience, the metadata of every image tile (i.e., the same information already contained in the image filenames) is also provided in CSV format. Image tiles in the CSV files are organized from the highest to the lowest consensus probability value. These CSV files have 12 columns: ID of LULC Class, Short name of LULC Class, ID Image, Pixel Probability Value as percentage, GHM Value, Center Latitude, Center Longitude, Country Code, Administative Departement Level 1, Administative Departement Level 2, Locality, Number of S2 images which represent the number of found instances in the corresponding Sentinel-2 image collection between June 2015 and October 2020, when aggregated and exported as a final image.
For too large LULC classes (i.e., with more than 14000 potential image tiles) that had to undergo the distance-based selection, we provide the user with 2 CSV files: one containing all pure tiles coordinates without geo-referencing columns, and another file containing just the 14000 exported tiles coordinates with their geo-referencing information and metadata.
To provide an independent assessment of the quality of the obtained automatic annotation, two of our co-authors who are experts in vegetation mapping have visually inspected a geographically representative sample of 2900 images from the dataset (100 images per class) selected by an algorithm that maximizes the geographical distance between the selected image tiles. This visual inspection was elaborated using very high resolution imagery from both Google Earth and Bing Maps as ground truth. The validation process was established in three stages: First, for each LULC class, we selected 100 image tiles to visually verify their LULC annotation. To maximize the global representativeness of the validated image tiles, their selection was carried out by maximizing the geographical distance among them using an add-hoc script in R. In Fig. 7, we present the distribution map of the 100 image tiles selected for each LULC class. Second, each one of the selected image tiles was visually inspected in Google Earth and Bing Maps by two of the co-authors (E.G. and D.A-S.) to independently assign it to one of the 29 LULC classes. These two experts assigned each image tile to a LULC class when it occupied more than 70% of the image tile. Third, a confusion matrix for this validation was calculated at six different levels of our LULC classification hierarchy (from L0 to L5 as presented in Fig. 2). In Table 8, we summarized the obtained F1 scores at each level.
Global distribution of the selected 100 images for each Land-Use and Land-Cover (LULC) class to perform the validation of the 29 LULC classes contained in the Sentinel2GlobalLULC dataset. An add-hoc script in R was used to maximize the geographical distance among the 100 points of each class.
The obtained mean F1 scores ranged from 0.99 at level L0 to 0.91 at level L5 (Table 8). Such decrease in accuracy as the number of classes increased from level L0 to level L5 was mainly due to the hard distinction for the human eye between forest types at L5 and to the visual features complexity in Grasslands and Shrublands classes from level L2.
To make the Sentinel2GlobalLULC40 dataset easier to use, reproduce, and exploit and to promote its usage for DL models training, we have provided users with a python code to load all RGB images and train several Convolutional Neural Networks (CNNs) models on them using different learning hyper-parameters. These CNNs can only be trained on Sentinel2GlobalLULC to classify scene images into one of 29 LULC types. Knowing that most CNN frameworks admit only jpeg or png image formats, we provide a python script to convert “.tif” into “.jpeg” format with a full control on the conversion quality. Moreover, since for some LULC classes we limited the number of exported images to 14000, we provide a python script that can help the user to export more Sentinel-2 images and bands of each class if needed, using the coordinates stored in the CSV files.
In addition, to provide a global insight about the consistency and accuracy of the global distribution of these 29 LULC classes, we also publicly share their final reprojected global consensus maps as GEE assets. To assist the user in visualizing the global distribution of each LULC class, we have provide a GEE script with the LULC assets links to import, manipulate, and visualize. Further image exportation is also possible through GEE python API and we gave the user a complete control on the number of tiles to export, the time interval to select for image collections, the cloud removal parameters, the true RGB colors calibration values, and the Google drive account where to store the exported images. The user should be aware that GEE currently imposes a limited request number with a maximum of 3000 exportation tasks to run simultaneously on the same Google account.
In this section, we highlight the limitations of Sentinel2GlobalLULC40 dataset, its suitable DL setting and new perspectives of its usage.
Sentinel2GlobalLULC is specifically designed for scene image classification, so each image was annotated with one LULC class at scene level, not at pixel level. That is, it does not contain mixed classes, such as mixed forests (e.g. where both evergreen and deciduous trees coexist) or mosaics of croplands and natural vegetation, and it does not allow to identify polygons of different classes within an image scene.
Another point that the user should take into consideration is that some LULC classes have an inherently restricted geographical distribution since they only occur in particular environmental conditions of the world (e.g. Mangroves, Swamps, Seasonally flooded croplands, etc.). For these naturally restricted classes, one can not expect to find a broad geographical distribution of the training image tiles in our dataset. Other LULC classes (e.g. different types of forests, shrublands or grasslands, barren lands, etc…) are more widely distributed around the world. However, there exist conceptual and methodological differences across current LULC products on the definition of each class and used methods to map them. As a result of these inconsistencies, for widely distributed classes, one can not expect either to find a continuous geographical distribution of the training image tiles in our dataset. On the one hand, annotation quality of the training dataset is critical to get accurate models and it constitutes the one of the main challenges for the users49. Our approach to maximize the annotation quality was done via consensus across multiple LULC products over the world. On the other hand, a wide representativeness in the training dataset under different environmental conditions per class around the globe is preferred to provide transferability of the model to the widest set of existing geographical locations of each class around the world. Hence, to find a trade-off in our dataset between a wide representativeness across the world for each class while maintaining a high annotation quality, we decreased the threshold for spatial purity up to 75% in some classes. As a result, we provided a larger number of image tiles per class which are geographically distributed around the world in the best way possible. Deep learning models are known to be robust and generalizable in scene classification problems when the training images contain a dominant part of the target class (i.e. the annotated class occupies more than 60% of all pixels in the scene)42. Geographical transferability of DL classification models is known to be high, i.e., models trained with images from one geographical location maintain high classification accuracy when applied to very distant geographical locations50. In addition, it is known that models trained only with a limited part of a data distribution actually reach similar test error than models trained on the complete data distribution51. However, the inherent under-representation of some LULC classes and regions remains a limitation of our dataset, especially in disagreement areas. In addition, the inter-regions variation in terms of spatial patterns within the same LULC class (e.g. croplands in central Europe versus croplands in Subsaharan Africa) could constitute a serious limitation to geographical transferability. Thus, additional data and further analysis of DL performance could be required to help these models reach and maintain the same classification performance in every LULC class and region of the world. To give Sentinel2GlobalLULC users a clear information about the geographic representativeness in the 29 LULC classes, we included in the same repository with the dataset, a compressed file called “Geographic_Representativeness” that contains a csv file for each LULC class with the complete list of countries represented in that class. Each csv file has two columns, the first one gives the country code and the second one gives the number of images provided in that country for that LULC class. In addition to these 29 csv files, we provided another csv file that maps each ISO Alpha-2 country code to its original full country name.
The spatial resolution of the images in our dataset is that of Sentinel-2 RGB bands, i.e. 10 m/pixel, and the annotation is organized in image tiles of 2240 × 2240 m. Hence, this dataset is conceived to build models that use image tiles around 2240 × 2240 m at a spatial resolution of 10 m/pixel. As a result, the output LULC map produced by these models will have a native spatial resolution of 2240 × 2240 m. To overcome this spatial resolution limitation, Image super-resolution (SR) techniques could be of great utility. SR techniques improve various remote sensing applications by allowing users to create finer spatial details than those captured by the original acquisition sensors, and have shown to be very effective in this application52. Thus, a very promising solution for this limitation would be to artificially fine-tune Sentinel2GlobalLULC images resolution using SR as a preprocessing strategy before the training step and to offer more flexibility regarding the spatial resolution at the global mapping step.
Deep learning CNNs are usually trained only with the RGB channels available in each image. Thus, our dataset contains only RBG images. Nevertheless, multi-input CNNs nowadays are effectively combining information provided by different remote sensing sources at different scales and with various data types53. To give Sentinel2GlobalLULC users a possibility to take advantage from these multi-input models, we provided in the shared Github code54 (https://doi.org/10.5281/zenodo.5638409) of our dataset, a data exportation script with a full control on the satellite source to choose (e.g. Sentinel-1..etc) and the spectral bands (e.g NIR, NDVI..etc) they want to export from these satellites.
Another important point that the user should take into consideration is that to build each image in our dataset, we combined Sentinel-2 images that were acquired at all available dates in the corresponding image collection between June 2015 and October 2020. Thus, each image is built from a different number of images since image collections in most locations of the northern hemisphere contains more than those situated in the southern hemisphere. To highlight this difference between both parts of the planet, we present in Fig. 8, the number of Sentinel-2 images (dates) used to build each image tile in the world. In addition, we give in Supplementary File 4, 29 figures similar to Fig. 8, but this time each one represent this number of collected Sentinel-2 dates for a different LULC class (from C1 to C29). Furthermore, we added to the 29 CSV files of Sentinel2GlobalLULC dataset, a new column representing the number of Sentinel-2 aggregated images to composite each exported image (this column is called “Number of S2 images”).
The number of Sentinel-2 dates used to build each image composite in Sentinel2GlobalLULC dataset. This number is represented under different intervals (An individual map for each one of the 29 LULC classes is presented in Supplementary File 4).
The user should be aware that our 25th-percentile composite method was realized on each one of the three reflectance bands (R, G and B) independently, which means that their 25th percentile could have been selected from different dates from June 2015 to October 2022. Despite applying the median independently on each band is a frequent method for compositing time-series of Landsat and Sentinel-2 imagery (e.g.46,48), we used the 25th percentile independently on each band since it is more conservative to remove clouds and other atmospheric noise in very cloudy regions44,45,47. In addition, compositing each band independently was motivated by computational resources limitations in GEE, since extracting the overall 25-percentile of all these 10 m resolution bands combined was more prone to lead to out of memory time-out errors.
Despite these limitations, Sentinel2GlobalLULC remains to our knowledge the first global LULC mapping dataset that includes up to 29 LULC classes, a number much higher than the valuable Dynamic World dataset55, which only provides 9 LULC classes yet.
All used scripts to implement or use our dataset and links to the GEE stored assets are available in the following Github repository54 (https://doi.org/10.5281/zenodo.5638409) repository with guidelines stored in a README file explaining all instructions about their execution.
Di Gregorio, A. Land cover classification system: classification concepts and user manual: LCCS, vol. 2 (Food & Agriculture Org., 2005).
Pielke, R. A. et al. Interactions between the atmosphere and terrestrial ecosystems: influence on weather and climate. Global change biology 4, 461–475 (1998).
Article  ADS  Google Scholar 
Menke, S., Holway, D., Fisher, R. & Jetz, W. Characterizing and predicting species distributions across environments and scales: Argentine ant occurrences in the eye of the beholder. Global Ecology and Biogeography 18, 50–63 (2009).
Article  Google Scholar 
Verburg, P. H., Neumann, K. & Nol, L. Challenges in using land use and land cover data for global change studies. Global change biology 17, 974–989 (2011).
Article  ADS  Google Scholar 
DeFries, R. Terrestrial vegetation in the coupled human-earth system: contributions of remote sensing. Annual Review of Environment and Resources 33, 369–390 (2008).
Article  ADS  Google Scholar 
Pfeifer, M., Disney, M., Quaife, T. & Marchant, R. Terrestrial ecosystems from space: a review of earth observation products for macroecology applications. Global Ecology and Biogeography 21, 603–624 (2012).
Article  Google Scholar 
Quaife, T. et al. Impact of land cover uncertainties on estimates of biospheric carbon fluxes. Global Biogeochemical Cycles 22 (2008).
Herold, M. et al. A joint initiative for harmonization and validation of land cover datasets. IEEE Transactions on Geoscience and Remote Sensing 44, 1719–1727 (2006).
Article  ADS  Google Scholar 
Townshend, J., Justice, C., Li, W., Gurney, C. & McManus, J. Global land cover classification by remote sensing: present capabilities and future possibilities. Remote Sensing of Environment 35, 243–255 (1991).
Article  ADS  Google Scholar 
Loveland, T. R. et al. Development of a global land cover characteristics database and igbp discover from 1 km avhrr data. International Journal of Remote Sensing 21, 1303–1330 (2000).
Article  ADS  Google Scholar 
Bartholome, E. & Belward, A. S. Glc2000: a new approach to global land cover mapping from earth observation data. International Journal of Remote Sensing 26, 1959–1977 (2005).
Article  ADS  Google Scholar 
Tuanmu, M.-N. & Jetz, W. A global 1-km consensus land-cover product for biodiversity and ecosystem modelling. Global Ecology and Biogeography 23, 1031–1045 (2014).
Article  Google Scholar 
Sheng, G., Yang, W., Xu, T. & Sun, H. High-resolution satellite scene classification using a sparse coding based multiple feature combination. International journal of remote sensing 33, 2395–2412 (2012).
Article  ADS  Google Scholar 
Xia, G. et al. Aid: A benchmark dataset for performance evaluation of aerial scene classification. arxiv 2016. arXiv preprint arXiv:1608.05167 (2016).
Xia, G.-S. et al. Structural high-resolution satellite image indexing. In ISPRS TC VII Symposium-100 Years ISPRS 38, 298–303 (2010).
Google Scholar 
Zhao, L., Tang, P. & Huo, L. Feature significance-based multibag-of-visual-words model for remote sensing image scene classification. Journal of Applied Remote Sensing 10, 035004 (2016).
Article  ADS  Google Scholar 
Zhou, W., Newsam, S., Li, C. & Shao, Z. Patternnet: A benchmark dataset for performance evaluation of remote sensing image retrieval. ISPRS journal of photogrammetry and remote sensing 145, 197–209 (2018).
Article  ADS  CAS  Google Scholar 
Cheng, G., Han, J. & Lu, X. Remote sensing image scene classification: Benchmark and state of the art. Proceedings of the IEEE 105, 1865–1883 (2017).
Article  Google Scholar 
Sumbul, G., Charfuelan, M., Demir, B. & Markl, V. Bigearthnet: A large-scale benchmark archive for remote sensing image understanding. In IGARSS 2019-2019 IEEE International Geoscience and Remote Sensing Symposium, 5901–5904 (IEEE, 2019).
Townshend, J. R. & Justice, C. O. Towards operational monitoring of terrestrial systems by moderate-resolution remote sensing. Remote Sensing of Environment 83, 351–359 (2002).
Article  ADS  Google Scholar 
Morisette, J., Privette, J., Strahler, A., Mayaux, P. & Justice, C. An approach for the validation of global land cover products through the committee on earth observing satellites (2003).
McCallum, I., Obersteiner, M., Nilsson, S. & Shvidenko, A. A spatial comparison of four satellite derived 1 km global land cover datasets. International Journal of Applied Earth Observation and Geoinformation 8, 246–255 (2006).
Article  ADS  Google Scholar 
Gao, Y. et al. Consistency analysis and accuracy assessment of three global 30-m land-cover products over the european union using the lucas dataset. Remote Sensing 12, 3479 (2020).
Article  ADS  Google Scholar 
Liu, L. et al. Finer-resolution mapping of global land cover: Recent developments, consistency analysis, and prospects. Journal of Remote Sensing 2021 (2021).
Gengler, S. & Bogaert, P. Combining land cover products using a minimum divergence and a bayesian data fusion approach. International Journal of Geographical Information Science 32, 806–826 (2018).
Article  Google Scholar 
Xu, P., Herold, M., Tsendbazar, N.-E. & Clevers, J. G. Towards a comprehensive and consistent global aquatic land cover characterization framework addressing multiple user needs. Remote Sensing of Environment 250, 112034 (2020).
Article  ADS  Google Scholar 
Fritz, S. et al. Cropland for sub-saharan africa: A synergistic approach using five land cover data sets. Geophysical Research Letters 38 (2011).
Zhu, X. X. et al. Deep learning in remote sensing: A comprehensive review and list of resources. IEEE Geoscience and Remote Sensing Magazine 5, 8–36 (2017).
Article  Google Scholar 
Shrestha, A. & Mahmood, A. Review of deep learning algorithms and architectures. IEEE Access 7, 53040–53065 (2019).
Article  Google Scholar 
Ma, L. et al. Deep learning in remote sensing applications: A meta-analysis and review. ISPRS journal of photogrammetry and remote sensing 152, 166–177 (2019).
Article  ADS  Google Scholar 
Benhammou, Y., Achchab, B., Herrera, F. & Tabik, S. Breakhis based breast cancer automatic diagnosis using deep learning: Taxonomy, survey and insights. Neurocomputing 375, 9–24 (2020).
Article  Google Scholar 
Rawat, W. & Wang, Z. Deep convolutional neural networks for image classification: A comprehensive review. Neural computation 29, 2352–2449 (2017).
Article  MathSciNet  PubMed  MATH  Google Scholar 
Nogueira, K., Penatti, O. A. & Dos Santos, J. A. Towards better exploiting convolutional neural networks for remote sensing scene classification. Pattern Recognition 61, 539–556 (2017).
Article  ADS  Google Scholar 
Zhang, L., Xia, G.-S., Wu, T., Lin, L. & Tai, X. C. Deep learning for remote sensing image understanding (2016).
Luengo, J., Garca-Gil, D., Ramrez-Gallego, S., Garca, S. & Herrera, F. Big data preprocessing – enabling smart data. Cham: Springer (2020).
Ghorbanian, A. et al. Improved land cover map of iran using sentinel imagery within google earth engine and a novel automatic workflow for land cover classification using migrated training samples. ISPRS Journal of Photogrammetry and Remote Sensing 167, 276–288 (2020).
Article  ADS  Google Scholar 
NASS, U. Usda-national agricultural statistics service, cropland data layer. United States Department of Agriculture, National Agricultural Statistics Service, Marketing and Information Services Office, Washington, DC [Available at http//nassgeodata.gmu.edu/Crop-Scape, Last accessed September 2012.] (2003).
Yang, L. et al. A new generation of the united states national land cover database: Requirements, research priorities, design, and implementation strategies. ISPRS Journal of Photogrammetry and Remote Sensing 146, 108–123 (2018).
Article  ADS  Google Scholar 
Helber, P., Bischke, B., Dengel, A. & Borth, D. Eurosat: A novel dataset and deep learning benchmark for land use and land cover classification. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 12, 2217–2226 (2019).
Article  ADS  Google Scholar 
Benhammou, Y. et al. Sentinel2GlobalLULC: A dataset of Sentinel-2 georeferenced RGB imagery acquired between June 2015 and October 2020 annotated for global land use/land cover mapping with deep learning (License CC BY 4.0). Zenodo https://doi.org/10.5281/zenodo.6941662 (2022).
Gorelick, N. et al. Google earth engine: Planetary-scale geospatial analysis for everyone. Remote sensing of Environment 202, 18–27 (2017).
Article  ADS  Google Scholar 
Krizhevsky, A., Sutskever, I. & Hinton, G. E. Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems 25 (2012).
Kennedy, C. M., Oakleaf, J. R., Theobald, D. M., Baruch-Mordo, S. & Kiesecker, J. Managing the middle: A shift in conservation priorities based on the global human modification gradient. Global Change Biology 25, 811–826 (2019).
Article  ADS  PubMed  Google Scholar 
Corbane, C. et al. A global cloud free pixel-based image composite from sentinel-2 data. Data in brief 31, 105737 (2020).
Article  CAS  PubMed  PubMed Central  Google Scholar 
Simonetti, D., Pimple, U., Langner, A. & Marelli, A. Pan-tropical sentinel-2 cloud-free annual composite datasets. Data in Brief 39, 107488 (2021).
Article  CAS  PubMed  PubMed Central  Google Scholar 
Verhegghen, A., Kuzelova, K., Syrris, V., Eva, H. & Achard, F. Mapping canopy cover in african dry forests from the combined use of sentinel-1 and sentinel-2 data: Application to tanzania for the year 2018. Remote Sensing 14, 1522 (2022).
Article  ADS  Google Scholar 
Corbane, C. et al. Convolutional neural networks for global human settlements mapping from sentinel-2 satellite imagery. Neural Computing and Applications 33, 6697–6720 (2021).
Article  Google Scholar 
Griffiths, P., van der Linden, S., Kuemmerle, T. & Hostert, P. A pixel-based landsat compositing algorithm for large area land cover mapping. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 6, 2088–2101 (2013).
Article  ADS  Google Scholar 
Zhang, Q., Yang, L. T., Chen, Z. & Li, P. A survey on deep learning for big data. Information Fusion 42, 146–157 (2018).
Article  ADS  Google Scholar 
Guirado, E., Tabik, S., Alcaraz-Segura, D., Cabello, J. & Herrera, F. Deep-learning versus obia for scattered shrub detection with google earth imagery: Ziziphus lotus as case study. Remote Sensing 9, 1220 (2017).
Article  ADS  Google Scholar 
Nakkiran, P., Neyshabur, B. & Sedghi, H. The deep bootstrap framework: Good online learners are good offline generalizers. arXiv preprint arXiv:2010.08127 (2020).
Wang, Z., Jiang, K., Yi, P., Han, Z. & He, Z. Ultra-dense gan for satellite imagery super-resolution. Neurocomputing 398, 328–337 (2020).
Article  Google Scholar 
Tziolas, N., Tsakiridis, N., Ben-Dor, E., Theocharis, J. & Zalidis, G. Employing a multi-input deep convolutional neural network to derive soil clay content from a synergy of multi-temporal optical and radar imagery data. Remote Sensing 12, 1389 (2020).
Article  ADS  Google Scholar 
Benhammou, Y. Sentinel2GlobalLULC Github code (License CC-BY 4.0), Zenodo, https://doi.org/10.5281/zenodo.5638409 (2021).
Brown, C. F. et al. Dynamic world, near real-time global 10 m land use land cover mapping. Scientific Data 9, 1–17 (2022).
Article  PubMed  PubMed Central  Google Scholar 
Rottensteiner, F. et al. The isprs benchmark on urban object classification and 3d building reconstruction. ISPRS Annals of the Photogrammetry. Remote Sensing and Spatial Information Sciences I-3 (2012), Nr. 1 1, 293–298 (2012).
Google Scholar 
Penatti, O. A., Nogueira, K. & Dos Santos, J. A. Do deep features generalize from everyday objects to remote sensing and aerial scenes domains? In Proceedings of the IEEE conference on computer vision and pattern recognition workshops, 44–51 (2015).
Basu, S. et al. Deepsat: a learning framework for satellite imagery. In Proceedings of the 23rd SIGSPATIAL international conference on advances in geographic information systems, 1–10 (2015).
Yang, Y. & Newsam, S. Bag-of-visual-words and spatial extensions for land-use classification. In Proceedings of the 18th SIGSPATIAL international conference on advances in geographic information systems, 270–279 (2010).
Dai, D. & Yang, W. Satellite image classification via two-layer sparse coding with biased image representation. IEEE Geoscience and Remote Sensing Letters 8, 173–176 (2010).
Article  ADS  Google Scholar 
Zhao, B., Zhong, Y., Xia, G.-S. & Zhang, L. Dirichlet-derived multiple topic scene classification model for high spatial resolution remote sensing imagery. IEEE Transactions on Geoscience and Remote Sensing 54, 2108–2123 (2015).
Article  ADS  Google Scholar 
Zou, Q., Ni, L., Zhang, T. & Wang, Q. Deep learning based feature selection for remote sensing scene classification. IEEE Geoscience and Remote Sensing Letters 12, 2321–2325 (2015).
Article  ADS  Google Scholar 
Xia, G.-S. et al. Aid: A benchmark data set for performance evaluation of aerial scene classification. IEEE Transactions on Geoscience and Remote Sensing 55, 3965–3981 (2017).
Article  ADS  Google Scholar 
Van Etten, A. et al. The multi-temporal urban development spacenet dataset. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 6398–6407 (2021).
Sulla-Menashe, D. & Friedl, M. A. User guide to collection 6 modis land cover (mcd12q1 and mcd12c1) product. USGS: Reston, VA, USA 1–18 (2018).
Buchhorn, M. et al. Copernicus Global Land Cover Layers—Collection 2. Remote Sensing 12, 1044 (2020).
Article  ADS  Google Scholar 
Sexton, J. O. et al. Global, 30-m resolution continuous fields of tree cover: Landsat-based rescaling of modis vegetation continuous fields with lidar-based estimates of error. International Journal of Digital Earth 6, 427–448 (2013).
Article  ADS  Google Scholar 
Teluguntla, P. et al. Global Cropland Area Database (GCAD) derived from Remote Sensing in Support of Food Security in the Twenty-first Century: Current Achievements and Future Possibilities, vol. 2, chap. 7, 131–159 (Taylor & Francis, 2015).
Shimada, M. et al. New global forest/non-forest maps from alos palsar data (2007–2010). Remote Sensing of environment 155, 13–31 (2014).
Article  ADS  Google Scholar 
Hansen, M. C. et al. High-resolution global maps of 21st-century forest cover change. science 342, 850–853 (2013).
Article  ADS  CAS  PubMed  Google Scholar 
Simard, M., Pinto, N., Fisher, J. B. & Baccini, A. Mapping forest canopy height globally with spaceborne lidar. Journal of Geophysical Research: Biogeosciences 116 (2011).
Pekel, J.-F., Cottam, A., Gorelick, N. & Belward, A. S. High-resolution mapping of global surface water and its long-term changes. Nature 540, 418–422 (2016).
Article  ADS  CAS  PubMed  Google Scholar 
Gong, P. et al. Annual maps of global artificial impervious area (gaia) between 1985 and 2018. Remote Sensing of Environment 236, 111510 (2020).
Download references
This work is part of the project “Thematic Center on Mountain Ecosystem & Remote sensing, Deep learning-AI e-Services University of Granada-Sierra Nevada” (LifeWatch-2019-10-UGR-01), which has been co-funded by the Ministry of Science and Innovation through the FEDER funds from the Spanish Pluriregional Operational Program 2014-2020 (POPE), LifeWatch-ERIC action line, within the Workpackages LifeWatch-2019-10-UGR-01 WP-8, LifeWatch-2019-10-UGR-01 WP-7 and LifeWatch-2019-10-UGR-01 WP-4. This work was also supported by projects A-RNM-256-UGR18, A-TIC-458-UGR18, PID2020-119478GB-I00 and P18-FR-4961. E.G. was supported by the European Research Council grant agreement n° 647038 (BIODESERT) and the Generalitat Valenciana, and the European Social Fund (APOSTD/2021/188). We thank the “Programa de Unidades de Excelencia del Plan Propio” of the University of Granada for partially covering the article processing charge.
Department of Computer Science and Artificial Intelligence, Andalusian Research Institute in Data Science and Computational Intelligence, DaSCI, University of Granada, 18071, Granada, Spain
Yassir Benhammou, Rohaifa Khaldi, Francisco Herrera & Siham Tabik
Systems Analysis and Modeling for Decision Support Laboratory, Higher National School of Applied Sciences of Berrechid, Hassan 1st University, Berrechid, 218, Morocco
Yassir Benhammou & Boujemâa Achchab
LifeWatch-ERIC ICT Core, 41071, Seville, Spain
Yassir Benhammou & Rohaifa Khaldi
Department of Botany, Faculty of Science, University of Granada, 18071, Granada, Spain
Domingo Alcaraz-Segura
iEcolab, Inter-University Institute for Earth System Research, University of Granada, 18006, Granada, Spain
Domingo Alcaraz-Segura
Andalusian Center for Assessment and Monitoring of Global Change (CAESCG), University of Almería, 04120, Almería, Spain
Domingo Alcaraz-Segura & Emilio Guirado
Multidisciplinary Institute for Environment Studies “Ramon Margalef”, University of Alicante, San Vicente del Raspeig, 03690, Alicante, Spain
Emilio Guirado
You can also search for this author in PubMed Google Scholar
You can also search for this author in PubMed Google Scholar
You can also search for this author in PubMed Google Scholar
You can also search for this author in PubMed Google Scholar
You can also search for this author in PubMed Google Scholar
You can also search for this author in PubMed Google Scholar
You can also search for this author in PubMed Google Scholar
Y.B. contributed to the conception of the dataset, implemented the code, performed all the data extraction and wrote the paper. D.A.-S. contributed to the conception and validation of the dataset, provided guidance, and wrote the paper. E.G. validated the dataset. R.K. contributed to the conception of the dataset. F.H. and B.A. provided edits and suggestions. S.T. contributed to the conception of the dataset and wrote the manuscript.
Correspondence to Yassir Benhammou, Domingo Alcaraz-Segura, Emilio Guirado or Siham Tabik.
The authors declare no competing interests.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
Reprints and Permissions
Benhammou, Y., Alcaraz-Segura, D., Guirado, E. et al. Sentinel2GlobalLULC: A Sentinel-2 RGB image tile dataset for global land use/cover mapping with deep learning. Sci Data 9, 681 (2022). https://doi.org/10.1038/s41597-022-01775-8
Download citation
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41597-022-01775-8
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative
Advertisement
Scientific Data (Sci Data) ISSN 2052-4463 (online)
© 2022 Springer Nature Limited
Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

source