Background & Summary

Planktonic Foraminifera are marine unicellular eukaryotes with calcareous shells and chambered tests. Building on the classical pioneering works of Bradshaw1, Bé and Tolderlund2, and Bé3, planktonic Foraminifera (phylum of the Rhizaria supergroup) contain about 50 extant morphospecies in the global ocean4,5,6,7. Planktonic Foraminifera are sensitive to environmental conditions, many of which are registered by the chemical composition of their calcareous shells. As a result, their fossil record is widely used to reconstruct paleo-environments8,9,10.

Understanding the impacts of climate change on planet Earth and its ecosystems, especially the vast expanses of the surface ocean, is a global challenge and thus central to many ecological and biogeochemical studies11,12,13,14. Until now, the impacts of anthropogenic stressors on the distribution and biodiversity of planktonic organisms are poorly understood at the global scale14. Hence, better knowledge of the role of multiple stressors on the dynamics of modern planktonic communities and observational data on the distribution and biodiversity in the global ocean are required to assess past, present, and future developments of the marine ecosystem in response to expected changes of the global marine environment15,16,17. Most of the planktonic Foraminifera species live between the surface and the seasonal thermocline of the open ocean5,18,19, exposing them to a multitude of stressors including anthropogenic effects such as ocean acidification. Ongoing global warming combined with chemical changes in ambient seawater is affecting their calcification, biodiversity, and distribution at the community levels20,21,22,23,24.

As a ubiquitous but minor part of the total marine biomass25, planktonic Foraminifera serve as a model for pelagic biodiversity studies26,27, though their potential has been mostly explored in paleoenvironmental studies28. The FORCIS project evaluates changes in the diversity, distribution and abundance of planktonic Foraminifera (vertical and horizontal) in response to multiple climatic stressors by compiling data on samples from water column at the global scale29.

The FORCIS database contains data on planktonic Foraminifera abundance in the global ocean from plankton tow, Continuous Plankton Recorder (CPR), plankton pump, and sediment trap samples, and is meant to provide a synoptic view from the earliest observations in 1910 until 2018 (Figs. 1, 2). These data are based on physically extracted organisms rather than in situ imaging techniques. Data obtained from plankton nets, plankton pumps and CPRs take “snapshots” of the distributions of both living and dead Foraminifera species in the water column, while data obtained from sediment traps are “time integrators” of mostly fluxes of dead individuals (tests) settling from the surface ocean. The CPR and plankton pump collecting techniques mainly sample surface waters to about 10 m depth30,31, while oblique or vertical towed plankton nets, such as the commonly used Multinet (e.g., Schiebel12), sample the productive water column to the export zone (hundreds of meters) over single or multiple depth levels (Fig. 1).

Fig. 1
figure 1

Schematic representation of the sampling devices deployed to collect modern planktonic Foraminifera from the global ocean at different depth levels, from a “snapshot” to an averaged time record, and integrated into the FORCIS database. CPR and plankton pump are sampling mainly the living planktonic Foraminifera living (yellow dots) in the upper ocean. The sediment trap is collecting mainly dead Foraminifera fluxes (white dots). The plankton net and multinet are sampling larger depth ranges. Arrows indicate resolution of the depth level(s).

Fig. 2
figure 2

(A) Temporal and spatial coverage of the FORCIS data at 4 × 4 degree (latitude and longitude) grid resolution colored for the time series range (years) of each cell. (B) Geographical locations of all records included in the FORCIS database.

The data presented in the FORCIS database aim to improve our understanding about (1) potential spatial and vertical migrations, (2) phenology, and (3) the various of effects global climate change on the planktonic Foraminifera biogeography, as well as their vertical and seasonal distribution over the past decades. Because of the temporal range of the observations, the database can also be used to investigate the impact of anthropogenic ocean change on planktonic Foraminifera distribution and ecology.

Methods

Data collection

The database currently includes 188,000 planktonic Foraminifera subsamples. Each subsample represents one aliquot of planktonic foraminifera specimens collected within a specific depth range, time interval, size fraction range or identified as living or dead, at a single location sampled via plankton pumps and nets, CPR and sediment traps (more details in the Data Records section). The compiled data were gathered from published scientific literature, PhD or master’s theses, books, databases, unpublished datasets, and reports. Some data were directly provided by contributors from the FORCIS working group and their personal networks. The dataset includes contributions from around 140 published and unpublished references1,23,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127,128,129,130,131,132,133,134,135,136,137,138,139,140,141,142,143,144,145,146,147,148,149,150,151,152,153,154,155,156,157,158,159,160,161,162,163,164,165,166,167,168 reporting on the diversity and distribution of planktonic Foraminifera, spanning a time interval of more than a century (1910–2018). Most of the datasets published before 1960 were digitized from tables or plots from dissertations (some available as a hardcopy only) and scientific papers, or sourced from the contributor’s digital data files. Data on planktonic Foraminifera were extracted manually or automatically using WebPlotDigitizer-4.3 software169, including number concentrations of tests (ind/m3), relative numbers (%), and fluxes (ind/m2/day) collected from the global ocean using plankton nets, CPR, plankton pumps, and sediment traps (Table 1 and Fig. 2A,B). Moreover, data indicating only the presence or the absence of species were also retrieved. Binned data of species (i.e., estimated number concentrations or percentage concentrations reported for a minimum and maximum depth range like for the CPR data in the North Atlantic) were also collected and included in the FORCIS database. Foraminifera abundance data are divided into four categories, i.e., raw values (numbers of individuals), number concentrations (ind/m3), percentage concentrations (%), or fluxes (ind/m2/day).

Table 1 Temporal and spatial coverages of the planktonic Foraminifera samples in the FORCIS database and number of primary keys in the tables sites, profiles, samples, and subsamples.

Database design and architecture

The FORCIS database is composed of ten tables with the counts of modern planktonic Foraminifera and metadata (Table 2), built and designed using PostgreSQL, which allows filtering and quality checking of the data during the importing and extraction steps. In the data tables, sites, profiles, casts, samples, subsamples, and counts are interconnected by five unique identifiers (primary keys) (Fig. 3) labeled as ‘site_id’, ‘profile_id’, ‘cast_id’, ‘sample_id’, and ‘subsample_id’. They present hierarchical levels and are automatically generated according to naming rules if not provided in the original dataset, during data import.

Table 2 FORCIS database main table and primary key description.
Fig. 3
figure 3

Methodology and structuring FORCIS database compilation: from data collection and different access levels to the final published database.

Each site (‘site_id’) is characterized by its geographic information (coordinates in longitude and latitude form). Associated information includes water depth and the name of the ocean basin. Data from the same site are separated into different profiles (‘profile_id’) according to their collection time (“profile_date_time”). Profiles are made of different casts, which are defined by their sampling device and depth. Each cast (having a unique ‘cast_id’) contains at least one sample, which is given a unique sample ID (‘sample_id’) based on their sampling time and depth. Subsamples with unique IDs are used to distinguish samples from different size fractions (size_fraction, min and max), and living or dead specimens when available. Finally, each individual count value is identified by its species name, and belongs to a unique set of ‘subsample_id’, ‘sample_id’, ‘cast_id’, ‘profile_id’ and ‘site_id’. Each line in the database is assigned to a reference (‘ref_id’), for example, a published paper, manuscript, book, or unpublished study with information on the data contributor. Moreover, the FORCIS database comprises information regarding the sampling methodology (i.e., sampling device, mesh size), and nature of the subsamples where applicable, for example, distinction between individuals with filled or empty tests (often referred to as “dead specimens”).

Eight major ocean basins are distinguished, the Arctic, Antarctic, South Pacific, North Pacific, South Atlantic, North Atlantic, Indian Ocean, and the Mediterranean Sea using QGIS software (3.16 Hannover; available online https://www.qgis.org/es/site/), whose boundaries were defined by adapting the shapefile published by the International Hydrographic Organization (IHO) database map170 (Fig. 4). Each sampling site (‘site_id’) in the FORCIS database is associated with its respective oceanic basin.

Fig. 4
figure 4

Oceanic basin boundaries defined by the International Hydrographic Organization (IHO) database map. Note that white areas were excluded.

Importing the datasets

The metadata and count data were collected using a spreadsheet template (available at Zenodo171). Several variables in this template are mandatory fields to make sure that related tables of the database can be filled and linked together, i.e., for each hierarchical level: site coordinates (‘site_lon_start_decimal’, ‘site_lat_start_decimal’); profile date (‘profile_date_time’); cast information (‘cast_min_depth’, ‘cast_max_depth’, ‘cast_sampling_device_name’); sample information (‘sampling_device_type’, ‘sample_min_depth’, ‘sample_max_depth’, ‘sample_date_time_start’); subsample information (‘subsample_count_type’, ‘subsample_size_fraction_min’), and bibliographic information (either full reference or ‘doi’ for recent datasets).

For data safety, updates of the FORCIS database were routinely saved under different versions and stored in a SQL server. In parallel, data quality control and curation were done during the database development to ensure maximum quality consistency.

Database harmonization, curation, and quality control

All dataset entries underwent a series of quality control, curation, harmonization, and standardization steps, during and after inclusion in the database, in close collaboration between database managers and data contributors. For example, ‘site_lon_start_decimal’ and ‘site_lat_start_decimal’ were quality controlled and checked, for example, for redundancies and inconsistencies to avoid replicating datasets. Imported data were first screened by the database managers to check for inconsistencies such as negative depth range (i.e., minimum depth larger than maximum depth), and second by the members of FORCIS working group, to apply quality control and minimize the errors. Different maps were generated to validate the geographical data distribution, check and correct our entries for outliers. For example, maps of species distributions were produced to check for regional and ecological plausibility that helped quality check the dataset for mistyping while assembling the data and for the taxonomy harmonization. However, none of the data retrieved from the original publication was corrected or excluded. Species counting information distinguishes the absence of information (NA, Not Available) from the absence of the specimens (value of zero).

Harmonization of the taxonomy

Species names were initially kept in the database as given by the data contributor or by the original publication, with minor corrections being made for spelling errors. Genus attributions had to be harmonized to a common standard5,7,172, to ensure that each taxon is labeled in the database with a unique binomen or trinomen. Finally, abbreviations and names referring to further attributes that can be taxonomically significant (shell pigmentation, coiling direction) were also harmonized to a common standard, to facilitate automated analysis. The resulting list of harmonized binomina or trinomina with harmonized names of additional attributes (original taxonomy) was used to resolve two further taxonomic issues: synonymy (different names given to the same underlying taxon) and shifting taxonomic concepts (splitting or lumping, including new taxa). Both issues result from the fact that formal taxonomy always reflects the opinion of the author and is subject to change as new knowledge emerges. The resulting “validated taxonomy” contains 55 species and categories, preserving in a consistent manner information that may be taxonomically relevant, but is presently not reflected in formal nomenclature (coiling direction, presence of specifically shaped terminal chambers).

Since 1960, new species have been described that were not recognised before or lumped with others (e.g., Neogloboquadrina incompta173, Globorotalia eastropacia174, Berggrenia pumilio175, Globigerinella calida176, Globorotalia cavernula176, Globorotalia ungulata177, Orcadia riedeli178, Tenuitellita fleisheri179). As such additional information is not always provided, the validated taxonomy has been subsequently mapped onto a “lumped taxonomy” comprising 46 species and categories that could be recognized in all datasets. The names used for all formally described taxa follow Schiebel and Hemleben5 and references therein, as expanded by Morard et al.172, and revised by Brummer and Kucera7 and references therein.

In most cases, the mapping of synonyms onto the validated taxonomy and the contraction of the validated taxonomy onto the lumped taxonomy was straightforward and the procedure can be understood directly from the synonym lists provided (available at Zenodo171). There are two notable exceptions, which require explanations. The first concerns the treatment of coiling variants in the abundant and variable genus Neogloboquadrina. In the high latitudes, oppositely coiled N. pachyderma have been often, but not always, recognized and counted separately. Darling et al.180 confirmed that the coiling variants represent different genetically distinct lineages, so that sinistral specimens are assigned to N. pachyderma and dextral specimens to N. incompta. Where coiling direction was not recorded, the counts are reported as the sum of both species (n_pachyderma_any). The second exception concerns the species Globigerinoides ruber, where the presence of pink- and white-pigmented specimens and the erroneous synonymization of G. elongatus with G. ruber resulted in complex and often ambiguous taxonomic attributions. The nomenclature of G. ruber in the FORCIS database follows the concept of Morard et al.172, with pink-pigmented specimens, when counted separately, being named G. ruber ruber, and non-pigmented specimens being attributed either to G. ruber albus with inflated chambers, or G. elongatus with compressed chambers. Where the distinction between G. ruber albus and G. elongatus has not been made, the counts are reported as the sum of both species (g_ruber_albus_or_elongatus). In cases where not even the shell pigmentation has been considered, we only report the count for all three categories together (g_ruber_any).

Extracting data

The hierarchical structure of the database, split into different related tables, facilitates swift extraction of large merged data volumes. It is possible to retrieve count data and/or metadata separately and to apply filters to extract specific sub-datasets.

As the SQL was only used to develop and quality check the FORCIS database, the finalized version of the database was extracted from the SQL and converted to “.csv” files and made available on Zenodo to facilitate the handling of the data for the users. To facilitate the handling of the database in the Zenodo “.csv” files, an R-package was compiled (https://frbcesab.github.io/forcis/), providing basic functions to extract the data from the different files based on different taxonomy levels and harmonize the species counts into a unique count type.

In the final published database, all data coming from different sampling devices were put into separate “.csv” files. Only the data of the CPR from the Southern Hemisphere have been separated from those CPR data collected from the Northern Hemisphere as the data structure is different (species-level resolved counts vs. binned total counts, respectively). Each of the 5 “.csv” files contain metadata and original species counts.

Updates on the last database versions will be released in csv format. We foresee a continuous update of the database depending on the number of new datasets published. The labels of updated versions of the released “.csv” files will contain the date of their publication and versioning number.

Data Records

The FORCIS database is published as five “.csv” files composed of data from four types of sampling devices, i.e., plankton tows, plankton pump, CPR (“.csv” file for each data from the Southern and from the Northern Hemispheres), and moored sediment traps, and the associated dataset is uploaded on the Zenodo repository171 (Fig. 2). These files encompass more than 188,000 subsamples including ~157,000 CPR (since 1991), ~22,000 net (since 1910), ~9,000 sediment trap (since 1978), and 400 pump (since 1985) subsamples (Table 1).

The data in FORCIS are presented as follows: each row in the database is a subsample (i.e., one single plankton aliquot collected within a water depth range, time interval, size fraction, at a single location) associated to 1) “block 1”: the metadata (i.e., location, date, depth, cast, environmental data of this record), and 2) “block 2”: the original data as reported in the data sources (abundance and/or diversity). The FORCIS database metadata has a hierarchical structure (Fig. 3): first, all sites are assigned to a site_id associated with the coordinates (site_lon_start_decimal and site_lon_end_decimal) and site_ocean_basin. Then, for each profile collected at the different site, a profile_id is attributed, based on the profile_date_time (time of the collection) and coordinates (Table 2). The depth range (profile_depth_min and profile_depth_max) of each profile, and environmental data including ambient seawater chemistry (profile_env_data_availability and profile_chemical_data_availability), and profile_season are given. Information regarding the different cast_id used for each profile_id is provided in the metadata block, such as: cast_sampling_device_name, cast_min_depth, cast_max_depth, cast_mesh_size of the plankton tow. For each individual sample, a sample_id is assigned, including depth range (sample_min_depth and sample_max_depth), sample_volume_filtered (for net data), coordinates (sample_lon and sample_lat), sample_segment_length (for CPR data), date of sampling (sample_date_time_start and sample_date_time_end), and in situ temperature and salinity data (sample_in_situ_temperature and sample_in_situ_salinity). Each sample can be divided into different subsample_id based on their size (subsample_size_fraction_min, subsample_size_fraction_max) and/or filled or not tests (subsample_living_or_dead). Other information is also reported in this table such as: subsample_count_type, subsample_sieved_or_measured, and subsample_storage_type and subsample_splitting_type. The contributors who provided the data are given in the column contributors, and the source of their data (ref_id and source) is reported for each subsample.

Each subsample is associated with its corresponding counts that could be either the abundance of a species or the total number of Foraminifera specimens (i.e., those not identified at the species level), and reported in the table count. The species names are kept as they were reported in the original data source and listed as species names in block 2.

Two taxonomic levels (level 1 “validated taxonomy” and level 2 “lumped taxonomy”) can be generated in two separated blocks (block 3 for taxonomy level 1, and block 4 for taxonomy level 2; Fig. 3).

Technical Validation

The compilation of ~188,000 subsamples resulted in a high number of counts in the FORCIS database (more than 1,300,000 species counts and ~1,200,000 non-zero counts), compared to fossil planktonic Foraminifera databases such as ForCenS (~4,000 subsamples, and ~ 60,000 counts) that reports data of the planktonic Foraminifera found in the surface sediment samples181. The Triton database182 holds ~500,000 non-zero counts of planktonic Foraminifera occurrences during the Cenozoic. However, the FORCIS database holds a lower number of samples compared to the COPEPOD database (~400,000) which is a global-coverage database of zooplankton abundance, phytoplankton abundance, and zooplankton biomass data183.

Temporal data coverage varies temporally and spatially, but is highest after 1990 and in the Northern Hemisphere (Fig. 5). The plankton net dataset presents the widest temporal (from 1910 until 2017), and spatial ranges (from 61° S to 86° N, and 180° W to 180° E, Table 1). The sediment trap dataset includes data from 1978 to 2018, from 65° S to 77° N, and 177° W to 179° E. The CPR dataset covers the subtropical to polar oceans, from 30° N to 79° N, and 79° W to 20° E in the Northern Hemisphere, and from 77° S to 40° S, and 180° W to 180° E in the Southern Hemisphere. All CPR samples included here30,31 were collected during a time period from 1991 to 2018 (Fig. 5A). The pump dataset has the smallest regional coverage ranging from 22° S to 53° N, and 39° W to 143° E.

Fig. 5
figure 5

Number of subsamples collected by CPR (A), plankton net, plankton pump and sediment traps (B) per year in the FORCIS database.

Despite more than a century of work, large parts of the ocean have remained unsampled for planktonic Foraminifera, e.g., the Southern Pacific Ocean (Fig. 2). The temporal coverage of the FORCIS database exposes a low sampling effort especially during the time period before 1960, with only ~1,000 subsamples collected between 1910 and 1960 (Fig. 5B). In addition, few datasets are available from certain seasons, such as winter data from high latitudes due to the lack of sampling campaigns.

The FORCIS database comprises an extensive coverage of the Northern Hemisphere (Fig. 2A), especially of the North Atlantic Ocean. In contrast, plankton tows and sediment traps from the Southern Ocean are sparse due to difficulties associated with sampling in remote and stormy regions. However, despite these temporal and spatial gaps, the amount of data in FORCIS covers broad swaths of the global ocean and facilitates comparison of changes in distribution and diversity within and between different provinces over time (Fig. 2B).

Although FORCIS contains fewer species per sample than the coretop synthesis ForCenS (6 vs. 15), it contains more species than ForCenS when using the same taxonomic level in both databases, i.e., 46 vs. 40 species, respectively. The main reason for this difference is the coarser size fraction in ForCenS, which is limited to ≥150 μm181 vs. the finer size fractions in FORCIS that extend down to 30 μm; only these latter finer size fractions include small-sized species such as S. globigerus, N. vivans, O. riedeli, T. clarkei, T. fleisheri and T. parkerae, which are not included in ForCenS181,184.

Moreover, more species are documented in FORCIS compared to core-top sediment databases (e.g., CLIMAP185, Brown Foraminiferal Database186, ForCens), and the use of species names is not fully complementary between this study and the earlier databases. In addition, thin shells of small-sized species such as O. riedeli and T. parkerae may dissolve during settling in the water column before reaching the ocean floor and are therefore not present in ForCenS187,188.

Usage Notes

Filtering of data in the FORCIS database allows the user to select particular datasets (e.g., by latitude, longitude, season, ocean basin, year). Seasons were distinguished between the Northern Hemisphere (defining Autumn by September, October, November; Winter by December, January, February; Spring by March, April, May and Summer by June, July, August) and Southern Hemisphere (defining Spring by September, October, November; Summer by December, January, February; Autumn by March, April, and Winter by June, July, August). The type of original count data was kept in FORCIS as reported in the original study (raw, number concentration (ind/m−3), percentage concentration (%), bin or fluxes). Data are presented as counts of the identified specimens or total abundance of all the species found in the sample including unidentified specimens. In the latter case, the count is reported in the column unidentified_specimens.

In most cases, the number concentration is given by the data contributors, in others, the sampled volume of seawater could be calculated for vertical tows using the surface area of the net times the depth interval. When the total number of Foraminifera or volume of sampled seawater are not provided, the number concentration cannot be calculated (see column on subsample_absolute_abundance_available). The number concentration reported in FORCIS are raw numbers corrected for split and the filtered volume when available, but are not standardized for either the mesh size or sieve size fraction. This is important since different sizes will significantly affect number and percentage concentrations (e.g., Berger, 1969).

The column subsample_count_type gives the type of count reported in the database. All the counts reported as 0 (zero) in the original study were kept in the FORCIS database, which means that the respective species was not found in the sample. However, the absence of species has not always been consistently recorded because of different counting procedures (e.g., researchers working in the polar areas have not consistently reported the absence of tropical species). To express this, the column subsample_all_shells_present_were_counted helps the user to identify in which datasets a species may have been present but was not counted. For subsamples with “complete” taxonomic coverage, the entry in this column is “true”.

All counts without clear location (nine subsamples) and/or date of sampling (274 subsamples) were kept in the database even though they cannot be used directly for spatial and time-series analyses. A note has been associated with the corresponding subsamples.

The number_of_species_counted was calculated when all the species were counted in the subsample and provided for both levels of taxonomy (in block 3 and block 4) based on the number of planktonic Foraminifera species observed in each subsample. The number of benthic species was included in FORCIS when given in the original data source but is not included in the calculation of Foraminifera diversity.

Finally, the FORCIS database will be open for any new data entry, and the FORCIS project warmly welcomes any new data published or provided by any contributor by submitting the data through our website (https://forcis.cerege.fr/).