Point Schema¶
All data indexed by Argovis which can be cast as point data - in general, any piece of data attached to a set of spacetime coodinates - are represented by the JSON schema described in this document. Argovis’ point data schema is hierarchical, in that all such schema inherit from the base point data schema; this creates two main advantages:
By requiring all point data encode their common features like coordinates, IDs and other universal metadata identically, it is trivial to include new data products in co-location APIs that search on these parametes.
Flexibility to represent arbitrary point data is maintained by allowing subject matter experts for a given measurement class to extend the base point data schema with additional schema that represent their specific instrument or measurement.
In this way, we try to capture the flexibility of schema-less databases like MongoDB, while imposing just enough structure to capture the data validation opportunities afforded by strictly defined schema.
How to read this schema¶
Each entry in the schema fragments below contain a few keys:
type: the primitive type, format, or object description of a valid entry for this field
description: short comment on what this variable is
fill value (optional): what this should be filled by if absent
current vocabulary (optional): current set of possible values for this key, with explanations as required.
Schema enforcement & population¶
All these schema are enforced via MongoDB’s built-in schema validation; those schema validation rules are defined in the pointSchema.py script at https://github.com/argovis/db-schema.
Once an empty point collection is generated by the script above, it is populated by pipelines that translate from the formats of upstream data providers, to MongoDB-appropriate JSON that matches these schemas. Each schema extension section below contains links to the code backing these pipelines.
Base point data schema¶
This section describes the base point data schema from which all other point data in Argovis must inherit.
Required keys¶
All point data MUST include the following keys:
_idtype: string
description: a globally unique identifier for this record.
basintype: int
description: integer index of basin. Can be provided by Argovis based on lat/lon in
geolocation.fill value: -1, used if reported lon/lat are on land.
data_typetype: string
description: token indicating the general class of data
current vocabulary:
oceanicProfile,tropicalCyclone
date_updated_argovistype: ISO 8601 UTC datestring, example
1999-12-31T23:59:59Zdescription: time the record was added to Argovis
geolocationtype: geojson Point object
description: geojson Point tagging the lon/lat of this record.
fill value:
{"type": "Point", "coordinates": [0, -90]}
source_info.sourcetype: array of strings
description: data origin, typically used to label major project subdivisions
current vocabulary: defined per project, see schema extensions below.
timestamptype: ISO 8601 UTC datestring, example
1999-12-31T23:59:59Zdescription: time the record measurement was made at.
fill_value:
9999-01-01T00:00:00Z
Optional Keys¶
All point data MAY include the following keys; if meaningful data is available for any of these keys, it should be represented in the format described.
countrytype: string
description: ISO 3166-1 country code.
datatype: array of non-nested JSON documents
description: array indexes depth / altitude; individual documents are key/value pairs describing measurements made. Pressure or altitude must be present as one of the document keys. Example:
{pres: 4.7, psal: 33.987, temp: 1.107, temp_argoqc: 1}current vocabulary: defined per project, see schema extensions below.
data_centertype: string
description: entity responsible for processing this record, once received.
current vocabulary: defined per project, see schema extensions below.
data_keysmandatory ifdatais presenttype: array of strings
description: a complete list of all the keys found in any document in the
dataobject.
data_warningtype: array of strings
description: short string tokens indicating possible problems with this record.
current vocabulary:
degenerate_levels: data is reported twice for a given pressure / altitude level in a way that cannot be readily resolvedmissing_basin: unable to determine meaningful basin code, despite having a meaningful lat / lon (edge case in basins lookup grid)missing_location: one or both of longitude and latitude are missingmissing_timestamp: no date or time of measurement associated with this profile.
doitype: string
description: DOI for this record.
instrumenttype: string
description: string token describing the device used to make this measurement, like
profiling_float,ship_ctdetc.current vocabulary: TBD
pi_nametype: array of strings
description: name(s) of principle investigator(s)
platform_idtype: string
description: unique identifier for the platform or device responsible for making the measurements included in this record.
platform_typetype: string
description: make or model of the platform.
current vocabulary: TBD
source_info.data_keys_sourcetype: array of strings
description: list of measurement parameters as found in the source file
source_info.date_updated_sourcetype: ISO 8601 UTC datestring, example
1999-12-31T23:59:59Zdescription: date and time the upstream source file for this record was last modified
source_info.source_urltype: string
description: URL to download the original file from which the Argovis record was derived.
Argo profile schema extension¶
All Argo data in Argovis is described as the union of the base point data schema and the following.
Population pipeline¶
The point collection is populated with Argo data via the pipeline descibed at https://github.com/argovis/ifremer-sync.
Base point schema vocabularies¶
The following keys from the base point schema have these vocabularies for Argovis:
datakeys:“bbp470”
“bbp532”
“bbp700”
“bbp700_2”
“bisulfide”
“cdom”
“chla”
“cndx”
“cp660”
“down_irradiance380”
“down_irradiance412”
“down_irradiance442”
“down_irradiance443”
“down_irradiance490”
“down_irradiance555”
“down_irradiance670”
“downwelling_par”
“doxy”
“doxy2”
“doxy3”
“molar_doxy”
“nitrate”
“ph_in_situ_total”
“pres”
“psal”
“psal_sfile”
“temp”
“temp_sfile”
“turbidity”
“up_radiance412”
“up_radiance443”
“up_radiance490”
“up_radiance555”
and all the same again with “_argoqc” appended for the corresponding QC measurements.
data_center: “AO”,”BO”,”CS”,”HZ”,”IF”,”IN”,”JA”,”KM”,”KO”,”ME”,”NM”source_info.source: “argo_core”, “argo_bgc” and “argo_deep”
Required keys¶
cycle_numbertype: int
description: probe cycle index
Optional keys¶
data_keys_modetype: non-nested JSON document
description: JSON document with keys matching the entries of
data_keys, and values indicating the variable’s data modecurrent vocabulary:
Realtime, realtimeAdjusted, orDelayed mode.
fleetmonitoringtype: string
description: URL for this float at https://fleetmonitoring.euro-argo.eu/float/
geolocation_argoqctype: int
description: Argo’s position QC flag
fill value: -1
oceanopstype: string
description: URL for this float at https://www.ocean-ops.org/board/wa/Platform
positioning_systemtype: string
description: positioning system for this float.
vocabulary: see Argo ref table 9
profile_directiontype: string
description: whether the profile was gathered as the float ascended or descended
current vocabulary:
Ascending orDescending.
timestamp_argoqctype: int
description: Argo’s date QC flag
fill value: -1
vertical_sampling_schemetype: string
description: sampling scheme for this profile.
current vocabulary: see Argo ref table 16
wmo_inst_typetpye: string
description: instrument type as indexed by Argo.
current vocabulary: see Argo ref table 8
GO-SHIP profile schema extension¶
All GO-SHIP data in Argovis is described as the union of the base point data schema and the following.
Population pipeline¶
The point collection is populated with GO-SHIP data via the pipeline descibed at TBD
Base point schema vocabularies¶
The following keys from the base point schema have the following vocabularies for Argovis:
datakeys:“ammonium_btl”
“ammonium_btl_woceqc”
“bottle_latitude_btl”
“bottle_longitude_btl”
“bottle_number_btl”
“bottle_number_btl_woceqc”
“bottle_time_btl”
“carbon_tetrachloride_btl”
“carbon_tetrachloride_btl_woceqc”
“cfc_113_btl”
“cfc_113_btl_woceqc”
“cfc_11_btl”
“cfc_11_btl_woceqc”
“cfc_12_btl”
“cfc_12_btl_woceqc”
“chlorophyll_a_btl”
“chlorophyll_a_btl_woceqc”
“chlorophyll_a_ug_kg_btl”
“chlorophyll_a_ug_kg_btl_woceqc”
“ctd_beamcp_ctd”
“ctd_beamcp_ctd_woceqc”
“ctd_fluor_arbitrary_ctd”
“ctd_fluor_ctd”
“ctd_fluor_ctd_woceqc”
“ctd_fluor_raw_btl”
“ctd_fluor_raw_btl_woceqc”
“ctd_fluor_raw_ctd”
“ctd_fluor_raw_ctd_woceqc”
“ctd_number_of_observations_ctd”
“ctd_pressure_raw_btl”
“ctd_temperature_unk_ctd”
“ctd_temperature_unk_ctd_woceqc”
“ctd_transmissometer_ctd”
“ctd_transmissometer_ctd_woceqc”
“ctd_transmissometer_raw_btl”
“ctd_transmissometer_raw_btl_woceqc”
“ctd_transmissometer_raw_ctd”
“ctd_transmissometer_raw_ctd_woceqc”
“del_carbon_13_dic_btl”
“del_carbon_13_dic_btl_woceqc”
“del_carbon_14_dic_btl”
“del_carbon_14_dic_btl_woceqc”
“del_carbon_14_dic_error_btl”
“del_oxygen_18_btl”
“del_oxygen_18_btl_woceqc”
“del_oxygen_18_error_btl”
“delta_helium_3_btl”
“delta_helium_3_btl_woceqc”
“delta_helium_3_error_btl”
“dissolved_organic_carbon_btl”
“dissolved_organic_carbon_btl_woceqc”
“dissolved_organic_nitrogen_btl”
“dissolved_organic_nitrogen_btl_woceqc”
“doxy_btl”
“doxy_btl_woceqc”
“doxy_ctd”
“doxy_ctd_woceqc”
“fco2_btl”
“fco2_btl_woceqc”
“fco2_temperature_btl”
“helium_btl”
“helium_btl_woceqc”
“helium_error_btl”
“hplc_placeholder_btl_woceqc”
“methyl_chloroform_btl”
“methyl_chloroform_btl_woceqc”
“neon_btl”
“neon_btl_woceqc”
“neon_error_btl”
“nitrate_btl”
“nitrate_btl_woceqc”
“nitrite_btl”
“nitrite_btl_woceqc”
“nitrite_nitrate_btl”
“nitrite_nitrate_btl_woceqc”
“nitrous_oxide_btl”
“nitrous_oxide_btl_woceqc”
“oxygen_btl”
“oxygen_btl_woceqc”
“oxygen_ml_l_btl”
“oxygen_ml_l_btl_woceqc”
“par_ctd”
“par_ctd_woceqc”
“partial_co2_temperature_btl”
“partial_pressure_of_co2_btl”
“partial_pressure_of_co2_btl_woceqc”
“particulate_organic_carbon_btl”
“particulate_organic_carbon_btl_woceqc”
“particulate_organic_nitrogen_btl”
“particulate_organic_nitrogen_btl_woceqc”
“ph_sws_btl”
“ph_sws_btl_woceqc”
“ph_temperature_btl”
“ph_total_h_scale_btl”
“ph_total_h_scale_btl_woceqc”
“phaeophytin_btl”
“phaeophytin_btl_woceqc”
“phaeophytin_ug_l_btl”
“phaeophytin_ug_l_btl_woceqc”
“phosphate_btl”
“phosphate_btl_woceqc”
“potential_temperature_68_btl”
“potential_temperature_c_btl”
“pres”
“psal_btl”
“psal_btl_woceqc”
“psal_ctd”
“psal_ctd_woceqc”
“radium_226_btl”
“radium_226_btl_woceqc”
“radium_228_btl”
“radium_228_btl_woceqc”
“ref_temperature_btl”
“ref_temperature_btl_woceqc”
“ref_temperature_c_btl”
“ref_temperature_c_btl_woceqc”
“rev_pressure_btl”
“rev_pressure_btl_woceqc”
“rev_temperature_90_btl”
“rev_temperature_90_btl_woceqc”
“rev_temperature_btl”
“rev_temperature_btl_woceqc”
“rev_temperature_c_btl”
“rev_temperature_c_btl_woceqc”
“salinity_btl”
“salinity_btl_woceqc”
“sample_btl”
“sample_ctd”
“silicate_btl”
“silicate_btl_woceqc”
“sulfur_hexifluoride_btl”
“sulfur_hexifluoride_btl_woceqc”
“temperature_btl”
“temperature_btl_woceqc”
“temperature_ctd”
“temperature_ctd_woceqc”
“total_alkalinity_btl”
“total_alkalinity_btl_woceqc”
“total_carbon_btl”
“total_carbon_btl_woceqc”
“total_dissolved_nitrogen_btl”
“total_dissolved_nitrogen_btl_woceqc”
“tritium_btl”
“tritium_btl_woceqc”
“tritium_error_btl”
data_center:CCHDOsource_info.source:“cchdo_bats”,
“cchdo_carimed”,
“cchdo_climode”,
“cchdo_clivar”,
“cchdo_dimes”,
“cchdo_dimes uk2.5”,
“cchdo_flepvar”,
“cchdo_go-bgc”,
“cchdo_go-ship”,
“cchdo_hot”,
“cchdo_hydrostation s”,
“cchdo_ipy”,
“cchdo_line-w”,
“cchdo_natre”,
“cchdo_orchestra”,
“cchdo_other”,
“cchdo_pre-woce”,
“cchdo_race:trax”,
“cchdo_socat”,
“cchdo_soccom”,
“cchdo_the agulhas current time-series experiment”,
“cchdo_the arctic observing network (aon)”,
“cchdo_tictoc”,
“cchdo_trophic bats, leg 1”,
“cchdo_ushydro”,
“cchdo_woce”
Required keys¶
expocodetype: string
description:
Optional keys¶
casttype: int
description:
cchdo_cruise_idtype: int
description:
stationtype: string
description:
woce_linestype: array of strings
description: