.. _point_schema: Point Schema ============ All data indexed by Argovis which can be cast as *point data* - in general, any piece of data attached to a set of spacetime coodinates - are represented by the JSON schema described in this document. Argovis' point data schema is *hierarchical*, in that all such schema inherit from the base point data schema; this creates two main advantages: - By requiring all point data encode their common features like coordinates, IDs and other universal metadata identically, it is trivial to include new data products in co-location APIs that search on these parametes. - Flexibility to represent arbitrary point data is maintained by allowing subject matter experts for a given measurement class to extend the base point data schema with additional schema that represent their specific instrument or measurement. In this way, we try to capture the flexibility of schema-less databases like MongoDB, while imposing just enough structure to capture the data validation opportunities afforded by strictly defined schema. How to read this schema ----------------------- Each entry in the schema fragments below contain a few keys: - **type:** the primitive type, format, or object description of a valid entry for this field - **description:** short comment on what this variable is - **fill value** (optional): what this should be filled by if absent - **current vocabulary** (optional): current set of possible values for this key, with explanations as required. Schema enforcement & population ------------------------------- All these schema are enforced via MongoDB's built-in schema validation; those schema validation rules are defined in the ``pointSchema.py`` script at https://github.com/argovis/db-schema. Once an empty ``point`` collection is generated by the script above, it is populated by pipelines that translate from the formats of upstream data providers, to MongoDB-appropriate JSON that matches these schemas. Each schema extension section below contains links to the code backing these pipelines. Base point data schema ---------------------- This section describes the base point data schema from which all other point data in Argovis *must* inherit. Required keys +++++++++++++ All point data MUST include the following keys: - ``_id`` - **type:** string - **description:** a globally unique identifier for this record. - ``basin`` - **type:** int - **description:** integer index of basin. Can be provided by Argovis based on lat/lon in ``geolocation``. - **fill value:** -1, used if reported lon/lat are on land. - ``data_type`` - **type:** string - **description:** token indicating the general class of data - **current vocabulary:** ``oceanicProfile``, ``tropicalCyclone`` - ``date_updated_argovis`` - **type:** ISO 8601 UTC datestring, example ``1999-12-31T23:59:59Z`` - **description:** time the record was added to Argovis - ``geolocation`` - **type:** geojson Point object - **description:** geojson Point tagging the lon/lat of this record. - **fill value:** ``{"type": "Point", "coordinates": [0, -90]}`` - ``source_info.source`` - **type:** array of strings - **description:** data origin, typically used to label major project subdivisions - **current vocabulary:** defined per project, see schema extensions below. - ``timestamp`` - **type:** ISO 8601 UTC datestring, example ``1999-12-31T23:59:59Z`` - **description:** time the record measurement was made at. - **fill_value:** ``9999-01-01T00:00:00Z`` Optional Keys +++++++++++++ All point data MAY include the following keys; if meaningful data is available for any of these keys, it should be represented in the format described. - ``country`` - **type:** string - **description:** ISO 3166-1 country code. - ``data`` - **type:** array of non-nested JSON documents - **description:** array indexes depth / altitude; individual documents are key/value pairs describing measurements made. Pressure or altitude must be present as one of the document keys. Example: ``{pres: 4.7, psal: 33.987, temp: 1.107, temp_argoqc: 1}`` - **current vocabulary:** defined per project, see schema extensions below. - ``data_center`` - **type:** string - **description:** entity responsible for processing this record, once received. - **current vocabulary:** defined per project, see schema extensions below. - ``data_keys`` **mandatory if** ``data`` **is present** - **type:** array of strings - **description:** a complete list of all the keys found in any document in the ``data`` object. - ``data_warning`` - **type:** array of strings - **description:** short string tokens indicating possible problems with this record. - **current vocabulary:** - ``degenerate_levels``: data is reported twice for a given pressure / altitude level in a way that cannot be readily resolved - ``missing_basin``: unable to determine meaningful basin code, despite having a meaningful lat / lon (edge case in basins lookup grid) - ``missing_location``: one or both of longitude and latitude are missing - ``missing_timestamp``: no date or time of measurement associated with this profile. - ``doi`` - **type:** string - **description:** DOI for this record. - ``instrument`` - **type:** string - **description:** string token describing the device used to make this measurement, like ``profiling_float``, ``ship_ctd`` etc. - **current vocabulary:** TBD - ``pi_name`` - **type:** array of strings - **description:** name(s) of principle investigator(s) - ``platform_id`` - **type:** string - **description:** unique identifier for the platform or device responsible for making the measurements included in this record. - ``platform_type`` - **type:** string - **description:** make or model of the platform. - **current vocabulary:** TBD - ``source_info.data_keys_source`` - **type:** array of strings - **description:** list of measurement parameters as found in the source file - ``source_info.date_updated_source`` - **type:** ISO 8601 UTC datestring, example ``1999-12-31T23:59:59Z`` - **description:** date and time the upstream source file for this record was last modified - ``source_info.source_url`` - **type:** string - **description:** URL to download the original file from which the Argovis record was derived. Argo profile schema extension ----------------------------- All Argo data in Argovis is described as the union of the base point data schema and the following. Population pipeline +++++++++++++++++++ The ``point`` collection is populated with Argo data via the pipeline descibed at https://github.com/argovis/ifremer-sync. .. _argo_vocab: Base point schema vocabularies ++++++++++++++++++++++++++++++ The following keys from the base point schema have these vocabularies for Argovis: - ``data`` keys: - "bbp470" - "bbp532" - "bbp700" - "bbp700_2" - "bisulfide" - "cdom" - "chla" - "cndx" - "cp660" - "down_irradiance380" - "down_irradiance412" - "down_irradiance442" - "down_irradiance443" - "down_irradiance490" - "down_irradiance555" - "down_irradiance670" - "downwelling_par" - "doxy" - "doxy2" - "doxy3" - "molar_doxy" - "nitrate" - "ph_in_situ_total" - "pres" - "psal" - "psal_sfile" - "temp" - "temp_sfile" - "turbidity" - "up_radiance412" - "up_radiance443" - "up_radiance490" - "up_radiance555" - and all the same again with "_argoqc" appended for the corresponding QC measurements. - ``data_center``: "AO","BO","CS","HZ","IF","IN","JA","KM","KO","ME","NM" - ``source_info.source``: "argo_core", "argo_bgc" and "argo_deep" Required keys +++++++++++++ - ``cycle_number`` - **type:** int - **description:** probe cycle index Optional keys +++++++++++++ - ``data_keys_mode`` - **type:** non-nested JSON document - **description:** JSON document with keys matching the entries of ``data_keys``, and values indicating the variable's data mode - **current vocabulary:** ``R`` ealtime, realtime ``A`` djusted, or ``D`` elayed mode. - ``fleetmonitoring`` - **type:** string - **description:** URL for this float at https://fleetmonitoring.euro-argo.eu/float/ - ``geolocation_argoqc`` - **type:** int - **description:** Argo's position QC flag - **fill value:** -1 - ``oceanops`` - **type:** string - **description:** URL for this float at https://www.ocean-ops.org/board/wa/Platform - ``positioning_system`` - **type:** string - **description:** positioning system for this float. - vocabulary: see Argo ref table 9 - ``profile_direction`` - **type:** string - **description:** whether the profile was gathered as the float ascended or descended - **current vocabulary:** ``A`` scending or ``D`` escending. - ``timestamp_argoqc`` - **type:** int - **description:** Argo's date QC flag - **fill value:** -1 - ``vertical_sampling_scheme`` - **type:** string - **description:** sampling scheme for this profile. - **current vocabulary:** see Argo ref table 16 - ``wmo_inst_type`` - tpye: string - **description:** instrument type as indexed by Argo. - **current vocabulary:** see Argo ref table 8 GO-SHIP profile schema extension -------------------------------- All GO-SHIP data in Argovis is described as the union of the base point data schema and the following. Population pipeline +++++++++++++++++++ The ``point`` collection is populated with GO-SHIP data via the pipeline descibed at TBD Base point schema vocabularies ++++++++++++++++++++++++++++++ The following keys from the base point schema have the following vocabularies for Argovis: - ``data`` keys: - "ammonium_btl" - "ammonium_btl_woceqc" - "bottle_latitude_btl" - "bottle_longitude_btl" - "bottle_number_btl" - "bottle_number_btl_woceqc" - "bottle_time_btl" - "carbon_tetrachloride_btl" - "carbon_tetrachloride_btl_woceqc" - "cfc_113_btl" - "cfc_113_btl_woceqc" - "cfc_11_btl" - "cfc_11_btl_woceqc" - "cfc_12_btl" - "cfc_12_btl_woceqc" - "chlorophyll_a_btl" - "chlorophyll_a_btl_woceqc" - "chlorophyll_a_ug_kg_btl" - "chlorophyll_a_ug_kg_btl_woceqc" - "ctd_beamcp_ctd" - "ctd_beamcp_ctd_woceqc" - "ctd_fluor_arbitrary_ctd" - "ctd_fluor_ctd" - "ctd_fluor_ctd_woceqc" - "ctd_fluor_raw_btl" - "ctd_fluor_raw_btl_woceqc" - "ctd_fluor_raw_ctd" - "ctd_fluor_raw_ctd_woceqc" - "ctd_number_of_observations_ctd" - "ctd_pressure_raw_btl" - "ctd_temperature_unk_ctd" - "ctd_temperature_unk_ctd_woceqc" - "ctd_transmissometer_ctd" - "ctd_transmissometer_ctd_woceqc" - "ctd_transmissometer_raw_btl" - "ctd_transmissometer_raw_btl_woceqc" - "ctd_transmissometer_raw_ctd" - "ctd_transmissometer_raw_ctd_woceqc" - "del_carbon_13_dic_btl" - "del_carbon_13_dic_btl_woceqc" - "del_carbon_14_dic_btl" - "del_carbon_14_dic_btl_woceqc" - "del_carbon_14_dic_error_btl" - "del_oxygen_18_btl" - "del_oxygen_18_btl_woceqc" - "del_oxygen_18_error_btl" - "delta_helium_3_btl" - "delta_helium_3_btl_woceqc" - "delta_helium_3_error_btl" - "dissolved_organic_carbon_btl" - "dissolved_organic_carbon_btl_woceqc" - "dissolved_organic_nitrogen_btl" - "dissolved_organic_nitrogen_btl_woceqc" - "doxy_btl" - "doxy_btl_woceqc" - "doxy_ctd" - "doxy_ctd_woceqc" - "fco2_btl" - "fco2_btl_woceqc" - "fco2_temperature_btl" - "helium_btl" - "helium_btl_woceqc" - "helium_error_btl" - "hplc_placeholder_btl_woceqc" - "methyl_chloroform_btl" - "methyl_chloroform_btl_woceqc" - "neon_btl" - "neon_btl_woceqc" - "neon_error_btl" - "nitrate_btl" - "nitrate_btl_woceqc" - "nitrite_btl" - "nitrite_btl_woceqc" - "nitrite_nitrate_btl" - "nitrite_nitrate_btl_woceqc" - "nitrous_oxide_btl" - "nitrous_oxide_btl_woceqc" - "oxygen_btl" - "oxygen_btl_woceqc" - "oxygen_ml_l_btl" - "oxygen_ml_l_btl_woceqc" - "par_ctd" - "par_ctd_woceqc" - "partial_co2_temperature_btl" - "partial_pressure_of_co2_btl" - "partial_pressure_of_co2_btl_woceqc" - "particulate_organic_carbon_btl" - "particulate_organic_carbon_btl_woceqc" - "particulate_organic_nitrogen_btl" - "particulate_organic_nitrogen_btl_woceqc" - "ph_sws_btl" - "ph_sws_btl_woceqc" - "ph_temperature_btl" - "ph_total_h_scale_btl" - "ph_total_h_scale_btl_woceqc" - "phaeophytin_btl" - "phaeophytin_btl_woceqc" - "phaeophytin_ug_l_btl" - "phaeophytin_ug_l_btl_woceqc" - "phosphate_btl" - "phosphate_btl_woceqc" - "potential_temperature_68_btl" - "potential_temperature_c_btl" - "pres" - "psal_btl" - "psal_btl_woceqc" - "psal_ctd" - "psal_ctd_woceqc" - "radium_226_btl" - "radium_226_btl_woceqc" - "radium_228_btl" - "radium_228_btl_woceqc" - "ref_temperature_btl" - "ref_temperature_btl_woceqc" - "ref_temperature_c_btl" - "ref_temperature_c_btl_woceqc" - "rev_pressure_btl" - "rev_pressure_btl_woceqc" - "rev_temperature_90_btl" - "rev_temperature_90_btl_woceqc" - "rev_temperature_btl" - "rev_temperature_btl_woceqc" - "rev_temperature_c_btl" - "rev_temperature_c_btl_woceqc" - "salinity_btl" - "salinity_btl_woceqc" - "sample_btl" - "sample_ctd" - "silicate_btl" - "silicate_btl_woceqc" - "sulfur_hexifluoride_btl" - "sulfur_hexifluoride_btl_woceqc" - "temperature_btl" - "temperature_btl_woceqc" - "temperature_ctd" - "temperature_ctd_woceqc" - "total_alkalinity_btl" - "total_alkalinity_btl_woceqc" - "total_carbon_btl" - "total_carbon_btl_woceqc" - "total_dissolved_nitrogen_btl" - "total_dissolved_nitrogen_btl_woceqc" - "tritium_btl" - "tritium_btl_woceqc" - "tritium_error_btl" - ``data_center``: ``CCHDO`` - ``source_info.source``: - "cchdo_bats", - "cchdo_carimed", - "cchdo_climode", - "cchdo_clivar", - "cchdo_dimes", - "cchdo_dimes uk2.5", - "cchdo_flepvar", - "cchdo_go-bgc", - "cchdo_go-ship", - "cchdo_hot", - "cchdo_hydrostation s", - "cchdo_ipy", - "cchdo_line-w", - "cchdo_natre", - "cchdo_orchestra", - "cchdo_other", - "cchdo_pre-woce", - "cchdo_race:trax", - "cchdo_socat", - "cchdo_soccom", - "cchdo_the agulhas current time-series experiment", - "cchdo_the arctic observing network (aon)", - "cchdo_tictoc", - "cchdo_trophic bats, leg 1", - "cchdo_ushydro", - "cchdo_woce" Required keys +++++++++++++ - ``expocode`` - **type:** string - **description:** Optional keys +++++++++++++ - ``cast`` - **type:** int - **description:** - ``cchdo_cruise_id`` - **type:** int - **description:** - ``station`` - **type:** string - **description:** - ``woce_lines`` - **type:** array of strings - **description:**