stacodelists: use standard, language-independent variable codes to help international data interoperability and machine reuse in R

An R data package with all the SDMX standard codelists

Visit the documentation website of statcodelists on [statcodelists.dataobservatory.eu/](https://statcodelists.dataobservatory.eu/).
Visit the documentation website of statcodelists on statcodelists.dataobservatory.eu/.

dataobservatory DOI

The goal of statcodelists is to promote the reuse and exchange of statistical information and related metadata with making the internationally standardized SDMX code lists available for the R user. SDMX – the Statistical Data and Metadata eXchange has been published as an ISO International Standard (ISO 17369). The metadata definitions, including the codelists are updated regularly according to the standard. The authoritative version of the code lists made available in this package is https://sdmx.org/?page_id=3215/.

Purpose

Cross-domain concepts in the SDMX framework describe concepts relevant to many, if not all, statistical domains. SDMX recommends using these concepts whenever feasible in SDMX structures and messages to promote the reuse and exchange of statistical information and related metadata between organisations.

Code lists are predefined sets of terms from which some statistical coded concepts take their values. SDMX cross-domain code lists are used to support cross-domain concepts. What are these cross-domain coded concepts?

  • Geographical codes, like NL = the Netherlands in the CL_AREA code list.
  • Standard industry codes J631 for Data processing, hosting and related activities in Europe. (NACE Rev 2 in Europe, beware, it is J592in Australia and New Zealand, see CL_ACTIVITY_ANZSIC06.)
  • Occupations, like OC2521 for Database designers and administrators in CL_OCCUPATIONS
  • Time fomatting standards, like CCYY for annual data series in CL_TIME_FORMAT.

Check out the available codlists on the package homepage.

The use of common code lists will help users to work even more efficiently, easing the maintenance of and reducing the need for mapping systems and interfaces delivering data and metadata to them. A very obvious advantage of using the code systems is that you can retrieve data from national sources indifferent of the natural language used in North Macedonia, Japan, the U.S. or the Netherlands. While the data labels may change to be locally human-readable, computers and geeks can read the codes and understand them immediately. Provided that they use the standard codes.

Our data observatories are rolling out SDMX coding across all datasets to help data ingestion and interoperability, data findability and data reuse. statcodelists can help the use of standard SDMX codes in your R workflow–both for downloading data from statistical agencies and to produce publication-ready datasets that the rest of the world (and even APIs) will understand.

Installation

You can install statcodelists from CRAN:

install.packages("statcodelists")

Further recommended code values for expressing general statistical concepts like not applicable, etc., can be found in section Generic codes of the Guidelines for the creation and management of SDMX Cross-Domain Code Lists.

For further codelists used by reliable statistical agency but not harmonized on SDMX level please consult the SDMX Global Registry Codelists page.

The creator of this package is not affiliated with SDMX, and this package was has not been endorsed by SDMX.

Code of Conduct

Please note that the statcodelists project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.

Daniel Antal
Daniel Antal
Developer of open-source statistical software

My research interests include reproducible social science, economics and finance.

Related