Skip to main content

Integrity Checker Tool (ICT)

Editor's note

The text below will need a thorough revision due to various changes that have not yet been taken into account

Integrity Checker Tool (ICT)

The Integrity Checker Tool (ICT) tests the integrity of (a selection of) the data in files that are curated within a particular scope, i.e. the SAF, the MRGs, and curated texts. The integrity checking of other data, e.g. formatted texts, such as HRGs, is outside the scope of the ICT.

In order for the ICT to be used optimally, it will assume for specific kinds of files that the integrity of other files is guaranteed, as follows:

When testing a ... ,the integrity is assumed of
MRGSAF
curated fileMRG and SAF

The idea behind this is to enable curators to only test changes they have made rather than testing the entire set of files.

Editor's Note

As the tool hasn't been made, and no practical experience has been gained, many of these optimizations may not work in the first versions.

Editor's Note

There's a lot of duplication in syntax specs. For example, the SAF spec and MRG spec define the regex for various kinds of tags all over the place. It would be nice to have a way by which syntax can be specified in one location that is 'naturally predictable' so that both readers and maintainers of the documentation can easily find it. One way might be to include the syntax in a 'popover', i.e. that we define stuff with particular syntax as a concept and have the syntax be included in its hoverText.

Installing the Tool

The tool can be installed from the command line and made globally available by executing

This section is written when there's an actual tool to install

We expect that it will be something like

npm install -g @tno-terminology-design/ict
Before running the tool from the command line, make sure you have met the necessary prerequisites depending on your operating environment.

  1. Node.js and NPM: Ensure Node.js and NPM are installed.
  2. Global Installation: If you have installed the package globally, confirm the global NPM modules path by running npm config get prefix. The global modules are usually stored under <prefix>/node_modules.
  3. Environment Variables: Add the path to global NPM binaries to your system's PATH environment variable. This should be <prefix> on Windows. To add to PATH, you can edit your environment variables or run set PATH=%PATH%;<prefix> in the CMD.

Calling the Tool

The behavior of the ICT can be configured per call e.g. by a configuration file and/or command-line parameters. Examples include specifications for:

The command-line syntax is as follows:

ict [ <scopedir> ] <cmd> [ <paramlist> ]
Where:
<scopedir>(optional) specifies the scopedir of the scope whose integrity is to be tested. If <scopedir> is omitted and a configuration file is used, its value is read from that file. If cases where <scopedir> isn't specified, the current directory is assumed to be the scopedir.
In this document, we use the term "this scopedir" to refer to the value of <scopedir>, and this scope" to refer to the associated scope.
<cmd>The following commands are valid:
  • -saf: check the integrity of the scope's SAF. It does not take any further parameters.
  • -mrg: check the integrity of (one of) the scope's MRG(s). Additional parameters can be used, e.g. to specify a particular version of the MRG to be checked.
  • -txt: check the integrity of this scope's curated files. Additional parameters can be used, e.g. to select a particular subset of these files.
  • -all: check all curated files within this scope. Additional parameters may be used, e.g. to skip the checking of specific files.
<paramlist>a list of parameters that provide further specifications for what the ICT should be checking.

Parameters (Command-line arguments)

Editor's Note

The current set of parameters is just an initial suggestion. We'll need to see what will actually be needed in practice.

Legend

The columns in the following table are defined as follows:

  1. Key is the text to be used as a key.
  2. Value represents the kind of value to be used.
  3. Req'd specifies whether (Y) or not (n) the field is required to be present when the tool is being called. If (always) required, it MUST either be present in the configuration file, or as a command-line parameter.
  4. Cmd specifies a <cmd> value: if the ICT is called with this <cmd>, then this parameter will be used by the tool as described. A * in this field indicates that this parameter can be used with every command.
  5. Description specifies the meaning of the Value field, and other things you may need to know, e.g. why it is needed, a required syntax, etc.
KeyValueReq'dCmdsDescription
config<path>n*Path (including the filename) of the tool's (YAML) configuration file. This file contains the default key-value pairs to be used. Allowed keys (and the associated values) are documented in this table. Command-line arguments override key-value pairs specified in the configuration file. This parameter SHOULD NOT appear in the configuration file itself.
scopedir<path>Y*Path to the scopedir within which the tool is to operate, i.e.: this scopedir.
syntaxn*This argument has no value. If present, the syntax of all (YAML) fields in the file is checked against their specifications (see e.g. SAF specs, terminology construction, MRG specs, Curated Texts, TermRefs).
vsntag<vsntag>-mrgversiontag that is used to select the version of the MRG to be checked. The MRG that is selected will either have <vsntag> as the contents of the field terminology.vsntag, or as an element in the list of terminology.altvsntags.
term<term>n-txtterm that identifies a particular curated file. The curated file, whose (front-matter) field term matches this parameter, will be integrity-checked.
grouptags<grouptags>n-txtList of grouptags. Every curated file, whose (front-matter) field grouptags has an element that also appears as an element in the <grouptags> list, will be integrity-checked.

Integrity Checks

The checks that are done on files depend on the kind of file that is being checked. This section lists the tests for the various kinds of files. Every file is assumed to be part of this scope, and reside in the associated scopedir (i.e.: this scopedir).

SAF integrity

The SAF must be a file that contains valid YAML syntax.

The integrity of a SAF requires the following conditions to be satisfied for the key's in the scope section:

  • scopedir must point to the directory in which the SAF is stored for public use (i.e. in this scopedir).
  • curatedir, when appended to the value of "scopedir/", must point to the directory that stores the curated files.
  • glossarydir must point to an existing directory.
  • mrgfile must be an existing file in directory "scopedir/" (note that an empty terminology is still a terminology that can have an MRG).
  • hrgfile must be an existing file in directory "scopedir/" (note that an empty terminology is still a terminology that can have a HRG).
  • license must be an existing file in the directory pointed to by scopedir.

The integrity of a SAF requires the following conditions to be satisfied for every element in the scopes section:

  • scopetags must be a nonempty list of scopetags.
  • scopedir must be a valid URL, that points to an existing directory resource.

The integrity of a SAF requires the following conditions to be satisfied for every element in the versions section:

  • vsntag SHOULD not appear as an element in the altvsntags field of this version element, and it MUST NOT appear in the vsntag or altvsntags fields of any other element in the versions section.
  • altvsntags must be a (possibly empty) list of versiontags, each of which SHOULD not appear in the vsntag field of the element, and MUST NOT appear in the vsntag or altvsntags fields of any other element in the versions section.
  • termselection must be a non-empty list of term selection instructions.
  • status SHOULD be a non-empty field.

MRG integrity

The integrity checking for MRG files assumes that the integrity conditions of a SAF file are satisfied, and that the MRG file itself contains valid YAML syntax.

The integrity checking comprises every (group of) test(s) as specified in this sub-section.

The MRG MUST have sections named terminology, scopes, and entries.

Integrity checks for the terminology section include:

  • scopedir must point to the directory in which the SAF is stored for public use (i.e. in this scopedir).
  • vsntag must be a versiontag that SHOULD not appear as an element in the altvsntags field.
  • altvsntags must be a (possibly empty) list of versiontags, none of which appear in the vsntag field.
  • license must be an existing file in the directory pointed to by scopedir.

Integrity checks for the scopes section include:

  • scopetags must be a nonempty list of scopetags.
  • scopedir must be a valid URL, that points to an existing directory resource other than the scopedir of the current scope. This directory MUST contain a SAF. Do we need an option to test the integrity of such SAFs?

Integrity checks for the entries section consist of one part that is generic for all entries, and another part that depends on the value of the termType field (so that checking of e.g. entries of type concept and of type pattern can have different checks.) The checks that every entry must pass include the following:

  • scopetag MUST also appear as the value of terminology.scopetag, or as an element in one of the scopes.scopetags elements.
  • termType SHOULD be tbd.
  • grouptags MUST be a list of grouptag elements.
  • license MUST be an existing file in the directory pointed to by scopedir.
  • status SHOULD match an element in the list scope.statuses of the SAF.
  • locator, if specified, MUST have a readable resource (file) at scopedir/curatedir/locator, where scopedir and curatedir are specified in the SAF.
  • navurl, if specified, MUST return an HTML-resource when specified as the URL in a HTTP(S) request method GET or HEAD.

For specific kinds of MRG entries, the following additional constraints MUST be satisfied:


The following constraints MUST hold for MRG entries of type concept:

  • if a glossaryText contains a TermRef, then the TermRef SHOULD be resolvable (reference to the term-ref-integrity checks).
  • hoverText MUST NOT contain any TermRef, nor any other markdown links.
Editor's note

Checks need to be added to ensure congruence between terms and any synonyms that are defined for them. For example, they should have the same value in various fields, e.g., termType, isa (but not glossaryText or synonymOf)

Curated Text integrity

The integrity of any curated text file requires the integrity conditions of the MRG file to be satisfied, as well as the following conditions:

  • TBD

Concepts

The integrity of any curated text file that has termType: concept requires the integrity conditions of a curated text file to be satisfied, as well as the following conditions:

  • TBD

Patterns

The integrity of any curated text file that has termType: concept requires the integrity conditions of a curated text file to be satisfied, as well as the following conditions:

  • TBD

Processing, Errors and Warnings

The ICT starts by reading its command-line and configuration file. If the command-line has a key that is also found in the configuration file, the command-line key-value pair takes precedence. The resulting set of key-value pairs is tested for proper syntax and validity. Every improper syntax and every invalidity found will be logged.

Then, the ICT TBD

The ICT logs every error- and/or warning condition that it comes across while processing its configuration file, command-line parameters, and input files, in a way that helps tool-operators and document authors to identify and fix such conditions.

Deploying the Tool

The ICT comes with documentation that enables developers to ascertain its correct functioning (e.g. by using a test set of files, test scripts that exercise its parameters, etc.), and also enables them to deploy the tool in a git repo and author/modify CI-pipes to use that deployment.

Discussion Notes

This section lists the topics that may need further discussion