Integrity Checker Tool (ICT)
The text below will need a thorough revision due to various changes that have not yet been taken into account
Integrity Checker Tool (ICT)
The Integrity Checker Tool (ICT) tests the integrity of (a selection of) the data in files that are curated within a particular scope, i.e. the SAF, the MRGs, and curated texts. The integrity checking of other data, e.g. formatted texts, such as HRGs, is outside the scope of the ICT.
In order for the ICT to be used optimally, it will assume for specific kinds of files that the integrity of other files is guaranteed, as follows:
When testing a ... , | the integrity is assumed of |
---|---|
MRG | SAF |
curated file | MRG and SAF |
The idea behind this is to enable curators to only test changes they have made rather than testing the entire set of files.
As the tool hasn't been made, and no practical experience has been gained, many of these optimizations may not work in the first versions.
There's a lot of duplication in syntax specs. For example, the SAF spec and MRG spec define the regex for various kinds of tags all over the place. It would be nice to have a way by which syntax can be specified in one location that is 'naturally predictable' so that both readers and maintainers of the documentation can easily find it. One way might be to include the syntax in a 'popover', i.e. that we define stuff with particular syntax as a concept and have the syntax be included in its hoverText
.
Installing the Tool
The tool can be installed from the command line and made globally available by executing
This section is written when there's an actual tool to install
We expect that it will be something like
npm install -g @tno-terminology-design/ict
Before running the tool from the command line, make sure you have met the necessary prerequisites depending on your operating environment.
- CMD.exe (Windows)
- PowerShell(Windows)
- Bash (Linux/Mac)
- Node.js and NPM: Ensure Node.js and NPM are installed.
- Global Installation: If you have installed the package globally, confirm the global NPM modules path by running
npm config get prefix
. The global modules are usually stored under<prefix>/node_modules
. - Environment Variables: Add the path to global NPM binaries to your system's PATH environment variable. This should be
<prefix>
on Windows. To add to PATH, you can edit your environment variables or runset PATH=%PATH%;<prefix>
in the CMD.
- Node.js and NPM: Ensure Node.js and NPM are installed.
- Global Installation: Check the global NPM modules path as in CMD.
- Environment Variables: Update the PATH environment variable as in CMD. You can also use
$env:Path += ";<prefix>"
to update the PATH temporarily in the current PowerShell session.
- Node.js and NPM: Ensure Node.js and NPM are installed.
- Global Installation: If globally installed, run
npm config get prefix
to get the global modules path, usually<prefix>/lib/node_modules
. - Environment Variables: Add the
<prefix>/bin
directory to yourPATH
if it's not already. You can do this by addingexport PATH=$PATH:<prefix>/bin
to your~/.bashrc
or~/.zshrc
file.
Calling the Tool
The behavior of the ICT can be configured per call e.g. by a configuration file and/or command-line parameters. Examples include specifications for:
The command-line syntax is as follows:
ict [ <scopedir> ] <cmd> [ <paramlist> ]
Where: | |
---|---|
<scopedir> | (optional) specifies the scopedir of the scope whose integrity is to be tested. If <scopedir> is omitted and a configuration file is used, its value is read from that file. If cases where <scopedir> isn't specified, the current directory is assumed to be the scopedir.In this document, we use the term "this scopedir" to refer to the value of <scopedir> , and this scope" to refer to the associated scope. |
<cmd> | The following commands are valid:
|
<paramlist> | a list of parameters that provide further specifications for what the ICT should be checking. |
Parameters (Command-line arguments)
The current set of parameters is just an initial suggestion. We'll need to see what will actually be needed in practice.
Legend
The columns in the following table are defined as follows:
Key
is the text to be used as a key.Value
represents the kind of value to be used.Req'd
specifies whether (Y
) or not (n
) the field is required to be present when the tool is being called. If (always) required, it MUST either be present in the configuration file, or as a command-line parameter.Cmd
specifies a<cmd>
value: if the ICT is called with this<cmd>
, then this parameter will be used by the tool as described. A*
in this field indicates that this parameter can be used with every command.Description
specifies the meaning of theValue
field, and other things you may need to know, e.g. why it is needed, a required syntax, etc.
Key | Value | Req'd | Cmds | Description |
---|---|---|---|---|
config | <path> | n | * | Path (including the filename) of the tool's (YAML) configuration file. This file contains the default key-value pairs to be used. Allowed keys (and the associated values) are documented in this table. Command-line arguments override key-value pairs specified in the configuration file. This parameter SHOULD NOT appear in the configuration file itself. |
scopedir | <path> | Y | * | Path to the scopedir within which the tool is to operate, i.e.: this scopedir. |
syntax | n | * | This argument has no value. If present, the syntax of all (YAML) fields in the file is checked against their specifications (see e.g. SAF specs, terminology construction, MRG specs, Curated Texts, TermRefs). | |
vsntag | <vsntag> | -mrg | versiontag that is used to select the version of the MRG to be checked. The MRG that is selected will either have <vsntag> as the contents of the field terminology.vsntag , or as an element in the list of terminology.altvsntags . | |
term | <term> | n | -txt | term that identifies a particular curated file. The curated file, whose (front-matter) field term matches this parameter, will be integrity-checked. |
grouptags | <grouptags> | n | -txt | List of grouptags. Every curated file, whose (front-matter) field grouptags has an element that also appears as an element in the <grouptags> list, will be integrity-checked. |
Integrity Checks
The checks that are done on files depend on the kind of file that is being checked. This section lists the tests for the various kinds of files. Every file is assumed to be part of this scope, and reside in the associated scopedir (i.e.: this scopedir).
SAF integrity
The SAF must be a file that contains valid YAML syntax.
The integrity of a SAF requires the following conditions to be satisfied for the key's in the scope
section:
scopedir
must point to the directory in which the SAF is stored for public use (i.e. in this scopedir).curatedir
, when appended to the value of "scopedir
/", must point to the directory that stores the curated files.glossarydir
must point to an existing directory.mrgfile
must be an existing file in directory "scopedir
/" (note that an empty terminology is still a terminology that can have an MRG).hrgfile
must be an existing file in directory "scopedir
/" (note that an empty terminology is still a terminology that can have a HRG).license
must be an existing file in the directory pointed to byscopedir
.
The integrity of a SAF requires the following conditions to be satisfied for every element in the scopes
section:
scopetags
must be a nonempty list of scopetags.scopedir
must be a valid URL, that points to an existing directory resource.
The integrity of a SAF requires the following conditions to be satisfied for every element in the versions
section:
vsntag
SHOULD not appear as an element in thealtvsntags
field of thisversion
element, and it MUST NOT appear in thevsntag
oraltvsntags
fields of any other element in theversions
section.altvsntags
must be a (possibly empty) list of versiontags, each of which SHOULD not appear in thevsntag
field of the element, and MUST NOT appear in thevsntag
oraltvsntags
fields of any other element in theversions
section.termselection
must be a non-empty list of term selection instructions.status
SHOULD be a non-empty field.
MRG integrity
The integrity checking for MRG files assumes that the integrity conditions of a SAF file are satisfied, and that the MRG file itself contains valid YAML syntax.
The integrity checking comprises every (group of) test(s) as specified in this sub-section.
The MRG MUST have sections named terminology
, scopes
, and entries
.
Integrity checks for the terminology
section include:
scopedir
must point to the directory in which the SAF is stored for public use (i.e. in this scopedir).vsntag
must be a versiontag that SHOULD not appear as an element in thealtvsntags
field.altvsntags
must be a (possibly empty) list of versiontags, none of which appear in thevsntag
field.license
must be an existing file in the directory pointed to byscopedir
.
Integrity checks for the scopes
section include:
scopetags
must be a nonempty list of scopetags.scopedir
must be a valid URL, that points to an existing directory resource other than the scopedir of the current scope. This directory MUST contain a SAF. Do we need an option to test the integrity of such SAFs?
Integrity checks for the entries
section consist of one part that is generic for all entries, and another part that depends on the value of the termType
field (so that checking of e.g. entries of type concept
and of type pattern
can have different checks.) The checks that every entry must pass include the following:
scopetag
MUST also appear as the value ofterminology.scopetag
, or as an element in one of thescopes.scopetags
elements.termType
SHOULD be tbd.grouptags
MUST be a list of grouptag elements.license
MUST be an existing file in the directory pointed to byscopedir
.status
SHOULD match an element in the listscope.statuses
of the SAF.locator
, if specified, MUST have a readable resource (file) atscopedir
/curatedir
/locator
, wherescopedir
andcuratedir
are specified in the SAF.navurl
, if specified, MUST return an HTML-resource when specified as the URL in a HTTP(S) request methodGET
orHEAD
.
For specific kinds of MRG entries, the following additional constraints MUST be satisfied:
- Terms
- Concepts
- Relations
- Mental Models
The following constraints MUST hold for MRG entries of type concept
:
- if a
glossaryText
contains a TermRef, then the TermRef SHOULD be resolvable (reference to the term-ref-integrity checks). hoverText
MUST NOT contain any TermRef, nor any other markdown links.
As header fields for term
termTypes need to be discussed, we do not yet specify any constraints
Header fields for termType: relation
As relation
s need to be discussed, we do not yet specify any constraints.
As pattern
s need to be discussed, we do not yet specify any constraints.
Checks need to be added to ensure congruence between terms and any synonyms that are defined for them. For example, they should have the same value in various fields, e.g., termType
, isa
(but not glossaryText
or synonymOf
)
Curated Text integrity
The integrity of any curated text file requires the integrity conditions of the MRG file to be satisfied, as well as the following conditions:
- TBD
Concepts
The integrity of any curated text file that has termType: concept
requires the integrity conditions of a curated text file to be satisfied, as well as the following conditions:
- TBD
Patterns
The integrity of any curated text file that has termType: concept
requires the integrity conditions of a curated text file to be satisfied, as well as the following conditions:
- TBD
Processing, Errors and Warnings
The ICT starts by reading its command-line and configuration file. If the command-line has a key that is also found in the configuration file, the command-line key-value pair takes precedence. The resulting set of key-value pairs is tested for proper syntax and validity. Every improper syntax and every invalidity found will be logged.
Then, the ICT TBD
The ICT logs every error- and/or warning condition that it comes across while processing its configuration file, command-line parameters, and input files, in a way that helps tool-operators and document authors to identify and fix such conditions.
Deploying the Tool
The ICT comes with documentation that enables developers to ascertain its correct functioning (e.g. by using a test set of files, test scripts that exercise its parameters, etc.), and also enables them to deploy the tool in a git repo and author/modify CI-pipes to use that deployment.
Discussion Notes
This section lists the topics that may need further discussion