Skip to main content

Machine Readable Glossary Generation Tool

Editor's note

Documentation needs to be adjusted for:

  • Converting formPhrases: MRGT will write expanded formPhrase macros into MRGEntry formPhrases field

The Machine Readable Glossary generation Tool (MRGT) generates Machine Readable Glossaries (MRGs) for one specific, or all terminology versions that are curated within a specific scope. MRGs come in a specific, well-defined format. They contain some meta-data, followed by a list of so-called MRG entries, one for every term in its scope, which represent concepts and other semantic units that are known within that scope.

The (newly generated) MRG(s) are meant to be processed by the other tools in the toolbox, regardless of whether such tools are called from within the context of another scope. As they contain every term that is used in the scope, and include all the relevant meta-data, an MRG serves as the single, authoritative source of that (version of the) scope's terminology.

Installing the Tool

The tool can be installed from the command line and made globally available by executing

npm install -g @tno-terminology-design/mrgt
Before running the tool from the command line, make sure you have met the necessary prerequisites depending on your operating environment.

  1. Node.js and NPM: Ensure Node.js and NPM are installed.
  2. Global Installation: If you have installed the package globally, confirm the global NPM modules path by running npm config get prefix. The global modules are usually stored under <prefix>/node_modules.
  3. Environment Variables: Add the path to global NPM binaries to your system's PATH environment variable. This should be <prefix> on Windows. To add to PATH, you can edit your environment variables or run set PATH=%PATH%;<prefix> in the CMD.

Calling the Tool

The behavior of the MRGT can be configured per call e.g. by a configuration file and/or command-line parameters. The command-line syntax is as follows:

mrgt [ <paramlist> ]

where <paramlist> is an (optional) list of parameters.

Legend

The columns in the following table are defined as follows:

  1. Parameter specifies the parameter and further specifications
  2. Req'd specifies whether (Y) or not (n) the field is required to be present when the tool is being called. If required, it MUST either be present in the configuration file, or as a command-line parameter.
  3. Description specifies the meaning of the Value field, and other things you may need to know, e.g. why it is needed, a required syntax, etc.

If a configuration file used, the long version of the parameter must be used (without the preceding --).

KeyReq'dDescription
-c, --config <path>nPath (including the filename) of the tool's (YAML) configuration file.
-h, --helpndisplay help for command.
-o, --onNotExist <action>nThe action in case a vsntag was specified, but wasn't found in the SAF.
-s, --scopedir <path>nPath of the scope directory from which the tool is called.
-v, --vsntag <vsntag>nVersiontag for which the MRG needs to be (re)generated.
-V, --versionnoutput the version number of the tool.

The <action> parameter can take the following values:

<action>Description
'throw'an error is thrown (an exception is raised), and processing will stop.
'warn'a message is displayed (and logged) and processing continues.
'log'a message is written to a log(file) and processing continues.
'ignore'processing continues as if nothing happened.

Running the Tool

One run of the MRGT either

  • generates an MRG for one specific terminology version within the current scope (which is the case when the version parameter was specified), or it
  • generates multiple MRGs, i.e., one for every version of the terminology that is curated within the current scope (which is the case when the version parameter is omitted).

Running the tool comprises the following phases:1

  1. Constructing a provisional MRG;
  2. Post-processing the entries in that provisional MRG;
  3. Creating/overwriting MRG file(s) in the glossarydir of the current scope.

Phase 1: constructing a provisional MRG

Generating an MRG for a particular version of a terminology starts by reading the SAF of the scope within which that terminology is curated, which exists in the scopedir that was provided as one of the calling parameters. If a vsntag argument is provided, it will search the versions section of the SAF to find the corresponding entry. This corresponding entry will have the value of the vsntag parameter either in its vsntag field, or it is one of the elements in the altvsntags field. If the SAF does not have a corresponding entry, the action specified in the onNotExist parameter will determine whether or not (and how) to proceed.

In this phase, for every terminology version that is to be created, one provisional MRG is created, that contains a provisional MRG entry for every term contained in the particular version of the terminology. This provisional MRG entry either contains:

The Term Selection Instruction syntax specifies precisely how provisional MRGs are created.

After a provisional MRG entry is created, the following modifications are made:

The result is a set of regularized form phrases, which is then used to produce the formPhrases field in the MRG entry.

tip

An MRG SHOULD NOT have two (or more) MRG entries that have a same element in their formPhrases field, because that would mean that the form phrase is ambiguous, as it refers to two different semantic units.

Storing a provisional MRG in the glossarydir

When the creation of a provisional MRG is complete, a filename mrg.<scopetag>.<vsntag>.yaml is constructed, where:

If a file with that name already exists in the glossarydir of the current scope, it will be deleted. Then, a new file with that name will be created, which will contain:

Then, if the <vsntag> part of the filename equals the value of the defaultvsn field in the scope section of the SAF, a copy of that file is created in the glossarydir whose filename is mrg.<scopetag>.yaml, which is the name by which the default MRG of the current scope is referred to.

Next, the MRGT will create a copy of the MRG file for every versiontag that exists in the altvsntags-field of the element in the versions section of the SAF from which the MRG was generated. The copy will contain the same MRG as the file that has just been written. The name of this copied file is mrg.<scopetag>.<altvsntag>.yaml, which is the same name as the MRG file, except that the <vsntag> part of that filename is replaced with the value of the versiontag found in the altvsntags-field.

Phase 2: post processing Synonyms

This phase starts only after all provisional MRGs are created that the MRGT was instructed to build in this run, and the corresponding files have been added to the glossarydir of the current scope. This allows post processing, e.g. of synonyms, to use the newly generated provisional MRG entries

When a provisional MRG entry in (one of) the created provisional MRGs has a synonymOf field that contains a term identifier, this will now refer to either

Effectively, this means that whenever a term is defined as a synonym of some other term, the corresponding MRG entry will have all fields of this other term, except for those that were specified in the header of the term that is defined as a synonym of that other term.

Phase 3: post processing other fields

Now, all provisional MRG entries in all [provisional MRGs] are processed so as to become useable from the context within which they have been selected. This means that every field in such a provisional MRG entry is discarded if the field name (when converted into lowercase), matches any of the field names in the table below, after which the fields in the below table are added with the contents as specified. The MRGT run is concluded after all these modifications have been written to their appropriate MRG files.

FieldValue(s) that are assigned to the fields
scopetagoverwrite the scopetag field with the scopetag field as found in the scope section of the SAF.
locatorpath, relative to scopedir/curatedir/, of the file that contains the (header of) the curated text.
navurl(localized) path to which browsers navigate in order to see the rendered version of the curated text.
headingidsa list of the markdown headings and/or heading ids that are found in the body of the curated text. Note that this body can be either in the curated text file or in a separate body file.

The following sections elaborate on the construction of (the contents) of some of these fields.

The navurl field is constructed by concatenating website/navpath/curatedir/id, where website, navpath and curatedir are given by the contents of the respective fields in the scope section of the SAF.

The id part is one of the following:

  1. if the scope section of the SAF contains the field bodyFileID, then its contents specifies the name of the field that is expected to exist in the header of the curated text, and its value will become the id part. Thus, static site generators such as Docusaurus, which uses the id field to specify this value, can be accommodated.
  2. if the SAF does not specify the bodyFileID field, then id will become the name of the file that contains the rendered version of the body-file as specified in the bodyFile field in the header of the curated text file, or, if that field is empty or non-exitent, the name of the curated text file itself.

Constructing the headingid fields (#headingids-construction)

The headingids field is constructed by finding all markdown headings in the body-file (or the curated text file if there is no separate body file, and making a list out of them.

Example of Markdown Headers and their `headingid` fields

Markdown headings are only recognized when they are preceeded with number signs (#) at the beginning of a line. The alternative syntax, that uses sequences of = or - characters on the next line, is ignored.

Here is an example of a markdown header:


## This is a Markdown Header

This header will result in the text this-is-a-markdown-header being added as an element in the headingids field.

Phase 4: checking the result

The last step consists of checking crucial properties that MRGs are relied on to have, and raising appropriate exceptions in case something is wrong. This helps curators that check the log outputs to become aware of things they may need to fix before these MRGs are further used (or published).

In this step, the following checks are done (as a minimum):

Exceptions, Warnings, and Logging

Editor's note

This section needs to be reviewed/revised so as to enable a consistent way of error checking and logging, similar to what is done in the TRRT

The general principle is that the MRGT helps its users to do their jobs. This means that errors that terminate the processing are limited to the max, that warnings (perhaps at different 'levels' of detail/severity) are given output whenever possible (yet may be limited by command-line options), and that texts are tailored for the envisaged users of the tool.

The MRGT logs conditions that prevent it from properly:

Also, the MRGT provides suggestions that help tool-operators (curators) to not only identify, but also fix any problems.

The MRGT comes with documentation that enables developers to ascertain its correct functioning (e.g. by using a test set of files, test scripts that exercise its parameters, etc.), and also enables them to deploy the tool in a git repo and author/modify CI-pipes to use that deployment.

Notes


  1. The MRGT MUST NOT start by overwriting files that contain an MRG, as they should remain available as a (possible) source for copying MRG entries from during the construction of one or more provisional MRGs. Writing the actual files should be done after all provisional MRGs have been constructed.