Skip to main content

Term Reference Resolution Tool (TRRT)

The Term Ref Resolution Tool (TRRT) is a TEv2 text conversion tool that takes files that contain so-called TermRefs as inputs, and that outputs (a copy of) these files in which these TermRefs are converted into renderable refs.

While TermRefs have a default syntax, alternative syntaxes can be used by choosing another (predefined) syntax, or creating your own syntax (i.e. an interpreter that conforms to the TRRT interpreter profile and configuring it for use by the TRRT.

Renderabe refs do not have a default structure, but there are various predefined converters that can be chosen, and subsequently specified for use by the TRRT.

Examples of TermRef conversions

Consider the TermRef [the purpose of actors](actor#purpose@essif-lab). The following table shows some of the predefined methods that they can be converted into:

[the purpose of actors](https://essif-lab.github.io/framework/docs/terms/actor#purpose)

which is text that a markdown interpreter will render into a text the purpose of actors that hyperlinks to the path https://essif-lab.github.io/framework/docs/terms/actor#purpose.

Installing the Tool

The tool can be installed from the command line and made globally available by executing

npm install -g @tno-terminology-design/trrt
Before running the tool from the command line, make sure you have met the necessary prerequisites depending on your operating environment.

  1. Node.js and NPM: Ensure Node.js and NPM are installed.
  2. Global Installation: If you have installed the package globally, confirm the global NPM modules path by running npm config get prefix. The global modules are usually stored under <prefix>/node_modules.
  3. Environment Variables: Add the path to global NPM binaries to your system's PATH environment variable. This should be <prefix> on Windows. To add to PATH, you can edit your environment variables or run set PATH=%PATH%;<prefix> in the CMD.

Calling the Tool

The behavior of the TRRT can be configured per call e.g. by a configuration file and/or command line parameters. The command line syntax is as follows:

trrt [ <paramlist> ] [ <globpattern> ]

where:

  • <paramlist> is an (optional) list of parameters, as specified in the table below.
  • globpattern (optional) specifies a set of (input) files that are to be processed. If a configuration file is used, its contents may specify an additional set of input files to be processed.
Legend

The columns in the following table are defined as follows:

  1. Parameter specifies the parameter and further specifications
  2. Req'd specifies whether (Y) or not (n) the field is required to be present when the tool is being called. If required, it MUST either be present in the configuration file, or as a command line parameter.
  3. Description specifies the meaning of the Value field, and other things you may need to know, e.g. why it is needed, a required syntax, etc.

If a configuration file is used, the long version of the parameter must be used (without the preceding --).

ParameterReq'dDescription
-V, --versionnOutput the version number of the tool.
-c, --config <path>nPath (including the filename) of the tool's (YAML) configuration file.
-o, --output <dir>Y(Root) directory for output files to be written.
-s, --scopedir <path>YPath of the scope directory where the SAF is located.
-int, --interpreter <type> or <regex>nSpecifies the interpreter to be used to detect TermRefs. This can either be a predefined interpreter, or a regex. See TRRT Converters for details.
-con[n], --converter[n] <type> or <hexpr>1nSpecifies the converter to be used to produce the converted TermRef. This can either be a predefined converter, or a handlebars expression. See TRRT Converters for details.
-f, --forcenAllow overwriting of existing files.
-h, --helpnDisplay help for command.

Term Ref Resolution

All text conversion tools, including the TRRT, convert (input) text files into results (output text files) by locating particular text patterns, doing some processing, and constructing texts that are used to replace the located text patterns. This process is illustrated in the figure below, and further explained in the page TEv2 Text Conversion:

The generic text conversion pattern on which the toolbox is basedFigure 1: The (generic) parts of a Text Conversion

The following subsections specify the particulars of the TRRT: the interpreter profile, its predefined interpreters, the intermediate processing, the construction of its converter profile and its predefined converters.

TRRT Interpreter Profile

The interpreter profile of the TRRT consist of the following named capturing groups that are used by the predefined interpreters.

Legend
  1. Group name of the capturing group;
  2. Req'd specifies whether (Y) or not (n, or F) the group is required to have non-empty contents. The F means that we reserve this field for Future Use.
  3. Description specifies the meaning (purpose) for which the contents of the capturing group will be used.
GroupReq'dDescription
showtextYThe text in a TermRef that the author expects to be rendered.
typenThe term type of the semantic unit that is referred to.
termnThe term of the semantic unit that is referred to.
traitnA text that is expected to correspond with one of the headingids in the MRG entry of the semantic unit that is referred to.
scopetagnThe scopetag that identifies the scope that curates the semantic unit that is referred to.
vsntagnA versiontag that identifies the terminology that contains the semantic unit that is referred to.
info

Note that the names of some of these capturing groups do not correspond 1-1 with the names of moustache variables in the converter profile of the TRRT.

TRRT Predefined Interpreters

The following sections specify the predefined intepreters for the TRRT.

The default Interpreter

The most general form of the default interpreter syntax is:

[show text](termType:term#trait@scopetag:vsntag)

where:

  • show text (required) is the text that will be highlighted/emphasized to indicate it is linked. It must not contain the characters @ or ] (this is needed to distinguish TermRefs from regular markdown links).
  • termType (optional) is a term type. It need not be specified if the term field is (already) a unique identifier for the semantic unit that is being refered to.
  • term (optional) is a term. It need not be specified if the term can be derived from the showtext, as specified in the section on Finding an MRG Entry (bullet 2.ii).
  • trait (optional) refers to a particular characteristic of the semantic unit. It need not be specified if the reference is not to a particular characteristic. If it is specified, it must be a heading id of the section in the body of a curated text that describes the characteristic.
  • scopetag:vsntag (optional) is a terminology-identifier. If not specified, its value is taken to be the default terminology of the current scope.

For completeness, below is the regex that defines the default interpreter for the TRRT.

(?:(?<=[^`\\])|^)\[(?=[^@\]]+\]\([#a-z0-9_-]*@[:a-z0-9_-]*\))(?<showtext>[^\n\]@]+)\]\((?:(?:(?<type>[a-z0-9_-]*):)?)(?:(?<id>[a-z0-9_-]*)?(?:#(?<trait>[a-z0-9_-]*))?)?@(?<scopetag>[a-z0-9_-]*)(?::(?<vsntag>[a-z0-9_-]*))?\)

The alt (alternative) Interpreter

It is convenient for authors to be able to use the '@scopetag' part of a (default) TermRef immediately behind the show text within the square brackets ([ and ]), and leave out the parentheses and the text in between if all the other items are omitted.

This is particularly useful in the vast majority of cases, where the default processing of showtext results in term and trait is absent. Examples of this are [definition](@), or [TermRef](@).

The usefulness becomes even greater as the TRRT also implements more sophisticated ways to derive a term from a show text, e.g. to accommodate for plural forms (of nouns), or conjugate forms (for verbs).

info

This alternative notation will assume that the showtext part of a TermRef won't contain the @ character. However, it is likely that some authors will want to use an email address as the showtext part of a regular link, e.g. as in [rieks.joosten@tno.nl](mailto:rieks.joosten@tno.nl). However, since scopetags should not contain .-characters, [rieks.joosten@tno.nl] does not qualify as a showtext in our syntax. Authors should use angle brackets to link to email addresses, as in <rieks.joosten@tno.nl>.

This leads to an alternative notation that can be used in addition to the previously specified notation. Here is the alternative syntax and its equivalent counterpart:

alt syntaxEquivalent default syntax
[show text@][show text](@)
[show text@scopetag][show text](showtext@scopetag)
[show text@scopetag:vsntag](term#trait)[show text](term#trait@scopetag:vsntag)

In the last row of the above table, term and #trait are optional. Thus, [definition@]() is equivalent with the alt syntax [definition@] and with the default syntax [definition](@).

For completeness, below is the regex that defines the alt interpreter for the TRRT.

(?:(?<=[^`\\])|^)\[(?=[^@\]]+@[:a-z0-9_-]*\](?:\([#a-z0-9_-]+\))?)(?<showtext>[^\n\]@]+?)@(?<scopetag>[a-z0-9_-]*)(?::(?<vsntag>[a-z0-9_-]*?))?\](?:\((?:(?:(?<type>[a-z0-9_-]+):)?)(?<id>[a-z0-9_-]*)(?:#(?<trait>[a-z0-9_-]*?))?\))

Processing

The purpose of the TRRT is to allow input texts to contain TermRefs that refer to a semantic unit, and convert them into renderable refs that exhibit more information about such semantic units.

To do that, the TRRT uses the interpreter to locate subsequent TermRefs in its input files, and for each of them, processes the named capturing groups that the interpreter populates. Then, it will attempt to find the MRG entry that documents the semantic unit to which the term ref refers. When found (without ambiguities), it will populate moustache variables as specified in its converter profile, and use the specified converter to produce the text by which the TermRef will be replaced.

Finding the MRG entry associated with a TermRef

  1. Get the MRG file that is expected to contain the MRG entry, by resolving the terminology identifier that consists of the named capturing groups scopetag:vsntag. Note that if scopetag wasn't populated, the default scope is assumed, and if vsntag isn't populated, the default version is used.
  2. Locate the MRG entry in this MRG, using the values of the named capturing groups termtype and term, as follows.
    1. Initialize this step by selecting all MRG entries from the MRG (the idea is to limit the number of selected entries step by step, until there is no more than one).
    2. Process the named capturing groups, as follows.
      1. If termtype is specified, then remove all entries except those whose termtype field equals the specified value of termtype.
      2. Remove all entries except those that produce a match according to the following process:
        1. normalize the text in the named capturing group term, or, if that is not specified, the named capturing group showtext, as follows:
          1. convert the text to lowercase;
          2. remove any leading and/or trailing spaces.
        2. there is a match with an MRG entry if the result of this conversion is either a form phrase that appears in the formPhrases field of that MRG entry, or if the result matches the term field of that MRG entry.2
      3. If the remaining set of entries includes more than one element, then keep only the entries whose termType field contains the value specified by the defaulttype field as specified in the terminology section of the MRG.
  3. If the remaining set of entries is either empty (not found), or contains multiple entries (ambiguous TermRef), an appropriate exception must be raised (and logged), and conversion of (only!) this TermRef is discontinued.
  4. If the remaining set of entries contains precisely one element, its fields will be made available as moustache variables for further processing by the converter.
Editor's note

The Porter Stemming Algorithm is a process for removing the commoner morphological and inflexional endings from words in English. Its main use is as part of a term normalisation process that is usually done when setting up Information Retrieval systems. The mentioned site links to lots of freely useable code that the TRRT might want to consider using.

Perhaps the TRRT may use this tool as a means for generating the term field from the showtext if necessary. However, we would need to first experiment with that to see whether or not, c.q. to what extent this conversion does what it is expected to do.

TRRT Converter Profile

The converter profile of the TRRT consists of a set of moustache variables that are populated from the following sources.

Note that MRG entries may have fields that are not required by the TEv2 specifications, but by the curator(s) of the terminology to which the such MRG entries belong. For example, the curator(s) of the TEv2 terminologies have specified that MRG entries could have the fields glossaryTerm and glossaryText. These fields are then also available as moustache variables as part of the converter profile for the TRRT.

TRRT Predefined Converters

The following tabs specify the predefined converters for the TRRT.

The markdown-link converter is defined by the following handlebars expression.

[{{showtext}}]({{navurl}}{{#if trait}}#{{trait}}{{/if}})

Errors and Warnings

The TRRT starts by reading its command line and optional configuration file. If the command line has a key that is also found in the configuration file, the command line key-value pair takes precedence. The resulting set of key-value pairs is tested for proper syntax and validity. Every improper syntax and every invalidity found will be logged. Improper syntax may be e.g. an invalid globpattern. Invalid conditions include non-existing directories or files, lack of write-permissions where needed, etc.

The TRRT logs every error- and/or warning condition that it comes across while processing its configuration file, command line parameters, and input files, in a way that helps tool operators and document authors to identify and fix such conditions.

Deploying the Tool

The TRRT comes with documentation that enables developers to ascertain its correct functioning (e.g. by using a test set of files, test scripts that exercise its parameters, etc.), and also enables them to deploy the tool in a git repo and author/modify CI-pipes to use that deployment.


  1. Multiple converters may be specified by appending a number to the parameter key, e.g., converter[1]: <template> converter[2]: <template>, where n is the termid occurrence count from which to start using a specific converter during resolution of a file. Using converter, without a number, is equal to using converter[0]
  2. Matching with the term field enables one to specify form phrases that include a space, yet use a term that has replaced the space with a -.