Term Reference Resolution Tool (TRRT)

The Term Ref Resolution Tool (TRRT) is a TEv2 text conversion tool that takes files that contain so-called TermRefs as inputs, and that outputs (a copy of) these files in which these TermRefs are converted into renderable refs.

While TermRefs have a default syntax, alternative syntaxes can be used by choosing another (predefined) syntax, or creating your own syntax (i.e. an interpreter that conforms to the TRRT interpreter profile and configuring it for use by the TRRT.

Renderabe refs do not have a default structure, but there are various predefined converters that can be chosen, and subsequently specified for use by the TRRT.

Examples of TermRef conversions

Consider the TermRef [the purpose of actors](actor#purpose@essif-lab). The following table shows some of the predefined methods that they can be converted into:

Markdown
HTML
HTML with hovertext
Customizing

[the purpose of actors](https://essif-lab.github.io/framework/docs/terms/actor#purpose)

which is text that a markdown interpreter will render into a text the purpose of actors that hyperlinks to the path https://essif-lab.github.io/framework/docs/terms/actor#purpose.

<a href="https://essif-lab.github.io/framework/docs/terms/actor#purpose">
  the purpose of actors
</a>

which is code that will render the text the purpose of actors as a hyperlink, that, when clicked, will navigate to the purpose section of the page that documents (the semantic unit called) actor.

<a href="https://essif-lab.github.io/framework/docs/terms/actor#purpose"
    title="Actor: an Entity that can act (do things/execute Actions), e.g. people, machines, but not Organizations">
  the purpose of actors
</a>

which is code that will render the text the purpose of actors as a hyperlink. When a user hovers over the hyperlink, a popup appears showing the text Actor: an Entity that can act (do things/execute Actions), e.g. people, machines, but not Organizations. When a user clicks on it, it will navigate to the purpose section of the page that documents the semantic unit called actor.

By devising one's own converter, it is possible to create arbitrary customized renderable refs. Suppose you have a React component that supports the use of the tags <Term ...> and </Term> that support tooltip and linking functionality. You could create a converter that would produce the following:

<Term popup="An Actor is someone or something that can act, i.e. actually do things, execute actions, such as people or machines." reference="actor">
  the purpose of actors
</Term>

Installing the Tool

The tool can be installed from the command line and made globally available by executing

npm install -g @tno-terminology-design/trrt

Before running the tool from the command line, make sure you have met the necessary prerequisites depending on your operating environment.

CMD.exe (Windows)
PowerShell (Windows)
Bash (Linux/Mac)

Node.js and NPM: Ensure Node.js and NPM are installed.
Global Installation: If you have installed the package globally, confirm the global NPM modules path by running npm config get prefix. The global modules are usually stored under <prefix>/node_modules.
Environment Variables: Add the path to global NPM binaries to your system's PATH environment variable. This should be <prefix> on Windows. To add to PATH, you can edit your environment variables or run set PATH=%PATH%;<prefix> in the CMD.

Node.js and NPM: Ensure Node.js and NPM are installed.
Global Installation: Check the global NPM modules path as in CMD.
Environment Variables: Update the PATH environment variable as in CMD. You can also use $env:Path += ";<prefix>" to update the PATH temporarily in the current PowerShell session.

Node.js and NPM: Ensure Node.js and NPM are installed.
Global Installation: If globally installed, run npm config get prefix to get the global modules path, usually <prefix>/lib/node_modules.
Environment Variables: Add the <prefix>/bin directory to your PATH if it's not already. You can do this by adding export PATH=$PATH:<prefix>/bin to your ~/.bashrc or ~/.zshrc file.

Calling the Tool

The behavior of the TRRT can be configured per call e.g. by a configuration file and/or command line parameters. The command line syntax is as follows:

trrt [ <paramlist> ] [ <globpattern> ]

where:

<paramlist> is an (optional) list of parameters, as specified in the table below.
globpattern (optional) specifies a set of (input) files that are to be processed. If a configuration file is used, its contents may specify an additional set of input files to be processed.

Legend

The columns in the following table are defined as follows:

Parameter specifies the parameter and further specifications
Req'd specifies whether (Y) or not (n) the field is required to be present when the tool is being called. If required, it MUST either be present in the configuration file, or as a command line parameter.
Description specifies the meaning of the Value field, and other things you may need to know, e.g. why it is needed, a required syntax, etc.

If a configuration file is used, the long version of the parameter must be used (without the preceding --).

Parameter	Req'd	Description
`-V`, `--version`	n	Output the version number of the tool.
`-c`, `--config <path>`	n	Path (including the filename) of the tool's (YAML) configuration file.
`-o`, `--output <dir>`	Y	(Root) directory for output files to be written.
`-s`, `--scopedir <path>`	Y	Path of the scope directory where the SAF is located.
`-int`, `--interpreter <type> or <regex>`	n	Specifies the interpreter to be used to detect TermRefs. This can either be a predefined interpreter, or a regex. See TRRT Converters for details.
`-con[n]`, `--converter[n] <type> or <hexpr>`¹	n	Specifies the converter to be used to produce the converted TermRef. This can either be a predefined converter, or a handlebars expression. See TRRT Converters for details.
`-f`, `--force`	n	Allow overwriting of existing files.
`-h`, `--help`	n	Display help for command.

Term Ref Resolution

All text conversion tools, including the TRRT, convert (input) text files into results (output text files) by locating particular text patterns, doing some processing, and constructing texts that are used to replace the located text patterns. This process is illustrated in the figure below, and further explained in the page TEv2 Text Conversion:

The generic text conversion pattern on which the toolbox is based Figure 1: The (generic) parts of a Text Conversion

The following subsections specify the particulars of the TRRT: the interpreter profile, its predefined interpreters, the intermediate processing, the construction of its converter profile and its predefined converters.

TRRT Interpreter Profile

The interpreter profile of the TRRT consist of the following named capturing groups that are used by the predefined interpreters.

Legend

Group name of the capturing group;
Req'd specifies whether (Y) or not (n, or F) the group is required to have non-empty contents. The F means that we reserve this field for Future Use.
Description specifies the meaning (purpose) for which the contents of the capturing group will be used.

Group	Req'd	Description
`showtext`	Y	The text in a TermRef that the author expects to be rendered.
`type`	n	The term type of the semantic unit that is referred to.
`term`	n	The term of the semantic unit that is referred to.
`trait`	n	A text that is expected to correspond with one of the `headingids` in the MRG entry of the semantic unit that is referred to.
`scopetag`	n	The scopetag that identifies the scope that curates the semantic unit that is referred to.
`vsntag`	n	A versiontag that identifies the terminology that contains the semantic unit that is referred to.

info

Note that the names of some of these capturing groups do not correspond 1-1 with the names of moustache variables in the converter profile of the TRRT.

TRRT Predefined Interpreters

The following sections specify the predefined intepreters for the TRRT.

The `default` Interpreter

The most general form of the default interpreter syntax is:

[show text](termType:term#trait@scopetag:vsntag)

where:

show text (required) is the text that will be highlighted/emphasized to indicate it is linked. It must not contain the characters @ or ] (this is needed to distinguish TermRefs from regular markdown links).
termType (optional) is a term type. It need not be specified if the term field is (already) a unique identifier for the semantic unit that is being refered to.
term (optional) is a term. It need not be specified if the term can be derived from the showtext, as specified in the section on Finding an MRG Entry (bullet 2.ii).
trait (optional) refers to a particular characteristic of the semantic unit. It need not be specified if the reference is not to a particular characteristic. If it is specified, it must be a heading id of the section in the body of a curated text that describes the characteristic.
scopetag:vsntag (optional) is a terminology-identifier. If not specified, its value is taken to be the default terminology of the current scope.

For completeness, below is the regex that defines the default interpreter for the TRRT.

(?:(?<=[^`\\])|^)\[(?=[^@\]]+\]\([#a-z0-9_-]*@[:a-z0-9_-]*\))(?<showtext>[^\n\]@]+)\]\((?:(?:(?<type>[a-z0-9_-]*):)?)(?:(?<id>[a-z0-9_-]*)?(?:#(?<trait>[a-z0-9_-]*))?)?@(?<scopetag>[a-z0-9_-]*)(?::(?<vsntag>[a-z0-9_-]*))?\)

The `alt` (alternative) Interpreter

It is convenient for authors to be able to use the '@scopetag' part of a (default) TermRef immediately behind the show text within the square brackets ([ and ]), and leave out the parentheses and the text in between if all the other items are omitted.

This is particularly useful in the vast majority of cases, where the default processing of showtext results in term and trait is absent. Examples of this are [definition](@), or [TermRef](@).

The usefulness becomes even greater as the TRRT also implements more sophisticated ways to derive a term from a show text, e.g. to accommodate for plural forms (of nouns), or conjugate forms (for verbs).

info

This alternative notation will assume that the showtext part of a TermRef won't contain the @ character. However, it is likely that some authors will want to use an email address as the showtext part of a regular link, e.g. as in [rieks.joosten@tno.nl](mailto:rieks.joosten@tno.nl). However, since scopetags should not contain .-characters, [rieks.joosten@tno.nl] does not qualify as a showtext in our syntax. Authors should use angle brackets to link to email addresses, as in <rieks.joosten@tno.nl>.

This leads to an alternative notation that can be used in addition to the previously specified notation. Here is the alternative syntax and its equivalent counterpart:

`alt` syntax	Equivalent `default` syntax
[`show text`@]	[`show text`](@)
[`show text`@`scopetag`]	[`show text`](`showtext`@`scopetag`)
[`show text`@`scopetag`:`vsntag`](`term`#`trait`)	[`show text`](`term`#`trait`@`scopetag`:`vsntag`)

In the last row of the above table, term and #trait are optional. Thus, [definition@]() is equivalent with the alt syntax [definition@] and with the default syntax [definition](@).

For completeness, below is the regex that defines the alt interpreter for the TRRT.

(?:(?<=[^`\\])|^)\[(?=[^@\]]+@[:a-z0-9_-]*\](?:\([#a-z0-9_-]+\))?)(?<showtext>[^\n\]@]+?)@(?<scopetag>[a-z0-9_-]*)(?::(?<vsntag>[a-z0-9_-]*?))?\](?:\((?:(?:(?<type>[a-z0-9_-]+):)?)(?<id>[a-z0-9_-]*)(?:#(?<trait>[a-z0-9_-]*?))?\))

Processing

The purpose of the TRRT is to allow input texts to contain TermRefs that refer to a semantic unit, and convert them into renderable refs that exhibit more information about such semantic units.

To do that, the TRRT uses the interpreter to locate subsequent TermRefs in its input files, and for each of them, processes the named capturing groups that the interpreter populates. Then, it will attempt to find the MRG entry that documents the semantic unit to which the term ref refers. When found (without ambiguities), it will populate moustache variables as specified in its converter profile, and use the specified converter to produce the text by which the TermRef will be replaced.

Finding the MRG entry associated with a TermRef

Get the MRG file that is expected to contain the MRG entry, by resolving the terminology identifier that consists of the named capturing groups scopetag:vsntag. Note that if scopetag wasn't populated, the default scope is assumed, and if vsntag isn't populated, the default version is used.
Locate the MRG entry in this MRG, using the values of the named capturing groups termtype and term, as follows.
1. Initialize this step by selecting all MRG entries from the MRG (the idea is to limit the number of selected entries step by step, until there is no more than one).
2. Process the named capturing groups, as follows.
  1. If termtype is specified, then remove all entries except those whose termtype field equals the specified value of termtype.
  2. Remove all entries except those that produce a match according to the following process:
    1. normalize the text in the named capturing group term, or, if that is not specified, the named capturing group showtext, as follows:
      1. convert the text to lowercase;
      2. remove any leading and/or trailing spaces.
    2. there is a match with an MRG entry if the result of this conversion is either a form phrase that appears in the formPhrases field of that MRG entry, or if the result matches the term field of that MRG entry.²
  3. If the remaining set of entries includes more than one element, then keep only the entries whose termType field contains the value specified by the defaulttype field as specified in the terminology section of the MRG.
If the remaining set of entries is either empty (not found), or contains multiple entries (ambiguous TermRef), an appropriate exception must be raised (and logged), and conversion of (only!) this TermRef is discontinued.
If the remaining set of entries contains precisely one element, its fields will be made available as moustache variables for further processing by the converter.

Editor's note

The Porter Stemming Algorithm is a process for removing the commoner morphological and inflexional endings from words in English. Its main use is as part of a term normalisation process that is usually done when setting up Information Retrieval systems. The mentioned site links to lots of freely useable code that the TRRT might want to consider using.

Perhaps the TRRT may use this tool as a means for generating the term field from the showtext if necessary. However, we would need to first experiment with that to see whether or not, c.q. to what extent this conversion does what it is expected to do.

TRRT Converter Profile

The converter profile of the TRRT consists of a set of moustache variables that are populated from the following sources.

The named capturing groups as specified by the interpreter profile of the TRRT. Since only the named capturing groups showtext and trait are useful for a converter, they are made available as moustache variables {{showtext}} and {{trait}} respectively.
The fields in the MRG entry of the semantic unit that the term ref refers to. Each of the fields in that MRG entry is available as a moustache variable.

Note that MRG entries may have fields that are not required by the TEv2 specifications, but by the curator(s) of the terminology to which the such MRG entries belong. For example, the curator(s) of the TEv2 terminologies have specified that MRG entries could have the fields glossaryTerm and glossaryText. These fields are then also available as moustache variables as part of the converter profile for the TRRT.

TRRT Predefined Converters

The following tabs specify the predefined converters for the TRRT.

markdown-link
html-link
html-hovertext-link
html-glossarytext-link

The `markdown-link` Converter

The markdown-link converter is defined by the following handlebars expression.

[{{showtext}}]({{navurl}}{{#if trait}}#{{trait}}{{/if}})

The `html-link` Converter

The html-link converter is defined by the following handlebars expression.

<a href="{{navurl}}{{#if trait}}#{{trait}}{{/if}}">{{showtext}}</a>

The `html-hovertext-link` Converter

The html-hovertext-link converter is defined by the following handlebars expression (newlines and whitespaces have been added for better readability, and should be ignored).

<a href="{{localize navurl}}{{#if trait}}#{{trait}}{{/if}}" 
  title="{{#if hoverText}}
           {{hoverText}}
         {{else}}
           {{#if glossaryTerm}}
             {{glossaryTerm}}
           {{else}}
             {{capFirst term}}
           {{/if}}
           : {{noRefs glossaryText type='markdown'}}
         {{/if}}"
>{{showtext}}</a>

This converter uses the following functions.

localize: converts the URL of its argument (i.e., navurl) with a (shorter) version by removing the protocol and host parts in case the resource is located on the same site.
capFirst: capitalizes the first character of every word found in its argument.
noRefs: replaces every reference (in this case all markdown links) that it finds in the text of its argument (i.e., in the glossaryText) with {{capFirst showtext}}.

The `html-glossarytext-link` Converter

The html-glossarytext-link converter is defined by the following handlebars expression (newlines and whitespaces have been added for better readability, and should be ignored):

<a href="{{localize navurl}}{{#if trait}}#{{trait}}{{/if}}"
    title="{{capFirst term}}: {{noRefs glossaryText type='markdown'}}"
  >{{showtext}}
</a>

This converter uses the following functions.

localize: converts the URL of its argument (i.e., navurl) with a (shorter) version by removing the protocol and host parts in case the resource is located on the same site.
capFirst: capitalizes the first character of every word found in its argument.
noRefs: replaces every reference (in this case all markdown links) that it finds in the text of its argument (i.e., in the glossaryText) with {{capFirst showtext}}.

Errors and Warnings

The TRRT starts by reading its command line and optional configuration file. If the command line has a key that is also found in the configuration file, the command line key-value pair takes precedence. The resulting set of key-value pairs is tested for proper syntax and validity. Every improper syntax and every invalidity found will be logged. Improper syntax may be e.g. an invalid globpattern. Invalid conditions include non-existing directories or files, lack of write-permissions where needed, etc.

The TRRT logs every error- and/or warning condition that it comes across while processing its configuration file, command line parameters, and input files, in a way that helps tool operators and document authors to identify and fix such conditions.

Deploying the Tool

The TRRT comes with documentation that enables developers to ascertain its correct functioning (e.g. by using a test set of files, test scripts that exercise its parameters, etc.), and also enables them to deploy the tool in a git repo and author/modify CI-pipes to use that deployment.

Multiple converters may be specified by appending a number to the parameter key, e.g., converter[1]: <template> converter[2]: <template>, where n is the termid occurrence count from which to start using a specific converter during resolution of a file. Using converter, without a number, is equal to using converter[0]↩
Matching with the term field enables one to specify form phrases that include a space, yet use a term that has replaced the space with a -.↩

Term Reference Resolution Tool (TRRT)

Installing the Tool​

Calling the Tool​

Term Ref Resolution​

TRRT Interpreter Profile​

TRRT Predefined Interpreters​

The default Interpreter​

The alt (alternative) Interpreter​

Processing​

Finding the MRG entry associated with a TermRef​

TRRT Converter Profile​

TRRT Predefined Converters​

The markdown-link Converter​

The html-link Converter​

The html-hovertext-link Converter​

The html-glossarytext-link Converter​

Errors and Warnings​

Deploying the Tool​