Term Reference Resolution Tool (TRRT)
The Term Ref Resolution Tool (TRRT) is a TEv2 text conversion tool that takes files that contain so-called TermRefs as inputs, and that outputs (a copy of) these files in which these TermRefs are converted into renderable refs.
While TermRefs have a default syntax, alternative syntaxes can be used by choosing another (predefined) syntax, or creating your own syntax (i.e. an interpreter that conforms to the TRRT interpreter profile and configuring it for use by the TRRT.
Renderabe refs do not have a default structure, but there are various predefined converters that can be chosen, and subsequently specified for use by the TRRT.
Examples of TermRef conversions
Consider the TermRef [the purpose of actors](actor#purpose@essif-lab)
. The following table shows some of the predefined methods that they can be converted into:
- Markdown
- HTML
- HTML with hovertext
- Customizing
[the purpose of actors](https://essif-lab.github.io/framework/docs/terms/actor#purpose)
which is text that a markdown interpreter will render into a text the purpose of actors
that hyperlinks to the path https://essif-lab.github.io/framework/docs/terms/actor#purpose
.
<a href="https://essif-lab.github.io/framework/docs/terms/actor#purpose">
the purpose of actors
</a>
which is code that will render the text the purpose of actors
as a hyperlink, that, when clicked, will navigate to the purpose
section of the page that documents (the semantic unit called) actor
.
<a href="https://essif-lab.github.io/framework/docs/terms/actor#purpose"
title="Actor: an Entity that can act (do things/execute Actions), e.g. people, machines, but not Organizations">
the purpose of actors
</a>
which is code that will render the text the purpose of actors
as a hyperlink. When a user hovers over the hyperlink, a popup appears showing the text Actor: an Entity that can act (do things/execute Actions), e.g. people, machines, but not Organizations
. When a user clicks on it, it will navigate to the purpose
section of the page that documents the semantic unit called actor
.
By devising one's own converter, it is possible to create arbitrary customized renderable refs. Suppose you have a React component that supports the use of the tags <Term ...>
and </Term>
that support tooltip and linking functionality. You could create a converter that would produce the following:
<Term popup="An Actor is someone or something that can act, i.e. actually do things, execute actions, such as people or machines." reference="actor">
the purpose of actors
</Term>
Installing the Tool
The tool can be installed from the command line and made globally available by executing
npm install -g @tno-terminology-design/trrt
Before running the tool from the command line, make sure you have met the necessary prerequisites depending on your operating environment.
- CMD.exe (Windows)
- PowerShell (Windows)
- Bash (Linux/Mac)
- Node.js and NPM: Ensure Node.js and NPM are installed.
- Global Installation: If you have installed the package globally, confirm the global NPM modules path by running
npm config get prefix
. The global modules are usually stored under<prefix>/node_modules
. - Environment Variables: Add the path to global NPM binaries to your system's PATH environment variable. This should be
<prefix>
on Windows. To add to PATH, you can edit your environment variables or runset PATH=%PATH%;<prefix>
in the CMD.
- Node.js and NPM: Ensure Node.js and NPM are installed.
- Global Installation: Check the global NPM modules path as in CMD.
- Environment Variables: Update the PATH environment variable as in CMD. You can also use
$env:Path += ";<prefix>"
to update the PATH temporarily in the current PowerShell session.
- Node.js and NPM: Ensure Node.js and NPM are installed.
- Global Installation: If globally installed, run
npm config get prefix
to get the global modules path, usually<prefix>/lib/node_modules
. - Environment Variables: Add the
<prefix>/bin
directory to yourPATH
if it's not already. You can do this by addingexport PATH=$PATH:<prefix>/bin
to your~/.bashrc
or~/.zshrc
file.
Calling the Tool
The behavior of the TRRT can be configured per call e.g. by a configuration file and/or command line parameters. The command line syntax is as follows:
trrt [ <paramlist> ] [ <globpattern> ]
where:
<paramlist>
is an (optional) list of parameters, as specified in the table below.globpattern
(optional) specifies a set of (input) files that are to be processed. If a configuration file is used, its contents may specify an additional set of input files to be processed.
Legend
The columns in the following table are defined as follows:
Parameter
specifies the parameter and further specificationsReq'd
specifies whether (Y
) or not (n
) the field is required to be present when the tool is being called. If required, it MUST either be present in the configuration file, or as a command line parameter.Description
specifies the meaning of theValue
field, and other things you may need to know, e.g. why it is needed, a required syntax, etc.
If a configuration file is used, the long version of the parameter must be used (without the preceding --
).
Parameter | Req'd | Description |
---|---|---|
-V , --version | n | Output the version number of the tool. |
-c , --config <path> | n | Path (including the filename) of the tool's (YAML) configuration file. |
-o , --output <dir> | Y | (Root) directory for output files to be written. |
-s , --scopedir <path> | Y | Path of the scope directory where the SAF is located. |
-int , --interpreter <type> or <regex> | n | Specifies the interpreter to be used to detect TermRefs. This can either be a predefined interpreter, or a regex. See TRRT Converters for details. |
-con[n] , --converter[n] <type> or <hexpr> 1 | n | Specifies the converter to be used to produce the converted TermRef. This can either be a predefined converter, or a handlebars expression. See TRRT Converters for details. |
-f , --force | n | Allow overwriting of existing files. |
-h , --help | n | Display help for command. |
Term Ref Resolution
All text conversion tools, including the TRRT, convert (input) text files into results (output text files) by locating particular text patterns, doing some processing, and constructing texts that are used to replace the located text patterns. This process is illustrated in the figure below, and further explained in the page TEv2 Text Conversion:
Figure 1: The (generic) parts of a Text Conversion
The following subsections specify the particulars of the TRRT: the interpreter profile, its predefined interpreters, the intermediate processing, the construction of its converter profile and its predefined converters.
TRRT Interpreter Profile
The interpreter profile of the TRRT consist of the following named capturing groups that are used by the predefined interpreters.
Legend
Group
name of the capturing group;Req'd
specifies whether (Y
) or not (n
, orF
) the group is required to have non-empty contents. TheF
means that we reserve this field for Future Use.Description
specifies the meaning (purpose) for which the contents of the capturing group will be used.
Group | Req'd | Description |
---|---|---|
showtext | Y | The text in a TermRef that the author expects to be rendered. |
type | n | The term type of the semantic unit that is referred to. |
term | n | The term of the semantic unit that is referred to. |
trait | n | A text that is expected to correspond with one of the headingids in the MRG entry of the semantic unit that is referred to. |
scopetag | n | The scopetag that identifies the scope that curates the semantic unit that is referred to. |
vsntag | n | A versiontag that identifies the terminology that contains the semantic unit that is referred to. |
Note that the names of some of these capturing groups do not correspond 1-1 with the names of moustache variables in the converter profile of the TRRT.
TRRT Predefined Interpreters
The following sections specify the predefined intepreters for the TRRT.
The default
Interpreter
The most general form of the default
interpreter syntax is:
[show text
](termType
:term
#trait
@scopetag
:vsntag
)
where:
show text
(required) is the text that will be highlighted/emphasized to indicate it is linked. It must not contain the characters@
or]
(this is needed to distinguish TermRefs from regular markdown links).termType
(optional) is a term type. It need not be specified if theterm
field is (already) a unique identifier for the semantic unit that is being refered to.term
(optional) is a term. It need not be specified if the term can be derived from theshowtext
, as specified in the section on Finding an MRG Entry (bullet 2.ii).trait
(optional) refers to a particular characteristic of the semantic unit. It need not be specified if the reference is not to a particular characteristic. If it is specified, it must be a heading id of the section in the body of a curated text that describes the characteristic.scopetag
:vsntag
(optional) is a terminology-identifier. If not specified, its value is taken to be the default terminology of the current scope.
For completeness, below is the regex that defines the default
interpreter for the TRRT.
(?:(?<=[^`\\])|^)\[(?=[^@\]]+\]\([#a-z0-9_-]*@[:a-z0-9_-]*\))(?<showtext>[^\n\]@]+)\]\((?:(?:(?<type>[a-z0-9_-]*):)?)(?:(?<id>[a-z0-9_-]*)?(?:#(?<trait>[a-z0-9_-]*))?)?@(?<scopetag>[a-z0-9_-]*)(?::(?<vsntag>[a-z0-9_-]*))?\)
The alt
(alternative) Interpreter
It is convenient for authors to be able to use the '@scopetag
' part of a (default
) TermRef immediately behind the show text
within the square brackets ([
and ]
), and leave out the parentheses and the text in between if all the other items are omitted.
This is particularly useful in the vast majority of cases, where the default processing of showtext
results in term
and trait
is absent. Examples of this are [definition](@)
, or [TermRef](@)
.
The usefulness becomes even greater as the TRRT also implements more sophisticated ways to derive a term
from a show text
, e.g. to accommodate for plural forms (of nouns), or conjugate forms (for verbs).
This alternative notation will assume that the showtext
part of a TermRef won't contain the @
character. However, it is likely that some authors will want to use an email address as the showtext
part of a regular link, e.g. as in [rieks.joosten@tno.nl](mailto:rieks.joosten@tno.nl)
. However, since scopetags should not contain .
-characters, [rieks.joosten@tno.nl]
does not qualify as a showtext
in our syntax. Authors should use angle brackets to link to email addresses, as in <rieks.joosten@tno.nl>
.
This leads to an alternative notation that can be used in addition to the previously specified notation. Here is the alternative syntax and its equivalent counterpart:
alt syntax | Equivalent default syntax |
---|---|
[show text @] | [show text ](@) |
[show text @scopetag ] | [show text ](showtext @scopetag ) |
[show text @scopetag :vsntag ](term #trait ) | [show text ](term #trait @scopetag :vsntag ) |
In the last row of the above table, term
and #trait
are optional. Thus, [definition@]()
is equivalent with the alt
syntax [definition@]
and with the default
syntax [definition](@)
.
For completeness, below is the regex that defines the alt
interpreter for the TRRT.
(?:(?<=[^`\\])|^)\[(?=[^@\]]+@[:a-z0-9_-]*\](?:\([#a-z0-9_-]+\))?)(?<showtext>[^\n\]@]+?)@(?<scopetag>[a-z0-9_-]*)(?::(?<vsntag>[a-z0-9_-]*?))?\](?:\((?:(?:(?<type>[a-z0-9_-]+):)?)(?<id>[a-z0-9_-]*)(?:#(?<trait>[a-z0-9_-]*?))?\))
Processing
The purpose of the TRRT is to allow input texts to contain TermRefs that refer to a semantic unit, and convert them into renderable refs that exhibit more information about such semantic units.
To do that, the TRRT uses the interpreter to locate subsequent TermRefs in its input files, and for each of them, processes the named capturing groups that the interpreter populates. Then, it will attempt to find the MRG entry that documents the semantic unit to which the term ref refers. When found (without ambiguities), it will populate moustache variables as specified in its converter profile, and use the specified converter to produce the text by which the TermRef will be replaced.
Finding the MRG entry associated with a TermRef
- Get the MRG file that is expected to contain the MRG entry, by resolving the terminology identifier that consists of the named capturing groups
scopetag
:vsntag
. Note that ifscopetag
wasn't populated, the default scope is assumed, and ifvsntag
isn't populated, the default version is used. - Locate the MRG entry in this MRG, using the values of the named capturing groups
termtype
andterm
, as follows.- Initialize this step by selecting all MRG entries from the MRG (the idea is to limit the number of selected entries step by step, until there is no more than one).
- Process the named capturing groups, as follows.
- If
termtype
is specified, then remove all entries except those whosetermtype
field equals the specified value oftermtype
. - Remove all entries except those that produce a match according to the following process:
- normalize the text in the named capturing group
term
, or, if that is not specified, the named capturing groupshowtext
, as follows:- convert the text to lowercase;
- remove any leading and/or trailing spaces.
- there is a match with an MRG entry if the result of this conversion is either a form phrase that appears in the
formPhrases
field of that MRG entry, or if the result matches theterm
field of that MRG entry.2
- normalize the text in the named capturing group
- If the remaining set of entries includes more than one element, then keep only the entries whose
termType
field contains the value specified by thedefaulttype
field as specified in the terminology section of the MRG.
- If
- If the remaining set of entries is either empty (not found), or contains multiple entries (ambiguous TermRef), an appropriate exception must be raised (and logged), and conversion of (only!) this TermRef is discontinued.
- If the remaining set of entries contains precisely one element, its fields will be made available as moustache variables for further processing by the converter.
The Porter Stemming Algorithm is a process for removing the commoner morphological and inflexional endings from words in English. Its main use is as part of a term normalisation process that is usually done when setting up Information Retrieval systems. The mentioned site links to lots of freely useable code that the TRRT might want to consider using.
Perhaps the TRRT may use this tool as a means for generating the term
field from the showtext
if necessary. However, we would need to first experiment with that to see whether or not, c.q. to what extent this conversion does what it is expected to do.
TRRT Converter Profile
The converter profile of the TRRT consists of a set of moustache variables that are populated from the following sources.
- The named capturing groups as specified by the interpreter profile of the TRRT. Since only the named capturing groups
showtext
andtrait
are useful for a converter, they are made available as moustache variables{{showtext}}
and{{trait}}
respectively. - The fields in the MRG entry of the semantic unit that the term ref refers to. Each of the fields in that MRG entry is available as a moustache variable.
Note that MRG entries may have fields that are not required by the TEv2 specifications, but by the curator(s) of the terminology to which the such MRG entries belong. For example, the curator(s) of the TEv2 terminologies have specified that MRG entries could have the fields glossaryTerm
and glossaryText
. These fields are then also available as moustache variables as part of the converter profile for the TRRT.
TRRT Predefined Converters
The following tabs specify the predefined converters for the TRRT.
- markdown-link
- html-link
- html-hovertext-link
- html-glossarytext-link
The markdown-link
Converter
The markdown-link
converter is defined by the following handlebars expression.
[{{showtext}}]({{navurl}}{{#if trait}}#{{trait}}{{/if}})
The html-link
Converter
The html-link
converter is defined by the following handlebars expression.
<a href="{{navurl}}{{#if trait}}#{{trait}}{{/if}}">{{showtext}}</a>
The html-hovertext-link
Converter
The html-hovertext-link
converter is defined by the following handlebars expression (newlines and whitespaces have been added for better readability, and should be ignored).
<a href="{{localize navurl}}{{#if trait}}#{{trait}}{{/if}}"
title="{{#if hoverText}}
{{hoverText}}
{{else}}
{{#if glossaryTerm}}
{{glossaryTerm}}
{{else}}
{{capFirst term}}
{{/if}}
: {{noRefs glossaryText type='markdown'}}
{{/if}}"
>{{showtext}}</a>
This converter uses the following functions.
localize
: converts the URL of its argument (i.e.,navurl
) with a (shorter) version by removing the protocol and host parts in case the resource is located on the same site.capFirst
: capitalizes the first character of every word found in its argument.noRefs
: replaces every reference (in this case all markdown links) that it finds in the text of its argument (i.e., in theglossaryText
) with{{capFirst showtext}}
.
The html-glossarytext-link
Converter
The html-glossarytext-link
converter is defined by the following handlebars expression (newlines and whitespaces have been added for better readability, and should be ignored):
<a href="{{localize navurl}}{{#if trait}}#{{trait}}{{/if}}"
title="{{capFirst term}}: {{noRefs glossaryText type='markdown'}}"
>{{showtext}}
</a>
This converter uses the following functions.
localize
: converts the URL of its argument (i.e.,navurl
) with a (shorter) version by removing the protocol and host parts in case the resource is located on the same site.capFirst
: capitalizes the first character of every word found in its argument.noRefs
: replaces every reference (in this case all markdown links) that it finds in the text of its argument (i.e., in theglossaryText
) with{{capFirst showtext}}
.
Errors and Warnings
The TRRT starts by reading its command line and optional configuration file. If the command line has a key that is also found in the configuration file, the command line key-value pair takes precedence. The resulting set of key-value pairs is tested for proper syntax and validity. Every improper syntax and every invalidity found will be logged. Improper syntax may be e.g. an invalid globpattern. Invalid conditions include non-existing directories or files, lack of write-permissions where needed, etc.
The TRRT logs every error- and/or warning condition that it comes across while processing its configuration file, command line parameters, and input files, in a way that helps tool operators and document authors to identify and fix such conditions.
Deploying the Tool
The TRRT comes with documentation that enables developers to ascertain its correct functioning (e.g. by using a test set of files, test scripts that exercise its parameters, etc.), and also enables them to deploy the tool in a git repo and author/modify CI-pipes to use that deployment.
- Multiple converters may be specified by appending a number to the parameter key, e.g.,
converter[1]: <template>
converter[2]: <template>
, wheren
is the termid occurrence count from which to start using a specific converter during resolution of a file. Usingconverter
, without a number, is equal to usingconverter[0]
↩ - Matching with the
term
field enables one to specify form phrases that include a space, yet use aterm
that has replaced the space with a-
.↩