5. Unstructured Text Analytics, Dashboards, and Widgets

The use of Unstructured Text Analytics has two overarching goals:

  • fact extraction

  • classification

Fact extraction uses a Rules-Based System who’s foundation are the dictionaries discussed in the dictionary training section. In this section we’ll discuss developing widgets and specialized dashboards that are based on dictionary matches. NoviLens also has the capability of generating columns based on dictionary matches. This allows the user to generate new “features” for machine learning/classification.

To review the dictionary function, a dictionary is a group of terms that establish a “type”. For instance, in the following example using COVID-19, Resp” are a set of terms the describe respiratory symptoms while “pathology” are pathology-related terms.

The widgets for dictionaries can display either a single dictionary value or a dictionary pair. The value of a dictionary pair is it’s specificity.

Dictionary Pair

Two Widgets: Dictionary Pair and Single Dictionary

5.1. The Sentence Table

When NoviLens produces a dictionary match, a new table is constructed in the background. This table is automatically linked to the primary table in the dashboard. It is visible when configuring new widgets. View the following video to see how to use this feature.

5.2. Using REGEX with Terms to Find Facts

Searching through documents to find sentences with specific facts can be accomplished with NoviLens UTA by using the Advanced dictionary Function combined with the dictionary table widget. For example, searching a listing of Security and Exchange Documents for money. In this case we can create the REGEX for money and name it as a dictionary called “Money” REGEX:$ d+. This can be combined with another Dictionary called “money_units” containing billion, million, thousand, hundred. The corpus will be searched for both dictionaries and a widget developed for dictionary pairs.

The following shows how the addition of the second dictionary of the money units allows for the selection of billions from the general dollar values in the Bar Widget on the right.

regex pairing

Using REGEX with Dictionary to Extract Facts

In creating a Dashboard for Fact Extraction, combining a Dictionary Table with a Dictionary Table Multiple Select Widget provides a rapid method of isolating facts of interest. This is illustrated in the following video:

5.3. Dashboards Generated from Dictionary Matches

The Dictionary Match Dashboard is a specialty Dashboard that groups data between tables based on dictionary matches in text fields. Your widget options are restricted to dictionary-based selections. You can generate subset widgets by selecting the associated tables related to the primary dictionary table. Again, these widgets will be related to the dictionaries.

You cannot use the Dictionary Match Table to link quantitative data. For that function, NoviLens uses the dictionary match topic::add a new column to the base table. This action is performed by generating a new field in the selected data table.

5.4. Generating a Dictionary-Based Data Field