Next word prediction

2023-08-06

Description of the app

The purpose of the app is to suggest the next word, given an input text. Up to three word suggestions are provided to the user in a list. The most likely suggestion is presented at the top of the list.

Link to the app:

https://emcarlss.shinyapps.io/Text_completer/.

Description of the algorithm

Ngrams (n=1 to 4) based on an English language corpus of news, blogs and Twitter text were created. Stopwords were kept, but very infrequent ngrams were removed for performance purposes. Unigrams are included since the app has an auto-complete functionality (given just a few characters the app also suggests completions).

A variant of the Stupid Backoff method (Brants et. al 2007) was implemented. If the model does not find a matching quadgram, it will “back off” to a tri-, bi- or unigram model.

Furthermore, words in the first part of a bigram, trigram and quadgram that were not among the ~6000 most frequent unigrams were masked with “<UNK>”. The masking was also done with the user’s input string. Thus, if the user inputs “I saw the Titanic” it might be matched to “saw the Gladiator movie” (suggestion =“movie”), provided that both Titanic and Gladiator were masked.

Instructions

Go to https://emcarlss.shinyapps.io/Text_completer/

Wait for the app to load.
Start type a sentence in English.
Finish sentence with a SPACE to trigger the algorithm and get a suggestion of the next word.
Up to three suggestions may appear.
You may select a suggestion by using the up and down arrow followed by Enter (alternatively you can click with your mouse or trackpad).
The app also has an auto-complete function if only entering a couple of characters.

How it functions

The app was built by integrating a javascript Awesomplete widget into R Shiny. If the user inputs more than one sentence, only the latest sentence is provided as input to the algorithm. The app employs speedy packages:

Reading data: R arrow-package (building on the Arrow C++ library)
Dataset operations: data.table
NLP-tasks: quanteda

Github repository: https://github.com/emcarlss/text_completer

The app was inspired by another project: https://github.com/maherharb/Autocomplete
(N.B no code used from that project)