2021 is a banner year for AI Natural Language Processing
Google acquired AI company has tried to master natural language processing that machines use to understand human language
Universities have been facing challenges with evaluating their students remotely. The practical difficulties in organising remote examinations have resulted in some interesting alternatives such as open book exams and research-based essay writing. The quality of submissions will likely improve if students look up books for references.
The same applies to Artificial Intelligence. If an AI system can look up a memory bank for reference instead of memorising our languages, its output could be better. This is what DeepMind claims to have achieved with RETRO. Having gained credibility in teaching AI to self-learn games like Go, and predict complex protein structures, Google acquired AI company DeepMind has tried to master natural language processing that machines use to understand human language. Pre-trained language models produce text by predicting the words and sentences that should come next in their response.
DeepMind’s RETRO is a model whose performance is enhanced by an external resource – a massive text corpora of some 2 trillion words. To put things in perspective, this would take 175 people to do a lifetime of continuous reading.
When the model generates text, it looks up this external resource to make its prediction more accurate. Researchers claim that such a model design makes it easier to understand how AI infers and picks up bias. It costs less to train, making it more accessible for organizations.
Natural Language Processing has been an uphill task for machines, given our complex languages. Billions of research dollars have been poured into language models.
Last year, OpenAI’s language model GPT-3 showed the possibility of a computer responding in complex and meaningfully sentences. Language models have tended to falter beyond a narrow scope. GPT-3 demonstrated that if a language model was scaled enough, it could be versatile. The larger the number of parameters or internal configuration values, the higher the accuracy of the model. Other large organisations have developed their own large models. There has been a gold rush of sorts, to create bigger and better generalist language models, loading billions of parameters.
In late 2021, NVIDIA and Microsoft developed the Megatron-Turing NLG 530B model that is trained on the entire Wikipedia in English, 63 million English news articles, 38GB of Reddit discussions, GitHub, books from Project Gutenberg and so forth. With a whopping 530 billion parameters, the model is fully trained to perform inference. Notably, it is a 3x improvement on GPT-3’s 175 billion parameters while leaving other large language models far behind. Google and Beijing Academy of Artificial Intelligence built models exceeding a trillion parameters.
While large language models are trending, they are becoming too cumbersome for training, as well as serving downstream AI applications such as digital assistants. They are also guzzlers of data, compute power and energy. Researchers are always strapped by an inadequate number of examples for training. Imagine trying to train such a model on identifying banking fraud. It would take forever to label conversations that imply a new type of fraud.
AI research teams are trying to get around this problem. DeepMind’s RETRO is one such attempt with merely 7 billion parameters. Another approach is few-shot learning as successfully demonstrated by GPT-3, which uses a very small set of labeled examples to train the model. This was transformational because GPT-3 could be trained with as few as 16 examples.
Few-shot learning is being explored by Meta (formerly Facebook) for content moderation on its social media platforms, which requires rapid policy enforcement. As harmful content keeps evolving, Meta struggles to find sufficient labelled content. It has deployed a new AI model that is first trained on generic massive text corpora freely available. It is then trained on policy data that has been labeled previously. Lastly, it is trained on concise text on a new policy.
It uses a new framework called Entailment as Few-Shot Learner. Entailment involves establishing a logical consequence between sentences. For instance, if ‘the car is speeding’, then ‘the car is on the move’ must also be true. Simply put, the AI tool can recognise a hate message because it understands the policy that the content is violating. The tool was used to quickly detect and remove hate speeches and vaccine doubting posts.
2021 has been a banner year for natural language processing, as organisations compete for bigger and better models. The written word has never been more critical than now in making AI useful for society. Will there be a finish line for this race for natural language model supremacy? Not anytime soon. But AI will certainly become more intelligent, and a better assistant and more accessible to average humans. Each of these efforts will be steppingstones to move the needle, even if just a little. Eventually we can expect to use a more discrete intelligent machine that would go unnoticed till it failed, much like electricity.
Shalini Verma is CEO of PIVOT technologies, a Dubai-based cognitive innovation company. She tweets @shaliniverma1.