Neural Network-based Language Model (NLM)
From GM-RKB
(Redirected from Neural Language Model)
Jump to navigation
Jump to search
A Neural Network-based Language Model (NLM) is a language model that is neural text-to-text sequence model.
- Context:
- It can be produced by a Neural Language Modeling System (that can solve a neural LM training task).
- It can range from (typically) being a Pretrained Neural Language Model (LM) to being an Untretrained Neural Language Model (LM).
- It can range from being a Character-Level Neural Network-based LM to being a Word/Token-Level Neural Network-based LM.
- It can range from being a Forward Neural Network-based Language Model to being a Backward Neural Network-based Language Model to being a Bi-Directional Neural Network-based Language Model.
- It can range from (typically) being a Deep NNet-based LM (such as a large NLM) to being a Shallow NNet-based LM.
- It can range from being a Uni-Lingual NLM to being a Multi-Lingual NLM.
- ...
- Example(s):
- a Bigram Neural Language Model (previous word is used to predict the current word).
- an RNN-based Language Model such as:
- an Transformer-based Language Model such as: GPT-2, BERT-based model, ELMo, and Turing-NLG.
- an Universal Language Model Fine-tuning for Text Classification (ULMFiT).
- a 3rd-Party NLM, such as: OpenAI NLM, Google NLM, Microsoft NLM, ...
- ...
- Counter-Example(s):
- a Text-Substring Probability Function,
- an N-Gram Language Model,
- an Exponential Language Model,
- a Cache Language Model (Jelinek et al., 1991),
- a Bag-Of-Concepts Model (Cambria & Hussain, 2012),
- a Positional Language Model (Lv & Zhai, 2009),
- a Structured Language Model (Chelba and Jelinek, 2000),
- a Random Forest Language Model (Xu, 2005),
- a Bayesian Language Model (Teh, 2006),
- a Class-based Language Model (Brown et al., 1992),
- a Maximum Likelihood-based Language Model (Goldberg, 2015),
- a Query Likelihood Model,
- a Factored Language Model.
- See: Language Modeling Task, Language Modeling System, Natural Language Representation Dataset, Language Modeling Benchmark, Artificial Neural Network, Natural Language Processing Task, Natural Language Understanding Task, Natural Language Inference Task.
References
2017
- (Daniluk et al., 2017) ⇒ Michał Daniluk, Tim Rocktaschel, Johannes Welbl, and Sebastian Riedel. (2017). "Frustratingly Short Attention Spans in Neural Language Modeling.” In: Proceedings of ICLR 2017.
- QUOTE: Neural language models predict the next token using a latent representation of the immediate token history. Recently, various methods for augmenting neural language models with an attention mechanism over a differentiable memory have been proposed. For predicting the next token, these models query information from a memory of the recent history which can facilitate learning mid - and long-range dependencies. However, conventional attention mechanisms used in memory-augmented neural language models produce a single output vector per time step.
2015
- (Karpathy, 2015) ⇒ Andrej Karpathy. (2015). "The Unreasonable Effectiveness of Recurrent Neural Networks.” In: Proceedings of Blog post 2015年05月21日.
- QUOTE: ... By the way, together with this post I am also releasing code on Github that allows you to train character-level language models based on multi-layer LSTMs. You give it a large chunk of text and it will learn to generate text like it one character at a time. ...
2003
- (Bengio et al., 2003a) ⇒ Yoshua Bengio, R. Ducharme, Vincent, P., and C. Jauvin. (2003). "A Neural Probabilistic Language Model." In: Journal of Machine Learning Research, 3(6).
- QUOTE: A goal of statistical language modeling is to learn the joint probability function of sequences of words in a language. This is intrinsically difficult because of the curse of dimensionality : a word sequence on which the model will be tested is likely to be different from all the word sequences seen during training. ... We report on experiments using neural networks for the probability function, showing on two text corpora that the proposed approach significantly improves on state-of-the-art n-gram models, and that the proposed approach allows to take advantage of longer contexts.