LEARNING TO LEARN, RECURSIVE SELF-IMPROVEMENT, METALEARNING, META-LEARNING - Self-referential learning machines

Scroll down for pre-2000 papers. See NIPS 2016 slides or similar NIPS 2018 WS slides.

LEARNING TO LEARN
METALEARNING MACHINES AND
RECURSIVE SELF-IMPROVEMENT

Gödel machine paper: J. Schmidhuber. Ultimate Cognition à la Gödel. Cognitive Computation 1(2):177-193, 2009. PDF. (Springer.) More papers on Gödel machines.

Overview article:
T. Schaul and J. Schmidhuber. Metalearning. Scholarpedia, 5(6):4650, 2010.

OOPS paper: J. Schmidhuber. Optimal Ordered Problem Solver. Machine Learning, 54, 211-254, 2004. PDF. HTML. HTML overview.

More on meta-learning Gödel machines and OOPS at Scholarpedia in the article on Universal Search (2007)

Here are overview slides on 4 methods for recursive self-improvement since 1987 (for ICML 2015)

Most machine learning researchers focus on domain-specific learning algorithms. Can we also construct general purpose learning algorithms, in particular, meta-learning algorithms that can learn better learning algorithms? This question has been a main drive of Schmidhuber's research since his diploma thesis on metalearning in 1987 [1], where he applied Genetic Programming (GP) to itself, to recursively evolve better GP methods.

Metalearning (or Meta-Learning) means learning the credit assignment method itself through self-modifying code. Metalearning may be the most ambitious but also the most rewarding goal of machine learning. There are few limits to what a good metalearner will learn. Where appropriate it will learn to learn by analogy, by chunking, by planning, by subgoal generation, by combinations thereof - you name it.

Schmidhuber's recent Gödel machine is the first fully self-referential optimal universal metalearner, typically using the somewhat `less universal' Optimal Ordered Problem Solver for finding provably optimal self-improvements.

Some of Schmidhuber's earlier approaches to metalearning and general purpose learning machines are based on self-modifying policies (SMPs) and incremental self-improvement based on SMPs. The learning algorithm of an SMP is part of the SMP itself - SMPs can modify the way they modify themselves. Schmidhuber introduced several ways of forcing SMPs to come up with better and better self-modification algorithms: (a) The "success-story algorithm" [6-14] (click here for talk slides), (b) Gradient calculation in recurrent nets [2-5], (c) Market models of the mind inspired by Holland's bucket brigade [1], (d) Genetic Programming or GP on recursive meta-levels [1].

Optimal Ordered Problem Solver
Learning Robots
SSA learns a complex task involving two agents and two keys
Goedel machine
Goedel machine summary
2011: First Superhuman Visual Pattern Recognition

Related links:

Full publication list
(with additional HTML and pdf links)

Gödel machine

Universal Learning Algorithms

Reinforcement Learning

Reinforcement Learning Economies

Active Exploration

Learning Robots

Hierarchical Learning & Subgoal Generation

Self-Modeling Robots

TU Munich Cogbotlab

CoTeSys group of Schmidhuber

Deutsch

PRE-2000 WORK ON META-LEARNING AND SELF-MODIFYING CODE

14. J. Schmidhuber. A general method for incremental self-improvement and multiagent learning. In X. Yao, editor, Evolutionary Computation: Theory and Applications. Chapter 3, pp.81-123, Scientific Publ. Co., Singapore, 1999 (submitted 1996).

13. J. Schmidhuber, J. Zhao, N. Schraudolph. Reinforcement learning with self-modifying policies. In S. Thrun and L. Pratt, eds., Learning to learn, Kluwer, pages 293-309, 1997. Postscript; PDF; HTML.

12. J. Schmidhuber, J. Zhao, and M. Wiering. Shifting inductive bias with success-story algorithm, adaptive Levin search, and incremental self-improvement. Machine Learning 28:105-130, 1997. PDF . Flawed HTML.

11. J. Zhao and J. Schmidhuber. Solving a complex prisoner's dilemma with self-modifying policies. In From Animals to Animats 5: Proceedings of the Fifth International Conference on Simulation of Adaptive Behavior, 1998, in press.

10. J. Schmidhuber and J. Zhao and M. Wiering. Simple principles of metalearning. Technical Report IDSIA-69-96, IDSIA, June 1996.

9. M. Wiering and J. Schmidhuber. Solving POMDPs using Levin search and EIRA. In L. Saitta, ed., Machine Learning: Proceedings of the 13th International Conference (ICML 1996), pages 534-542, Morgan Kaufmann Publishers, San Francisco, CA, 1996. PDF . HTML.

8. J. Schmidhuber. Environment-independent reinforcement acceleration (invited talk at Hongkong University of Science and Technology). Technical Note IDSIA-59-95, June 1995.

7. J. Schmidhuber. Beyond "Genetic Programming": Incremental Self-Improvement. In J. Rosca, ed., Proc. Workshop on Genetic Programming at ML95, pages 42-49. National Resource Lab for the study of Brain and Behavior, 1995.

6. J. Schmidhuber. On learning how to learn learning strategies. Technical Report FKI-198-94, Fakultät für Informatik, Technische Universität München, November 1994.

5. J. Schmidhuber. A neural network that embeds its own meta-levels. In Proc. of the International Conference on Neural Networks '93, San Francisco. IEEE, 1993.

4. J. Schmidhuber. An introspective network that can learn to run its own weight change algorithm. In Proc. of the Intl. Conf. on Artificial Neural Networks, Brighton, pages 191-195. IEE, 1993.

3. J. Schmidhuber. A self-referential weight matrix. In Proceedings of the International Conference on Artificial Neural Networks, Amsterdam, pages 446-451. Springer, 1993. PDF . HTML.

2. J. Schmidhuber. Steps towards `self-referential' learning. Technical Report CU-CS-627-92, Dept. of Comp. Sci., University of Colorado at Boulder, November 1992.

1. J. Schmidhuber. Evolutionary principles in self-referential learning, or on learning how to learn: The meta-meta-... hook. Diploma thesis, Institut für Informatik, Technische Universität München, 1987. Searchable PDF scan (created by OCRmypdf which uses LSTM). HTML. Here Genetic Programming (GP) is applied to itself, to recursively evolve better GP methods.

In 1992 Schmidhuber suggested that recurrent neural networks (RNNs) can be used to metalearn learning algorithms. A gradient-based metalearning algorithm was derived (see refs 2-5 above). However, it did not work very well in practice because standard RNNs were used to implement it, instead of the better and more recent LSTM RNNs. But his former PhD student Sepp Hochreiter has continued along these lines and achieved an astonishing result, using LSTM nets instead of traditional RNNs (ICANN 2001): LSTM networks with roughly 5000 weights are trained to metalearn fast online learning algorithms for nontrivial classes of functions, such as all quadratic functions of two variables. LSTM is necessary because metalearning typically involves huge time lags between important events, and standard RNNs cannot deal with these. After a month of metalearning on a PC all weights are frozen, then the frozen net is used as follows: some new function f is selected, then a sequence of random training exemplars of the form ...data/target/data/target/data... is fed into the input units, one sequence element at a time. After about 30 exemplars the frozen recurrent net correctly predicts target inputs before it sees them. No weight changes! How is this possible? After metalearning the frozen net implements a sequential learning algorithm which apparently computes something like error signals from data inputs and target inputs and translates them into changes of internal estimates of f. Parameters of f, errors, temporary variables, counters, computations of f and of parameter updates are all somehow represented in form of circulating activations. Remarkably, the new - and quite opaque - online learning algorithm running on the frozen network is much faster than standard backprop with optimal learning rate. This indicates that one can use gradient descent to metalearn learning algorithms that outperform gradient descent. Furthermore, the metalearning procedure automatically avoids overfitting in a principled way, since it punishes overfitting online learners just like it punishes slow ones, simply because overfitters and slow learners cause more cumulative errors during metalearning.

LEARNING TO LEARN METALEARNING MACHINES AND RECURSIVE SELF-IMPROVEMENT

LEARNING TO LEARN
METALEARNING MACHINES AND
RECURSIVE SELF-IMPROVEMENT