METALEARNING MACHINES AND
RECURSIVE SELF-IMPROVEMENT
Overview article:
T. Schaul and J. Schmidhuber. Metalearning. Scholarpedia, 5(6):4650, 2010.
OOPS paper: J. Schmidhuber. Optimal Ordered Problem Solver. Machine Learning, 54, 211-254, 2004. PDF. HTML. HTML overview.
More on meta-learning Gödel machines and OOPS at Scholarpedia in the article on Universal Search (2007)
Here are overview slides on 4 methods for recursive self-improvement since 1987 (for ICML 2015)
Metalearning (or Meta-Learning) means learning the credit assignment method itself through self-modifying code. Metalearning may be the most ambitious but also the most rewarding goal of machine learning. There are few limits to what a good metalearner will learn. Where appropriate it will learn to learn by analogy, by chunking, by planning, by subgoal generation, by combinations thereof - you name it.
Schmidhuber's recent Gödel machine is the first fully self-referential optimal universal metalearner, typically using the somewhat `less universal' Optimal Ordered Problem Solver for finding provably optimal self-improvements.
Full publication list
(with additional HTML and pdf links)
Reinforcement Learning Economies
Hierarchical Learning & Subgoal Generation
CoTeSys group of Schmidhuber
14. J. Schmidhuber. A general method for incremental self-improvement and multiagent learning. In X. Yao, editor, Evolutionary Computation: Theory and Applications. Chapter 3, pp.81-123, Scientific Publ. Co., Singapore, 1999 (submitted 1996).
13. J. Schmidhuber, J. Zhao, N. Schraudolph. Reinforcement learning with self-modifying policies. In S. Thrun and L. Pratt, eds., Learning to learn, Kluwer, pages 293-309, 1997. Postscript; PDF; HTML.
12. J. Schmidhuber, J. Zhao, and M. Wiering. Shifting inductive bias with success-story algorithm, adaptive Levin search, and incremental self-improvement. Machine Learning 28:105-130, 1997. PDF . Flawed HTML.
11. J. Zhao and J. Schmidhuber. Solving a complex prisoner's dilemma with self-modifying policies. In From Animals to Animats 5: Proceedings of the Fifth International Conference on Simulation of Adaptive Behavior, 1998, in press.
10. J. Schmidhuber and J. Zhao and M. Wiering. Simple principles of metalearning. Technical Report IDSIA-69-96, IDSIA, June 1996.
9. M. Wiering and J. Schmidhuber. Solving POMDPs using Levin search and EIRA. In L. Saitta, ed., Machine Learning: Proceedings of the 13th International Conference (ICML 1996), pages 534-542, Morgan Kaufmann Publishers, San Francisco, CA, 1996. PDF . HTML.
8. J. Schmidhuber. Environment-independent reinforcement acceleration (invited talk at Hongkong University of Science and Technology). Technical Note IDSIA-59-95, June 1995.
7. J. Schmidhuber. Beyond "Genetic Programming": Incremental Self-Improvement. In J. Rosca, ed., Proc. Workshop on Genetic Programming at ML95, pages 42-49. National Resource Lab for the study of Brain and Behavior, 1995.
6. J. Schmidhuber. On learning how to learn learning strategies. Technical Report FKI-198-94, Fakultät für Informatik, Technische Universität München, November 1994.
5. J. Schmidhuber. A neural network that embeds its own meta-levels. In Proc. of the International Conference on Neural Networks '93, San Francisco. IEEE, 1993.
4. J. Schmidhuber. An introspective network that can learn to run its own weight change algorithm. In Proc. of the Intl. Conf. on Artificial Neural Networks, Brighton, pages 191-195. IEE, 1993.
3. J. Schmidhuber. A self-referential weight matrix. In Proceedings of the International Conference on Artificial Neural Networks, Amsterdam, pages 446-451. Springer, 1993. PDF . HTML.
2. J. Schmidhuber. Steps towards `self-referential' learning. Technical Report CU-CS-627-92, Dept. of Comp. Sci., University of Colorado at Boulder, November 1992.
1. J. Schmidhuber. Evolutionary principles in self-referential learning, or on learning how to learn: The meta-meta-... hook. Diploma thesis, Institut für Informatik, Technische Universität München, 1987. Searchable PDF scan (created by OCRmypdf which uses LSTM). HTML. Here Genetic Programming (GP) is applied to itself, to recursively evolve better GP methods.