Laurent Orseau
See the README for introductory examples for the LTS+CM algorithm on several domains such as Rubik’s cube and Sokooban.
In Levin Tree Search with Context Models (LTS+CM), context models are crucial for guiding the search. Contexts are pieces of information extracted from the environment or state, which are then used to predict which actions to take (the policy). Contexts are grouped into mutually exclusive sets called mutex sets. The number of mutex sets is assumed fixed during the search, but the number of contexts per mutex sets does not need to be known.
For efficiency though, each context is encoded into a Racket fixnum — let’s call this the context’s fixnum. The learnable parameters β of the context models are stored in a Context DataBase (CDB), as a matrix where each row is a context fixnum and each column is an action. Each mutex set is associated with a hash table where the key is the context fixnum, and the value is the row in the βmatrix.
Isn’t encoding into a fixnum restrictive? Not really. Having too many contexts per mutex sets can be detrimental to learning. For example, suppose that we have N datapoints and that each mutex sets has C contexts. Due to the mutual exclusion nature of mutex sets, each datapoint is associated with exactly one context per mutex set. Hence, each context may receive about N/C datapoints. As a crude rule of thumb, each context may need a few dozens of datapoints for their associated β parameters to learn good values. In practice, context occurrences within a mutex set are more likely to follow a Zipf distribution, which changes the argument a little, but the main idea still stands. Thus, if C is large, N must be large too.
Example: Relative Tiling with board-relative-tiling/collect
Tiling is a common strategy for generating contexts, especially in grid-based environments. The lts-cm/byte-board module provides dedicated functions for this. The following example demonstrates how to use board-relative-tiling/collect to extract contexts from a small board.
;Let's convert the string into a board of integers:[(#\X)0][(#\space)1][(#\P)2][(#\G)3]
;Let's find the player P=2:#:collect!displayln#:rowp-row#:colp-col#:max-valuemax-val#:pad-valuepad-val#:row-dist1#:col-dist1#:row-span2#:col-span2)6
9
101
151
Tile 1Tile 2Tile 3Tile 4
┌──┐┌──┐┌──┐┌──┐
│XX││XX││ P││P │
│ P││P ││││ G│
└──┘└──┘└──┘└──┘
fixnum of tile 1 = (((wall × size) + wall) × size + empty) × size + player
= (((0 × 4) + 0) × 4 + 1) × 4 + 2
= 6
;Tile 1:6
;Tile 2:9
Note that encodings are local to each mutex set, that is, even if two active contexts of different tiles have the same fixnums, they will map to different rows in the βmatrix.
#:collect!collect!#:rowp-row#:colp-col#:max-valuemax-val#:pad-valuepad-val#:row-dist1#:col-dist1#:row-span2#:col-span2);Obtain the collected fixnums:> (collect!)'(6 9 101 151)
procedure
( make-list-collector )→(->procedure? )
> (collect!'a)> (collect!1)> (collect!'(xyz));Obtain the collected elements:> (collect!)'(a 1 (x y z))
procedure
vec:fxvector?
> (collect!1)> (collect!2)> (collect!3);Obtain the collected elements:> vec(fxvector 1 2 3)
procedure
> (collect!1)> (collect!2)> (collect!3);Obtain the collected elements:> (collect!)(fxvector 1 2 3)
This module provides utilities for encoding lists of natural numbers into a single fixnum, and vice-versa. This is essential for creating compact representations that can be used as keys in hash tables or for other efficient processing. Technically, the encoding scheme is akin to representing a number in a mixed radix system, where each position can have a different base (size).
procedure
( naturals->fixnum intssizes[n])→fixnum?
ints:(listofnatural?)
Each integer i from ints must be less than its corresponding s in sizes (i.e., 0<= i< s). The function folds from left to right, effectively computing (((n * size_0 + int_0) * size_1 + int_1) * ...).
143145
143145
procedure
sizes[ #:remainderremainder])n-orig:fixnum?remainder:(or/c#t#f'check-0'cons)='check-0
'check-0 (default): Raises an error if the remainder is not zero, ensuring the fixnum is fully decoded by the given sizes.
#f: The remainder is discarded.
#t: Returns two values: the list of decoded naturals and the remainder.
'cons: The remainder is cons ’ed onto the beginning of the resulting list of naturals.
'(0 2 1 2 2 0 1 0)
'(0 2 1)
5589
'(3)
'(5 3)
'(1 0 1 1 1)
syntax
( naturals->fixnum* [nfixnum?0][[valnatint?][sizeposint?]]...+)
316
;Starting with a base value:316
This module provides utilities for working with 2D boards represented by byte strings A board is a structure holding a flat byte string along with its dimensions.
procedure
( list->board lstn-cols)→board?
n-cols:exact-positive-integer?
procedure
( board->string aboard)→string?
aboard:board?
┌─┬─┬─┐
│0│1│2│
├─┼─┼─┤
│3│4│5│
├─┼─┼─┤
│6│7│8│
└─┴─┴─┘
procedure
( board->list aboard)→(listofbyte? )
aboard:board?
procedure
( board-in-bounds? brdrowcol)→boolean?
brd:board?row:exact-integer?col:exact-integer?
procedure
( board-set! aboardrowcolval)→void?
aboard:board?row:exact-integer?col:exact-integer?val:byte?
syntax
( board-index aboardrowcol)
procedure
( board-copy brd)→board?
brd:board?
procedure
( board->bytes aboard)→bytes?
aboard:board?
procedure
#:collect!collect!#:rowrow0#:colcol0[ #:max-valuemax-value#:pad-valuepad-value#:row-distrow-dist#:col-distcol-dist#:row-spanrow-spanbrd:board?row0:exact-integer?col0:exact-integer?
The number of tiles (mutex sets) generated by such a tiling is (row-dist × 2 + 1 - row-span) × (col-dist × 2 + 1 - col-span).
For each tile, the cells of the tile are encoded into a single fixnum using naturals->fixnum with size = (+ max-value1). If a cell of a tile is outside the boundaries of the board, the pad-value is used in place of the cell’s value. The resulting code is passed to collect!.
This module implements the Δ-Secant line search algorithm for the paper “Line Search for Convex Minimization”.
The function convex-line-search returns the lowest point found of a given convex function between two initial points when a stopping criterion is satisfied.
The function quasi-exact-line-search build upon convex-line-search to ensure sufficient progress is made, and is intended to be used within an optimization algorithm such as gradient descent or Frank-Wolfe.
procedure
xleftxright[ #:yleftyleft#:xqxq#:yqyq#:y-tolerancereal?#:stop-whenstop-when#:callbackcallback]) → dict?f:(->real?real?)xleft:real?xright:real?yleft:real?=(fxleft)yq:real?=(fxq)real?:y-tolerance=1e-10
'iter: Number of iterations performed.
'xlow and 'ylow: lowest point found — usually these are the quantities of interest.
'xgap and 'ygap: upper bounds on |xlow - x*| and |ylow - x*|.
'x- and 'x+: x-interval containing x*.
'ya and 'yb: The minimum of these two values is a lower bound on y*.
'pts: The 5 points around x*. See paper.
The arguments yleft and yq MUST be equal to (fxleft) and (fxq).
The argument xq is the first point within [xleft, xright] to be queried.
The argument stop-when controls when the algorithm should terminate. By default, it terminates when the y-distance to the minimum ('ygap) is provably less than y-tolerance.
The argument callback can be used to monitor the progress of the line search.
(list
'(iter . 23)
(list
'pts
(pt 0.9999892887337545 1.1473122458246348e-10)
(pt 0.9999966814872121 1.1012527123440112e-11)
(pt 1.0000016825306512 2.8309093923890705e-12)
(pt 1.0000135688568499 1.8411387621275733e-10)
(pt 1.000055351641258 3.063804189957421e-9))
'(x- . 0.9999972646480525)
'(x+ . 1.0000109385368174)
'(xgap . 1.3673888764942355e-5)
'(ya . -2.9452989812498634e-11)
'(yb . -1.1960640832144743e-11)
'(ygap . 3.2283899204887706e-11)
'(xlow . 1.0000016825306512)
'(ylow . 2.8309093923890705e-12))
'((iter . 7) (xlow . 1.0095288532116586) (ylow . 9.079904352933715e-5))
'((iter . 15)
(xgap . 1.5371377058688084e-11)
(ygap . 9.722889160457271e-12)
(xlow . -2.796517824676535e-12)
(ylow . 1.0000000000055929))
procedure
[ xleftxright#:yleftyleft#:xqxq#:yqyq#:jac^2jac^2#:cc#:callbackcallback]) → dict?jac^2:(or/c#fpositive-real?)=#fc:positive-real?=1.0
Moreover, by contrast to convex-line-search, if the minimum is found to be at xright, the range [xleft, xright] is quadrupled to the right and the line search continues, and so on. This means that for example the call (quasi-exact-line-search/ 12) loops forever. To prevent this quadrupling behaviour, one can force the function f to be increasing at xright, for eaxmple with (λ (x)(if (< x2)(/ x)+inf.0)) instead of / .
The argument jac^2, if provided, should be the squared 2-norm of the jacobian (aka the gradient or derivative) at xleft. This information may be used to speed up the search.
See convex-line-search for the description of the returned dictionary, and of the other arguments.
(keep-keys'(iterxlowylow)))'(((iter . 2) (xlow . 1.47265625) (ylow . 0.2234039306640625))
((iter . 4) (xlow . 0.7788681457319648) (ylow . 0.04889929697201954))
((iter . 6) (xlow . 1.0095288532116586) (ylow . 9.079904352933715e-5)))