Context Navigation

← Previous Changeset
Next Changeset →

Changeset 174

Timestamp:

Jan 23, 2008, 12:50:49 AM (18 years ago)

Author:

neil.c.c.brown

Message:

Added sections on code organisation and monads

File:

1 edited

docs/trunk/hacking-guide/tock-intro.tex (modified) (1 diff)

Legend:

Unmodified

Added

Removed

docs/trunk/hacking-guide/tock-intro.tex

r173	r174
29	29	\section{Finding the Right Place}
30	30
31	%TODO explain code organisation
31	Tock's modules are currently arranged into four directories. They are:
32
33	\begin{enumerate}
34	\item ``common'' -- All modules that are used by many/most parts of the program.
35	\item ``frontends'' -- All modules relating to lexing, parsing, preprocessing occam and Rain, as well
36	as early steps in compilation like resolving names, checking types, etc.
37	\item ``transformations'' -- All modules relating to transforming the tree either for simplicity,
38	efficiency or simply to remove elements (e.g. parallel assignment) not supported by the backends.
39	\item ``backends'' -- All modules related to the final step of turning AST into actual code.
40	\end{enumerate}
41
42	Tests are in the same directory as the thing they test. The separation is by no means hard-and-fast,
43	or perfect, but it's better than nothing.
44
45	The directories should provide a quick idea of where to find what you are interested in. Data types
46	and functions common to the whole compiler, such as the AST definition and type helper functions
47	are in ``common''. The other parts of the compiler are in the obvious order (frontends, transformations,
48	backends).
49
50	The \lstinline\|Main\| module in the main tock directory is the actual module for the tock executable.
51	It merely deals with the command-line options and strings together the various passes according to
52	the options given.
53
54	If you want to add a new frontend or backend, then add a new command-line option (look in the modules
55	\lstinline\|Main\| and \lstinline\|CompState\|) for it and handle the option accordingly in the \lstinline\|Main\|
56	module. To add a new pass, add it to the appropriate place in the list in the \lstinline\|PassList\|
57	module, or add it to the appropriate pass-item already listed there (e.g. \lstinline\|simplifyTypes\|).
32	58
33	59	\section{Understand the Existing Code}
34	60
35	%TODO give a brief explanation of the monadic code
61	There are (unfortunately, but realistically) several barriers to understanding the existing Tock code:
62
63	\begin{enumerate}
64	\item It's written in Haskell.
65	\item It makes heavy use of monads.
66	\item It uses generics.
67	\item You need to understand the AST well.
68	\item The C and C++ backends are quite dense and tricky.
69	\end{enumerate}
70
71	The last point is somewhat unavoidable, without an inspired re-write. Knowledge of occam will help
72	a lot with understanding the AST, except perhaps for the \lstinline\|Structured\| item (see below).
73	Haskell knowledge can be solved with a book or two (or other web resources); monads are also covered
74	in a section below.
75
76	\subsection{A.Structured}
77
78	%TODO
79
80	\subsection{Monadic Code}
81
82	A monad is a particular sort of type in Haskell. The most common (and in a way, the most complex) Haskell
83	monad is \lstinline\|IO\|. As an example, here is the type of the \lstinline\|readFile\| function:
84
85	\begin{lstlisting}
86	readFile :: FilePath -> IO String
87	\end{lstlisting}
88
89	The return value here is \lstinline\|IO String\|. You should read this as ``an IO action that yields a value
90	of type String''. One mistake that newcomers to monads often make is to immediately try to work out
91	how to get the String value out of the IO monad. This is actually the wrong way of looking at it; you
92	should instead look to lift all your other code into the IO monad.
93
94	Note also that the function is \textit{not} of type: \lstinline\|IO FilePath -> IO String\|. This would be
95	read as taking an IO action that yields a FilePath, and giving back an IO action that yields a String.
96	This type of function is rarely seen; usually all the arguments will be non-monadic, and only the final
97	(return) value will be monadic.
98
99	Unlike typical functional programming, monadic actions must be explicitly ordered. The easiest way to do
100	this is to use a \lstinline\|do\| block.
101
102	We will now look at a real example from the current Tock codebase. This is the \lstinline\|doProcess\| function
103	inside the \lstinline\|removeParAssign\| function in the \lstinline\|SimplifyProcs\| module. Its purpose
104	is to turn parallel assignments into multiple (sequential) single assignments.
105
106	\begin{lstlisting}
107	doProcess :: A.Process -> PassM A.Process
108	doProcess (A.Assign m vs@(_:_:_) (A.ExpressionList _ es))
109	= do ts <- mapM typeOfVariable vs
110	specs <- sequence [makeNonceVariable "assign_temp" m t A.VariableName A.Original \| t <- ts]
111	let temps = [A.Variable m n \| A.Specification _ n _ <- specs]
112	let first = [A.Assign m [v] (A.ExpressionList m [e]) \| (v, e) <- zip temps es]
113	let second = [A.Assign m [v] (A.ExpressionList m [A.ExprVariable m v']) \| (v, v') <- zip vs temps]
114	return $ A.Seq m $ foldl (\s spec -> A.Spec m spec s) (A.Several m (map (A.OnlyP m) (first ++ second))) specs
115	doProcess p = doGeneric p
116	\end{lstlisting}
117
118	The type of this function is to take a \lstinline\|Process\| (the \lstinline \|A.\| is simply because it's an AST
119	fragment; the AST module is always imported as A) and returns a monadic action in the PassM monad that will yield
120	a Process.
121
122	The function has a standard (i.e. unrelated to monads) Haskell pattern-match as its header. The direct value of
123	the function is a \lstinline\|do\| block; a \lstinline\|do\| block is technically of type `monadic action'.
124
125	The indentation rule of the do block is fairly simple; each item in the do block should have the same level as
126	indentation, and finishes when something is found with less indentation (fairly standard Haskell rules).
127
128	The first line (\lstinline\|ts <- mapM typeOfVariable vs\|) is already somewhat complex. Firstly, the type of
129	\lstinline\|typeOfVariable\| is:
130
131	\begin{lstlisting}
132	typeOfVariable :: (CSM m, Die m) => A.Variable -> m A.Type
133	\end{lstlisting}
134
135	CSM and Die are two typeclasses to which PassM belongs. So in our case, \lstinline\|m\| can be \lstinline\|PassM\|.
136	Therefore the effective type for us is \lstinline\|A.Variable -> PassM A.Type\|. \lstinline\|mapM\| is a monadic
137	version of \lstinline\|map\|:
138
139	\begin{lstlisting}
140	mapM :: Monad m => (a -> m b) -> [a] -> m [b]
141	\end{lstlisting}
142
143	It basically takes a monadic function, and applies it to each element of the given list, returning a monadic
144	action that will yield the mapped elements.
145
146	So in our case, \lstinline\|mapM typeOfVariable\| will have type \lstinline\|[A.Variable] -> PassM [A.Type].
147	The argument is then \lstinline\|vs\|. The notation \lstinline\|ts <-\| means that the value yielded
148	by the monadic action is labelled as \lstinline\|ts\|. You may think of it as the monadic version of the
149	\lstinline\|let\| notation in normal Haskell. Note that \lstinline\|ts\| is of type \lstinline\|[A.Type]\|; it
150	is not monadic. The action is actually performed by this statement, and the result is put into \lstinline\|ts\|.
151
152	The second line is again interesting. The list is a standard Haskell list comprehension. The type of
153	\lstinline\|makeNonceVariable\| is:
154
155	\begin{lstlisting}
156	makeNonceVariable :: CSM m => String -> Meta -> A.Type -> A.NameType -> A.AbbrevMode -> m A.Specification
157	\end{lstlisting}
158
159	The list comprehension is therefore of type \lstinline\|[PassM A.Specification]\|; that is, it is a list
160	of monadic actions, each of which will yield a Specification. This is then given to the sequence function
161	which has this type:
162
163	\begin{lstlisting}
164	sequence :: Monad m => [m a] -> m [a]
165	\end{lstlisting}
166
167	In other words, sequence takes a list of monadic actions, performs each of them (in sequence!) and returns
168	the resulting list of elements (inside the monad, of course). So sequence performs all our actions and gives
169	us back a list of Specifications. This list is then labelled as \lstinline\|specs\|.
170
171	The next three lines in our code fragment begin with \lstinline\|let\|. These are all non-monadic lines, and
172	are just plain Haskell. Note that there is a slight difference between the \lstinline\|let\| notation inside
173	a \lstinline\|do\| block to the normal \lstinline\|let\|..\lstinline\|in\| notation; there is no \lstinline\|in\|
174	keyword in the version in the \lstinline\|do\| block. This is a technicality, but one that can trip you up;
175	if you add the \lstinline\|in\| keyword you'll get a not-very-helpful parser error.
176
177	Finally, the last line of our function features \lstinline\|return\|. \lstinline\|return\| is a standard monadic
178	function that is used very frequently. Its type is:
179
180	\begin{lstlisting}
181	return :: Monad m => a -> m a
182	\end{lstlisting}
183
184	All it does is turn a plain (non-monadic) value into a simple monadic action that yields the value. More
185	simply, it lifts the value into the monad. In our function we have a plain value, but we need to lift it
186	inside the monad to satisfy the types. This is very complex in relation to the types, so is perhaps best
187	viewed as an analogue to C or Java's return statement.
188
189	Note that you there is nor really an opposite of the return function (something of type \lstinline\|m a -> a\|).
190	Values can never really be `freed' from the monad; everything around them just gets lifted inside.
191
192	\subsection{The PassM monad}
193
194	It has been mentioned above that the PassM monad is the most common monad. Here is it's actual type:
195
196	\begin{lstlisting}
197	type PassM = ErrorT ErrorReport (StateT CompState IO)
198	\end{lstlisting}
199
200	The PassM monad is a mixture of three monads; an error monad, a state monad, and the IO monad. The error
201	monad allows for exception-like mechanisms. In Tock, we throw an error whenever we can proceed no further
202	with the compilation. Examples include parser errors, type errors and parallel safety problems. The
203	state in question is the CompState type (see the module of the same name), which holds things like the
204	name-type dictionary (aka symbol table). The IO monad is included for various small reasons, such as
205	being able to read in files.
206
207	I have mentioned numerous times that you should put things inside the monad, not look to sneak them out.
208	The way it really works is that you combine all your actions inside the PassM monad into one giant
209	PassM monadic action. Then this action is run. For a simple example we will look at the testPass
210	function:
211
212	\begin{lstlisting}
213	runPass ::
214	PassM b -- ^ The actual pass.
215	-> CompState -- ^ The state to use to run the pass.
216	-> IO (CompState, Either ErrorReport b) -- ^ The resultant state, and either an error or the successful outcome of the pass.
217	runPass actualPass startState = (liftM (\(x,y) -> (y,x))) (runStateT (runErrorT actualPass) startState)
218	\end{lstlisting}
219
220	The first part of the function is just to reverse the order of the pair (this is real code). Perhaps
221	the most instructive part is the type. Given a PassM action and a starting state, the function will give
222	you back (still inside the IO monad, admittedly) the resulting state, and either an error (if there
223	was an error during the action) or the final result of the monadic action.
224
225	This is where the monad is unwrapped; the monadic action finally yields a plain value. But in Tock,
226	the plain value is only used in the \lstinline\|Main\| module. The whole of the rest of the program,
227	obviously or not, runs inside the PassM monad.
228
229	\subsection{Generics}
230
231	%TODO
36	232
37	233	\section{Add Your Own Code}

Note: See TracChangeset for help on using the changeset viewer.

Context Navigation

Changeset 174

Legend:

docs/trunk/hacking-guide/tock-intro.tex

Download in other formats: