- Timestamp:
- Jan 23, 2008, 12:50:49 AM (18 years ago)
- Author:
- neil.c.c.brown
- Message:
-
Added sections on code organisation and monads
- File:
-
- 1 edited
- docs/trunk/hacking-guide/tock-intro.tex (modified) (1 diff)
Legend:
- Unmodified
- Added
- Removed
-
docs/trunk/hacking-guide/tock-intro.tex
r173 r174 29 29 \section{Finding the Right Place} 30 30 31 %TODO explain code organisation 31 Tock's modules are currently arranged into four directories. They are: 32 33 \begin{enumerate} 34 \item ``common'' -- All modules that are used by many/most parts of the program. 35 \item ``frontends'' -- All modules relating to lexing, parsing, preprocessing occam and Rain, as well 36 as early steps in compilation like resolving names, checking types, etc. 37 \item ``transformations'' -- All modules relating to transforming the tree either for simplicity, 38 efficiency or simply to remove elements (e.g. parallel assignment) not supported by the backends. 39 \item ``backends'' -- All modules related to the final step of turning AST into actual code. 40 \end{enumerate} 41 42 Tests are in the same directory as the thing they test. The separation is by no means hard-and-fast, 43 or perfect, but it's better than nothing. 44 45 The directories should provide a quick idea of where to find what you are interested in. Data types 46 and functions common to the whole compiler, such as the AST definition and type helper functions 47 are in ``common''. The other parts of the compiler are in the obvious order (frontends, transformations, 48 backends). 49 50 The \lstinline|Main| module in the main tock directory is the actual module for the tock executable. 51 It merely deals with the command-line options and strings together the various passes according to 52 the options given. 53 54 If you want to add a new frontend or backend, then add a new command-line option (look in the modules 55 \lstinline|Main| and \lstinline|CompState|) for it and handle the option accordingly in the \lstinline|Main| 56 module. To add a new pass, add it to the appropriate place in the list in the \lstinline|PassList| 57 module, or add it to the appropriate pass-item already listed there (e.g. \lstinline|simplifyTypes|). 32 58 33 59 \section{Understand the Existing Code} 34 60 35 %TODO give a brief explanation of the monadic code 61 There are (unfortunately, but realistically) several barriers to understanding the existing Tock code: 62 63 \begin{enumerate} 64 \item It's written in Haskell. 65 \item It makes heavy use of monads. 66 \item It uses generics. 67 \item You need to understand the AST well. 68 \item The C and C++ backends are quite dense and tricky. 69 \end{enumerate} 70 71 The last point is somewhat unavoidable, without an inspired re-write. Knowledge of occam will help 72 a lot with understanding the AST, except perhaps for the \lstinline|Structured| item (see below). 73 Haskell knowledge can be solved with a book or two (or other web resources); monads are also covered 74 in a section below. 75 76 \subsection{A.Structured} 77 78 %TODO 79 80 \subsection{Monadic Code} 81 82 A monad is a particular sort of type in Haskell. The most common (and in a way, the most complex) Haskell 83 monad is \lstinline|IO|. As an example, here is the type of the \lstinline|readFile| function: 84 85 \begin{lstlisting} 86 readFile :: FilePath -> IO String 87 \end{lstlisting} 88 89 The return value here is \lstinline|IO String|. You should read this as ``an IO action that yields a value 90 of type String''. One mistake that newcomers to monads often make is to immediately try to work out 91 how to get the String value out of the IO monad. This is actually the wrong way of looking at it; you 92 should instead look to lift all your other code into the IO monad. 93 94 Note also that the function is \textit{not} of type: \lstinline|IO FilePath -> IO String|. This would be 95 read as taking an IO action that yields a FilePath, and giving back an IO action that yields a String. 96 This type of function is rarely seen; usually all the arguments will be non-monadic, and only the final 97 (return) value will be monadic. 98 99 Unlike typical functional programming, monadic actions must be explicitly ordered. The easiest way to do 100 this is to use a \lstinline|do| block. 101 102 We will now look at a real example from the current Tock codebase. This is the \lstinline|doProcess| function 103 inside the \lstinline|removeParAssign| function in the \lstinline|SimplifyProcs| module. Its purpose 104 is to turn parallel assignments into multiple (sequential) single assignments. 105 106 \begin{lstlisting} 107 doProcess :: A.Process -> PassM A.Process 108 doProcess (A.Assign m vs@(_:_:_) (A.ExpressionList _ es)) 109 = do ts <- mapM typeOfVariable vs 110 specs <- sequence [makeNonceVariable "assign_temp" m t A.VariableName A.Original | t <- ts] 111 let temps = [A.Variable m n | A.Specification _ n _ <- specs] 112 let first = [A.Assign m [v] (A.ExpressionList m [e]) | (v, e) <- zip temps es] 113 let second = [A.Assign m [v] (A.ExpressionList m [A.ExprVariable m v']) | (v, v') <- zip vs temps] 114 return $ A.Seq m $ foldl (\s spec -> A.Spec m spec s) (A.Several m (map (A.OnlyP m) (first ++ second))) specs 115 doProcess p = doGeneric p 116 \end{lstlisting} 117 118 The type of this function is to take a \lstinline|Process| (the \lstinline |A.| is simply because it's an AST 119 fragment; the AST module is always imported as A) and returns a monadic action in the PassM monad that will yield 120 a Process. 121 122 The function has a standard (i.e. unrelated to monads) Haskell pattern-match as its header. The direct value of 123 the function is a \lstinline|do| block; a \lstinline|do| block is technically of type `monadic action'. 124 125 The indentation rule of the do block is fairly simple; each item in the do block should have the same level as 126 indentation, and finishes when something is found with less indentation (fairly standard Haskell rules). 127 128 The first line (\lstinline|ts <- mapM typeOfVariable vs|) is already somewhat complex. Firstly, the type of 129 \lstinline|typeOfVariable| is: 130 131 \begin{lstlisting} 132 typeOfVariable :: (CSM m, Die m) => A.Variable -> m A.Type 133 \end{lstlisting} 134 135 CSM and Die are two typeclasses to which PassM belongs. So in our case, \lstinline|m| can be \lstinline|PassM|. 136 Therefore the effective type for us is \lstinline|A.Variable -> PassM A.Type|. \lstinline|mapM| is a monadic 137 version of \lstinline|map|: 138 139 \begin{lstlisting} 140 mapM :: Monad m => (a -> m b) -> [a] -> m [b] 141 \end{lstlisting} 142 143 It basically takes a monadic function, and applies it to each element of the given list, returning a monadic 144 action that will yield the mapped elements. 145 146 So in our case, \lstinline|mapM typeOfVariable| will have type \lstinline|[A.Variable] -> PassM [A.Type]. 147 The argument is then \lstinline|vs|. The notation \lstinline|ts <-| means that the value yielded 148 by the monadic action is labelled as \lstinline|ts|. You may think of it as the monadic version of the 149 \lstinline|let| notation in normal Haskell. Note that \lstinline|ts| is of type \lstinline|[A.Type]|; it 150 is not monadic. The action is actually performed by this statement, and the result is put into \lstinline|ts|. 151 152 The second line is again interesting. The list is a standard Haskell list comprehension. The type of 153 \lstinline|makeNonceVariable| is: 154 155 \begin{lstlisting} 156 makeNonceVariable :: CSM m => String -> Meta -> A.Type -> A.NameType -> A.AbbrevMode -> m A.Specification 157 \end{lstlisting} 158 159 The list comprehension is therefore of type \lstinline|[PassM A.Specification]|; that is, it is a list 160 of monadic actions, each of which will yield a Specification. This is then given to the sequence function 161 which has this type: 162 163 \begin{lstlisting} 164 sequence :: Monad m => [m a] -> m [a] 165 \end{lstlisting} 166 167 In other words, sequence takes a list of monadic actions, performs each of them (in sequence!) and returns 168 the resulting list of elements (inside the monad, of course). So sequence performs all our actions and gives 169 us back a list of Specifications. This list is then labelled as \lstinline|specs|. 170 171 The next three lines in our code fragment begin with \lstinline|let|. These are all non-monadic lines, and 172 are just plain Haskell. Note that there is a slight difference between the \lstinline|let| notation inside 173 a \lstinline|do| block to the normal \lstinline|let|..\lstinline|in| notation; there is no \lstinline|in| 174 keyword in the version in the \lstinline|do| block. This is a technicality, but one that can trip you up; 175 if you add the \lstinline|in| keyword you'll get a not-very-helpful parser error. 176 177 Finally, the last line of our function features \lstinline|return|. \lstinline|return| is a standard monadic 178 function that is used very frequently. Its type is: 179 180 \begin{lstlisting} 181 return :: Monad m => a -> m a 182 \end{lstlisting} 183 184 All it does is turn a plain (non-monadic) value into a simple monadic action that yields the value. More 185 simply, it lifts the value into the monad. In our function we have a plain value, but we need to lift it 186 inside the monad to satisfy the types. This is very complex in relation to the types, so is perhaps best 187 viewed as an analogue to C or Java's return statement. 188 189 Note that you there is nor really an opposite of the return function (something of type \lstinline|m a -> a|). 190 Values can never really be `freed' from the monad; everything around them just gets lifted inside. 191 192 \subsection{The PassM monad} 193 194 It has been mentioned above that the PassM monad is the most common monad. Here is it's actual type: 195 196 \begin{lstlisting} 197 type PassM = ErrorT ErrorReport (StateT CompState IO) 198 \end{lstlisting} 199 200 The PassM monad is a mixture of three monads; an error monad, a state monad, and the IO monad. The error 201 monad allows for exception-like mechanisms. In Tock, we throw an error whenever we can proceed no further 202 with the compilation. Examples include parser errors, type errors and parallel safety problems. The 203 state in question is the CompState type (see the module of the same name), which holds things like the 204 name-type dictionary (aka symbol table). The IO monad is included for various small reasons, such as 205 being able to read in files. 206 207 I have mentioned numerous times that you should put things inside the monad, not look to sneak them out. 208 The way it really works is that you combine all your actions inside the PassM monad into one giant 209 PassM monadic action. Then this action is run. For a simple example we will look at the testPass 210 function: 211 212 \begin{lstlisting} 213 runPass :: 214 PassM b -- ^ The actual pass. 215 -> CompState -- ^ The state to use to run the pass. 216 -> IO (CompState, Either ErrorReport b) -- ^ The resultant state, and either an error or the successful outcome of the pass. 217 runPass actualPass startState = (liftM (\(x,y) -> (y,x))) (runStateT (runErrorT actualPass) startState) 218 \end{lstlisting} 219 220 The first part of the function is just to reverse the order of the pair (this is real code). Perhaps 221 the most instructive part is the type. Given a PassM action and a starting state, the function will give 222 you back (still inside the IO monad, admittedly) the resulting state, and either an error (if there 223 was an error during the action) or the final result of the monadic action. 224 225 This is where the monad is unwrapped; the monadic action finally yields a plain value. But in Tock, 226 the plain value is only used in the \lstinline|Main| module. The whole of the rest of the program, 227 obviously or not, runs inside the PassM monad. 228 229 \subsection{Generics} 230 231 %TODO 36 232 37 233 \section{Add Your Own Code}
Note:
See TracChangeset
for help on using the changeset viewer.