Designing ParaSail, a new programming language

Monday, November 1, 2010

Type conversion in ParaSail

Allowing for user-defined type conversion is complex, as it is an NxN problem, where you want to allow conversion of N different types to each other, and each pair might involve a different set of operations. ParaSail addresses this problem in the following ways:

Allows use of the [[ ... ]] operation, which converts the operand to a universal type.

The user defines which universal type(s) a given type converts to by defining the "to_univ" operator(s). Once the value has been converted to a universal type, it will be implicitly converted to any other type which defines a "from_univ" operator from that same universal type.

Using a prefix can disambiguate if necessary: T2::[[T1_Obj]] will convert T1_Obj first to a universal type using T1's "from_univ" operator, and then to T2 using T2's "to_univ" operator.

Allows use of the target type's name as a function name.

This will convert between two types that are structurally equivalent (that is, the same module parameters were used in their defining instantiation), but which were distinguished by one or both being defined as new. For example:

type T1 is new Vector<Integer<>> and type T2 is new Vector<Integer<>> define two distinct types because of the use of new, but T2(T1_Obj) and T1(T2_Obj) will convert T1_Obj to type T2, and T2_Obj to type T1, respectively.

The target type's name can also convert an object from some source type if the user defines a "convert" operator for either of the two types, and this operator's input and output parameter types match the source and target of the conversion. Note that this matching might be thanks to the input or output formal being itself parameterized. For example, the interface Int_Vector below provides a "convert" operator that converts to any other instance of Int_Vector:

 interface Int_Vector<Element is Integer<>> is
 ...
 operator "convert"(X : Int_Vector) 
 -> Result_Type is
 Int_Vector<Target_Element is Integer<>>;
 end interface Int_Vector;

These capabilities seem to provide enough flexibility for the user to define the desired explicit conversion functions. Note that the only implicit conversion in ParaSail is from a universal type to a type with an appropriate "from_univ" operator.

Posted by Tucker Taft at 4:43 PM 6 comments:

A virtual machine for ParaSail with picothread scheduling

As we have worked toward having an initial compiler/interpreter for ParaSail, we have decided to define a ParaSail Virtual Machine (PSVM) which will support the kind of very-light-weight threading structure (picothreading) needed to be able to evaluate small constructs, like parameters in parallel. We have decided to turn each ParaSail operation into a single PSVM Routine, even if its execution will involve multiple threads executing bits and pieces of the code for the operation. Each PSVM routine is identified by a unique index, the routine index, and is represented by a routine header and a sequence of PSVM instructions.

The ParaSail Virtual Machine instructions use relatively conventional addressing modes to refer to operands in memory. Memory is presumed to be organized into areas, each area being a contiguous sequence of 64-bit words. While executing, a PSVM routine has access to the following areas:

Parameter_Area: An area for input and output parameters. Output parameters, if any, come first, followed by input parameters. Parameters may be a single 64-bit value, or may be a 64-bit pointer to a larger object.
Local_Area: An area for storing local variables, and parameter lists being built up to pass as a Parameter_Area to a called routine. The first word of a local area is a link to the local area of the enclosing block or operation, in the presence of nesting. This sort of link is generally called a static link. The second word of a local area is a link to the associated parameter area.
Type_Area: Each type, which is an instance of a module, is represented by an area containing the actual module parameters, and a table of operations. Each operation in the operation table is represented by a pair: routine index and type area. In many cases the type area for an operation is the same as the enclosing Type_Area, but for inherited operations, the type area for the operation would refer to the type area for the super-class from which the operation was inherited. The actual module parameters are most frequently other types (represented by a pointer to their type area), but can be values, object references, or operations.

Instructions identify a memory location of interest by a pair: memory area and offset. This pair is called an object locator. In addition to the Parameter_Area, Local_Area, and Type_Area, an object locator can specify a memory area pointed-to by a local variable, and the locator offset is the offset within the pointed-to area. The local pointers are called local base registers, and may reside in any of the first 64 words of the local area. Finally, in the presence of nesting, the chain of static links may be followed to find an enclosing local area or enclosing parameter area.

Here is a sampling of ParaSail Virtual Machine instructions:

Move_Word, Store_Int_Literal, Store_String_Literal, Store_Real_Literal, Jump, Conditional_Jump, Call, Return, Parallel_Call, Parallel_Block, Parallel_Wait.

Operands are generally identified with object locators, except for literal operands which are identified either by their actual value, or by an index into a table of literals.

Note that there are no instructions for doing normal arithmetic or logical operations. These are all implemented by calling routines. There are a set of built-in routines for doing the typical set of arithmetic and logical operations, on operands of various sizes.

The more interesting instructions are the Parallel_Call, Parallel_Block, and Parallel_Wait. Parallel_Call is essentially equivalent to Call, where the calling routine computes the input parameters and places them in a parameter area, and then calls the routine. The difference with a Parallel_Call is that the caller also identifies a picothread master and allocates a small area for a picothread control block (pTCB), and the instruction completes without waiting for the call to complete. Instead, the Parallel_Call instruction adds the pTCB to a queue waiting to be serviced by one of the virtual processors (vCPUs) executing the ParaSail program. When the caller thread has finished its own work and wants to wait for the Parallel_Call to complete, it uses the Parallel_Wait instruction, identifying the same picothread master as was specified in the Parallel_Call instruction. This suspends the calling thread until all of the parallel picothreads associated with the picothread master complete.

The Parallel_Block instruction is very similar to the Parallel_Call instruction, except that it identifies instructions that are part of the current routine, rather than calling a separate routine. The execution of these instructions is performed by a separate picothread, which has its own pTCB, and local area. The static link of the local area for the Parallel_Block picothread refers to the local area of the thread invoking the Parallel_Block instruction, allowing the nested picothread to use up-level references to reach the local variables of the enclosing picothread.

The Return instruction is used to complete processing of both a Parallel_Call and a Parallel_Block, and the Parallel_Wait is used to wait for either kind of parallel activity.

We recently completed a prototype implementation of the ParaSail Virtual Machine, including the picothread scheduling. We learned some interesting lessons along the way. Initially, a vCPU that was executing a picothread that performed a Parallel_Wait was itself suspended. That quickly exhausted the number of vCPUs, and led us to start dynamically creating new vCPUs. That caused the overall stack space to grow dramatically, since each vCPU needed its own heavy-weight threading context in the underlying operating system, along with a stack.

At this point, we concluded that a vCPU that executed a Parallel_Wait instruction should service the queue of waiting pTCBs if the picothread master it was waiting for was not yet complete. That significantly reduced the number of vCPUs needed. However, it still didn't seem to be as efficient as it could be. As originally implemented, the queue of waiting pTCBs was first-in, first-out (FIFO). However, after looking at various traces of execution, we realized that it was the last pTCB that was created which was always the first pTCB to be awaited. Hence, we concluded that the pTCB queue should be last-in, first-out (LIFO). That is, a vCPU should preferentially service the most recently queued pTCB when it had cycles to spare, since that would more likely be associated with a picothread master that is being awaited, and by servicing that pTCB first, it will reduce the number of pTCBs that were suspended in a Parallel_Wait instruction. After making this final change, even a heavily recursive algorithm throwing off lots of parallel picothreads was handled efficiently.

Posted by Tucker Taft at 11:01 AM 1 comment:

Tuesday, September 14, 2010

Notation for "intervals" in ParaSail

It is quite common in a precondition or postcondition to want to indicate that the value of a variable must be within some interval of values (aka "range" of values), such as 1..10, or 0..+inf. When dealing with real values (e.g. floating point values) as opposed to integer values, it is often desirable to represent an open or half-open interval, where the boundary value is not included in the specified interval. For example, to specify that X must be in the interval 0.0 .. 10.0, but not including zero itself, the notation "(0.0 .. 10.0]" is sometimes used, where where "(" and ")" represent an open (exclusive) boundary, while "[" and "]" represent a closed (inclusive) boundary.

For ParaSail, because "()" and "[]" are already used in the syntax, we have adopted a different notation for open and half-open intervals:

0.0 .. 10.0 closed interval
0.0 <.. 10.0 half-open on left
0.0 ..< 10.0 half-open on right
0.0 <..< 10.0 open on both sides

Hence, one can write in an annotation:

{ X in A <.. B }

as a short-hand for

{ A < X and then X <= B }

with the additional difference that X is only evaluated once (though that will rarely matter in an annotation).

Like the other relational operators, these interval operators are defined automatically once the "=?" operator has been appropriately defined.

Posted by Tucker Taft at 9:25 AM 4 comments:

Wednesday, September 1, 2010

Finalization (destructors) and multi-thread exits in ParaSail

Languages such as C++ and Ada that support both finalization (destructors) and exceptions generally face some degree of distributed overhead due to the complex interactions between these two features. Since exceptions can arise virtually anywhere, if there is finalization to be performed (e.g. destructors to be called) on the way "out" of a scope, there is generally a need for some kind of per-thread cleanup list to be walked as part of propagating exceptions, or the need for numerous implicit exception handlers and a precise indication of exactly how much initialization has been performed (e.g. constructors that have been completed) at any given moment.

As indicated in an earlier post, we have tentatively decided to do without exceptions in ParaSail, but the exit ... with feature can result in potentially numerous scopes being exited in the various other threads which are all being terminated as a result of some thread initiating an exit from a compound statement such as a block or loop. We have also suggested that ParaSail will support some notion of finalization, if an "end" operator is associated with the type of an object, such that on exiting the scope of the object, the "end" operator will be invoked automatically. So the question is: must these "end" operator calls be performed on all of the local objects of the threads being terminated as a side effect of an exit by some parallel thread? Our tentative answer is "no."

To avoid race conditions between a thread being terminated and the code executing after an exit, we believe that we need to restrict a thread that might be prematurely terminated, so that it can update only objects local to the exitable construct. The only code of an exitable construct that would be permitted to update objects declared outside the construct would be the code following an "exit ... with" or the code following "end ... with".

For example, here is the exitable construct we used in an earlier post:

const Result : optional Tree_Id;
for T => Root then T.Left || T.Right
while T not null concurrent loop
if T.Value == Desired_Value then
// Found desired node, exit with its identifier
exit loop with (Result => T.Id);
end if;
end loop with (Result => null);

In this example, we might have many threads all executing inside the loop concurrently. To avoid race conditions, we would not allow any of these threads to update objects declared outside the loop, because they might be terminated at any moment, and the update might be disrupted in the middle. However, we would allow the code following the "exit loop with" to update Result, as well as the code following "end loop with." This is safe because only one of these is ever executed for a given execution of the loop, and once we begin executing such code it won't be disrupted by some other thread of the same loop initiating an exit.

Note that the code following an exit ... with or end ... with might be disrupted by a thread exiting some enclosing construct, but this code would not be allowed to update objects outside that enclosing construct, thereby avoiding the race condition.

Given this rule that code in a compound statement with multiple threads may not update objects declared outside the compound statement, if there is a chance that at least one of those threads might perform an exit ... with, we can simplify the finalization problem. There is no need to invoke the "end" operator on an object if that operator cannot possibly affect objects that will survive the exit statement.

Thinking more specifically about the "end" operator, and the general lack of access to global variables within an operation, what exactly can an "end" operator do? The answer is that an "end" operator cannot really do anything unless the object being finalized includes a reference of some sort to an object that will outlive the finalization. We will talk more in a future posting about what if any restrictions exist on incorporating a reference to one object as part of another object, but for now let us presume that such objects can exist, but they cannot be overwritten by a whole-object assignment statement (i.e. they are not "assignable").

What would be the purpose of such an object with an embedded reference? One use would be to perform a kind of "mark/release" style of resource allocation. On entry to a scope, a "mark" object could be created with a reference to an enclosing resource manager of some sort. Inside the scope, allocations of the resource would be mediated by the "mark" object, such that when exiting the scope, the "end" operator applied to the "mark" object could automatically release all of the resources allocated on behalf of objects local to the scope.

Now returning to our original question about whether finalization needs to be performed on all objects local to a thread that is being prematurely terminated -- if we presume that the "end" operators are performing something analogous to a "release" operation, then we can see how we could safely skip all intermediary release operations so long as we perform the ones associated with the actual compound statement being exited. This also presumes that the only references permitted from an object local to a multi-threaded compound statement with an exit, to an object declared outside the compound statement, are references to concurrent objects that are themselves local to the innermost enclosing multi-threaded-with-exit compound statement. For example:

var Enclosing_Obj1 : concurrent T := ...
...
*Block1*
block
var Mark1 := Create_Mark(Enclosing_Obj1, ...);
// Mark1 contains a reference to Enclosing_Obj1
// and has an "end" operator which performs a release operation.
for I in 1..10 concurrent loop
var Mark2 := Create_Mark(Mark1, ...);
// Here we have an inner mark
Allocate_Resource(Mark2, ...);
// Here we allocate some resource using Mark2 to mediate the
// allocation from Mark1 which in turn is mediating allocation
// from Enclosing_Obj1. The marks allow the resources to be
// automatically released on block exit as a side effect of
// finalization via the "end" operator.
if This_Looks_Good(I, ...) then
exit block Block1 with (Result => I);
// This terminates any other threads inside Block1.
end if;
end loop;
end block Block1 with (Result => 0);

Now if some thread is deep within a call on "This_Looks_Good" when some other thread initiates the block exit, the only object that gets explicitly finalized will be Mark1. The multiple Mark2 objects (one for each thread of the concurrent loop) will not be finalized, as presumably performing an "end" on Mark1 will also release any allocations that were mediated by one of the Mark2 objects.

The bottom line is that when a tree of threads is terminated by an exit from a multi-threaded compound statement, the only finalization that needs to be performed is for the objects immediately local to the compound statement being exited. Objects local to the various threads being terminated need not be separately finalized. This avoids the kind of complex interaction that exists between finalization and exception propagation, including avoiding the overhead of maintaining precise initialization information, and avoiding performing a lot of potentially wasteful finalization.

Posted by Tucker Taft at 7:42 AM 2 comments:

Tuesday, August 24, 2010

No exceptions in ParaSail, but exitable multi-thread constructs

We have been mulling over the idea of exceptions in ParaSail, and have pretty firmly concluded that they aren't worth the trouble. In a highly parallel language, with lots of threads, exception propagation across threads becomes a significant issue, and that is a nasty area in general. Also, exceptions can introduce many additional paths into a program, making thorough testing that much harder. And the whole business of declaring what exceptions might be propagated, and then deciding what to do if some other exception is propagated can create numerous maintenance headaches.

There is a feature in ParaSail as currently designed which provides some of the same capabilities of exceptions, but is particularly suited to parallel programming. This is the "exit with" statement, which allows a construct to be exited with one or more values specified as results, and at the same time terminating any other threads currently executing within the construct. For example, here is a loop implementing a parallel search of a tree with the first thread finding the desired node exiting and killing off all of the other threads as part of the "exit ... with" statement:

const Result : optional Tree_Id;
for T => Root then T.Left || T.Right
while T not null concurrent loop
if T.Value == Desired_Value then
// Found desired node, exit with its identifier
exit loop with (Result => T.Id);
end if;
end loop with (Result => null);

This declares a Result object of type Tree_Id. It then walks the tree in parallel, starting at Root and continuing with T.Left and T.Right concurrently. It continues until it reaches "null" on each branch, or some node is found with its Value component matching the Desired_Value. The value of Identifier at the end indicates the identifier of the node having the desired Value, or null to indicate that no node was found. The presence of optional in the declaration for Result indicates that its value might be null.

Supporting this kind of intentional "race" seems important in parallel programming, as many problems are amenable to a divide and conquer approach, but it is important that as soon as a solution is found, no further time is wasted searching other parts of the solution space. The "end ... with" phrase allows the specification of one or more results if the construct ends normally, as opposed to via an "exit ... with" (in this case, ending normally means all threads reach a null branch in the walk of the tree without finding the desired node). Effectively the "exit ... with" skips over the "end ... with" phrase.

So how does this all relate to exceptions? Well given the "exit ... with" capability, one can establish two or more threads, one which monitors for a failure condition, and the others which do the needed computation. The thread monitoring for a failure condition performs an "exit ... with" if it detects a failure, with the result indicating the nature of the failure, and as a side-effect killing off any remaining computation threads. If the normal computation succeeds, then an "exit ... with" giving the final result will kill off the monitoring thread. Note that the "exit ... with" statements must occur textually within the construct being exited, so it is visible whether such a premature exit can occur, unlike an exception which can arise deep within a call tree and be propagated out many levels.

As an example of the kind of failure condition which might be amenable to this kind of monitoring, imagine a resource manager object, which provides up to some fixed maximum of some kind of resource (e.g. storage) to code within a block. This resource manager (which is presumably of a concurrent type) could be passed down to operations called within the block for their use. Meanwhile, a separate monitoring thread would be created immediately within the block which would call an operation on the resource manager which would suspend the thread until the resource runs out, at which point it would be awakened with an appropriate indication of the resource exhaustion, and any other information that might be helpful in later diagnosis. On return from this Wait_For_Exhaustion operation, the monitoring thread would do an "exit block with (Result => Failure, ...)" or equivalent, to indicate that the computation required more resources than were provided. The code following the block would then be able to take appropriate action.

Posted by Tucker Taft at 1:41 PM No comments:

Monday, August 23, 2010

Initial implementation model for ParaSail types

ParaSail supports object-oriented programming, and as such there needs to be some kind of run-time representation for types to support dispatching calls (aka virtual function calls, method calls, etc.). Most "normal" objects need no run-time type information associated with them, but an object or parameter of a polymorphic type (such as "Int_Expr+") needs some run-time type information to support dispatching calls on its operations. In addition, each operation of an interface generally needs an implicit parameter identifying its "enclosing" type, to gain access to the module parameters, given that every module is effectively a "generic" module.

Because ParaSail supports multiple inheritance of interfaces (including ad hoc interface matching -- see blog entry on that topic), a simple table of operations for each type with a compile-time-known slot-number in the table doesn't work as it would if ParaSail only supported single inheritance. Instead we adopt the notion of a "type view" which consists of a pointer to the overall table of operations of the type, as well as an "slot-number mapping" to provide the particular "view" of the type. The slot-number mapping is a simple vector of operation slot numbers for the operation table, indexed by the interface slot number of the particular interface through which the type is being "viewed." For example, presume we have an interface "Assignable" with only two operations, say, Assign and Copy, with interface slot numbers one and two. Given some type with many operations, where the operation slot numbers of 21 and 33 are for Assign and Copy respectively, then the Assignable "view" of the type would have a slot-number mapping of:

[1 => 21, 2 => 33]

The actual operation table would be a vector of pairs, each pair being a reference to the type from which the code for the operation was inherited, and the reference to the actual code for the operation. Hence, the operation table for Type_C could be imagined as something like:

[1 => (Type_A, Op1_Code),
2 => (Type_B, Op2_Code),
...,
21 => (Type_A, Assign_Code),
...,
33 => (Type_C, Copy_Code),
...]

Here we assume that the code for Op1 and Assign originated with Type_A, the code for Op2 originated with Type_B, and the code for Copy originated with Type_C.

In addition to an operation table, a type would have a table of module actuals, one for each module formal parameter. Module actuals could be themselves "type views," operations, or (constant) objects.

If an interface declared externally-visible components (const or var), these would be represented for uniformity by operations that take a reference to an object of the interface's type and return a reference to the component. This allows multiple inheritance of such (at least conceptual) components from one or more implemented interfaces, though the actual offset of the components within the object (or whether they are true components in the case of consts) might very well differ from the corresponding components of the interface.

When an operation is called, in addition to its explicit parameters, it would be passed a reference to the type from which it originated. This would allow it to gain access to the other operations of its module, as well as to the module actuals. Because ParaSail allows the type from which a type is extended to be overridden (see blog entry on ParaSail extension, inheritance, and polymorphism ), the operation table may vary depending on which type is the base for extension (since the type from which a given operation originates could vary). Hence an operation doesn't necessarily "know" where one of the other operations of its module originates (presuming the operation is inherited rather than redefined within the module's class directly).

Because each type has its own operation table with pointers to the code for each operation, it is possible for some of the operations to be specially compiled for that type, rather than simply reusing the "generic" code generated when the module was first compiled. This allows the code to incorporate information about the module actuals (as an optimization), rather than having to use the information only at run-time which is what the "generic" code would do. Hence, in the example above, the Copy_Code for Type_C might incorporate directly information about the module actuals or the operation table for Type_C, rather than having to index into the table of module actuals or into the operation table at run-time.

Other implementation models are of course possible, but this one seems to have the advantage of uniformity and flexibility, with constant-time run-time performance for making a dispatching call.

Posted by Tucker Taft at 6:16 PM No comments:

Ad hoc interface matching in ParaSail

When a module is instantiated in ParaSail to form a type, actual parameters must be specified for each module formal parameter. When the module formal is itself a type, the module actual must be a type that matches the formal appropriately. For example, given:

interface Set<Element_Type is Assignable<>> is
function Union(Left, Right : Set) -> Set;
function Unit_Set(Elem : Element_Type) -> Set;
...
end interface Set;

we will want to be able to write:

type My_Int_Set is Set<My_Integer_Type>;

Now the question is whether My_Integer_Type must be based on an interface that is explicitly indicated as extending or implementing Assignable, or should we simply require that My_Integer_Type has all of the operations expected of a type based on Assignable. In other words, does an actual type match a formal type only if it extends or implements the specified interface, or is ad hoc matching permitted, where at the point of instantiation a check is made that all of the required operations are present? A similar question would come up when converting an object of a particular type to a polymorphic type (such as Assignable+).

Our initial instincts were to require that the actual type (or the type being converted) explicitly extend or implement the formal type (or the target type). However, this will tend to cause a proliferation of interfaces being added to the list of implements for any given module, such as Assignable or Comparable or Numeric or ..., and still the actual subset of operations of interest might not be mentioned explicitly.

Given the above concern, we now favor allowing ad hoc matching requirements for matching of module actuals to module formals, and when converting from a normal type to a polymorphic type. There are some potential problems during maintenance if the operations of an interface change, but those problems exist anyway, since any code that makes calls on individual operations defined in an interface may suffer maintenance issues when operations change. Note, however, that adding operations to the actual type, or removing them from the formal type, don't create maintenance issues.

In general, there are many fewer places where modules are instantiated, or conversions to polymorphic types occur, than there are normal calls on operations, and defining and using (presumably abstract) interfaces such as Assignable or Comparable as module formals, if they capture exactly what operations of interest are needed by the module, could reduce rather than increase maintenance problems when trying to decide whether to change an existing operation of some interface.

So the answer to the above question about instantiating Set with My_Integer_Type is now answered as follows: My_Integer_Type must provide the operations defined in the interface Assignable, but it need not directly or indirectly extend or implement Assignable. This generally means that the list of interfaces specified in the implements list is there primarily for documentation, and for imposing a linkage such that if additional operations are added to the implemented interface, the need to define the operations is known at the point of module definition, rather than being deferred until some point of use. This also implies that when ad hoc matching is used, there is a presumption that the module formal (or target type) is very stable and new operations are not expected to be added to it. A type like Assignable<> certainly qualifies.

Posted by Tucker Taft at 3:25 PM 10 comments: