Monday, December 23, 2013
Concurrency vs. Parallelism
Over the past few years there seems to have been an increasing number of discussions of the difference between concurrency and parallelism. These discussions didn't seem very convincing at first, but over time a useful distinction did begin to emerge. So here is another attempt at trying to distinguish these two:
ParaSail is focused on providing parallel programming constructs, but concurrent objects are useful for more traditional concurrent programming approaches, particularly when coupled with explicit use of the "||" operator. It is an interesting question whether some higher-level process-like construct might be useful.
- Concurrent programming constructs allow a programmer to simplify their program by using multiple (heavy-weight) threads to reflect the natural concurrency in the problem domain. Because restructuring the program in this way can provide significant benefits in understandability, relatively heavy weight constructs are acceptable. Performance is also not necessarily as critical, so long as it is predictable.
- Parallel programming constructs allow a programmer to divide and conquer a problem, using multiple (light-weight) threads to work in parallel on independent parts of the problem. Such restructuring does not necessarily simplify the program, and as such parallel programming constructs need to be relatively light weight, both syntactically and at run-time. If the constructs introduce too much overhead, they defeat the purpose.
ParaSail is focused on providing parallel programming constructs, but concurrent objects are useful for more traditional concurrent programming approaches, particularly when coupled with explicit use of the "||" operator. It is an interesting question whether some higher-level process-like construct might be useful.
Thursday, November 21, 2013
First release of Javallel, Parython; new release 5.1 of ParaSail and Sparkel
The ParaSail family of languages is growing, with two more additions now available for experimentation. We have made a new release 5.1 which includes all four members of the family -- ParaSail itself, Sparkel based on the SPARK subset of Ada, Javallel based on Java, and Parython based on Python. Binaries plus examples for these are all available in a single (large) download:
http://bit.ly/ps51bin
As before, if you are interested in sources, visit:
http://sparkel.org
The biggest change in ParaSail was a rewrite of the region-based storage manager (actually, this same storage manager is used for all four languages), to dramatically reduce the contention between cores/processors related to storage management. The old implementation was slower, and nevertheless still had a number of race conditions. This one is faster and (knock on wood) free of (at least those ;-) race conditions.
As far as how Javallel relates to Java, here are some of the key differences:
http://bit.ly/ps51bin
As before, if you are interested in sources, visit:
http://sparkel.org
The biggest change in ParaSail was a rewrite of the region-based storage manager (actually, this same storage manager is used for all four languages), to dramatically reduce the contention between cores/processors related to storage management. The old implementation was slower, and nevertheless still had a number of race conditions. This one is faster and (knock on wood) free of (at least those ;-) race conditions.
As far as how Javallel relates to Java, here are some of the key differences:
- Classes require a "class interface" to declare their visible operations and fields
- There is no notion of a "this" parameter -- all parameters must be declared
- There is no notion of "static" -- a method is effectively static if it doesn't have any parameters whose type matches the enclosing class; no variables are static
- You only say "public" once, in the class, separating the private stuff (before the word "public") from the implementation of the visible methods.
- Semi-colons are optional at the end of a line
- Parentheses are optional around the condition of an "if"
- "for" statements use a different syntax; e.g:
- for I in 1..10 [forward | reverse | parallel] { ... }
- "{" and "}" are mandatory for all control structures
- You can give a name to the result of a method via:
- Vector
createVec(...) as Result { ... Result = blah; ... } - You have to say "T+" rather than simply "T" if you want to accept actual parameter that are of any subclass of T (aka polymorphic). "T" by itself only allows actuals of exactly class T.
- Object declarations must start with "var," "final," or "ref" corresponding to variable objects, final objects, or ref objects (short-lived renames).
- There are no special constructors; any method that returns a value of the enclosing is effectively a constructor; objects may be created inside a method using a tuple-like syntax "(a => 2, b => 3)" whose type is determined from context
- X.Foo(Y) is equivalent to Foo(X, Y)
- Top-level methods are permitted, to simplify creating a "main" method
- uses "and then" and "or else" instead of "&&" and "||"; uses "||" to introduce explicit parallelism.
- "synchronized" applies to classes, not to methods or blocks
- enumeration literals start with "#"
There are examples in javallel_examples/*.jl?, which should give a better idea of what
javallel is really like. Parython examples are in parython_examples/*.pr?
Thursday, November 14, 2013
Using ParaSail as a Modeling Language for Distributed Systems
The ACM HILT 2013 conference just completed in Pittsburgh, and we had some great tutorials, keynotes, and sessions on model-based engineering, as well as on formal methods applied to both modeling and programming languages. One of the biggest challenges identified was integrating complex systems with components defined in various modeling or domain-specific languages, along with an overall architecture, which might be specified in a language like AADL or SysML or might just be sketches on a whiteboard somewhere. A big part of the challenge is that different languages have different underlying semantic models, with different type systems, different notions of time, different concurrency and synchronization models (if any), etc. The programming language designer in me wants to somehow bring these various threads (so to speak) together in a well-defined semantic framework, ideally founded on a common underlying language.
One way to start is by asking how can you "grow" a programming language into a modeling language (without killing it ;-)? ParaSail has some nice features that might fit well at the modeling level, in that its pointer-free, implicitly parallel control and data semantics are already at a relatively high level, and don't depend on a single-address-space view, nor a central-processor-oriented view. As an aside, Sebastian Burckhardt from Microsoft Research gave a nice talk on "cloud sessions" at the recent SPLASH/OOPSLA conference in Indianapolis (http://splashcon.org/2013/program/991, http://research.microsoft.com/apps/pubs/default.aspx?id=163842), and we chatted afterward about what a perfect fit the ParaSail pointer-free type model was to the Cloud Sessions indefinitely-persistent data model. Modeling often abstracts away some of the details of the distribution and persistence of processing and data, so the friendliness of the ParaSail model to Cloud Sessions might also bode well for its friendliness to modeling of other kinds of long-lived distributed systems.
ParaSail's basic model is quite simple, involving parameterized modules, with separate definition of interface and implementation, types as instantiations of modules, objects as instances of types, and operations defined as part of defining modules, operating on objects. Polymorphism is possible in that an object may be explicitly identified as having a polymorphic type (denoted by T+ rather than simply T) and then the object carries a run-time type identifier, and the object can hold a value of any type that implements the interface defined by the type T, including T itself (if T is not an abstract type), as well as types that provide in their interface all the same operations defined in T's interface.
So how does this model relate to a modeling language like Simulink or a Statemate? Is a Simulink "block" a module, a type, an object, or an operation (or something entirely different)? What about a box on a state-machine chart? For Simulink, one straightforward answer is that a Simulink block is a ParaSail object. The associated type of the block object defines a set of operations or parameter values that determine how it is displayed, how it is simulated, how it is code-generated, how it is imported/exported using some XML-like representation, etc. A Simulink graph would be an object as well, being an instance of a directed graph type, with a polymorphic type, say "Simulink_Block+," being the type of the elements in the graph (e.g. DGraph).
Clearly it would be useful to define new block types using the Simulink-like modeling language itself, rather than having to "drop down" to the underlying programming language. One could imagine a predefined block type "User_Defined_Block" used to represent such blocks, where the various display/simulation/code-generation/import/export operations would be defined in a sub-language that is itself graphical, but relies on some additional (built-in) block types specifically designed for defining such lower-level operations. Performing code-generation on these graphs defining the various primitive operations of this new user-defined block type would ideally create code in the underlying programming language (e.g. ParaSail) that mimics closely what a (ParaSail) programmer might have written to define a new block type directly in the (ParaSail) programming language. This begins to become somewhat of a "meta" programming language, which always makes my head spin a little...
A practical issue at the programming language level when you go this direction is that, what was a simple interface/implementation module model, may want to support "sub-modules" in various dimensions. In particular, there may be sets of operations associated with a given type devoted to relatively distinct problems, such as display vs. code generation, and it might be useful to allow both the interface, and certainly the implementation of a block-type-defining module to be broken up into sub-modules. The ParaSail design includes this notion, which we have called "interface extensions" (which is a bit ambiguous, so the term "interface add-on" might be clearer). These were described in:
http://parasail-programming-language.blogspot.com/2010/08/eliminating-need-for-visitor-pattern-in.html
but have not as of yet been implemented. Clearly interface add-ons, for say, [#display] or [#code_gen], could help separate out the parts of the definition of a given block type.
A second dimension for creating sub-modules would be alternative implementations of the same interface, with automatic selection of the particular implementation based on properties of the parameters specified when the module is instantiated. In particular, each implementation might have its own instantiation "preconditions" which indicate what must be true about the actual module parameters provided before a given implementation is chosen. In addition, there needs to be some sort of a preference rule to use when more than one implementations' preconditions are satisfied by a given instantiation. For example, presume we have one implementation of an Integer interface that handles 32-bit ranges of integers, a second that handles 64-bit ranges, and one that handles infinite range. Clearly the 32-bit implementation would have a precondition that the range required be within +/- 2^31, the 64-bit one would require the range be within +/- 2^63, and the infinite-range-handling implementation would have no precondition. If we were to instantiate this Integer module with a 25-bit range, the preconditions of all three of the implementations would be satisfied, but there would presumably be a preference to use the 32-bit implementation over the other two. The approach we have considered for this is to allow a numeric "preference" level to be specified when providing an implementation of a module along with the implementation "preconditions," with the default level being "0" and the default precondition being "#true." The compiler would choose the implementation with the maximum preference level with satisfied preconditions. It would complain if there were a tie, requiring the user to specify explicitly which implementation of the module is to be chosen at the point of instantiation.
One way to start is by asking how can you "grow" a programming language into a modeling language (without killing it ;-)? ParaSail has some nice features that might fit well at the modeling level, in that its pointer-free, implicitly parallel control and data semantics are already at a relatively high level, and don't depend on a single-address-space view, nor a central-processor-oriented view. As an aside, Sebastian Burckhardt from Microsoft Research gave a nice talk on "cloud sessions" at the recent SPLASH/OOPSLA conference in Indianapolis (http://splashcon.org/2013/program/991, http://research.microsoft.com/apps/pubs/default.aspx?id=163842), and we chatted afterward about what a perfect fit the ParaSail pointer-free type model was to the Cloud Sessions indefinitely-persistent data model. Modeling often abstracts away some of the details of the distribution and persistence of processing and data, so the friendliness of the ParaSail model to Cloud Sessions might also bode well for its friendliness to modeling of other kinds of long-lived distributed systems.
ParaSail's basic model is quite simple, involving parameterized modules, with separate definition of interface and implementation, types as instantiations of modules, objects as instances of types, and operations defined as part of defining modules, operating on objects. Polymorphism is possible in that an object may be explicitly identified as having a polymorphic type (denoted by T+ rather than simply T) and then the object carries a run-time type identifier, and the object can hold a value of any type that implements the interface defined by the type T, including T itself (if T is not an abstract type), as well as types that provide in their interface all the same operations defined in T's interface.
So how does this model relate to a modeling language like Simulink or a Statemate? Is a Simulink "block" a module, a type, an object, or an operation (or something entirely different)? What about a box on a state-machine chart? For Simulink, one straightforward answer is that a Simulink block is a ParaSail object. The associated type of the block object defines a set of operations or parameter values that determine how it is displayed, how it is simulated, how it is code-generated, how it is imported/exported using some XML-like representation, etc. A Simulink graph would be an object as well, being an instance of a directed graph type, with a polymorphic type, say "Simulink_Block+," being the type of the elements in the graph (e.g. DGraph
Clearly it would be useful to define new block types using the Simulink-like modeling language itself, rather than having to "drop down" to the underlying programming language. One could imagine a predefined block type "User_Defined_Block" used to represent such blocks, where the various display/simulation/code-generation/import/export operations would be defined in a sub-language that is itself graphical, but relies on some additional (built-in) block types specifically designed for defining such lower-level operations. Performing code-generation on these graphs defining the various primitive operations of this new user-defined block type would ideally create code in the underlying programming language (e.g. ParaSail) that mimics closely what a (ParaSail) programmer might have written to define a new block type directly in the (ParaSail) programming language. This begins to become somewhat of a "meta" programming language, which always makes my head spin a little...
A practical issue at the programming language level when you go this direction is that, what was a simple interface/implementation module model, may want to support "sub-modules" in various dimensions. In particular, there may be sets of operations associated with a given type devoted to relatively distinct problems, such as display vs. code generation, and it might be useful to allow both the interface, and certainly the implementation of a block-type-defining module to be broken up into sub-modules. The ParaSail design includes this notion, which we have called "interface extensions" (which is a bit ambiguous, so the term "interface add-on" might be clearer). These were described in:
http://parasail-programming-language.blogspot.com/2010/08/eliminating-need-for-visitor-pattern-in.html
but have not as of yet been implemented. Clearly interface add-ons, for say, [#display] or [#code_gen], could help separate out the parts of the definition of a given block type.
A second dimension for creating sub-modules would be alternative implementations of the same interface, with automatic selection of the particular implementation based on properties of the parameters specified when the module is instantiated. In particular, each implementation might have its own instantiation "preconditions" which indicate what must be true about the actual module parameters provided before a given implementation is chosen. In addition, there needs to be some sort of a preference rule to use when more than one implementations' preconditions are satisfied by a given instantiation. For example, presume we have one implementation of an Integer interface that handles 32-bit ranges of integers, a second that handles 64-bit ranges, and one that handles infinite range. Clearly the 32-bit implementation would have a precondition that the range required be within +/- 2^31, the 64-bit one would require the range be within +/- 2^63, and the infinite-range-handling implementation would have no precondition. If we were to instantiate this Integer module with a 25-bit range, the preconditions of all three of the implementations would be satisfied, but there would presumably be a preference to use the 32-bit implementation over the other two. The approach we have considered for this is to allow a numeric "preference" level to be specified when providing an implementation of a module along with the implementation "preconditions," with the default level being "0" and the default precondition being "#true." The compiler would choose the implementation with the maximum preference level with satisfied preconditions. It would complain if there were a tie, requiring the user to specify explicitly which implementation of the module is to be chosen at the point of instantiation.
Wednesday, November 6, 2013
Tech talk and Tutorial at SPLASH 2013 on parallel programming
I gave an 80-minute "tech talk" and a 3-hour tutorial on parallel programming last week at SPLASH 2013 in Indianapolis. The audiences were modest but enthusiastic.
The tech talk was entitled:
This tutorial will introduce attendees to the various paradigms for creating algorithms that take advantage of the growing number of multicore processors, while avoiding the overhead of excessive synchronization overhead. Many of these approaches build upon the basic divide-and-conquer approach, while others might be said to use a multiply-and-conquer paradigm. Attendees will also learn the theory and practice of "work stealing," a multicore scheduling approach which is being adopted in numerous multicore languages and frameworks, and how classic work-list algorithms can be restructured to take best advantage of the load balancing inherent in work stealing. Finally, the tutorial will investigate some of the tradeoffs associated with different multicore storage management approaches, including task-local garbage collection and region-based storage management.
The tech talk was entitled:
Living without Pointers: Bringing Value Semantics to Object-Oriented Parallel Programming
Here is the summary:
The heavy use of pointers in modern object-oriented programming
languages interferes with the ability to adapt computations to the new
distributed and/or multicore architectures. This tech talk will
introduce an alternative pointer-free approach to object-oriented
programming, which dramatically simplifies adapting computations to the
new hardware architectures. This tech talk will illustrate the
pointer-free approach by demonstrating the transformation of two popular
object-oriented languages, one compiled, and one scripting, into
pointer-free languages, gaining easier adaptability to distributed
and/or multicore architectures, while retaining power and ease of use.
Here is a link to the presentation: http://bit.ly/sp13pf
The tutorial was entitled:
Multicore Programming using Divide-and-Conquer and Work Stealing
Here is the summary for the tutorial:
This tutorial will introduce attendees to the various paradigms for creating algorithms that take advantage of the growing number of multicore processors, while avoiding the overhead of excessive synchronization overhead. Many of these approaches build upon the basic divide-and-conquer approach, while others might be said to use a multiply-and-conquer paradigm. Attendees will also learn the theory and practice of "work stealing," a multicore scheduling approach which is being adopted in numerous multicore languages and frameworks, and how classic work-list algorithms can be restructured to take best advantage of the load balancing inherent in work stealing. Finally, the tutorial will investigate some of the tradeoffs associated with different multicore storage management approaches, including task-local garbage collection and region-based storage management.
Here is a link to the presentation: http://bit.ly/sp13mc
Comments welcome!
-Tuck
Monday, September 16, 2013
Parallelizing Python and Java
Designing and implementing ParaSail has been a fascinating process. Achieving parallelism and safety at the same time by eliminating rather than adding features has worked out better than we originally expected. One of the lessons of the process has been that just a small number of key ideas are sufficient to achieve safe and easy parallelism. Probably the biggest is the elimination of pointers, with the substitution of expandable objects. The other big one is the elimination of direct access to global variables, instead requiring that a variable be passed as a (var or in out) parameter if a function is going to update it. Other important ones include the elimination of parameter aliasing, and the replacement of exception handling with a combination of more pre-condition checking at compile time and a more task-friendly event handling mechanism at run time.
So the question now is whether some of these same key ideas can be applied to existing languages, to produce something with much the same look and feel of the original, while moving toward a much more parallel-, multicore-, human-friendly semantics.
Over the past year we have been working on a language inspired by the verifiable SPARK subset of Ada, now tentatively dubbed Sparkel (for ELegant, parallEL, SPARK). For those interested, this experimental language now has its own website:
http://www.sparkel.org
Sparkel has essentially all of the power of ParaSail, but with a somewhat more SPARK-like/Ada-like look and feel. We will be giving a talk on Sparkel at the upcoming HILT 2013 conference on High Integrity Language Technology in Pittsburgh in November:
http://www.sigada.org/conf/hilt2013
HILT 2013 is open for registration, and it promises to be a great conference, with great talks, tutorials, and panels about model checking, model-based engineering, formal methods, SMT solvers, safe parallelism, etc. (disclaimer -- please note the name of the program chair).
At the upcoming OOPSLA/SPLASH conference, we are giving a talk about applying these same principles to Python and Java. Super-secret code names for the results of this effort are Parython and Javallel. The announcement of this talk is on the splashcon.org web site:
http://splashcon.org/2013/program/tutorials-tech-talks/853-living-without-pointers-bringing-value-semantics-to-object-oriented-parallel-programming
If you are coming to OOPSLA/SPLASH, please stop by to hear the results. We will also be adding entries over the next couple of months about some of the lessons learned in working to create a safe parallel language when starting quite explicitly from an existing language.
So the question now is whether some of these same key ideas can be applied to existing languages, to produce something with much the same look and feel of the original, while moving toward a much more parallel-, multicore-, human-friendly semantics.
Over the past year we have been working on a language inspired by the verifiable SPARK subset of Ada, now tentatively dubbed Sparkel (for ELegant, parallEL, SPARK). For those interested, this experimental language now has its own website:
http://www.sparkel.org
Sparkel has essentially all of the power of ParaSail, but with a somewhat more SPARK-like/Ada-like look and feel. We will be giving a talk on Sparkel at the upcoming HILT 2013 conference on High Integrity Language Technology in Pittsburgh in November:
http://www.sigada.org/conf/hilt2013
HILT 2013 is open for registration, and it promises to be a great conference, with great talks, tutorials, and panels about model checking, model-based engineering, formal methods, SMT solvers, safe parallelism, etc. (disclaimer -- please note the name of the program chair).
At the upcoming OOPSLA/SPLASH conference, we are giving a talk about applying these same principles to Python and Java. Super-secret code names for the results of this effort are Parython and Javallel. The announcement of this talk is on the splashcon.org web site:
http://splashcon.org/2013/program/tutorials-tech-talks/853-living-without-pointers-bringing-value-semantics-to-object-oriented-parallel-programming
If you are coming to OOPSLA/SPLASH, please stop by to hear the results. We will also be adding entries over the next couple of months about some of the lessons learned in working to create a safe parallel language when starting quite explicitly from an existing language.
Thursday, September 12, 2013
Revision 4.7 of ParaSail alpha release now available
We are pleased to release alpha revision 4.7 of the ParaSail compiler and virtual machine, available at the same URL:
http://bit.ly/Mx9DRb
http://bit.ly/Mx9DRb
This release includes a large number of bug fixes, plus the following enhancements:
More support for operations on polymorphic types, including binary operations, where it is an error if the two operands have different underlying types, unless the operator is "=?". "=?" is a special case, where rather than an error, it returns #unordered if two polymorphic operands have different underlying types. This now allows us to define a type like Set and have it work as desired, that is, as a Set holding (almost) any type.
We have a preference now for non-generic operations when there would otherwise be ambiguity between a normal operation and a generic operation of the same name. This is relevant for the "|" operator on Univ_String. We have now added To_String and From_String on Univ_String (these are identity operations), which means Univ_String now implements the "Imageable<>" interface. The new preference rules prevents this from creating ambiguity on | .
We know allow {> ... <} for annotations as a substitute for { ... }. This will allow us to eventually use { ... } for Set/Map constructors in the PARython dialect. The new {> ... <} syntax makes annotations a bit more obvious, which seems like a good thing.
Even when still using "then"/"is"/"loop"/"of" instead of ":", we have made "end blah" optional. Presumably project-specific rules might require "end blah" if the construct is too many lines long (e.g. more than 20 lines).
Highlighting information for the ConTEXT source text editor is under share/tools/... in a file named ConTEXT_ParaSail.chl courtesy of ParaSail user Arie van Wingerden. Similar information for the "geshi" highlighting system (used by WikiPedia) is in a file called geshi_parasail.php.
Case statements over polymorphic objects are now supported where the case choice has an associated identifier, such as in:
var E : Expr+ := ...
case E of
[L : Leaf] => ...
[U : Unary] => ...
[B : Binary] => ...
[..] =>
end case;
Note that the specified types for the case choices must be non-polymorphic types at the moment. In a later release we will support having choices with polymorphic types, such as:
[SE : Simple_Expr+] => ...
where presumably Simple_Expr is an extension of Expr.
We have added function types, of the form:
We have a preference now for non-generic operations when there would otherwise be ambiguity between a normal operation and a generic operation of the same name. This is relevant for the "|" operator on Univ_String. We have now added To_String and From_String on Univ_String (these are identity operations), which means Univ_String now implements the "Imageable<>" interface. The new preference rules prevents this from creating ambiguity on
We know allow {> ... <} for annotations as a substitute for { ... }. This will allow us to eventually use { ... } for Set/Map constructors in the PARython dialect. The new {> ... <} syntax makes annotations a bit more obvious, which seems like a good thing.
Even when still using "then"/"is"/"loop"/"of" instead of ":", we have made "end blah" optional. Presumably project-specific rules might require "end blah" if the construct is too many lines long (e.g. more than 20 lines).
Highlighting information for the ConTEXT source text editor is under share/tools/... in a file named ConTEXT_ParaSail.chl courtesy of ParaSail user Arie van Wingerden. Similar information for the "geshi" highlighting system (used by WikiPedia) is in a file called geshi_parasail.php.
Case statements over polymorphic objects are now supported where the case choice has an associated identifier, such as in:
var E : Expr+ := ...
case E of
[L : Leaf] => ...
[U : Unary] => ...
[B : Binary] => ...
[..] =>
end case;
Note that the specified types for the case choices must be non-polymorphic types at the moment. In a later release we will support having choices with polymorphic types, such as:
[SE : Simple_Expr+] => ...
where presumably Simple_Expr is an extension of Expr.
We have added function types, of the form:
func(X : Integer; Y : Float) -> Integer
Basically the same syntax as a "func" declaration but without the func's identifier. To declare a "func" that takes another "func" as a parameter, you would now use syntax like:
func Apply
(Op : func(X : Integer) -> Integer; S : Vector
-> Vector
rather than the former syntax:
func Apply
(func Op(X : Integer) -> Integer; S : Vector
-> Vector
The syntax for lambda expressions has been simplified, so you only specify the names of the inputs, with no mention of the types of the inputs or the outputs. For example:
lambda (Y) -> 2*Y
is a lambda expression that doubles its input. A lambda expr can be passed as a parameter so long as the type of the formal parameter is a compatible "func" type. So for example, given the above definition of "Apply", you could write:
Apply (lambda(Y)->2*Y, [1, 2, 3])
and expect to get "[2, 4, 6]" as the result (presuming "Apply" does the natural thing).
We now share the lock between a polymorphic object and the non-polymorphic object it contains; we also share the lock between object and its parent part, if any. Handle "continue loop"s that exit blocks. Source code has been restructured to more easily handle new parser using same underlying language semantics, to support "Parython" and other language parallelization efforts.
When a "new" type is declared (type T is new Integer<...>) we allow additional operations to be declared immediately thereafter in the same scope, and have them be visible wherever the type is later used, just as though a new module had been defined as an extension of the old module, and the new operations had been declared inside that new module.
When a "new" type is declared (type T is new Integer<...>) we allow additional operations to be declared immediately thereafter in the same scope, and have them be visible wherever the type is later used, just as though a new module had been defined as an extension of the old module, and the new operations had been declared inside that new module.
When an expression does not resolve, we now provide additional diagnostics to help explain why.
Wednesday, July 3, 2013
ParaSail web site now available
We now have a (non-blog) web site for ParaSail:
http://www.parasail-lang.org
We will use this in the future as the official starting point for reaching resources for ParaSail, and in particular, getting to the latest documentation and downloads. We will also be creating some examples, and ideally an online Read-Eval-Print-Loop (REPL) for trying your own examples. If you have questions or comments about this new website, the Google Group for ParaSail is the best place to post those comments/questions:
http://groups.google.com/group/parasail-programming-language
http://www.parasail-lang.org
We will use this in the future as the official starting point for reaching resources for ParaSail, and in particular, getting to the latest documentation and downloads. We will also be creating some examples, and ideally an online Read-Eval-Print-Loop (REPL) for trying your own examples. If you have questions or comments about this new website, the Google Group for ParaSail is the best place to post those comments/questions:
http://groups.google.com/group/parasail-programming-language
Thursday, April 18, 2013
Systems Programming with Go, Rust, and ParaSail
Here is a paper that goes with the talk I am giving this coming Tuesday, April 23 at the DESIGN West, aka Embedded Systems Conference, in San Jose, on a comparison between Go, Rust, and ParaSail.
If you are in the Bay Area, come on down. The talk is ESC-218, on Tuesday April 23 from 2:00-3:00PM in Salon 3: http://www.ubmdesign.com/sanjose/schedule-builder/session-id/98
NOTE: Some browsers are having trouble with this version of the paper. It was cut and pasted from a Word document, which is always dicey. A PDF version of this paper is available on the ParaSail Google Group:
http://groups.google.com/group/parasail-programming-language
If you are in the Bay Area, come on down. The talk is ESC-218, on Tuesday April 23 from 2:00-3:00PM in Salon 3: http://www.ubmdesign.com/sanjose/schedule-builder/session-id/98
NOTE: Some browsers are having trouble with this version of the paper. It was cut and pasted from a Word document, which is always dicey. A PDF version of this paper is available on the ParaSail Google Group:
http://groups.google.com/group/parasail-programming-language
Systems
Programming in the Distributed, Multicore World with Go, Rust, and ParaSail
S. Tucker Taft
AdaCore
24 Muzzey Street, 3rd Floor
Lexington, MA02421
24 Muzzey Street, 3rd Floor
Lexington, MA02421
taft@adacore.com
Abstract
The distributed, multicore train
is stopping for no programmer, and especially the systems programmer will need
to be ready to hop on to the distributed parallel programming paradigm to keep
their systems running as efficiently as possible on the latest hardware
environments.There are three new
systems programming languages that have appeared in the last few years which
are attempting to provide a safe, productive, and efficient parallel programming
capability.Go is a new language from Google, Rust is a new language from Mozilla Research, and ParaSail is a new language from
AdaCore.This talk will describe the
challenges these languages are trying to address, and the various similar and
differing choices that have been made to solve these challenges.
Keywords multicore, distributed, parallel
programming, systems programming language, Go language, Rust language, ParaSail
language
1. Introduction
The distributed, multicore train is stopping for no
programmer, and especially the systems programmer will need to be ready to hop
on to the distributed parallel programming paradigm to keep their systems
running as efficiently as possible on the latest hardware environments.There are three new systems programming
languages that have appeared in the last few years which are attempting to
provide a safe, productive, and efficient parallel programming capability.Go
is a new language from Google [1], Rust
is a new language from Mozilla [2], and ParaSail
is a new language from AdaCore [3][4].
The designers of Go,
Rust, and ParaSail all are facing a common challenge -- how to help
programmers address the new distributed and multicore architectures, without
having the complexity of programming going past that which is manageable by the
professional, yet still human, programmer.All programming languages evolve, and as a rule, they tend to get more
complex, not less so.If every time a
new hardware architecture becomes important, the programming language is
enhanced to provide better support for that architecture, the language can
become totally unwieldy, even for the best programmers.When the architectures changes radically, as
with the new massively distributed and/or multicore/manycore architectures, this
may mean that the language no longer hangs together at all, and instead has
become a federation of sublanguages, much like a house that has been added onto
repeatedly with a different style for each addition.
Because of the complexity curse associated with language
evolution, when there is a significant shift in the hardware landscape, there
is a strong temptation to start over in programming language design.After many years of a relatively stable
programming language world, we now see a new burst of activity on the language
design front, inspired in large part by the sense that our existing mainstream
languages are either not going to be supportive enough, or that they are
becoming simply too complex in trying to support both the old and new hardware
paradigms through a series of language extensions.
2. Go from Google
Go, Rust, and ParaSail all emerged over the past few years, each with its own
approach to managing complexity while supporting parallelism.Go
from Google is the brain child of Rob Pike and his colleagues at Google.Rob was at Bell Labs in the early Unix and C
days, and in many ways Go inherits the C tradition of simplicity and power.
Unlike C, storage management has been taken over by the Go run-time through a
general purpose garbage collection approach, but like C, care is still needed
in other areas to ensure overall program safety.
From the multicore perspective, Go uses goroutines for
structuring large computations into a set of smaller potentially parallel
computations.Goroutines are easy to
create – essentially any stand-alone call on a function or a method can be
turned into a goroutine by simply prefixing it with the word “go.”Once a
goroutine is spawned, it executes independently of the rest of the code.A goroutine is allowed to outlive the
function that spawns it, thanks to the support of the garbage collector; local
variables of the spawning function will live as long as necessary if they are
visible to the spawned goroutine.
For goroutines to be useful, they need to
communicate their results back to the spawning routine.This is generally done using strongly-typed channels in Go.A channel can be passed as a parameter as
part of spawning a goroutine, and then as the goroutine performs its
computation it can send one or more
results into the channel.Meanwhile, at
some appropriate point after spawning the goroutine, the spawner can attempt to
receive one or more values from the
channel.A channel can be unbuffered,
providing a synchronous communication between sender and receiver, or it can
provide a buffer of a specified size,
effectively creating a message queue.
Communication between goroutines can also go
directly through shared global variables.However, some sort of synchronization through channels or explicit locks
is required to ensure that the shared variables are updated and read in an
appropriate sequence.
Here is an example Go program that counts the number
of words in a string, presuming they are separated by one or more separator characters, dividing
multi-character strings in half and passing them off to goroutines for recursive word counts:
func Word_Count
(s string; separators string) int = {
const slen = len(s)
switch slen {
case 0: return 0 // Empty string
case 1:// single-char string
if strings.ContainsRune
(separators, S[0]) {
return 0// A single separator
} else {
return 1// A single non-separator
}
default:// divide string and recurse
const half_len = slen/2
// Create two chans and two goroutines
var left_sum = make(chan int)
var right_sum = make(chan int)
go func(){left_sum <- span="" word_count="">->
(s[0..half_len], separators)}()
go func() {right_sum <- span="" word_count="">->
(s[half_len..slen],
separators)}()
// Read partial sums
// and adjust total if word was
divided
if strings.ContainsRune
(separators, s[half_len-1]) ||
strings.ContainsRune
(separators, s[half_len]) {
// No adjustment needed
return <-left_sum right_sum="" span="">-left_sum>
} else {// Minus 1 because word
divided
return <-left_sum -="" 1="" right_sum="" span="">-left_sum>
}
}
}
2.1 Unique Features of Go
Go has some unusual features.Whether a declaration is exported is determined by whether or not its name begins with an
upper-case letter (as defined by Unicode); if the declaration is a package-level
declaration or is the declaration of a field or a method, then if the name
starts with an upper-case letter, the declaration is visible outside the
current package.
Every Go source file begins
with a specification of the package
it is defining (possibly only in part).One
source file may import declarations
from another by specifying the path to the file that contains the declarations,
but within the importing code the exported declarations of the imported code
are referenced using the imported file’s package
name, which need not match that of the imported file’s filename.Of course projects would typically establish
standard naming conventions which would align source file names and package
names somehow.
Go provides a reflection
capability, which is used, for example, to convert an object of an arbitrary
type into a human-readable representation.The “%v” format in Go’s version
of printf does this, allowing
arbitrarily complex structs to be written out with something as simple as:
fmt.Printf(“%v”,
my_complex_object)
Printf is implemented in Go
itself, using the “reflect” package.
There are no uninitialized
variables in Go.If a variable is not
explicitly initialized when declared, it is initialized by default to the zero of its type, where each type has an
appropriately-defined zero, typically
either the zero numeric value or the nil
(aka “null”) pointer value, or some composite object with all components having
their appropriate zero value.
2.2 What Go Leaves Out
Because complexity was a
major concern during all three of these language designs, some of the most
important design decisions were about what to leave out of the language.Here we
mention some of the features that Go does not
have.
Go does not permit direct cyclic
dependencies between packages.However,
the Go interface capability permits
the construction ofrecursive control or
data structures that cross packages, because an interface declared in one
package can be implemented by a type declared in another package without either
package being directly dependent on the other.
Like C, Go has no generic
template facility.There are some builtin
type constructors, such as array, map, and chan,
which are effectively parameterized type constructors, but there is no way for
the user to create their own such type constructor.Unlike C, there is no macro facility which
might be used to create something like a parameterized type.Nevertheless, Go’s flexible interface and reflection capabilities allow the creation of complex data
structures that depend only on the presence of a user-provided method such as Hash and the DeepEqual
function of the reflect package.
Go does not allow
user-defined operators.Various
operators are built in for the built-in types, such as int and float32.Interestingly enough, Go does include
built-in complex types (complex64 and complex128) with appropriate operators.
Go does not have
exceptions.However, functions and
methodsmay return multiple results, and
often errors are represented by having a second return value called error that is non-nil on error.Unlike in C, you cannot ignore such an extra
parameter; unless you explicitly assign it to an object of name “_”.When things go really wrong in Go, a run-time
panic ensues, and presumably during
development, you are tossed into a debugger.
3. Rust from Mozilla Research
The modern web browser represents one of the most
complex and critical pieces of software of the internet era.The browser is also a place where performance
is critical, and there are many opportunities for using parallelism as a web
page is “rendered.”The Rust language arose originally as a
personal project by one of the engineers at Mozilla Research (Graydon Hoare), and
has grown now into a Mozilla-sponsored research effort.Rust has been designed to help address the
complexity of building components of a modern browser-centric software
infrastructure, in the context of the new distributed multicore hardware
environment.
Like Go, Rust has chosen to
simplify storage management by building garbage collection into the
language.Unlike Go, Rust has chosen to
restrict garbage collection to per-task heaps, and adopt a unique ownership policy
for data that can be exchanged between tasks.What this means is that data that can be shared between tasks is visible
to only one of them at a time, and only through a single pointer at a time
(hence an owning pointer).This eliminates the possibility of data races
between tasks, and eliminates the need for a garbage collector for this global
data exchange heap.When an owning
pointer is discarded, the storage designated by the pointer may be immediately
reclaimed – so no garbage accumulates in the global exchange heap.
Here is a Rust version of the
Word Count program, recursing on multi-character strings with subtasks
encapsulated as futures computing the
subtotals of each string slice:
fn Word_Count
(S :
&str; Separators : &str) -> uint {
let Len =
S.len();
match Len
{
0 => return 0; // Empty string
1 =>
return // one-char string
if
Separators.contains(S[0]) { 0 }
else
{ 1 };// 0 or 1 words
_
=>// Divide and recurse
let
Half_Len = Len/2;
let
Left_Sum = future::spawn {
||
Word_Count(S.slice
(0, Half_Len-1), Separators)};
let
Right_Sum = future::spawn {
||
Word_Count(S.slice
(Half_Len, Len-1), Separators)};
// Adjust sum if a word is divided
if
Separators.contains(S[Half_Len]) ||
Separators.contains(S[Half_Len+1]) {
//
No adjustment needed
return
Left_Sum.get()
+ Right_Sum.get();
} else {
//
Subtract one because word divided
return
Left_Sum.get()+Right_Sum.get() – 1;
}
}
}
Rust does not have special syntax
for spawning a “task” (Rust’s equivalent of a “goroutine”) nor declaring the
equivalent of a “channel,” but instead relies on its generic template facility and a run-time library ofthreading and synchronization
capabilities.In the above example, we
illustrate the use of futures which
are essentially a combination of a task and an unbuffered channel used to
capture the result of a computation.There are several other mechanisms for spawning and coordinating tasks,
but they all depend on the basic tasking model, as mentioned above, where each
task has its own garbage-collected heap for task-local computation (manipulated
by what Rust calls managed pointers),
plus access via owning pointers to
data that can be shared between tasks (by sending an owning pointer in a
message).
3.1 Rust Memory Management Performance
One of the major advantages
of the Rust approach to memory
management is that garbage collection is local to a single task.By contrast, in Go each garbage collector thread has to operate on data that is potentially
visible to all goroutines, requiring
a garbage collection algorithm that synchronizes properly with all of the
active goroutines, as well as with any other concurrent garbage collector
threads (presuming garbage collection itself needs to take advantage of
parallel processing to keep up with multithreaded garbage generation).
In Rust, a conventional single-threaded garbage collector algorithm
is adequate, because any given garbage collector is working on a single
per-task heap.Furthermore, storage
visible via owning pointers needs no garbage collection at all, as releasing an
owning pointer means that the associated storage can also be released
immediately.
3.2 The Costs of Safety and
Performance
One of the downsides of added safety and performance
can be added complexity.As we see, Rust
has added safety by allowing access to sharable data only via pointers that
give exclusive access to one task at a time, and added performance because
garbage collection is single-threaded.However, as a result, Rust needs several kinds of pointers.In fact, there are four kinds of pointers in
Rust, managed pointers (identified by
‘@’as a prefix on a type) for per-task garbage-collected storage, owning pointers (identified by ‘~’) for
data that is sharable between tasks, borrowed
pointers (identified by ‘&’) that can temporarily refer to either per-task
or sharable data, and raw pointers
(identified by ‘*’), analogous to typical C pointers, with no guarantees of
safety.
4. ParaSail
from AdaCore
The ParaSail language from AdaCore takes support for parallelism one
step further than Go or Rust, by treating all
expression evaluation in the language as implicitly
parallel, while also embracing full type and data-race safety.Rather than adding complexity to accomplish
this, the explicit goal for ParaSail was to achieve safety and pervasive
parallelism by simplifying the language,
eliminating impediments to parallelism by eliminating many of the features that
make safe, implicit parallelism harder.
4.1 What ParaSail Leaves Out
Some of the features left out of ParaSail include the following:
·No pointers
·No global variables
·No run-time exception handling
·No explicit threads, no explicit locking nor
signaling
·No explicit heap, no garbage collection needed
·No parameter aliasing
So what is left?ParaSail provides a familiar
class-and-interface-based object-oriented programming model, with mutable
objects and assignment statements.But
ParaSail also supports a highly functional style of programming, aided by the
lack of global variables, where the only variable data visible to a function is
via parameters explicitly specified as var
parameters.This means that the
side-effects of a function are fully captured by its parameter profile, which
together with the lack of parameter aliasing allows the compiler to verify
easily whether two computations can safely be performed in parallel.
By design, every expression
in ParaSail can be evaluated in parallel.If two parts of the same expression might conflict, the expression is
not a legal ParaSail expression.The net
effect is that the compiler can choose where and when to insert parallelism
strictly based on what makes the most sense from a performance point of view. Here, for example, is the Word Count example
in ParaSail, where parallelism is implicit in the recursive calls on Word Count,
without any explicit action by the programmer:
func
Word_Count
(S : Univ_String;
Separators :
Countable_Set :=
[' '])
-> Univ_Integer is
case |S| of
[0] => return 0// Empty string
[1] =>
if S[1] in Separators then
return 0// No words
else
return 1// One word
end if
[..] =>// Divide and recurse
const Half_Len := |S|/2
const Sum := Word_Count
(S[1 .. Half_Len], Separators) +
Word_Count
(S[Half_Len <.. |S|], Separators)
if S[Half_Len] in Separators or else
S[Half_Len+1] in Separators then
return Sum // No adjustment needed
else
return Sum-1// Adjust sum
end if
end case
end func Word_Count
Although there is no explicit
use of a parallel construct, the sum of the two recursive calls on Word_Count
can be evaluated in parallel, with the compiler automatically creating a picothread for each recursive call,
waiting for their completion, and then summing the results, without the
programmer having to add explicit directives.
4.2 Implicit and Explicit Parallelism
in ParaSail
Explicit parallelism may be
specified if desired in ParaSail, or the programmer can simply rely on the
compiler to insert it where it makes the most sense.The general philosophy is that the semantics
are parallel by default, and the programmer needs to work a bit harder if they
want to force sequential semantics.For
example, statements in ParaSail can be separated as usual with “;” (which is implicit at the end of the line
when appropriate), or by “||” if the
programmer wants to request explicit parallelism, or by “then” if the programmer wants to disallow
implicit parallelism.By default the
compiler will evaluate two statements in parallel if there are no data
dependences between them.
As another example of
ParaSail’s implicit and explicit parallelism, the iterations of “for I in 1..10 loop” are by default executed
in any order, including parallel if there are no data dependences between the
loop iterations, while “for I in 1..10
forward loop” or “for I in 1..10
reverse loop” may be specified to prevent parallel evaluation, and “for I in 1..10 concurrent loop” may be used
to specify that parallel evaluation is desired, and it is an error if there are
any data dependences between the iterations.In all these cases, the compiler will ensure that any parallel
evaluation is safe and data-race free, and will complain if there are potential
race conditions when parallel evaluation semantics are specified.
4.3 Simplicity breeds Simplicity in
ParaSail
There is somewhat of a virtuous cycle that occurs when a programming
language is simplified, in that one simplification can lead to another.By eliminating pointers and a global heap
from ParaSail, the language can provide fully automatic storage management
without the need for a garbage collector.Objects in ParaSail have value
semantics meaning that assignment copies the value of the right-hand side
into the left-hand side, with no sharing of data.A built-in move operation is provided for moving the value of the right-hand
side into the left-hand side, along with a swap
for swapping values, thereby reducing the cost of copying while still
preserving value semantics.
Every type in ParaSail has an
extra value, called null, which is
used to represent an empty or otherwise uninitialized value.An object or component may have a null value of its type T only if it is
declared to be “optional T”.Optional components may be used to implement
trees and linked lists, without the use of explicit pointers, and without the
potential sharing issues associated with pointers, even if behind the scenes
the compiler uses pointers to implement optional objects or components.The availability of optional components
effectively allows an object to grow
and shrink, meaning that dynamic
structures like hash tables, and higher level notions such as maps and sets,
can be implemented in ParaSail without any explicit pointers, with the
advantages of purely value semantics.
The lack of explicit pointers
means that all objects in ParaSail effectively live on the stack, even though they may still grow and shrink.Each scope has its own region, effectively a local heap that expands and contracts to hold
the objects associated with the scope.No garbage collector is needed because when an object goes out of scope,
or an object or one of its component is set back to null, the associated storage may be immediately reclaimed, much
like storage designated by owning
pointers in Rust.By simplifying the
type model, the storage management in ParaSail is dramatically simpler and of
higher performance, with no complex parallel garbage collection algorithm
required.
5. Implementation
Status and Conclusions
The programming language
design world has been rejuvenated by the new challenges of distributed and
multicore architectures.Three new
programming languages designed for building industrial-strength systems have
emerged, Go, Rust, and ParaSail.Each of these languages tries to make
parallel programming simpler and safer, while still providing the level of
power and performance needed for the most critical systems development
tasks.
From an implementation status
point of view, Go is the most stable of these three new languages, with two compilers
available, and a number of systems built with Go now being deployed.Rust and ParaSail are still under
development, but both are available for trial use, with Rust having early on
achieved the compiler boot strap milestone,
where the Rust compiler is itself implemented in Rust.All three languages provide opportunities for
the professional programmer to expand their horizons, and see what sort of new
paradigms and idioms will become more important as we leave behind the era of
simple sequential processing, and enter the era of massively parallel, widely
distributed computing.
References
[2]The
Rust Programming Language, Mozilla Research, http://www.rust-lang.org
[3]ParaSail:
Less is More with Multicore, S. Tucker Taft, http://www.embedded.com/design/other/4375616/ParaSail--Less-is-more-with-multicore
(retrieved 4/1/2013).
[4]Designing
ParaSail: A New Programming Language, S. Tucker Taft, http://parasail-programming-language.blogspot.com
Subscribe to:
Comments (Atom)