Maybe this is old news to some of you, but I was quite surprised when I discovered this.
The essential problem is that gensyms created in arc are not unique symbols, they are merely created from a string prefix and a counter, and if someone happened to use a symbol with the same name as the printed representation of the gensym, there would be a name conflict.
(define (ar-gensym)
(set! ar-gensym-count (+ ar-gensym-count 1))
(string->symbol (string-append "gs" (number->string ar-gensym-count))))
In arc, if gensyms were truly unique, the last part of the following would fail. arc> (= g (uniq))
gs1767
arc> (eval `(= ,g "foo"))
"foo"
arc> (eval g)
"foo"
arc> gs1767 ; this would result in an error if g was not 'gs1767
"foo"
arc> (= gs1767 "bar")
"bar"
arc> gs1767
"bar"
arc> (eval g) ; this would still be "foo" if g was a unique symbol
"bar"
This behavior isn't surprising considering that uniq just increments a counter and creates a new symbol from the resulting string. (And a comment above the definition admits the current implementation is something of a joke.)This problem does not occur in mzscheme; symbols created with gensym, and symbols that happen to have the same name, are distinct and separate.
> (define g (gensym))
> g
g2
> (eval (list 'define g "foo"))
> (eval g)
"foo"
> g2
reference to undefined identifier: g2
> (define g2 "bar")
> g2
"bar"
> (eval g)
"foo"
Changing the definition of 'ar-gensym in ac.scm to the following seems to fix half the problem: (define ar-gensym gensym)
The part fixed is for local variables declared as function parameters (and therefore 'let and 'with). However, for global variables, the problem persists because of arc's explicit addition of a prefix, which results a symbols that does not act as a gensym. (define (ac-global-name s)
(string->symbol (string-append "_" (symbol->string s))))
I believe that global gensyms will work properly if we simply don't attempt to add the prefix. I don't know of any way to definitively determine if a symbol is a gensym, but my best guess at this point is to convert the symbol to a string and back again, and if the two symbols are equal, then the original is an interned symbol (and thus not a gensym). The following change to 'ac-global-name seems to fix the behavior of the global gensym test case above. (define (ac-global-name s)
(if (equal? s (string->symbol (symbol->string s)))
(string->symbol (string-append "__" (symbol->string s)))
s))
So I guess my question is, is this (assuming the last hack is correct) the right way to solve the gensym issue? If so, how do you determine if a symbol is a gensym or not? (Is there a better way to determine if a symbol has been interned?) Are there any other reasons why pg didn't want to use mzscheme's 'gensym?If people like the idea of using mzscheme's gensyms, and if they think this hack is good enough, I'll put it up on Anarki.
Also, I hope pg reads this and either uses my fix (or fixes it himself), because I really would like working gensyms in the official distro.
By that line of thinking, almost everything in arc is broken, from macros to equality checks (what, no EQL support?) , because arc is always willing to accept a risk of rare bugs/inefficiencies in exchange for simplicity.
-----
Perhaps the current design is slightly simpler (but not by much, because my changes remove about as much code as they add). And personally, I think it is more unintuitive to think that gensyms are converted to globals and interned into the main arc namespace, than just left alone when assigned in a global context.
But either way, the whole point of gensyms in the first place is to protect from unintended variable capture, right? I don't really see how a design which basically fails to do that can be considered acceptable. Especially when every other Lisp I have ever used ensures somehow that gensyms do not interfere with other variables.
Much of what might be called broken in arc is either just broken until someone gets around to fixing it (like first class macros), or intentionally designed that way and not broken at all (like unhygienic macros). The former will presumably be fixed at some point, and the latter, well, isn't as "broken" as people claim it to be (ask kennytilton for figures on how many macros he has written).
Gensyms are the former, as indicated by the comment above the implementation of ar-gensym. On the other hand, one might argue that gensyms are the latter because unintended variable capture doesn't occur very often. I won't directly argue against this, but consider that CL uses unhygenic macros, yet goes to the trouble of ensuring safe gensyms. (In fact, it can party get away with unhygenic macros because it ensures unique gensyms.)
Also, I think there is a matter of elegance involved in an issue's classification as "to be fixed" or "not really broken". I can't conclusively prove that gensyms increase the elegance of code that uses them, it is just a gut feeling I have. And likewise I feel unhygienic macros decrease elegance (but this may be slightly more arbitrary, since I have never actually written a hygienic macro).
There is always an issue of risk vs. gain, but in this case, I consider the gain to be greater than the risk/cost. Your opinion may differ.
P.S. I think it would settle things pretty decisively if pg told us what he intends to do with gensyms.... pg?
-----
In practice, I wouldn't believe to be that much of a problem. Who would want to mimic the abominable symbols generated by arc with the current methods anyway, intentionally or otherwise?
-----
(define (ar-gensym)
(set! ar-gensym-count (+ ar-gensym-count 1))
(string->uninterned-symbol (string-append "gs" (number->string ar-gensym-count))))-----
-----
But doing all of that is equivalent to just creating gensyms as done now, with a `__' prefix -- so if you really want that last property then nothing needs to be changed (perhaps only the `gs' prefix)...
-----
Looking at my code I realize it is inconsistent with respect to underscores, but that was because I did testing on both official and Anarki, and forgot to check the code I pasted into the submission.
I'm not sure if that has anything to do with your comment about three underscores or not. (I'd appreciate an explanation either way.)
-----
Anarki's FFI is based on mzscheme's (of course), which itself imports symbols beginning with an underscore (_int, _pointer, _string, ...) These can clash with Arc's names (and they actually do, the string function for example). To overcome this issue, I added an underscore to Arc's names in Anarki (Arc's string is now mzscheme's __string).
Reading the original ac-global-name code and the corrected one I saw yours had one more underscore. I thought it was part of the correction, but it seems you just give arc2's original ac-global-name and Anarki's corrected code. Hence the mistake. Sorry for that noise :)
-----