If a challenge says that input will be in the form of a string, what is acceptable?
Various languages have different ways of implementing strings, so here's what I've "borrowed" from Wikipedia:
In computer programming, a string is traditionally a sequence of characters, either as a literal constant or as some kind of variable.
The reason I ask is that my most frequently used language here is Java. Of course, Java has a String
class, and that's what I've been using. However, I've run into a couple situations where a char[]
would be better.
In my opinion, while a char[]
is not a String
, it is a "string".
Say I have a challenge that reads:
Write a function that takes a string (printable ASCII) as input, adds 1 to each character, and returns it (don't worry about overflow)
While this is a trivial example, String
is crap here.
- It's not mutable, so I need a new one to return.
- To access a character you have to use
.charAt(i)
instead of[i]
.length()
is two bytes longer than.length
- Once you add, you have to cast it back to
char
or it converts to an integer:
An example of both types:
String a(String a){String b="";for(int i=0;i<a.length();b+=(char)(a.charAt(i++)+1));return b;}
char[]a(char[]a){for(int i=0;i<a.length;a[i++]++);return a;}
I haven't looked much to see if I can reduce either of these, but the point remains; there's no way the String
version is coming out ahead.
Of course, in some situations String
has the upper hand. If you need indexOf()
, trim()
, or easy conversion from literals, you'd want that.
Since both a String
and a char[]
are strings by most reasonable definitions of the word, I believe it shouldn't matter which is used. In addition, having to choose between the two makes for a different golf process than sticking to one or the other.
Don't get me wrong. I'm not saying that I've seen any argument/debate about this on the site, but I've personally held back from using char[]
where it asks for a string, and I just got around to wondering why. Either way, I think it would make a good Standard Definition, and I'd like some input.
5 Answers 5
I agree with the wikipedia definition. It's a sequence of characters. As the name suggests, it's one-dimensional. I program a lot in C and sometimes in Pascal, which both implement strings in different ways.
C doesn't have a string type, only char[]
with a string by convention being terminated by a zero byte.
Original Pascal has a String
type which has a maximum length of 255, because the length of s[]
is stored in s[0].
More recently more advanced versions of these languages have been released, with more advanced string types. I frequently see C++ answers with both std::string
and char[]
and I see no problem with that.
Brainfuck and assembly language have no types, yet clearly both can handle "strings" according to the wikipedia definition. So I would define it as a sequence of characters and leave it at that.
What I do think question posters need to define, is what is a character, i.e. whether only ASCII or full unicode can be expected as input. I can think of one recent example where a poster stated that input would be ASCII only, but included a Euro sign in a test case (he later changed it to a dollar sign.) This is perhaps even more important for defining the winning criterion. Where questions specify that program length is by characters, it is good to see that many posters are now linking pages like https://mothereff.in/byte-counter in their questions, which removes any ambiguity.
-
1\$\begingroup\$ Careful, C++ doesn't have a
String
class, it has astd::string
class. \$\endgroup\$Mooing Duck– Mooing Duck2014年10月04日 00:49:06 +00:00Commented Oct 4, 2014 at 0:49 -
\$\begingroup\$
Pascal has a String type which has a maximum length of 255
What? Can't Pascal count higher than 8 bits?? \$\endgroup\$cat– cat2016年04月23日 01:41:35 +00:00Commented Apr 23, 2016 at 1:41 -
1\$\begingroup\$ @cat as explained in above, in Pascal the string was at the machine level an array of bytes, and the length of the string is stored in the first byte s[0]. Back in those days 1 character = 1 byte. The Pascal system had the disadvantage that string length was limited to 255. C on the other hand had no length limit on strings, but had the disadvantage that a string could not contain a byte of value
0
as this marked the end of the string. Also, findng the length of the string was less computationally expensive in Pascal than C. Since then Pascal has expanded and C has evolved into C++ and C# \$\endgroup\$Level River St– Level River St2016年04月23日 09:09:40 +00:00Commented Apr 23, 2016 at 9:09 -
\$\begingroup\$ I've only just realised (from codegolf.stackexchange.com/a/246190/53748) that this is often being used to allow a list of single character strings in languages that do have a string type (Python, Jelly, O5AB1E, etc.). I'm not sure if this was the intention, should we update this post to allow that (a retrospective fit), add a new answer, or start to disallow it under this answer? (Bearing in mind that this already has +24/-0 of which we do not know how many considered this fact.) \$\endgroup\$Jonathan Allan– Jonathan Allan2022年04月13日 12:07:52 +00:00Commented Apr 13, 2022 at 12:07
-
\$\begingroup\$ ...actually not Jelly - in Jelly a string type does not exist, only lists of characters (except for when a long-standing, and sometimes used, interpreter bug with multiplication forces some to exist). \$\endgroup\$Jonathan Allan– Jonathan Allan2022年04月13日 12:17:05 +00:00Commented Apr 13, 2022 at 12:17
-
\$\begingroup\$ Redirecting observers to vote on: codegolf.meta.stackexchange.com/a/8963/53748 \$\endgroup\$Jonathan Allan– Jonathan Allan2022年04月13日 20:02:32 +00:00Commented Apr 13, 2022 at 20:02
-
\$\begingroup\$ For the record: Pascal as published in the year 1970 (was never named "Original Pascal") does not have a
string
data type. You can specify string literals as arguments towrite
/writeLn
but that’s about it. This was one of the early points of criticism. It was Turbo Pascal, a dialect of Pascal, that introduced thestring
data type you describe. Borland created various otherstring
data types yet never adopted the ISO standard 10206string
schema. \$\endgroup\$Kai Burghardt– Kai Burghardt2023年09月09日 18:00:35 +00:00Commented Sep 9, 2023 at 18:00
If it looks like a string and acts like a string, it's a string
Examples of strings:
- Native string types: C++
string
s, JavaString
s, JavaScriptString
s, etc. - Native character types: C/C++
char
s andwchar
s, Javachar
s andCharacter
s, etc. - An iterable (tuple, array, vector, list, etc.) of characters or length-1 strings: C/C++/Java/etc.
char[]
s, a list of length-1 strings in Python ([x for x in s]
), etc. - An iterable of byte values (between 0 and either 127 or 255, inclusive, depending on whether or not extended ASCII is supported): Python 3's
bytes
type, brainfuck's tape (when given ASCII input) - An iterable of integers representing Unicode code points (SWI-Prolog does this)
-
\$\begingroup\$ I don't see the argument that a Java
char
"looks like a string and acts like a string". Can you elaborate? \$\endgroup\$Peter Taylor– Peter Taylor2016年04月13日 22:02:08 +00:00Commented Apr 13, 2016 at 22:02 -
\$\begingroup\$ @PeterTaylor It stores text data. It can only store a single character, but that's not much different from a
char[1]
or a length-1 string. \$\endgroup\$user45941– user459412016年04月13日 22:03:21 +00:00Commented Apr 13, 2016 at 22:03 -
\$\begingroup\$ Regarding your last point, in SWI-Prolog code strings have this behavior:
A = `−`.
(minus symbol) returnsA = [8722]
\$\endgroup\$Fatalize– Fatalize2016年04月14日 10:05:01 +00:00Commented Apr 14, 2016 at 10:05 -
\$\begingroup\$ @Fatalize Ah, I knew I had seen a language that did that, but couldn't remember what it was. \$\endgroup\$user45941– user459412016年04月14日 19:09:24 +00:00Commented Apr 14, 2016 at 19:09
-
\$\begingroup\$ I think if a question says to take a string as input, it needs to use the language's default representation or type. So a "string" in Python needs to be such that it could be preceded by a print statement and give the correct output without an additional slice or join like an array would need. \$\endgroup\$mbomb007– mbomb0072016年04月15日 21:31:42 +00:00Commented Apr 15, 2016 at 21:31
-
\$\begingroup\$ In C#, can I output a
IEnumerable<char>
(a sequence of chars) when there is an explicitString
class? I think that I shouldn't but this definition will allow it. \$\endgroup\$aloisdg– aloisdg2016年07月06日 22:20:38 +00:00Commented Jul 6, 2016 at 22:20 -
\$\begingroup\$ How do you feel about generators that produce characters or length-1 strings? Would this be a [potentially infinite length] string? \$\endgroup\$Sparr– Sparr2017年12月19日 00:21:09 +00:00Commented Dec 19, 2017 at 0:21
-
\$\begingroup\$ @Sparr Why wouldn’t it be? It looks like a string and quacks like a string. \$\endgroup\$user45941– user459412017年12月19日 00:51:28 +00:00Commented Dec 19, 2017 at 0:51
-
\$\begingroup\$ @Mego the potentially infinite length, and the lack of a way to get its length, and the lack of O(1) indexing, might make it enough un-string-like enough? \$\endgroup\$Sparr– Sparr2017年12月25日 21:19:26 +00:00Commented Dec 25, 2017 at 21:19
-
\$\begingroup\$ @Sparr BF doesn’t have any of those things. Is a sequence of bytes on BF’s tape not a string? \$\endgroup\$user45941– user459412017年12月25日 21:31:25 +00:00Commented Dec 25, 2017 at 21:31
-
\$\begingroup\$ Good question. I'd have to survey a bunch of BF questions... I recall seeing many where string input was expected to specifically come from stdin, and string output to stdout, never referring to the original or final state of the tape. \$\endgroup\$Sparr– Sparr2017年12月27日 07:05:41 +00:00Commented Dec 27, 2017 at 7:05
A string is also a list of single character strings. For example:
["H", "e", "l", "l", "o", ",", " ", "w", "o", "r", "l", "d", "!"]
While technically different, this is not thematically different from a list of characters. Therefore I believe agrees with the current consensus' conclusion that the Wikipedia definition of "a sequence of characters". I don't believe this format encodes any more information than a plain string does.
Sometimes certain operations in higher level languages are more convenient when performed on a list in this manner, and I don't see why we should automatically limit some of the opportunities that people can have for golfing or otherwise improving their score.
You're over-thinking the problem. Unless the challenge specifically states that you must use the native "string" type of the language, you get to interpret what "string" means in a way that's most advantageous to your implementation.
-
2\$\begingroup\$ This was my first instinct, too. However, I've seen several cases where people ask about integers (32/64/arbitrary) or what counts as truthy, so I thought there may be a better interpretation that "whatever you want". \$\endgroup\$Geobits– Geobits2014年09月23日 13:50:18 +00:00Commented Sep 23, 2014 at 13:50
If a language makes a distinction between char[]
and String
, I think a good question to ask is "Does a char[] print like a string?" If it is possible to print a char[]
so that it looks like String
, then you can use it. Many questions give the option of printing or returning the output, so I think the returned output should look similar to the printed output if it were printed.
Let's say that the challenge allowed you to write a function that returns a string. Java, for example, has both a String
and a char[]
. If you use System.out.println()
to print the char[]
, then it is formatted and looks the same as if you had printed a String
(Ideone example). This means that it will be acceptable for the method to return either a String
or a char[]
.
Here is some psuedocode to demonstrate how this can be tested. You are allowed to modify the print
method, such as using println
, but you can't modify the golfed method's output before it is fed into the print.
class tester{
main(n){
print(f(n));
}
char[] f(n){
//golf code here
}
}
If there is a language that does not allow a char[]
to be easily printed, meaning that it always looks like [a, b, c]
or ['a', 'b', 'c']
regardless of the type of printing method you use, then I don't think char[]
is an acceptable substitute.
-
\$\begingroup\$ The Java example is questionable. Compare what happens without relying on overloads. \$\endgroup\$Peter Taylor– Peter Taylor2016年04月13日 22:00:35 +00:00Commented Apr 13, 2016 at 22:00
char[]
with.toCharArray()
and then convert it back withnew String(carray)
? \$\endgroup\$