10
\$\begingroup\$

Let's say I have a plain text string that I want to split into a string array, using an integer array containing the indexes of this string where I want it to be cut.

Example:

Input string: "001ABCD2T"

Indexes array: {3, 7, 8}

Expected resulting string array: {"001", "ABCD", "2", "T"}

Here's how I do it for now:

string line = "001ABCD2T";
int[] markers = new int[] {3, 7, 8};
for (int i = -1; i < markers.Length; i++)
{
 string value = String.Empty;
 if (i == -1)
 value = line.Substring(0, markers[0]);
 else if (i == markers.Length - 1)
 value = line.Substring(markers[i], line.Length - markers[i]);
 else
 value = line.Substring(markers[i], markers[i + 1] - markers[i]);
 Console.WriteLine(value);
}

The result is as expected, yet it is pretty eye-soring. Is there a cleaner way to write it, or even better, an already implemented method I didn't find?

asked Sep 3, 2014 at 13:47
\$\endgroup\$

5 Answers 5

10
\$\begingroup\$

Disclaimer: I'm not actually very familiar with C#. I'll also refrain from the style review and let someone more familiar with common practices take a crack.

You can prepend 0 and append line.Length to your array, simplifying the logic. There's probably a better way, but I got it working with a List:

List<int> list = new List<int>();
list.Add(0);
list.AddRange(markers);
list.Add(line.Length);
for (int i = 0; i < list.Count - 1; i++)
{
 string value = line.Substring(list[i], list[i + 1] - list[i]);
 Console.WriteLine(value);
}
answered Sep 3, 2014 at 14:44
\$\endgroup\$
3
  • \$\begingroup\$ Eh, that's an idea to start with, I like that. I'm waiting a bit for some other answers before accepting, but it already simplifies the code a lot. \$\endgroup\$ Commented Sep 3, 2014 at 14:53
  • 2
    \$\begingroup\$ @Kilazur No pressure to accept. In fact, accepting now may even discourage some people from writing more reviews, so by all means, leave it open! \$\endgroup\$ Commented Sep 3, 2014 at 14:58
  • \$\begingroup\$ No worries about that :p \$\endgroup\$ Commented Sep 3, 2014 at 18:35
9
\$\begingroup\$

I would write an extension method:

public static class StringExtensions
{
 public static IEnumerable<string> SplitByIndex(this string @string, params int[] indexes)
 {
 var previousIndex = 0;
 foreach (var index in indexes.OrderBy(i => i))
 {
 yield return @string.Substring(previousIndex, index - previousIndex);
 previousIndex = index;
 }
 yield return @string.Substring(previousIndex);
 }
}

And use it like this:

foreach (var part in "001ABCD2T".SplitByIndex(3, 7, 8))
{
 Console.WriteLine(part);
}

Explanation:

It's better to encapsulate this splitting algorithm in a method so we can re-use the same code later from another place. I like to use extension methods for general-purpose methods like this. I usually implement such functionality in some common assembly in a separate namespace (for example, MyCoolProject.Extensions.String). This way, anyone who needs these extension methods can simply use the corresponding namespace in his code, and others will not be troubled by unnecessary methods.

As for the algorithm itself, it does not change our index array and it yields only one substring at a time. Plus, it's a bit more readable in my opinion because we do not use indexes to access index array; we're just iterating through it.

Jamal
35.2k13 gold badges134 silver badges238 bronze badges
answered Sep 3, 2014 at 19:36
\$\endgroup\$
10
  • 1
    \$\begingroup\$ alright, why would you do that? \$\endgroup\$ Commented Sep 3, 2014 at 20:08
  • 1
    \$\begingroup\$ Given the context provided there is no reason to use an extension method. \$\endgroup\$ Commented Sep 3, 2014 at 20:48
  • 1
    \$\begingroup\$ Would this answer constitute a reasonable checkin log message? From @200_Success's answer \$\endgroup\$ Commented Sep 3, 2014 at 21:12
  • \$\begingroup\$ @Malachi Well, if the OP's code is written at one place and needs to be moved so it can be reused, it could be an useful answer I think. But by itself, no I don't think it is a reasonable check in log message. \$\endgroup\$ Commented Sep 3, 2014 at 23:46
  • \$\begingroup\$ Why the indexes.OrderBy(i => i)? With that I'd expect the following to result in exactly the same: "001ABCD2T".SplitByIndex(3, 7, 8), "001ABCD2T".SplitByIndex(8, 3, 7), "001ABCD2T".SplitByIndex(7, 8, 3); which is probably not how it should work. \$\endgroup\$ Commented Sep 4, 2014 at 6:25
3
\$\begingroup\$

Assuming a forbidden token (cannot exist in the input string) is alright, here's a goofy idea idea using String.Split().

string line = "001ABCD2T";
int[] markers = { 3, 7, 8 };
const string tokenToSplitBy = "~";
int insertionCount = 0;
foreach (int index in markers)
 line = line.Insert(index + insertionCount++, tokenToSplitBy);
string[] resultArray = line.Split(new[]{tokenToSplitBy}, StringSplitOptions.RemoveEmptyEntries);
foreach (string result in resultArray)
 Console.WriteLine(result);
answered Sep 4, 2014 at 2:57
\$\endgroup\$
6
  • \$\begingroup\$ Fancy and simple, dunno why I didn't think about that. Yet the performance will drop if I have to work with thousands of lines with great length :p But clever overall. \$\endgroup\$ Commented Sep 4, 2014 at 8:17
  • \$\begingroup\$ Agreed ... but the "something cleaner" challenge part of the question sort of de-prioritized the performance angle a bit. \$\endgroup\$ Commented Sep 4, 2014 at 10:14
  • \$\begingroup\$ Absolutely right, hence the appreciation; yet I hope you understand I will prefer an answer that fix my initial issue AND is optimization-friendly. \$\endgroup\$ Commented Sep 4, 2014 at 10:18
  • \$\begingroup\$ This way is really cute and I love it! Maybe I'll ask a candidate to explain what it does in a technical interview... :-) \$\endgroup\$ Commented Sep 4, 2014 at 15:03
  • \$\begingroup\$ Note that from a theoretical standpoint, this is going to be very inefficient (something like \$O(n^2)\,ドル when the original solution is \$O(n)\$). But I don't know whether this will affect actual performance, especially for small inputs. \$\endgroup\$ Commented Sep 11, 2014 at 18:13
2
\$\begingroup\$

You could do this without rebuilding the index array or any conditionals. Simply handle the first substring and the last substring outside of the loop:

string line = "001ABCD2T";
int[] markers = new int[] { 3, 7, 8 };
List<string> output = new List<string>{line.Substring(0, markers[0])};;
int i = 0;
int limit = markers.Length - 1;
for (; i < limit; i++)
{
 output.Add(line.Substring(markers[i], markers[i + 1] - markers[i]));
}
output.Add(line.Substring(markers[i], line.Length - markers[i]));
answered Sep 3, 2014 at 18:34
\$\endgroup\$
2
\$\begingroup\$

Whenever I see multiple calls to same method in if-else (or switch-case) sequence, I think about using few variables to make it single call. Conditional/ternary operator ?: may as well be used instead of if-else (but some would not see that as readability enhancement).

string line = "001ABCD2T";
int[] markers = new int[] {3, 7, 8};
for (int i = -1, m = markers.Legth-1; i < markers.Length; i++)
{
 int from = i < 0 ? 0 : markers[i];
 int to = i == m ? line.Length : markers[i + 1];
 string value = line.Substring(from, to-from);
 Console.WriteLine(value);
}

We can further think about rewriting it by remembering indexes from previous iteration:

for (int i = 0, from = 0; i <= markers.Length; i++)
{
 int to = i == markers.Length ? line.Length : markers[i];
 string value = line.Substring(from, to-from);
 from = to;
 Console.WriteLine(value);
}
answered Sep 3, 2014 at 15:03
\$\endgroup\$
5
  • \$\begingroup\$ Okay, what are we doing here and why? \$\endgroup\$ Commented Sep 3, 2014 at 20:49
  • \$\begingroup\$ The code should be clear enough for what, if you understand ternary operator. Why? It extractes the proper indexes (from and to) to use single call to Substring. \$\endgroup\$ Commented Sep 3, 2014 at 21:03
  • 1
    \$\begingroup\$ Would this answer constitute a reasonable checkin log message? From @200_Success's answer \$\endgroup\$ Commented Sep 3, 2014 at 21:12
  • \$\begingroup\$ Given the simplicity of the question, I don't see how you can even think about that meta-post. I would never think about changing such code, because it works and is not slow. So, in short, I really don't see your point. \$\endgroup\$ Commented Sep 3, 2014 at 21:26
  • \$\begingroup\$ The quality of answers on our site matter. further discussion should be had in The Second Monitor \$\endgroup\$ Commented Sep 3, 2014 at 21:29

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.