I'm trying to pass a programming challenge where we are to replace all instances of a substring in a string until there are none left. So an input like
Pibabakekezza bake
Has output
Pizza
And if there is nothing left say so. I fail an unknown test case because I take longer than 1 second.
//Rextester.Program.Main is the entry point for your code. Don't change it.
//Compiler version 4.0.30319.17929 for Microsoft (R) .NET Framework 4.5
using System;
using System.Runtime;
using System.Collections.Generic;
using System.Linq;
using System.Text.RegularExpressions;
using System.Runtime.CompilerServices;
namespace Rextester
{
public class Program
{
public static void Main(string[] args)
{
GCLatencyMode oldMode = GCSettings.LatencyMode;
// Make sure we can always go to the catch block,
// so we can set the latency mode back to `oldMode`
RuntimeHelpers.PrepareConstrainedRegions();
try
{
GCSettings.LatencyMode = GCLatencyMode.LowLatency;
//Your code goes here
string s =Console.ReadLine();
string b =Console.ReadLine();
while(s.Contains(b))
{
s=s.Replace(b, "");
}
if(s!="")
{
Console.WriteLine(s);
}
else
{
Console.WriteLine("FRULA");
}
}
finally
{
// ALWAYS set the latency mode back
GCSettings.LatencyMode = oldMode;
}
}
}
}
Restrictions:
1≤|bigString|≤1 000 000
1≤|substring|≤36
bigString and the substring string consist of uppercase and lowercase letters of the English alphabet and digits 0, 1, ..., 9. The characters in the substring are all different.
CPU Time limit: 1 second Memory limit: 1024 MB
9 Answers 9
As stated by 200_success +1 string is immutable
StringBuilder also has Replace but getting it to loop is trickier
You would need to time this
string s = "Pibabakekezza";
string r = "bake";
StringBuilder sb = new StringBuilder(s);
while (sb.Length != sb.Replace(r, "").Length) { }
Debug.WriteLine(sb.ToString());
A character can only be:
- start of replace
- in replace
- other
Not real clean code - this is more just the algorithm
I know should use { }
on every if else
Store partial matches in a Stack
I tested with a very large hard input and it was 40 milliseconds
public static void QuickReplaceTest()
{
string s = "Pibabakekezza";
string r = "bake";
Debug.WriteLine(QuickReplace2(s, r));
Debug.WriteLine(QuickReplace3(s, r));
s = "babakekePbakeibabbakeakebakekezbakezabake";
r = "b";
Debug.WriteLine(QuickReplace2(s, r));
Debug.WriteLine(QuickReplace3(s, r));
StringBuilder sb = new StringBuilder();
sb.Append("piz");
for (int i = 0; i < 100000; i++)
sb.Append("qrsuvw");
//sb.Append("piz");
for (int i = 0; i < 100000; i++)
sb.Append("xyz");
sb.Append("za");
s = sb.ToString();
sb.Clear();
r = "qrsuvwxyz";
Debug.WriteLine(QuickReplace2(s, r));
Debug.WriteLine(QuickReplace3(s, r));
s = s.Insert(400000, "pie");
Debug.WriteLine(QuickReplace2(s, r));
Debug.WriteLine(QuickReplace3(s, r));
}
public static string QuickReplace2(string s, string r)
{
if(string.IsNullOrEmpty(r))
return string.Empty;
if (string.IsNullOrEmpty(s))
return string.Empty;
s = s.Trim();
r = r.Trim();
if (r.Length > s.Length)
return string.Empty;
if (r.Length == 1)
return(s.Replace(r, ""));
StringBuilder sb = new StringBuilder(s.Length / 2);
char[] rArray = r.ToCharArray();
int rArrayLen = rArray.Length;
Stack<byte> partialMatches = new Stack<byte>();
byte curMatchCount = 0;
Stopwatch sw = new Stopwatch();
sw.Start();
foreach(char c in s)
{
//Debug.WriteLine(c);
if (c == rArray[0])
{
if (curMatchCount > 0)
partialMatches.Push(curMatchCount);
curMatchCount = 1;
}
else if (c == rArray[curMatchCount])
{
curMatchCount++;
if (curMatchCount == rArrayLen)
{
if (partialMatches.Count == 0)
curMatchCount = 0;
else
curMatchCount = partialMatches.Pop();
}
}
else
{
//need to unload the stack
if (partialMatches.Count > 0)
{
foreach (int count in partialMatches.Reverse())
for (int i = 0; i < count; i++)
sb.Append(rArray[i]);
partialMatches.Clear();
}
for (int i = 0; i < curMatchCount; i++)
sb.Append(rArray[i]);
curMatchCount = 0;
sb.Append(c);
}
}
if (partialMatches.Count > 0)
{
foreach (int count in partialMatches.Reverse())
for (int i = 0; i < count; i++)
sb.Append(rArray[i]);
}
for (int i = 0; i < curMatchCount; i++)
sb.Append(rArray[i]);
sw.Stop();
Debug.WriteLine(sw.ElapsedMilliseconds.ToString());
return sb.ToString();
}
public static string QuickReplace3(string s, string r)
{
if (string.IsNullOrEmpty(r))
return string.Empty;
if (string.IsNullOrEmpty(s))
return string.Empty;
s = s.Trim();
r = r.Trim();
if (r.Length > s.Length)
return string.Empty;
if (r.Length == 1)
return (s.Replace(r, ""));
StringBuilder sb = new StringBuilder(s.Length / 2);
char[] rArray = r.ToCharArray();
int rArrayLen = rArray.Length;
Stack<byte> partialMatches = new Stack<byte>();
byte curMatchCount = 0;
Stopwatch sw = new Stopwatch();
sw.Start();
foreach (char c in s)
{
sb.Append(c);
if (c == rArray[0])
{
if (curMatchCount > 0)
partialMatches.Push(curMatchCount);
curMatchCount = 1;
}
else if (c == rArray[curMatchCount])
{
curMatchCount++;
if (curMatchCount == rArrayLen)
{
sb.Length -= rArrayLen;
if (partialMatches.Count == 0)
curMatchCount = 0;
else
curMatchCount = partialMatches.Pop();
}
}
else
{
curMatchCount = 0;
partialMatches.Clear();
}
}
sw.Stop();
Debug.WriteLine(sw.ElapsedMilliseconds.ToString());
return sb.ToString();
}
-
\$\begingroup\$ Thanks! Unfortunately this is a little slower than my original post. Original code passes up to the 4th to last test case. This passes up to the 5th to last test case. \$\endgroup\$Seth Kitchen– Seth Kitchen2017年02月09日 01:13:08 +00:00Commented Feb 9, 2017 at 1:13
-
1\$\begingroup\$ @SethKitchen Wow, I was not sure. Need to try for a new approach. \$\endgroup\$paparazzo– paparazzo2017年02月09日 01:19:56 +00:00Commented Feb 9, 2017 at 1:19
-
\$\begingroup\$ +1: really fast :). Little bug: Input "babakekePbakeibabbakeakebakekezbakezabake" / "b" produces a
IndexOutOfRangeException.
\$\endgroup\$JanDotNet– JanDotNet2017年02月09日 14:35:54 +00:00Commented Feb 9, 2017 at 14:35 -
\$\begingroup\$ @JanDotNet Yes it breaks if the replace is only 1 character \$\endgroup\$paparazzo– paparazzo2017年02月09日 14:50:23 +00:00Commented Feb 9, 2017 at 14:50
-
\$\begingroup\$ If replace is only one character then can just use a single pass of String.Replace("r",""); \$\endgroup\$paparazzo– paparazzo2017年02月10日 10:18:55 +00:00Commented Feb 10, 2017 at 10:18
Messing around with the GC settings is a big red flag. If you think that's necessary for a programming challenge then you should instead look at the algorithm.
Sylvain Boisse makes a very good observation about the importance of the guarantee that
The characters in the substring are all different
and a good suggestion to use a stack. Observe that if we encounter a character which neither starts a potential match nor continues the current potential match then it blocks all (unremoved) characters before it from being part of a word, since they would have to span it. Therefore the stack can be cleared, and the code can be remarkably simple.
static string RepeatedRemoval(string initial, string removed)
{
if (removed.Length == 0) return initial;
StringBuilder sb = new StringBuilder(initial.Length);
char startRemoved = removed[0];
int[] stack = new int[initial.Length + 1];
int stackIdx = 0;
foreach (char ch in initial)
{
sb.Append(ch);
// Since all characters in removed are distinct, a match for removed[0] starts a new match
if (ch == startRemoved) stack[++stackIdx] = 1;
// A match for the next character expected extends the current match
else if (ch == removed[stack[stackIdx]]) ++stack[stackIdx];
// And any other character blocks everything up to now from being part of a match
else stackIdx = 0;
if (stack[stackIdx] == removed.Length)
{
--stackIdx;
sb.Length -= removed.Length;
}
}
return sb.ToString();
}
A minor optimisation can be obtained by handling the case removed.Length == 1
specially.
static string RepeatedRemoval(string initial, string removed)
{
if (removed.Length == 0) return initial;
StringBuilder sb = new StringBuilder(initial.Length);
char startRemoved = removed[0];
if (removed.Length == 1)
{
foreach (char ch in initial)
{
if (ch != startRemoved) sb.Append(ch);
}
return sb.ToString();
}
// The main case
int[] stack = new int[initial.Length + 1];
int stackIdx = 0;
foreach (char ch in initial)
{
sb.Append(ch);
// Since all characters in removed are distinct, a match for removed[0] starts a new match
if (ch == startRemoved) stack[++stackIdx] = 1;
else if (ch == removed[stack[stackIdx]])
{
// Extend the match by one char, and if it's a full match then pop.
if (++stack[stackIdx] == removed.Length)
{
--stackIdx;
sb.Length -= removed.Length;
}
}
// This char blocks everything up to now from being part of a match
else stackIdx = 0;
}
return sb.ToString();
}
-
\$\begingroup\$ Oh, I see now. If a match gets broken, then all stacked streaks must be broken. Yeah my handling of stack braking is unnecessarily complex then. +1! \$\endgroup\$MrBrushy– MrBrushy2017年02月09日 16:03:07 +00:00Commented Feb 9, 2017 at 16:03
-
\$\begingroup\$ Thanks this is the fastest solution given that works for all testcases! Yeah changing GC settings is just part of my C# programming contest template. Might as well be as fast as possible :) \$\endgroup\$Seth Kitchen– Seth Kitchen2017年02月09日 22:30:46 +00:00Commented Feb 9, 2017 at 22:30
Strings in C# are immutable, so every time you do s=s.Replace(b, "")
you are constructing a new object. To reduce memory churn, you should use StringBuilder.Replace()
instead.
You say it is an unknown test case, so suppose the bigString is equal to 500,000 "a"s followed by 500,000 "b"s, and the subString is "ab". It will take 500,000 iterations for the string to be completely reduced, and on the nth iteration it will take 500,000 - n steps to locate the substring if we're searching from the beginning as the Contains and Replace methods presumably do. But then the total number is steps is quadratic or about 1.25e11 which would explain why it is taking more than a second on an ordinary CPU. Maybe there is a different algorithm that can handle this case faster?
-
\$\begingroup\$ Yes this is true and is probably why I am not passing. I will try to think of a better algorithm \$\endgroup\$Seth Kitchen– Seth Kitchen2017年02月09日 05:13:15 +00:00Commented Feb 9, 2017 at 5:13
OK I'm no good at C#, I'll present an alternative solution is pseudo-code, using a Stack of matches.
The key is :
The characters in the substring are all different
Which means if we encounter the first char of the substring, it must be a new match starting (not the continuation of a previous match)
List<int> nonMatch = new List()
Stack<List<int>> partialMatches = new Stack<List<int>>()
List<int> currentPartialMatch = new List<int>(substring.length)
for i = i to bigString.length
if currentPartialMatch.size() == substring.length
// Full-match found! dropping it...
currentPartialMatch = partialMatches.pop()
if currentPartialMatch == null
currentPartialMatch = new List<int>(substring.length)
endIf
endIf
if bigString.charAt(i) == substring.charAt(0)
// A new sustring may to be starting
partialMatches.push(currentPartialMatch) // save the current match
currentPartialMatch = new List<int>(substring.length()) // start a new one
partialMatch.add(i)
else if bigString.charAt(i) == substring.charAt(currentPartialMatch.size())
// The current substring match is continuing
currentPartialMatch.add(i)
else
// The match streak is broken! We must un-pile all partial matches until the current char matches it
bool currentCharMatches = false
while !currentCharMatches and !partialMatches.isEmpty()
if bigString.charAt(i) == substring.charAt(currentPartialMatch.length))
// Found a match
currentCharMatches = true
currentPartialMatch.add(i)
breakWhile
else
// Still not matching
nonMatch.addAll(currentPartialMatch) // Dump the current (broken) match
// Pick up the previous matching streak (if any)
currentPartialMatch = partialMatches.pop()
endIf
endWhile
if currentPartialMatch == null
// No partial matches found, the stack is exhausted...
currentPartialMatch = new List<int>(substring.length()) // start a new one
nonMatch.add(i) // Dump the current character
endIf
endIf
endFor
StringBuilder builder = new StringBuilder(
// Reconstruct the list of non matches
for index in nonMatch.sorted()
builder.apend(bigString.charAt(index))
return builder.toString()
Quick-and-dirty complexity analysis:
- The
for
runs in linear time, the sorting inlog(n)
, so overallO(n)
time. - Memory-wise, each index is stored only once in only one List, so
O(n)
memory
If anyone want to edit this to make it more C#-friendly...
-
1\$\begingroup\$ The observation about the importance of distinct characters in the removed string and the way it enables using a stack is very good, but there's no need to use such a complicated stack. It suffices to store lengths of partial matches in the stack and pop completed matches from the end of the string builder. \$\endgroup\$Peter Taylor– Peter Taylor2017年02月09日 15:51:27 +00:00Commented Feb 9, 2017 at 15:51
-
\$\begingroup\$ @PeterTaylor That was my first assumption as well, but I'm not sure about that anymore. Since a partial Match could be interrupted by another, then to get the exact characters concerned by a match, you need to store them all. They might not be consecutive so a range would not suffice. I'm not sure, I'd need to run this, but my gut feeling is it is required. I hope I'm wrong though because that would be more elegant ^^ \$\endgroup\$MrBrushy– MrBrushy2017年02月09日 16:00:33 +00:00Commented Feb 9, 2017 at 16:00
It is possible to use a linked list of chars and navigate with the linked list nodes. That should be fast and uses minimal memory consumption.
public static string RemoveWordFromString(string content, string wordToRemove)
{
var chars = new LinkedList<char>(content);
var pointer = chars.First;
var wordLength = wordToRemove.Length;
do
{
while (IsMatch(pointer, wordToRemove))
{
pointer = Remove(pointer, wordLength);
pointer = MovePointerBackward(pointer, wordLength - 1);
}
pointer = pointer.Next;
} while (pointer != null);
return new String(chars.Select(c => c).ToArray());
}
private static LinkedListNode<char> MovePointerBackward(LinkedListNode<char> node, int length)
{
for (int i = 0; i < length; i++)
if (node.Previous != null)
node = node.Previous;
return node;
}
private static LinkedListNode<char> Remove(LinkedListNode<char> node, int length)
{
var list = node.List;
var prevNode = node.Previous;
LinkedListNode<char> nextNode = node.Next;
for (int i = 0; i < length; i++)
{
nextNode = node.Next;
list.Remove(node);
node = nextNode;
}
return prevNode ?? node;
}
private static bool IsMatch(LinkedListNode<char> node, string wordToRemove)
{
if (node == null)
return false;
var currentNode = node;
for (int i = 0; i < wordToRemove.Length; i++)
{
if (currentNode?.Value != wordToRemove[i])
return false;
currentNode = currentNode.Next;
}
return true;
}
-
1\$\begingroup\$ +1 I think key is to move backwards by word-to-removes length (-1) as you did. \$\endgroup\$SchreiberLex– SchreiberLex2017年02月09日 12:48:17 +00:00Commented Feb 9, 2017 at 12:48
-
\$\begingroup\$ Yes. StringBuilder.Replace prevents from unnecessary string creation, but it iterates each time over the whole string... And you are right - go backwards by length-1 is enough ;) I'll fix that. Thanks! \$\endgroup\$JanDotNet– JanDotNet2017年02月09日 13:08:26 +00:00Commented Feb 9, 2017 at 13:08
-
\$\begingroup\$ I ran the (hard) test case in my answer and got 775 milliseconds \$\endgroup\$paparazzo– paparazzo2017年02月09日 14:17:07 +00:00Commented Feb 9, 2017 at 14:17
-
\$\begingroup\$ @Paparazzi: You are right: Your solution is over 10 times faster! But I suppose that the linked list consumes less memory. \$\endgroup\$JanDotNet– JanDotNet2017年02月09日 14:33:42 +00:00Commented Feb 9, 2017 at 14:33
-
\$\begingroup\$ @Paparazzi: I compared the memory consumption of our solutions (using this method) and your solution needs ~25% less memory ;) \$\endgroup\$JanDotNet– JanDotNet2017年02月09日 14:45:01 +00:00Commented Feb 9, 2017 at 14:45
I suggest the following algorithm:
static string RemoveAll(string haystack, string needle)
{
var sb = new StringBuilder();
foreach (var c in haystack) {
sb.Append(c);
if (EndsWith(sb, needle))
sb.Length -= needle.Length;
}
return sb.ToString();
}
static bool EndsWith(StringBuilder sb, string needle)
{
if (sb.Length < needle.Length)
return false;
for (var i = 0; i < needle.Length; i++)
if (sb[sb.Length - needle.Length + i] != needle[i])
return false;
return true;
}
It doesn't even exploit the fact that needle
contains only unique characters, but could do so for further optimization.
If you want to trade readability for speed, you can use this code:
static string RemoveAllFast(string haystack, string needle)
{
if (needle == "")
return haystack;
var sb = new char[haystack.Length];
var sbLength = 0;
var lastChar = needle[needle.Length - 1];
var needleStart = needle.Substring(0, needle.Length - 1);
var needleStartLength = needleStart.Length;
foreach (var c in haystack) {
if (c == lastChar && EndsWithFast(sb, sbLength, needleStart)) {
sbLength -= needleStartLength;
} else {
sb[sbLength++] = c;
}
}
return new string(sb, 0, sbLength);
}
static bool EndsWithFast(char[] sb, int sbLength, string needle)
{
var needleLength = needle.Length;
if (sbLength < needleLength)
return false;
for (var i = 0; i < needleLength; i++) {
if (sb[sbLength - needleLength + i] != needle[i])
return false;
}
return true;
}
It takes about 35% of the time of the above code. On the other hand, the beauty and simplicity of the main idea is completely hidden between all the small optimizations.
-
\$\begingroup\$ Using
foreach (char c in haystack)
instead offor (char c : haystack)
makes the code understandable for a C# compiler ;). However the algorithm doesn't work... tryRemoveAll("PizzababakekePbakeibabbakeakebakekezbakezabakePizza", "bake")
for example. Without the string 'Pizza' at the start and end, your code throws an exception. \$\endgroup\$JanDotNet– JanDotNet2017年02月10日 06:46:58 +00:00Commented Feb 10, 2017 at 6:46 -
\$\begingroup\$ Thanks for catching this. Obviously I meant to check
sb.EndsWith
instead ofhaystack.EndsWith
. :) \$\endgroup\$Roland Illig– Roland Illig2017年02月10日 06:56:33 +00:00Commented Feb 10, 2017 at 6:56 -
\$\begingroup\$ Unfortunately,
StringBuilder
doesn't have a methodEndsWith
;). However, usingsb.ToString().EndsWith(needle)
seems to work (It should be possible to implement a fastEndsWith
for StringBuilder). Didn't checked the performance - but IMHO it is the most elegant way right now... +1 \$\endgroup\$JanDotNet– JanDotNet2017年02月10日 07:17:34 +00:00Commented Feb 10, 2017 at 7:17 -
\$\begingroup\$ @JanDotNet: Did you intentionally put a
\u200B\u200C
into your test string (between the last twozz
)? \$\endgroup\$Roland Illig– Roland Illig2017年02月10日 07:19:02 +00:00Commented Feb 10, 2017 at 7:19 -
\$\begingroup\$ It seems so. wasn't intentional.. sorry ;) However a solid algorithm should handle that correctly! ;P \$\endgroup\$JanDotNet– JanDotNet2017年02月10日 07:32:46 +00:00Commented Feb 10, 2017 at 7:32
I'm not exactly sure if I understand the challenge, but couldn't you just count the length of the string and just grab the left and right? Basically, ignore all the stuff in the middle, regardless of it's contents?
public static void MySillySolution()
{
string s = "Pibabakekezza";
string r;
StringBuilder sb = new StringBuilder();
sb.Append("piz");
for (int i = 0; i < 100000; i++)
sb.Append("qrsuvw");
for (int i = 0; i < 100000; i++)
sb.Append("xyz");
sb.Append("za");
s = sb.ToString();
sb.Clear();
r = "qrsuvwxyz";
Stopwatch sw = new Stopwatch();
sw.Start();
string solution = s.Substring(0, 3) + s.Substring(s.Length-2, 2);
Debug.WriteLine(solution);
sw.Stop();
Debug.WriteLine(sw.ElapsedMilliseconds.ToString());
Debug.WriteLine(sb.ToString());
}
I'm getting near instantaneous results with this solution... But I feel like I have totally missed the understanding of the challenge...
-
1\$\begingroup\$ What about strings like 'babakekePbakeibabbakeakebakekezbakezabake'? ;) \$\endgroup\$JanDotNet– JanDotNet2017年02月09日 17:10:41 +00:00Commented Feb 9, 2017 at 17:10
-
1\$\begingroup\$ @Seth Kitchen -- maybe you clarify the challenge a bit more? \$\endgroup\$Svek– Svek2017年02月09日 17:12:37 +00:00Commented Feb 9, 2017 at 17:12
-
\$\begingroup\$ @Svek yes you misunderstand. The substring doesn't have to be in the middle. as JanDotNet has said. The substring could be at the front in the middle and at the end or anywhere in between \$\endgroup\$Seth Kitchen– Seth Kitchen2017年02月09日 21:31:13 +00:00Commented Feb 9, 2017 at 21:31
QuickReplace 2 is a little faster than repeated removal, but fails some test cases.
Thank you all for the amazing help and suggestions. I ran the best two solutions in release. I will keep this updated if more answers are given :):
Starting watch for repeated removal:
RepeatedRemoval took:30
RepeatedRemoval answer: pizza
Starting watch for QuickReplace2:
QuickReplace2 took:18
QuickReplace2 answer: pizza
Starting watch for Original:
Original took:394065
Original answer: pizza
Explore related questions
See similar questions with these tags.
the characters in the substring are all different
(you're not using this hint). When the string is fully browsed, all remaining characters are the answer. This doesn't even need to split/instantiate strings. Sorry, not proficient enough in C# to propose a full answer. \$\endgroup\$