Check syntax (string and character comparison)

Question 1

I would like to know if my function that checks if a string syntax is correct. It's currently very messy, but seems to do the job.

//Example Valid String
[12345678]
{
something
whatever
}

Function:

private static bool CheckIfSyntaxIsValid(string input)
 {
 int mode = 1;
 char last = ' ';
 foreach (char c in input)
 {
 if (last == '{' && c == '}')
 {
 last = ' ';
 mode = 1;
 continue;
 }
 if (last == '{' && c == '{')
 return false;
 if (last == '[' && c == ']')
 {
 last = ' ';
 mode = 2;
 continue;
 }
 if (last == '[' && !Char.IsDigit(c))
 return false;
 if (c == '[')
 {
 if (mode == 2)
 return false;
 last = c;
 mode = 1;
 }
 if (c == '{')
 {
 if (mode == 1)
 return false;
 last = c;
 mode = 2;
 }
 if (c == ']' || c == '}')
 return false;
 }
 if (last != ' ')
 return false;
 return true;
 }

The rule is basically:

[] must contain only numbers/digits
It must start with [] before {}, and they must come after each other

So, [] {} [] {} etc. Anything else is simply invalid.

Okay, not everything else is invalid. Text outside of those are ignored, so the only thing that matters is that the bracket orders are correct, etc.

Question 2

Is this for validating a config file or something? Is there a reason why you want to validate the string instead of parsing it?

Question 3

No clue what parsing means in this context sadly, if there is any other way to do this i am all ear. This simply validates a file before it's processed to make sure it's at least syntax correct.

Question 4

So that I can better understand what you are trying to do, could you explain why you consider that example to be "valid"? It's a valid ... what? A valid product description? A valid patient record? Your example seems very fuzzy and contrived.

Question 5

Valid for compilation in a Compiler. All Code is within {}, and the Address for each code is within [] (right above the code. That's the system structure i want to validate:) (If the code is invalid and whatnot is of no concern within this validation)

Question 6

Would [208208]{ something, {whatever}} be valid, or invalid input?

Question 7

Let me post another answer because this is completely different from the first one.

You want to have it less messy so how about this solution?

I introduced a simple Token class that stores the token char and its position.

private class Token
{
 public Token(char value, int position)
 {
 Value = value;
 Position = position;
 }
 public char Value { get; }
 public int Position { get; }
}

The new SyntaxService does the following now:

it finds all tokens
it checks if the number is even and divisible by 4 ([]{}.Length)
it checks if each batch of four tokens has the valid sequence []{}
it checks if the attribute isn't empty
it checks if the attribute contains all digits
it checks if the {} block is not-empty

Example:

public class SyntaxService
{
 private const string separators = "[]{}";
 public bool ValidateSyntax(string value)
 {
 var tokens = FindTokens(value).ToList();
 return ValidateTokens(value, tokens);
 }
 private IEnumerable<Token> FindTokens(string value)
 {
 for (int i = 0; i < value.Length; i++)
 {
 if (separators.Contains(value[i]))
 {
 yield return new Token(value[i], i);
 }
 }
 }
 private bool ValidateTokens(string value, IList<Token> tokens)
 {
 var isValidTokenCount = tokens.Count % separators.Length == 0;
 if (!isValidTokenCount)
 {
 return false;
 }
 var substring = new Func<int, int, string>((start, end) => value.Substring(start + 1, end - start - 1));
 for (var i = 0; i < tokens.Count; i += separators.Length)
 {
 var batch = tokens.SkipFast(i).Take(separators.Length).ToArray();
 var isValidTokenSequence = batch.Select(x => x.Value).SequenceEqual(separators);
 if (!isValidTokenSequence)
 {
 return false;
 }
 const int squareBracketLeft = 0;
 const int squareBracketRight = 1;
 const int curlyBracketLeft = 2;
 const int curlyBracketRight = 3;
 var attribute = substring(batch[squareBracketLeft].Position, batch[squareBracketRight].Position);
 if (string.IsNullOrWhiteSpace(attribute))
 {
 return false;
 }
 if (!attribute.All(x => Char.IsDigit(x)))
 {
 return false;
 }
 var content = substring(batch[curlyBracketLeft].Position, batch[curlyBracketRight].Position);
 if (string.IsNullOrWhiteSpace(content))
 {
 return false;
 }
 }
 return true;
 }
 private class Token
 {
 public Token(char value, int position)
 {
 Value = value;
 Position = position;
 }
 public char Value { get; }
 public int Position { get; }
 }
}

My helper extension for SkipFast for lists:

static class Extensions
{
 public static IEnumerable<T> SkipFast<T>(this IList<T> values, int index)
 {
 for (var i = index; i < values.Count; i++)
 {
 yield return values[i];
 }
 }
}

Usage:

var isValid = new SyntaxService().ValidateSyntax(text);

This seems to work even better then the original method. It correctly recognizes invalid syntax where the original one thinks it's valid.

Question 8

I will mark this as an Answer cause it gives much insight and i can learn from it in this context and take useful things from it:)!

Question 9

I suggest making the code more expressive by encapsulating the conditions as lambdas before the loop.

You should avoid magic numbers and for the mode create constants or an enum.

In the following example the {} stand for a block and [] for an attribute.

private static bool CheckIfSyntaxIsValid(string input)
{
 var mode = 1;
 var last = ' ';
 var isEndOfEmptyBlock = new Func<char, bool>(c => last == '{' && c == '}');
 var isBeginOfNestedBlock = new Func<char, bool>(c => last == '{' && c == '{');
 var isEndOfEmptyAttribute = new Func<char, bool>(c => last == '[' && c == ']');
 var isNotDigitInsideAttribute = new Func<char, bool>(c => last == '[' && !Char.IsDigit(c));
 var isAttributeBegin = new Func<char, bool>(c => c == '[');
 var isAttributeEnd = new Func<char, bool>(c => c == ']');
 var isBlockBegin = new Func<char, bool>(c => c == '{');
 var isBlockEnd = new Func<char, bool>(c => c == '}');
 foreach (char c in input)
 {
 if (isEndOfEmptyBlock(c))
 {
 last = ' ';
 mode = 1;
 continue;
 }
 if (isBeginOfNestedBlock(c))
 {
 return false;
 }
 if (isEndOfEmptyAttribute(c))
 {
 last = ' ';
 mode = 2;
 continue;
 }
 if (isNotDigitInsideAttribute(c))
 {
 return false;
 }
 if (isAttributeBegin(c))
 {
 if (mode == 2)
 {
 return false;
 }
 last = c;
 mode = 1;
 }
 if (isBlockBegin(c))
 {
 if (mode == 1)
 {
 return false;
 }
 last = c;
 mode = 2;
 }
 if (isAttributeEnd(c) || isBlockEnd(c))
 {
 return false;
 }
 }
 if (last != ' ')
 {
 return false;
 }
 return true;
}

Question 10

This does simplify it in a sense, didn't think you could split it up in functions as variables like that, it's still very messy though (not your fault), i will give this a +1 for now at least:)

Question 11

@Zerowalker maybe if you could explain the "mode" we could do more for you ;-)

Question 12

Mode doesn't really mean anything, it's just a value so i can keep track with the "ifs". Cause if you look at my Example text, you can see that there is a order of things.

Question 13

I noticed that using Func instead of manually writing the "bool if things" gives quite a huge performance hit (not that It really matters in my case). Though didn't think it was that huge (more than twice as slow in my tests)

Question 14

@Zerowalker that's interesting, I didn't test it for performance, I wouldn't even expect it to be so bad. How big are your strings that you validate? I need to digg deeper and maybe learn something new too ;-)

t3chb0t t3chb0t 44.7k9 gold badges84 silver badges190 bronze badges · Accepted Answer · 2016-11-14 21:04:58Z

Let me post another answer because this is completely different from the first one.

You want to have it less messy so how about this solution?

I introduced a simple Token class that stores the token char and its position.

private class Token
{
 public Token(char value, int position)
 {
 Value = value;
 Position = position;
 }
 public char Value { get; }
 public int Position { get; }
}

The new SyntaxService does the following now:

it finds all tokens
it checks if the number is even and divisible by 4 ([]{}.Length)
it checks if each batch of four tokens has the valid sequence []{}
it checks if the attribute isn't empty
it checks if the attribute contains all digits
it checks if the {} block is not-empty

Example:

public class SyntaxService
{
 private const string separators = "[]{}";
 public bool ValidateSyntax(string value)
 {
 var tokens = FindTokens(value).ToList();
 return ValidateTokens(value, tokens);
 }
 private IEnumerable<Token> FindTokens(string value)
 {
 for (int i = 0; i < value.Length; i++)
 {
 if (separators.Contains(value[i]))
 {
 yield return new Token(value[i], i);
 }
 }
 }
 private bool ValidateTokens(string value, IList<Token> tokens)
 {
 var isValidTokenCount = tokens.Count % separators.Length == 0;
 if (!isValidTokenCount)
 {
 return false;
 }
 var substring = new Func<int, int, string>((start, end) => value.Substring(start + 1, end - start - 1));
 for (var i = 0; i < tokens.Count; i += separators.Length)
 {
 var batch = tokens.SkipFast(i).Take(separators.Length).ToArray();
 var isValidTokenSequence = batch.Select(x => x.Value).SequenceEqual(separators);
 if (!isValidTokenSequence)
 {
 return false;
 }
 const int squareBracketLeft = 0;
 const int squareBracketRight = 1;
 const int curlyBracketLeft = 2;
 const int curlyBracketRight = 3;
 var attribute = substring(batch[squareBracketLeft].Position, batch[squareBracketRight].Position);
 if (string.IsNullOrWhiteSpace(attribute))
 {
 return false;
 }
 if (!attribute.All(x => Char.IsDigit(x)))
 {
 return false;
 }
 var content = substring(batch[curlyBracketLeft].Position, batch[curlyBracketRight].Position);
 if (string.IsNullOrWhiteSpace(content))
 {
 return false;
 }
 }
 return true;
 }
 private class Token
 {
 public Token(char value, int position)
 {
 Value = value;
 Position = position;
 }
 public char Value { get; }
 public int Position { get; }
 }
}

My helper extension for SkipFast for lists:

static class Extensions
{
 public static IEnumerable<T> SkipFast<T>(this IList<T> values, int index)
 {
 for (var i = index; i < values.Count; i++)
 {
 yield return values[i];
 }
 }
}

Usage:

var isValid = new SyntaxService().ValidateSyntax(text);

This seems to work even better then the original method. It correctly recognizes invalid syntax where the original one thinks it's valid.

I will mark this as an Answer cause it gives much insight and i can learn from it in this context and take useful things from it:)!

Stack Exchange Network

Check syntax (string and character comparison)

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Check syntax (string and character comparison)

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions