Return to Question

replaced http://codereview.stackexchange.com/ with https://codereview.stackexchange.com/

edited Apr 13, 2017 at 12:40

If you don't know what Rubberduck is: Rubberduck is a COM add-in for the VBE / VBA's IDE that I'm building with ...@RubberDuck @RubberDuck. I have a branch where I've burned the whole parser namespace and replaced it with ANTLR-generated code and replaced it with ANTLR-generated code.

If you don't know what Rubberduck is: Rubberduck is a COM add-in for the VBE / VBA's IDE that I'm building with ...@RubberDuck. I have a branch where I've burned the whole parser namespace and replaced it with ANTLR-generated code.

edited tags

Link

edited May 7, 2015 at 6:15

Mathieu Guindon

edited May 7, 2015 at 6:15

Mathieu Guindon

75.5k
18
194
467

Source Link

asked Feb 4, 2015 at 7:22

Mathieu Guindon

asked Feb 4, 2015 at 7:22

Mathieu Guindon

75.5k
18
194
467

Rubberduck VBA Parser, Episode VI: Return of the Abstraction

VBA comment syntax is fun... and VBA line continuation makes it even more fun.

Picture a VBA module like this:

Rem this is an old-style comment.
' this is a more standard comment
Rem this _
 is _
 a _
 multiline _
 comment
 
Private Sub Foo() ' this _
 is _
 also _
 a _
 multiline _
 comment _
 _
...don't do this at home.
 
End Sub
'@TestMethod
Private Sub Bar()
 ' todo: call Foo
End Sub

_{(no wonder syntax highlighting is getting confused!)}

The only problem is that the .g4 VB6 grammar file I'm using to generate the parser, does not support comments. So I ended up [re-]inserting an abstraction layer between ANTLR's IParseTree and the rest of Rubberduck.. albeit very differently this time.

I added two methods to the IRubberduckParser interface:

/// <summary>
/// Parses all code modules in specified project.
/// </summary>
/// <returns>Returns an <c>IParseTree</c> for each code module in the project; the qualified module name being the key.</returns>
IEnumerable<VbModuleParseResult> Parse(VBProject vbProject);
IEnumerable<CommentNode> ParseComments(VBComponent vbComponent);

The VbModuleParseResult class encapsulates a module's IParseTree and its CommentNodes:

public class VbModuleParseResult
{
 public VbModuleParseResult(QualifiedModuleName qualifiedName, IParseTree parseTree, IEnumerable<CommentNode> comments)
 {
 _qualifiedName = qualifiedName;
 _parseTree = parseTree;
 _comments = comments;
 }
 private readonly QualifiedModuleName _qualifiedName;
 public QualifiedModuleName QualifiedName { get { return _qualifiedName; } }
 private IParseTree _parseTree;
 public IParseTree ParseTree { get { return _parseTree; } }
 private IEnumerable<CommentNode> _comments;
 public IEnumerable<CommentNode> Comments { get { return _comments; } }
}

That object is returned by VBParser methods, that the rest of Rubberduck uses (via IRubberduckParser):

public class VBParser : IRubberduckParser
{
 public INode Parse(string projectName, string componentName, string code)
 {
 var result = Parse(code);
 var walker = new ParseTreeWalker();
 
 var listener = new VBTreeListener(projectName, componentName);
 walker.Walk(listener, result);
 return listener.Root;
 }
 public IParseTree Parse(string code)
 {
 var input = new AntlrInputStream(code);
 var lexer = new VisualBasic6Lexer(input);
 var tokens = new CommonTokenStream(lexer);
 var parser = new VisualBasic6Parser(tokens);
 
 var result = parser.startRule();
 return result;
 }
 public IEnumerable<VbModuleParseResult> Parse(VBProject project)
 {
 return project.VBComponents.Cast<VBComponent>()
 .Select(component => new VbModuleParseResult(new QualifiedModuleName(project.Name, component.Name), 
 Parse(component.CodeModule.ToString()), ParseComments(component)));
 }

Here we are, the IEnumerable<CommentNode> ParseComments(VBComponent component) implementation:

 public IEnumerable<CommentNode> ParseComments(VBComponent component)
 {
 var code = component.CodeModule.Code();
 var qualifiedName = new QualifiedModuleName(component.Collection.Parent.Name, component.Name);
 var commentBuilder = new StringBuilder();
 var continuing = false;
 var startLine = 0;
 var startColumn = 0;
 for (var i = 0; i < code.Length; i++)
 {
 var line = code[i]; 
 var index = 0;
 if (continuing || line.HasComment(out index))
 {
 startLine = continuing ? startLine : i;
 startColumn = continuing ? startColumn : index;
 var commentLength = line.Length - index;
 continuing = line.EndsWith("_");
 if (!continuing)
 {
 commentBuilder.Append(line.Substring(index, commentLength).TrimStart());
 var selection = new Selection(startLine + 1, startColumn + 1, i + 1, line.Length);
 var result = new CommentNode(commentBuilder.ToString(), new QualifiedSelection(qualifiedName, selection));
 commentBuilder.Clear();
 
 yield return result;
 }
 else
 {
 // ignore line continuations in comment text:
 commentBuilder.Append(line.Substring(index, commentLength).TrimStart()); 
 }
 }
 }
 }
}

The code works perfectly:

debugging session showing the parsed CommentNodes

...but am I correct to read this last method and think there might be room for improvement? Anything else jumps at you?

c# parsing

lang-cs