If you don't know what Rubberduck is: Rubberduck is a COM add-in for the VBE / VBA's IDE that I'm building with ...@RubberDuck @RubberDuck. I have a branch where I've burned the whole parser namespace and replaced it with ANTLR-generated code and replaced it with ANTLR-generated code.
If you don't know what Rubberduck is: Rubberduck is a COM add-in for the VBE / VBA's IDE that I'm building with ...@RubberDuck. I have a branch where I've burned the whole parser namespace and replaced it with ANTLR-generated code.
If you don't know what Rubberduck is: Rubberduck is a COM add-in for the VBE / VBA's IDE that I'm building with ...@RubberDuck. I have a branch where I've burned the whole parser namespace and replaced it with ANTLR-generated code.
Rubberduck VBA Parser, Episode VI: Return of the Abstraction
VBA comment syntax is fun... and VBA line continuation makes it even more fun.
Picture a VBA module like this:
Rem this is an old-style comment. ' this is a more standard comment Rem this _ is _ a _ multiline _ comment Private Sub Foo() ' this _ is _ also _ a _ multiline _ comment _ _ ...don't do this at home. End Sub '@TestMethod Private Sub Bar() ' todo: call Foo End Sub
(no wonder syntax highlighting is getting confused!)
If you don't know what Rubberduck is: Rubberduck is a COM add-in for the VBE / VBA's IDE that I'm building with ...@RubberDuck. I have a branch where I've burned the whole parser namespace and replaced it with ANTLR-generated code.
The only problem is that the .g4 VB6 grammar file I'm using to generate the parser, does not support comments. So I ended up [re-]inserting an abstraction layer between ANTLR's IParseTree
and the rest of Rubberduck.. albeit very differently this time.
I added two methods to the IRubberduckParser
interface:
/// <summary>
/// Parses all code modules in specified project.
/// </summary>
/// <returns>Returns an <c>IParseTree</c> for each code module in the project; the qualified module name being the key.</returns>
IEnumerable<VbModuleParseResult> Parse(VBProject vbProject);
IEnumerable<CommentNode> ParseComments(VBComponent vbComponent);
The VbModuleParseResult
class encapsulates a module's IParseTree
and its CommentNode
s:
public class VbModuleParseResult
{
public VbModuleParseResult(QualifiedModuleName qualifiedName, IParseTree parseTree, IEnumerable<CommentNode> comments)
{
_qualifiedName = qualifiedName;
_parseTree = parseTree;
_comments = comments;
}
private readonly QualifiedModuleName _qualifiedName;
public QualifiedModuleName QualifiedName { get { return _qualifiedName; } }
private IParseTree _parseTree;
public IParseTree ParseTree { get { return _parseTree; } }
private IEnumerable<CommentNode> _comments;
public IEnumerable<CommentNode> Comments { get { return _comments; } }
}
That object is returned by VBParser
methods, that the rest of Rubberduck uses (via IRubberduckParser
):
public class VBParser : IRubberduckParser
{
public INode Parse(string projectName, string componentName, string code)
{
var result = Parse(code);
var walker = new ParseTreeWalker();
var listener = new VBTreeListener(projectName, componentName);
walker.Walk(listener, result);
return listener.Root;
}
public IParseTree Parse(string code)
{
var input = new AntlrInputStream(code);
var lexer = new VisualBasic6Lexer(input);
var tokens = new CommonTokenStream(lexer);
var parser = new VisualBasic6Parser(tokens);
var result = parser.startRule();
return result;
}
public IEnumerable<VbModuleParseResult> Parse(VBProject project)
{
return project.VBComponents.Cast<VBComponent>()
.Select(component => new VbModuleParseResult(new QualifiedModuleName(project.Name, component.Name),
Parse(component.CodeModule.ToString()), ParseComments(component)));
}
Here we are, the IEnumerable<CommentNode> ParseComments(VBComponent component)
implementation:
public IEnumerable<CommentNode> ParseComments(VBComponent component)
{
var code = component.CodeModule.Code();
var qualifiedName = new QualifiedModuleName(component.Collection.Parent.Name, component.Name);
var commentBuilder = new StringBuilder();
var continuing = false;
var startLine = 0;
var startColumn = 0;
for (var i = 0; i < code.Length; i++)
{
var line = code[i];
var index = 0;
if (continuing || line.HasComment(out index))
{
startLine = continuing ? startLine : i;
startColumn = continuing ? startColumn : index;
var commentLength = line.Length - index;
continuing = line.EndsWith("_");
if (!continuing)
{
commentBuilder.Append(line.Substring(index, commentLength).TrimStart());
var selection = new Selection(startLine + 1, startColumn + 1, i + 1, line.Length);
var result = new CommentNode(commentBuilder.ToString(), new QualifiedSelection(qualifiedName, selection));
commentBuilder.Clear();
yield return result;
}
else
{
// ignore line continuations in comment text:
commentBuilder.Append(line.Substring(index, commentLength).TrimStart());
}
}
}
}
}
The code works perfectly:
debugging session showing the parsed CommentNodes
...but am I correct to read this last method and think there might be room for improvement? Anything else jumps at you?