The Rubberduck project is coming along pretty nicely, and for the next version the parsing strategy is getting yet another revision, and this time around it really feels like it's done right.
Or is it?
The idea is to walk the parse tree once to locate every single declaration.
Here are the types of declarations we're talking about:
namespace Rubberduck.Parsing.Symbols
{
public enum DeclarationType
{
Module,
Class,
Procedure,
Function,
PropertyGet,
PropertyLet,
PropertySet,
Parameter,
Variable,
Constant,
Enumeration,
EnumerationMember,
Event,
UserDefinedType,
UserDefinedTypeMember,
LibraryFunction
}
}
And here's what a Declaration
looks like:
namespace Rubberduck.Parsing.Symbols
{
/// <summary>
/// Defines a declared identifier.
/// </summary>
public class Declaration
{
public Declaration(QualifiedMemberName qualifiedName, string parentScope,
string identifierName, string asTypeName, bool isSelfAssigned,
Accessibility accessibility, DeclarationType declarationType, ParserRuleContext context)
{
_qualifiedName = qualifiedName;
_parentScope = parentScope;
_identifierName = identifierName;
_asTypeName = asTypeName;
_isSelfAssigned = isSelfAssigned;
_accessibility = accessibility;
_declarationType = declarationType;
_selection = context.GetSelection();
_context = context;
}
private readonly QualifiedMemberName _qualifiedName;
public QualifiedMemberName QualifiedName { get { return _qualifiedName; } }
private readonly ParserRuleContext _context;
public ParserRuleContext Context { get { return _context; } }
private readonly IList<IdentifierReference> _references = new List<IdentifierReference>();
public IEnumerable<IdentifierReference> References { get { return _references; } }
public void AddReference(IdentifierReference reference)
{
_references.Add(reference);
}
private readonly Selection _selection;
/// <summary>
/// Gets a <c>Selection</c> representing the position of the declaration in the code module.
/// </summary>
/// <remarks>
/// Returns <c>default(Selection)</c> for module identifiers.
/// </remarks>
public Selection Selection { get { return _selection; } }
/// <summary>
/// Gets an <c>int</c> representing the VBProject the declaration is made in.
/// </summary>
/// <remarks>
/// This property is intended to differenciate identically-named VBProjects.
/// </remarks>
public int ProjectHashCode { get { return _qualifiedName.ModuleScope.ProjectHashCode; } }
/// <summary>
/// Gets the name of the VBProject the declaration is made in.
/// </summary>
public string ProjectName { get { return _qualifiedName.ModuleScope.ProjectName; } }
/// <summary>
/// Gets the name of the VBComponent the declaration is made in.
/// </summary>
public string ComponentName { get { return _qualifiedName.ModuleScope.ModuleName; } }
private readonly string _parentScope;
/// <summary>
/// Gets the parent scope of the declaration.
/// </summary>
public string ParentScope { get { return _parentScope; } }
private readonly string _identifierName;
/// <summary>
/// Gets the declared name of the identifier.
/// </summary>
public string IdentifierName { get { return _identifierName; } }
private readonly string _asTypeName;
/// <summary>
/// Gets the name of the declared type.
/// </summary>
/// <remarks>
/// This value is <c>null</c> if not applicable,
/// and <c>Variant</c> if applicable but unspecified.
/// </remarks>
public string AsTypeName { get { return _asTypeName; } }
private readonly bool _isSelfAssigned;
/// <summary>
/// Gets a value indicating whether the declaration is a joined assignment (e.g. "As New xxxxx")
/// </summary>
public bool IsSelfAssigned { get { return _isSelfAssigned; } }
private readonly Accessibility _accessibility;
/// <summary>
/// Gets a value specifying the declaration's visibility.
/// This value is used in determining the declaration's scope.
/// </summary>
public Accessibility Accessibility { get { return _accessibility; } }
private readonly DeclarationType _declarationType;
/// <summary>
/// Gets a value specifying the type of declaration.
/// </summary>
public DeclarationType DeclarationType { get { return _declarationType; } }
/// <summary>
/// Returns a string representing the scope of an identifier.
/// </summary>
public string Scope
{
get
{
switch (_declarationType)
{
case DeclarationType.Class:
case DeclarationType.Module:
return _qualifiedName.ModuleScope.ProjectName + "." + _qualifiedName.ModuleScope.ModuleName;
case DeclarationType.Procedure:
case DeclarationType.Function:
case DeclarationType.PropertyGet:
case DeclarationType.PropertyLet:
case DeclarationType.PropertySet:
return _qualifiedName.ModuleScope.ProjectName + "." + _qualifiedName.ModuleScope.ModuleName + "." + _identifierName;
default:
return _parentScope;
}
}
}
public override bool Equals(object obj)
{
if (!(obj is Declaration))
{
return false;
}
return GetHashCode() == ((Declaration)obj).GetHashCode();
}
public override int GetHashCode()
{
return string.Concat(ProjectHashCode.ToString(), ProjectName, ComponentName, _parentScope, _identifierName).GetHashCode();
}
}
Then we walk the parse tree once more, to locate every single identifier reference - here's what an IdentifierReference
looks like:
namespace Rubberduck.Parsing.Symbols
{
public struct IdentifierReference
{
public IdentifierReference(string projectName, string componentName, string identifierName,
Selection selection, bool isAssignment)
{
_projectName = projectName;
_componentName = componentName;
_identifierName = identifierName;
_selection = selection;
_isAssignment = isAssignment;
}
private readonly string _projectName;
public string ProjectName { get { return _projectName; } }
private readonly string _componentName;
public string ComponentName { get { return _componentName; } }
private readonly string _identifierName;
public string IdentifierName { get { return _identifierName; } }
private readonly Selection _selection;
public Selection Selection { get { return _selection; } }
private readonly bool _isAssignment;
public bool IsAssignment { get { return _isAssignment; } }
}
}
"Walking the tree" is done with an ANTLR TreeWalker
, and uses a listener - a class that calls virtual methods whenever it enters or exits a grammar rule.
So I have 2 "listeners" - one that "listens" for declarations (see it on GitHub)
...and another that "listens" for usages - this is the class I'm particularly interested in getting reviewed, since this is where all of Rubberduck's parsing power comes from: this class is responsible for assigning each identifier reference to the correct declaration, depending on the scope it's found in.
namespace Rubberduck.Parsing.Symbols
{
public class IdentifierReferenceListener : VBABaseListener
{
private readonly Declarations _declarations;
private readonly int _projectHashCode;
private readonly string _projectName;
private readonly string _componentName;
private string _currentScope;
public IdentifierReferenceListener(VBComponentParseResult result, Declarations declarations)
: this(result.QualifiedName.ProjectHashCode, result.QualifiedName.ProjectName, result.QualifiedName.ModuleName, declarations)
{ }
public IdentifierReferenceListener(int projectHashCode, string projectName, string componentName, Declarations declarations)
{
_projectHashCode = projectHashCode;
_projectName = projectName;
_componentName = componentName;
_declarations = declarations;
SetCurrentScope();
}
private string ModuleScope { get { return _projectName + "." + _componentName; } }
/// <summary>
/// Sets current scope to module-level.
/// </summary>
private void SetCurrentScope()
{
_currentScope = ModuleScope;
}
/// <summary>
/// Sets current scope to specified module member.
/// </summary>
/// <param name="name">The name of the member owning the current scope.</param>
private void SetCurrentScope(string name)
{
_currentScope = _projectName + "." + _componentName + "." + name;
}
public override void EnterSubStmt(VBAParser.SubStmtContext context)
{
SetCurrentScope(context.ambiguousIdentifier().GetText());
}
public override void ExitSubStmt(VBAParser.SubStmtContext context)
{
SetCurrentScope();
}
public override void EnterFunctionStmt(VBAParser.FunctionStmtContext context)
{
SetCurrentScope(context.ambiguousIdentifier().GetText());
}
public override void ExitFunctionStmt(VBAParser.FunctionStmtContext context)
{
SetCurrentScope();
}
public override void EnterPropertyGetStmt(VBAParser.PropertyGetStmtContext context)
{
SetCurrentScope(context.ambiguousIdentifier().GetText());
}
public override void ExitPropertyGetStmt(VBAParser.PropertyGetStmtContext context)
{
SetCurrentScope();
}
public override void EnterPropertyLetStmt(VBAParser.PropertyLetStmtContext context)
{
SetCurrentScope(context.ambiguousIdentifier().GetText());
}
public override void ExitPropertyLetStmt(VBAParser.PropertyLetStmtContext context)
{
SetCurrentScope();
}
public override void EnterPropertySetStmt(VBAParser.PropertySetStmtContext context)
{
SetCurrentScope(context.ambiguousIdentifier().GetText());
}
public override void ExitPropertySetStmt(VBAParser.PropertySetStmtContext context)
{
SetCurrentScope();
}
public override void EnterAmbiguousIdentifier(VBAParser.AmbiguousIdentifierContext context)
{
if (IsDeclarativeContext(context))
{
// skip declarations
return;
}
var selection = context.GetSelection();
EnterIdentifier(context, selection);
}
public override void EnterCertainIdentifier(VBAParser.CertainIdentifierContext context)
{
// skip declarations
if (IsDeclarativeContext(context))
{
return;
}
var selection = context.GetSelection();
EnterIdentifier(context, selection);
}
private void EnterIdentifier(RuleContext context, Selection selection)
{
var name = context.GetText();
var matches = _declarations[name].Where(IsInScope);
var declaration = GetClosestScope(matches);
if (declaration != null)
{
var isAssignment = IsAssignmentContext(context);
var reference = new IdentifierReference(_projectName, _componentName, name, selection, isAssignment);
declaration.AddReference(reference);
// note: non-matching names are not necessarily undeclared identifiers, e.g. "String" in "Dim foo As String".
}
}
private bool IsInScope(Declaration declaration)
{
return declaration.Scope == _currentScope
|| declaration.Scope == ModuleScope
|| IsGlobalField(declaration)
|| IsGlobalProcedure(declaration);
}
private Declaration GetClosestScope(IEnumerable<Declaration> declarations)
{
// this method (as does the rest of Rubberduck) assumes the VBA code is compilable.
var matches = declarations as IList<Declaration> ?? declarations.ToList();
var currentScope = matches.FirstOrDefault(declaration => declaration.Scope == _currentScope);
if (currentScope != null)
{
return currentScope;
}
var moduleScope = matches.FirstOrDefault(declaration => declaration.Scope == ModuleScope);
if (moduleScope != null)
{
return moduleScope;
}
// multiple matches in global scope??
return matches.FirstOrDefault();
}
private bool IsGlobalField(Declaration declaration)
{
// a field isn't a field if it's not a variable or a constant.
if (declaration.DeclarationType != DeclarationType.Variable ||
declaration.DeclarationType != DeclarationType.Constant)
{
return false;
}
// a field is only global if it's declared as Public or Global in a standard module.
var moduleMatches = _declarations[declaration.ComponentName].ToList();
var modules = moduleMatches.Where(match => match.DeclarationType == DeclarationType.Module);
// Friend members are only visible within the same project.
var isSameProject = declaration.ProjectHashCode == _projectHashCode;
// todo: verify that this isn't overkill. Friend modifier has restricted legal usage.
return modules.Any()
&& (declaration.Accessibility == Accessibility.Global
|| declaration.Accessibility == Accessibility.Public
|| (isSameProject && declaration.Accessibility == Accessibility.Friend));
}
private bool IsGlobalProcedure(Declaration declaration)
{
// a procedure is global if it's a Sub or Function (properties are never global).
// since we have no visibility on module attributes,
// we must assume a class member can be called from a default instance.
if (declaration.DeclarationType != DeclarationType.Procedure ||
declaration.DeclarationType != DeclarationType.Function)
{
return false;
}
// Friend members are only visible within the same project.
var isSameProject = declaration.ProjectHashCode == _projectHashCode;
// implicit (unspecified) accessibility makes a member Public,
// so if it's in the same project, it's public whenever it's not explicitly Private:
return isSameProject && declaration.Accessibility == Accessibility.Friend
|| declaration.Accessibility != Accessibility.Private;
}
private bool IsAssignmentContext(RuleContext context)
{
var parent = context.Parent;
return parent.Parent.Parent is VBAParser.SetStmtContext // object reference assignment
|| parent.Parent.Parent is VBAParser.LetStmtContext // value assignment
|| parent is VBAParser.ForNextStmtContext // treat For loop as an assignment
// todo: verify that we're not missing anything here (likely)
;
}
private bool IsDeclarativeContext(VBAParser.AmbiguousIdentifierContext context)
{
return IsDeclarativeParentContext(context.Parent);
}
private bool IsDeclarativeContext(VBAParser.CertainIdentifierContext context)
{
return IsDeclarativeParentContext(context.Parent);
}
private bool IsDeclarativeParentContext(RuleContext parentContext)
{
return parentContext is VBAParser.VariableSubStmtContext
|| parentContext is VBAParser.ConstSubStmtContext
|| parentContext is VBAParser.ArgContext
|| parentContext is VBAParser.SubStmtContext
|| parentContext is VBAParser.FunctionStmtContext
|| parentContext is VBAParser.PropertyGetStmtContext
|| parentContext is VBAParser.PropertyLetStmtContext
|| parentContext is VBAParser.PropertySetStmtContext
|| parentContext is VBAParser.TypeStmtContext
|| parentContext is VBAParser.TypeStmt_ElementContext
|| parentContext is VBAParser.EnumerationStmtContext
|| parentContext is VBAParser.EnumerationStmt_ConstantContext
|| parentContext is VBAParser.DeclareStmtContext
|| parentContext is VBAParser.EventStmtContext;
}
}
}
Is there any way this code could be made cleaner?
Here's how I'm using this API - here, in the VariableNotAssignedInspection
class:
public IEnumerable<CodeInspectionResultBase> GetInspectionResults(VBProjectParseResult parseResult)
{
var declarations = parseResult.Declarations.Items.Where(declaration =>
declaration.DeclarationType == DeclarationType.Variable
&& !parseResult.Declarations.Items.Any(item =>
item.IdentifierName == declaration.AsTypeName
&& item.DeclarationType == DeclarationType.UserDefinedType) // UDT variables don't need to be assigned
&& !declaration.IsSelfAssigned
&& !declaration.References.Any(reference => reference.IsAssignment));
foreach (var issue in declarations)
{
yield return new VariableNotAssignedInspectionResult(string.Format(Name, issue.IdentifierName), Severity, issue.Context, issue.QualifiedName.ModuleScope);
}
}
I like this API now, ...but am I going to curse at it in 3 months?
3 Answers 3
I can't see it because of some missing classes but are the types being returned also immutable? You've made types like Declaration
nicely immutable but if the complex objects themselves are mutable then that is still worrisome.
Unless you want this behaviour so you can easily change semantics, but I doubt it considering that would cause a lot of inconsistencies.
public override bool Equals(object obj)
{
if (!(obj is Declaration))
{
return false;
}
return GetHashCode() == ((Declaration)obj).GetHashCode();
}
Now this is something I don't like. First of all there's the obvious: you are casting twice. Instead use the as
+ == null
idiom to only use one cast.
However what really stubs my pinkytoe is the fact that you are implementing equality in your GetHashCode()
which is against all conventions, including the one from Geneva.
Two equal objects are supposed to return true
for Equals()
. Two distinct objects are supposed to return false
for Equals()
. Two distinct objects can have the same GetHashCode()
. By definition this means that you would consider two objects as equal when they aren't; they just have the same hashcode.
Implementing equality in your GetHashCode()
instead is not an excuse, that's on par with genocide.
public override int GetHashCode()
{
return string.Concat(ProjectHashCode.ToString(), ProjectName, ComponentName, _parentScope, _identifierName).GetHashCode();
}
I like the approach taken here although I'm not sure how scientifically correct this is. More common is to use primes and stuff.
private readonly string _projectName;
public string ProjectName { get { return _projectName; } }
These can be simply turned into
public string ProjectName { get; private set; }
I assume you did this to preserve uniformity with the Selection
property? I would rather look into making Selection
immutable, once again.
EnterSubStmt
Bleigh.
// a field isn't a field if it's not a variable or a constant.
if (declaration.DeclarationType != DeclarationType.Variable ||
declaration.DeclarationType != DeclarationType.Constant)
{
return false;
}
Boolean algebra! This should be &&
. Right now it will return true if the type is a variable or a constant or anything else.
It would be useful to write this in the positive sense: "a field is a field if it's a variable or a constant".
This same idea is behind the following snippet:
// a procedure is global if it's a Sub or Function (properties are never global).
// since we have no visibility on module attributes,
// we must assume a class member can be called from a default instance.
if (declaration.DeclarationType != DeclarationType.Procedure ||
declaration.DeclarationType != DeclarationType.Function)
{
return false;
}
return isSameProject && declaration.Accessibility == Accessibility.Friend
|| declaration.Accessibility != Accessibility.Private;
Personally I would prefer to have some brackets in here to clarify the order of operations, simply because this stuff is easy to make mistakes in.
return parentContext is VBAParser.VariableSubStmtContext
|| parentContext is VBAParser.ConstSubStmtContext
|| parentContext is VBAParser.ArgContext
|| parentContext is VBAParser.SubStmtContext
|| parentContext is VBAParser.FunctionStmtContext
|| parentContext is VBAParser.PropertyGetStmtContext
|| parentContext is VBAParser.PropertyLetStmtContext
|| parentContext is VBAParser.PropertySetStmtContext
|| parentContext is VBAParser.TypeStmtContext
|| parentContext is VBAParser.TypeStmt_ElementContext
|| parentContext is VBAParser.EnumerationStmtContext
|| parentContext is VBAParser.EnumerationStmt_ConstantContext
|| parentContext is VBAParser.DeclareStmtContext
|| parentContext is VBAParser.EventStmtContext;
Instead of doing all these casts, I would just create an anonymous array with all the types and checking if it contains the type of parentContext
.
In response to the request for unit-testing: if you can create your own Declarations
objects then you're already good to go. Perhaps create small snippets of code, pull them through ANTLR and use them to verify the workings of each property/method. The thing about unit-testing code-trees like this is that your tests will mainly be about testing small snippets of code and seeing if they're parsed correctly; you'll see the same approach in the Roslyn source.
If you do want to test the private methods, you can use a PrivateObject
to expose them.
-
2\$\begingroup\$ +1 - Great answer, particularly on spotting the equals implementation. I disagree with changing the readonly properties to private setter as it does subtly change the behaviour (internally at least) but most of the time it's good enough :) \$\endgroup\$RobH– RobH2015年03月24日 21:15:44 +00:00Commented Mar 24, 2015 at 21:15
-
\$\begingroup\$ In C# 6 you can make read only properties by omitting the
set
and assigning the value in the declaration statement or in a constructor. \$\endgroup\$Johnbot– Johnbot2015年03月25日 09:20:03 +00:00Commented Mar 25, 2015 at 9:20 -
\$\begingroup\$ FWIW the string-hash
GetHashCode
implementation was successfully broken with some very nasty evil code (see this GitHub issue) \$\endgroup\$Mathieu Guindon– Mathieu Guindon2016年01月25日 14:48:24 +00:00Commented Jan 25, 2016 at 14:48
First of all - I like it. It's definitely readable and the naming and style seem good. I'm going to focus on a single point really (at least for now):
Is ModuleScope
a (削除) bird (削除ここまで) class or a (削除) plane (削除ここまで) string?
It looks like it's both:
private string ModuleScope { get { return _projectName + "." + _componentName; } }
vs
case DeclarationType.Module:
return _qualifiedName.ModuleScope.ProjectName + "." + _qualifiedName.ModuleScope.ModuleName;
It looks like you would benefit massively from a single Scope
object.
public class Scope
{
private readonly string _projectName;
private readonly string _componentName;
private readonly string _name;
public string ProjectName { get { return _projectName; } }
public string ComponentName { get { return _componentName; } }
public string Name { get { return _name; } }
private string ModuleScope { get { return _projectName + "." + _componentName; } }
public Scope(string projectName, string componentName, string name = null)
{
_projectName = projectName;
_componentName = componentName;
_name = name;
}
public Scope CreateNestedScope(string name)
{
if (_name != null)
{
throw new InvalidOperationException();
}
return new Scope(_projectName, _componentName, name);
}
public override string ToString()
{
return string.IsNullOrEmpty(name)
? ModuleScope
: ModuleScope + "." + name;
}
}
Then in your IdentifierReferenceListener
class. Create a private Scope
property called ModuleScope
or even DefaultScope
.
private readonly Scope _defaultScope;
private Scope _currentScope;
public IdentifierReferenceListener(int projectHashCode, string projectName, string componentName, Declarations declarations)
{
_projectHashCode = projectHashCode;
_declarations = declarations;
_defaultScope = new Scope(projectName, componentName);
}
Then in each of your methods, instead of calling SetCurrentScope
with the name you do:
_currentScope = _defaultScope.CreateNestedScope(/* some value */);
It seems to me that it would keep all of your string mangling in a single place.
P.s. Code typed into browser so might not compile but I hope it illustrates my point :)
-
2\$\begingroup\$ Ooh I like this! \$\endgroup\$Mathieu Guindon– Mathieu Guindon2015年03月24日 16:30:26 +00:00Commented Mar 24, 2015 at 16:30
public struct IdentifierReference
You should not be using structs unless you have a very good reason to. MSDN has the following guidelines for choosing between structs and classes:
√ CONSIDER defining a struct instead of a class if instances of the type are small and commonly short-lived or are commonly embedded in other objects.
X AVOID defining a struct unless the type has all of the following characteristics:
- It logically represents a single value, similar to primitive types (int, double, etc.).
- It has an instance size under 16 bytes.
- It is immutable.
- It will not have to be boxed frequently.
Your struct exceeds the recommended size both on 32bit and even more so on 64bit where pointers are 8 bytes.
Also from MSDN, this time on struct design:
√ DO ensure that a state where all instance data is set to zero, false, or null (as appropriate) is valid.
-
\$\begingroup\$ Very valid point here. Definitely changing it to a
class
- thanks! \$\endgroup\$Mathieu Guindon– Mathieu Guindon2015年03月25日 15:19:01 +00:00Commented Mar 25, 2015 at 15:19
DoSomething
in 5 different modules, this code assigns all references toDoSomething
to whichever module is returned byFirstOrDefault
inGetClosestScope
. I've modified it quite a bit to address this issue, but the most pressing problem is that this code needs a major refactoring, and some implementation of aResolveMemberCall
method. \$\endgroup\$