- 
  Notifications
 
You must be signed in to change notification settings  - Fork 1.3k
 
Atlas search POC #956
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Atlas search POC #956
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Integration looks good! I left a few minor comments and suggestions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason why I created a separate AggregateFluentBuilder class with extension methods rather than modifying AggregateFluent was that it was necessary in order to have Atlas Search in a separate assembly. Now that we're integrating it into the driver, we might want to reconsider this decision.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the namespace here be MongoDB.Driver.Search, given that it's in a subdirectory?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes this breaks the convention. The intention was to make Search easily discoverable in intelligence, but still organize all multiple search file in same place. Now that we got IAggregateFluent.Search we can fix the namespace.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason why I introduced a new SearchBuilders class rather than extending the existing Builders class was that it isn't possible to add static extension methods to a class. This seems like another decision that would be worth reconsidering now that we're no longer forced into it. I imagine that it would be a better experience for customers not to have to keep track of type different classes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes good point, I think it needs to be merged with Builders.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. This should be part of Builders<T> rather than a separate class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're restricting it to just Text for now and not adding support for any of the other operators?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Text restriction is just for this POC, the intention is to support all operators.
Co-authored-by: David Golub <david.golub@mongodb.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looking good. I like the overall direction. @dgolub raises some valid points that we should address. (Thank you, @dgolub!)
A few broad comments...
- We need more testing.
 - We should determine if we can integration test given that 
$searchonly works on Atlas. Good question for the Java team since they implemented$searchalready. - Support for additional operators beyond 
Text. - Support for LINQ.
 - Investigate whether it is possible to provide a friendlier error message than 
MongoServerError: $search not enabled! Enable Search by setting serverParameter mongotHost to a valid "host:port" string. 
With respect to LINQ, we can probably limit it to chaining off of AsQueryable(). While it would be nice to support in where comprehensions, it adds complexity because $search must be the first operator in the pipeline.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. This should be part of Builders<T> rather than a separate class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the overall approach integrates with how we support existing pipeline stages.
I've added a few comments here and there on the POC code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This refactoring changes the error message a tiny bit. Could be considered a breaking change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same might apply to some of the other refactorings.
Do we really want one Ensure method calling another? Or should each Ensure method be complete by itself?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think code reuse makes things simpler and cleaner, where appropriate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Of course! But it changes the error messages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's a very minor change in this case.
We also need to change messages in IsNullOr* anyway at some point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Normally should call WithPipeline and _pipeline.Search
... but I realize _pipeline.Search is not implemented yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, added PipelineDefinitionBuilder.Search
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I disagree with removing private backing fields.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a classic usage for auto properties.
The only purpose of this class is to expose those properties, so we can let it do just that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My counter argument is that most of our classes already have private fields using the _field or __staticField naming convention.
In many cases (such as when validating in setters) an explicit backing field is required.
Why not just standardize on explicit backing fields and that way we can always know by glancing at the top of a class what state the class maintains.
Otherwise we have to tediously search the source file to see if there are any autoproperties we didn't know about.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the key here is finding a good balance between following the existing style and evolving our code base to use new language features and mainstream standards. We have already adopted some mainstream conventions like string interpolation, nameof, local methods and other, despite having most of the code written differently. I hope we continue to do so.
Autoproperties are a very good example of a "mainstream" and very loved feature by C# community (which continues to evolve), and I believe it is perceived as sort of "default" in C# dev process.
Like with fields, all properties (auto implemented or not) through the source file, they will always reside in same section, like fields. Given that, and the fact the major part of code is cleaned up, the "state" of the class appears more clear to me. Autoproperties indicate that this is just a simple data holders, no custom properties, no special relation between property and field that needs to be inspected.
For more complicated classes, where inspecting class state/functionality becomes is more complicated, and I find using VS built-in class/document explorers very useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lots of irrelevant refactoring in this file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good opportunity for minor refactoring. For example if staying with fields, readonly needs to be added anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the parameters be named highlightOptions and countOptions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The JSON fields in $search are called highlight and count, but I have no objection to this change if you feel that it will make it easier for customers to understand what the parameters do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realized that the server uses the shorter names.
It's a conflict between our naming conventions and the server's.
I made this comment because normally a parameter names count would be of type int.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This variable is not not just the query... I suggest renaming it to something like renderedArgs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed query to searchDefinition, and renderedQuery to renderedSearchDefinition. @rstam do you think this is a better naming?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure this builder is needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As you well know I don't like expression bodied methods, and this code is a prime example of why. The bad formatting makes the bracketed lines "look" like a body but its NOT!
I would write this like this:
public HighlightOptions<TDocument> Options(
 PathDefinition<TDocument> path,
 int? maxCharsToExamine = null,
 int? maxNumPassages = null)
{
 return new()
 {
 Path = path,
 MaxCharsToExamine = maxCharsToExamine,
 MaxNumPassages = maxNumPassages
 };
}
or if you insist on using an expression body for something that is not a one-liner at least format it so that it is more apparent that the bracketed code is NOT a body:
public HighlightOptions<TDocument> Options(
 PathDefinition<TDocument> path,
 int? maxCharsToExamine = null,
 int? maxNumPassages = null)
 => new()
 {
 Path = path,
 MaxCharsToExamine = maxCharsToExamine,
 MaxNumPassages = maxNumPassages
 };
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refactored all to use the last styling suggestion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know there's talk about moving all these classes up one level in the namespace, but maybe we should leave them here.
Being in their own namespace could help in avoiding future name collisions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All search code was moved under MongoDB.Driver.Search, and the intention is to keep it that way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error message will be wrong when value is not null but not between.
The error message won't mention that null is valid.
This is probably a reason for each Ensure method to be self contained.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good catch.
I still think that we should and can easily reuse the code.
We got similar behaviour in the old IsNull* methods. I suggest to fix them to in a separate ticket.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realized that the server uses the shorter names.
It's a conflict between our naming conventions and the server's.
I made this comment because normally a parameter names count would be of type int.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My counter argument is that most of our classes already have private fields using the _field or __staticField naming convention.
In many cases (such as when validating in setters) an explicit backing field is required.
Why not just standardize on explicit backing fields and that way we can always know by glancing at the top of a class what state the class maintains.
Otherwise we have to tediously search the source file to see if there are any autoproperties we didn't know about.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Of course! But it changes the error messages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an example of a class where we have to use explicit backing fields.
I would like us to standardize on this approach even in cases where autoproperties could be used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes there always will be cases where non trivial setter is required.
That should not deter us from using autoproperties for simpler classes. And especially for cases where set is private or init only.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't think => should be indented one more tab stop?
Better yet use a method body... :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't think { should be indented on tab stop further than => new()?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally I think => should stay in the line above. We can just evolve the standard we are using already (which is different than my original code here):
We already use:
T Method() => new T();
or
T Method() =>
 new T(1, 2, 3,....); { a = 1 }
for longer lines.
Evolving this further:
T Method() =>
 new T(1, 2, 3,....)
 {
 a = 1,
 b = 2,
 c = 3
 };
and in our case
T Method() =>
 new()
 {
 a = 1,
 b = 2,
 c = 3
 }
With this option it's clear that this is object creation and not a body.
Follow up work in CSHARP-4440.
Based on @dgolub project
Sample usage:
coll.Aggregate().Search(SearchBuilders<Person>.Search.Text(x => x.Title, "cat")).ToList()