Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Atlas search POC #956

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
BorisDog wants to merge 6 commits into mongodb:master from BorisDog:search_poc
Closed

Atlas search POC #956

BorisDog wants to merge 6 commits into mongodb:master from BorisDog:search_poc

Conversation

@BorisDog
Copy link
Contributor

@BorisDog BorisDog commented Nov 18, 2022

Based on @dgolub project

Sample usage:
coll.Aggregate().Search(SearchBuilders<Person>.Search.Text(x => x.Title, "cat")).ToList()

Copy link
Contributor

@dgolub dgolub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Integration looks good! I left a few minor comments and suggestions.

/// <summary>
/// Extension methods for <see cref="IAggregateFluent{TResult}"/>.
/// </summary>
public static class AggregateFluentBuilder
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason why I created a separate AggregateFluentBuilder class with extension methods rather than modifying AggregateFluent was that it was necessary in order to have Atlas Search in a separate assembly. Now that we're integrating it into the driver, we might want to reconsider this decision.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, removed.

using MongoDB.Bson;
using MongoDB.Driver.Core.Misc;

namespace MongoDB.Driver
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the namespace here be MongoDB.Driver.Search, given that it's in a subdirectory?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this breaks the convention. The intention was to make Search easily discoverable in intelligence, but still organize all multiple search file in same place. Now that we got IAggregateFluent.Search we can fix the namespace.

JamesKovacs reacted with thumbs up emoji
/// A static helper class containing various builders pertaining to Atlas Search.
/// </summary>
/// <typeparam name="TDocument">The type of the document.</typeparam>
public static class SearchBuilders<TDocument>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason why I introduced a new SearchBuilders class rather than extending the existing Builders class was that it isn't possible to add static extension methods to a class. This seems like another decision that would be worth reconsidering now that we're no longer forced into it. I imagine that it would be a better experience for customers not to have to keep track of type different classes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes good point, I think it needs to be merged with Builders.

Copy link
Contributor

@JamesKovacs JamesKovacs Nov 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. This should be part of Builders<T> rather than a separate class.

Text(query, new ExpressionFieldDefinition<TDocument>(path), fuzzy, score);
}

internal sealed class TextSearchDefinition<TDocument> : SearchDefinition<TDocument>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're restricting it to just Text for now and not adding support for any of the other operators?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Text restriction is just for this POC, the intention is to support all operators.

BorisDog and others added 2 commits November 23, 2022 09:22
Copy link
Contributor

@JamesKovacs JamesKovacs left a comment
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looking good. I like the overall direction. @dgolub raises some valid points that we should address. (Thank you, @dgolub!)

A few broad comments...

  • We need more testing.
  • We should determine if we can integration test given that $search only works on Atlas. Good question for the Java team since they implemented $search already.
  • Support for additional operators beyond Text.
  • Support for LINQ.
  • Investigate whether it is possible to provide a friendlier error message than MongoServerError: $search not enabled! Enable Search by setting serverParameter mongotHost to a valid "host:port" string.

With respect to LINQ, we can probably limit it to chaining off of AsQueryable(). While it would be nice to support in where comprehensions, it adds complexity because $search must be the first operator in the pipeline.

/// A static helper class containing various builders pertaining to Atlas Search.
/// </summary>
/// <typeparam name="TDocument">The type of the document.</typeparam>
public static class SearchBuilders<TDocument>
Copy link
Contributor

@JamesKovacs JamesKovacs Nov 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. This should be part of Builders<T> rather than a separate class.

Copy link
Contributor

@rstam rstam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the overall approach integrates with how we support existing pipeline stages.

I've added a few comments here and there on the POC code.

return value;
}
public static long IsGreaterThanOrEqualToZero(long value, string paramName) =>
IsGreaterThanOrEqualTo(value, 0, paramName);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This refactoring changes the error message a tiny bit. Could be considered a breaking change.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same might apply to some of the other refactorings.

Do we really want one Ensure method calling another? Or should each Ensure method be complete by itself?

Copy link
Contributor Author

@BorisDog BorisDog Dec 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think code reuse makes things simpler and cleaner, where appropriate.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course! But it changes the error messages.

Copy link
Contributor Author

@BorisDog BorisDog Dec 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's a very minor change in this case.
We also need to change messages in IsNullOr* anyway at some point.

SearchCountOptions count = null,
bool returnStoredSource = false)
{
return AppendStage(PipelineStageDefinitionBuilder.Search(query, highlight, indexName, count, returnStoredSource));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Normally should call WithPipeline and _pipeline.Search

... but I realize _pipeline.Search is not implemented yet.

Copy link
Contributor Author

@BorisDog BorisDog Dec 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, added PipelineDefinitionBuilder.Search

private static IndexKeysDefinitionBuilder<TDocument> __index = new IndexKeysDefinitionBuilder<TDocument>();
private static ProjectionDefinitionBuilder<TDocument> __projection = new ProjectionDefinitionBuilder<TDocument>();
private static SortDefinitionBuilder<TDocument> __sort = new SortDefinitionBuilder<TDocument>();
private static UpdateDefinitionBuilder<TDocument> __update = new UpdateDefinitionBuilder<TDocument>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I disagree with removing private backing fields.

Copy link
Contributor Author

@BorisDog BorisDog Dec 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a classic usage for auto properties.
The only purpose of this class is to expose those properties, so we can let it do just that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My counter argument is that most of our classes already have private fields using the _field or __staticField naming convention.

In many cases (such as when validating in setters) an explicit backing field is required.

Why not just standardize on explicit backing fields and that way we can always know by glancing at the top of a class what state the class maintains.

Otherwise we have to tediously search the source file to see if there are any autoproperties we didn't know about.

Copy link
Contributor Author

@BorisDog BorisDog Dec 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the key here is finding a good balance between following the existing style and evolving our code base to use new language features and mainstream standards. We have already adopted some mainstream conventions like string interpolation, nameof, local methods and other, despite having most of the code written differently. I hope we continue to do so.
Autoproperties are a very good example of a "mainstream" and very loved feature by C# community (which continues to evolve), and I believe it is perceived as sort of "default" in C# dev process.

Like with fields, all properties (auto implemented or not) through the source file, they will always reside in same section, like fields. Given that, and the fact the major part of code is cleaned up, the "state" of the class appears more clear to me. Autoproperties indicate that this is just a simple data holders, no custom properties, no special relation between property and field that needs to be inspected.

For more complicated classes, where inspecting class state/functionality becomes is more complicated, and I find using VS built-in class/document explorers very useful.

*/

using MongoDB.Driver.Search;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lots of irrelevant refactoring in this file.

Copy link
Contributor Author

@BorisDog BorisDog Dec 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good opportunity for minor refactoring. For example if staying with fields, readonly needs to be added anyway.

HighlightOptions<TInput> highlight = null,
string indexName = null,
SearchCountOptions count = null,
bool returnStoredSource = false)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the parameters be named highlightOptions and countOptions?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The JSON fields in $search are called highlight and count, but I have no objection to this change if you feel that it will make it easier for customers to understand what the parameters do.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realized that the server uses the shorter names.

It's a conflict between our naming conventions and the server's.

I made this comment because normally a parameter names count would be of type int.

operatorName,
(s, sr, linqProvider) =>
{
var renderedQuery = query.Render(s, sr);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This variable is not not just the query... I suggest renaming it to something like renderedArgs.

Copy link
Contributor Author

@BorisDog BorisDog Dec 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed query to searchDefinition, and renderedQuery to renderedSearchDefinition. @rstam do you think this is a better naming?

/// A build for highlighting options.
/// </summary>
/// <typeparam name="TDocument">The type of the document.</typeparam>
public sealed class HighlightOptionsBuilder<TDocument>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure this builder is needed.

Path = path,
MaxCharsToExamine = maxCharsToExamine,
MaxNumPassages = maxNumPassages
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you well know I don't like expression bodied methods, and this code is a prime example of why. The bad formatting makes the bracketed lines "look" like a body but its NOT!

I would write this like this:

public HighlightOptions<TDocument> Options(
 PathDefinition<TDocument> path,
 int? maxCharsToExamine = null,
 int? maxNumPassages = null)
{
 return new()
 {
 Path = path,
 MaxCharsToExamine = maxCharsToExamine,
 MaxNumPassages = maxNumPassages
 };
}

or if you insist on using an expression body for something that is not a one-liner at least format it so that it is more apparent that the bracketed code is NOT a body:

public HighlightOptions<TDocument> Options(
 PathDefinition<TDocument> path,
 int? maxCharsToExamine = null,
 int? maxNumPassages = null)
 => new()
 {
 Path = path,
 MaxCharsToExamine = maxCharsToExamine,
 MaxNumPassages = maxNumPassages
 };

Copy link
Contributor Author

@BorisDog BorisDog Dec 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactored all to use the last styling suggestion.

using MongoDB.Bson;
using MongoDB.Driver.Core.Misc;

namespace MongoDB.Driver.Search
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know there's talk about moving all these classes up one level in the namespace, but maybe we should leave them here.

Being in their own namespace could help in avoiding future name collisions.

Copy link
Contributor Author

@BorisDog BorisDog Dec 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All search code was moved under MongoDB.Driver.Search, and the intention is to keep it that way.

{
if (value != null)
{
Ensure.IsBetween(value.Value, min, max, paramName);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error message will be wrong when value is not null but not between.

The error message won't mention that null is valid.

This is probably a reason for each Ensure method to be self contained.

Copy link
Contributor Author

@BorisDog BorisDog Dec 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good catch.
I still think that we should and can easily reuse the code.
We got similar behaviour in the old IsNull* methods. I suggest to fix them to in a separate ticket.

HighlightOptions<TInput> highlight = null,
string indexName = null,
SearchCountOptions count = null,
bool returnStoredSource = false)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realized that the server uses the shorter names.

It's a conflict between our naming conventions and the server's.

I made this comment because normally a parameter names count would be of type int.

private static IndexKeysDefinitionBuilder<TDocument> __index = new IndexKeysDefinitionBuilder<TDocument>();
private static ProjectionDefinitionBuilder<TDocument> __projection = new ProjectionDefinitionBuilder<TDocument>();
private static SortDefinitionBuilder<TDocument> __sort = new SortDefinitionBuilder<TDocument>();
private static UpdateDefinitionBuilder<TDocument> __update = new UpdateDefinitionBuilder<TDocument>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My counter argument is that most of our classes already have private fields using the _field or __staticField naming convention.

In many cases (such as when validating in setters) an explicit backing field is required.

Why not just standardize on explicit backing fields and that way we can always know by glancing at the top of a class what state the class maintains.

Otherwise we have to tediously search the source file to see if there are any autoproperties we didn't know about.

return value;
}
public static long IsGreaterThanOrEqualToZero(long value, string paramName) =>
IsGreaterThanOrEqualTo(value, 0, paramName);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course! But it changes the error messages.

{
private int? _maxEdits;
private int? _prefixLength;
private int? _maxExpansions;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an example of a class where we have to use explicit backing fields.

I would like us to standardize on this approach even in cases where autoproperties could be used.

Copy link
Contributor Author

@BorisDog BorisDog Dec 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes there always will be cases where non trivial setter is required.
That should not deter us from using autoproperties for simpler classes. And especially for cases where set is private or init only.

PathDefinition<TDocument> path,
int? maxCharsToExamine = null,
int? maxNumPassages = null)
=> new()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't think => should be indented one more tab stop?

Better yet use a method body... :)

int? maxCharsToExamine = null,
int? maxNumPassages = null)
=> new()
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't think { should be indented on tab stop further than => new()?

Copy link
Contributor Author

@BorisDog BorisDog Dec 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I think => should stay in the line above. We can just evolve the standard we are using already (which is different than my original code here):
We already use:
T Method() => new T();
or

T Method() =>
 new T(1, 2, 3,....); { a = 1 }

for longer lines.
Evolving this further:

T Method() =>
 new T(1, 2, 3,....)
 {
 a = 1,
 b = 2,
 c = 3
 };

and in our case

T Method() =>
 new()
 {
 a = 1,
 b = 2,
 c = 3
 }

With this option it's clear that this is object creation and not a body.

Copy link
Contributor Author

BorisDog commented Dec 8, 2022

Follow up work in CSHARP-4440.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

@rstam rstam rstam left review comments

@JamesKovacs JamesKovacs JamesKovacs requested changes

+1 more reviewer

@dgolub dgolub dgolub left review comments

Reviewers whose approvals may not affect merge requirements

Assignees

No one assigned

Labels

None yet

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

AltStyle によって変換されたページ (->オリジナル) /