carlosthe19916
May 29, 2025
Collaborator

@jcrossley3 as promised here we go. In our internal chat I mentioned I wanted to propose 3 changes. This is one of them. I will create 2 separate discussions, or add the other points as comments in this same discussion. But be patient with me.

1. Full text search

Context:

Let's assume we have 2 Advisories (I changed the identifier of those files manually)

document_id
cve-1.json	CVE-1
cve-2.json	document_id=CVE-1

image

Current status:

q is defined as q={full_text_string}&{other_operators}

Happy path: Search for CVE-1 then we do q=CVE-1
- The output contains the expected entity. Great
Not happy path. We want to search for the advisory cve-2.json whose document_id is actually the string document_id=CVE-1. In this case the query the client will create is q=document_id=CVE-1
- The output WONT be the cve-2.json even if what I have set in q perfectly matches the name of the second Advisory and this is a problem

Problem

If q is defined as q={full_text_string}&{other_operators} what defines {full_text_string} is actually unknown as there isn't a way to tell whether {full_text_string} was actually a string the client wanted to search by or it was a set of operations like {field}{operator}{value}

I manipulated the content of the Advisory and injected a valid query in the name of the Advisory. In cases like "Labels" where users can actually write whatever they want then we might increase the risk of these errors.

This was also already reported by QE https://issues.redhat.com/browse/TC-2399 although I think we never went deep enough to uncover it.

Screencast.From.2025年05月29日.17-50-24.mp4

Proposal

If q is defined as q={full_text_string}&{other_operators} then {full_text_string} is actually breaking the whole principle of {field}{operator}{value}. We could take 2 strategies:

First proposal. Make {full_text_string} a key value pair within the q definition. E.g. q=full_text={full_text_string}&{other_operators}. Here is a more concrete example.
- q=full_text='document_id=CVE-1'&creation=yesterday. Notice that now it is clear that full_text has a value that is a string and cannot be confused by an actual operator. I added single quotes for our human eyes to see the difference but the implementation might or might not contain it. Actually quotes for string is another problem I will describe further down this document.
Second proposal. Extract {full_text_string} out of the q definition E.g. full_text=any_Value&q=all_q_definition. A more concrete example will be:
- http://localhost:8080/api/v2/advisory?full_text_string=any_string&q=all_q_definition`

In both previous proposals the full search feature needs a key value operation.

Impact

Full text search is something mainly our UI uses for now. I have no prove but I am inclined to think that the CAMP team uses more the q definition as {field}{operator}{value} rather than full text search since it is done more for discovery.

Unless we find a better way of dealing with the problem I described we will need to accept that our q needs to change a bit to avoid those errors.

Replies: 4 comments 13 replies

bobmcwhirter
May 29, 2025

fwiw, I'd always thought full-text meant "I have no idea what tags/parameters to use, so just find me anything mentioning cheddar, anywhere it appears"

then again, I don't actually use this stuff.

4 replies

@carlosthe19916

carlosthe19916 May 29, 2025
Collaborator Author

I think the definition of what full-text search means is clear. The way of expressing/writing it in the URL is what I wanted to discuss.

@bobmcwhirter

bobmcwhirter May 29, 2025

I meant either q=<text> XOR q=<fancy-query-parameters> but never both together.

@carlosthe19916

carlosthe19916 May 30, 2025
Collaborator Author

I agree

@jcrossley3

jcrossley3 May 30, 2025
Maintainer

The trick there is recognizing whether we have a full text search or a filter by analyzing the content, e.g. if there's an unescaped = in there, it's probably a filter.

If we don't want to have to escape operators in user input, I think we'd have to break the q parameter into search and filter, maybe.

jcrossley3
May 29, 2025
Maintainer

To me, this is a validation problem, similar to preventing SQL injection attacks. We know what our operators are, and we know our users want to perform a full-text search. Therefore, we need to escape all of our operators as a part of validating the user input so that the backend doesn't misinterpret a full-text search for a filter.

Instead of this:
q=full_text='document_id=CVE-1'&creation=yesterday
Just do this:
q=document_id\=CVE-1&creation=yesterday

That \= turns a filter into a full-text search for the string "document_id=CVE-1", which should find your advisory.

If we simply replace every =, ~, <, and > with \=, \~, \<, and \> in the user's input, I think we can keep the existing q syntax for specifying either full-text-searches, filters, or both.

0 replies

carlosthe19916
May 29, 2025
Collaborator Author

2. Loosen a bit `Q` => `SQL`

After discussing the label query definition #1652 and internally I thought that maybe we could loosen a bit the dependency between the internal models and the query definition.

Current status

The q definition is translated into SQL.

Problem

The q query is limited to the structure of the internal models/tables. q is an external interface exposed to clients that should be defined in a way that is easy to be consumed regardless of the internal implementation.

From the client point of view this is how I'd like to query by labels this way:

q=labels=label1&label2&label3

or using the OR operator:

q=labels=label1|label2|label3

Currently it was suggested internally by @jcrossley3 to query like this:

q=labels:app=something&labels:key=value

To me, the internal implementation although can influence on the query definition should not limit it.

Proposal

Allow q to be defined in a way that does not necessarily match the internal models. E.g. allow q=labels=label1|label2|label3

Leave q as it is in terms of how it is defined in the URL
In the backend allow extraction of key/value operators and leave the rest untouched. E.g.
Step1: define query q=labels=label1|label2|label3&name=jim
Step2: extract/delete from the query a key/value operator.
- Extract labels and map it to some object
- q ends up as q=name=jim because labels was extracted/deleted
Step3:
- q is processed with the current SQL translators
- labels was extracted to another variable therefore can be manually added to the final SQL query

Concrete example:
If q is defined as q=labels=label1|label2|label3&name=jim

let q=`labels=label1|label2|label3&name=jim`
let labels = q.extract("labels");
let new_q=q.delete("labels");
let output = fetch_sboms(new_q, labels);

Here is even a concrete fully working function that allows the labels to be defined outside q.

https://github.com/trustification/trustify/blob/main/modules/fundamental/src/sbom/service/sbom.rs#L116-L122

long story story: let's leave the q SQL mapping, but allow certain scenarios where fields would not match the db models and extract them and manually introduce them to the SQL scripts.

This way we are able to define a consumable api interface that relies 100% on the current sql mapping but if wanted/needed allow definitions that do not match the db

4 replies

@jcrossley3

jcrossley3 May 29, 2025
Maintainer

It sounds like you're asking for the ability to extract a parameter from the q parameter, right? Why not just pass the additional parameter in like the fetch_sboms function you cite expects? What's the benefit of passing it in the q?

@carlosthe19916

carlosthe19916 May 30, 2025
Collaborator Author

yes, the ability to extract and remove a parameter from q is what I'd like to see.

Carlos: I'd like to query SBOM's labels as q=labels=label1|label2|label3&name=quarkus. It should be equivalent to query Advisories by vulnerability e.g. q=severity=high|medium|low. To me this is a valid query.
Jim: q=labels=label1|label2|label3&name=quarkus is invalid, you need to use q=labels:app=something&labels:key=value because our labels internally are defined as a JSONB type.
Carlos: The internal definition of a model/table should not limit the q definition.

Proposal:

How to make q=labels=label1|label2|label3&name=quarkus work even though the internal db definition does not allow it?

let q=`q=labels=label1|label2|label3&name=quarkus`
let labels = q.extract("labels"); // labels='label1|label2|label3'
let new_q=q.delete("labels"); // new_q=name=quarkus
// let's parse the 'labels' string with any method we have in mind
let parsed_labed: Vec<Label> = parse(labels);
let output = fetch_sboms(new_q, parsed_labed);

Long story short:

Allow a key being extracted and later removed from q
Apply the current SQL mapping to the remaining q
Allow manual parsing of fields that do not match the DB tables.

@carlosthe19916

carlosthe19916 May 30, 2025
Collaborator Author

I think I am reconsidering my proposal xD. Would it make sense to define the label selector outside q? E.g. http://localhost:8080/api/v2/sbom?label_selector=some_text&q=another_text This would be similar to the kubernetes label selector rest api. If q has the limitation of "needs to match the db tables/columns", label_selector would not be limited by it as it is outside q, it will require manual translation into sql queries though.

@jcrossley3

jcrossley3 May 30, 2025
Maintainer

How to make q=labels=label1|label2|label3&name=quarkus work even though the internal db definition does not allow it?

We can make that work the way you expect. And I know you don't like the answer being determined by how the data is stored, but because it is stored as a map, we need to answer this question: what should happen when the values for those labels differ?

If we stored labels in an array -- no values, just "tags" -- the question would be moot, because the above query already works as you expect. But because they're in a map, we can just assume their values will be "". But they might not be. Should they still be returned? If the label1 value for one advisory is "foo" and "bar" for another one, should both be returned by your query?

If you say "yes", I think I can make that work.

carlosthe19916
Jun 4, 2025
Collaborator Author

@jcrossley3 your input made me think that while it is true that new releases should not break previously working client, that should not stop us to gradually introduce enhancements in the long term.

What would you say if we gradually move fields defined within q to their own HTTP Query Parameters? E.g.

http://localhost:8080/api/v2/sbom?q=severity=critical|medium&color=red|blue

would become:

http://localhost:8080/api/v2/sbom?severity="critical|medium"&color="red|blue"

Notice that both Query parameters are outside of q therefore they will be documented in the Openapi definition and we avoid obscuring the q definition.

The only design rule would be to name the query parameters according to what operator we will apply. E.g. Just like we use in sorting sort=name:asc we could do:

queryParamerter or queryParamter:eq to denote equal. E.g. /api/v2/sbom?severity:eq="critical|medium"
queryParamter:not_eq to denote not equal. E.g. /api/v2/sbom?severity:not_eq="critical|medium"
queryParamerter:lt to denote "less" than. E.g. /api/v2/sbom?creation:lt="2021-12-12"
queryParamerter:gt to denote "greater" than. E.g. /api/v2/sbom?creation:gt="2021-12-12"
queryParamerter:like to denote "greater" than. E.g. /api/v2/sbom?name:like="some string"

I know the easiest answer will be NO. But just sharing here in case it might help in the future.

5 replies

@jcrossley3

jcrossley3 Jun 4, 2025
Maintainer

I know you're tired of me asking, but what problem is this solving exactly? 😄

@carlosthe19916

carlosthe19916 Jun 4, 2025
Collaborator Author

No worries, that is a valid question... I think the problem of q is that it does not expose exactly which filters I can use within q.
If we were to extract progressively whatever filter inside q outside it and make them first class citizens in the HTTP Query Parameters world, then all those fields will appear in Swagger UI and in the Openapi definition.

I think that is a big problem worth solving, even if the majority disagree :) .

Screenshot From 2025年06月04日 16-50-28

@jcrossley3

jcrossley3 Jun 4, 2025
Maintainer

That's fair. And I agree it's a problem. There is an internal function that returns the valid fields, but it's not exposed in any structured way. At present, if you pass an invalid field in a filter, e.g. q=foo=bar, the error message will contain a list of the valid fields, but having them in the openapi definition would be an improvement.

@carlosthe19916

carlosthe19916 Jun 4, 2025
Collaborator Author

And just to repeat, I meant to do it progressively, not necessarily today but move towards a more mature API. The current q can coexist with the new query parameters.

@carlosthe19916

carlosthe19916 Jun 12, 2025
Collaborator Author

@jcrossley3 as everything works for the time being I forgot to follow up on this Discussion.
I feel like we (the team) could define regular meetings or populate regularly any ADR to see how to open the possibility of enhancing these things.

List any current problem we might think of
Solve a real a real problem if any

Would be nice to hold a beer and discuss about these things in good faith :)

q Query enhancements #1670

Uh oh!

carlosthe19916 May 29, 2025 Collaborator

1. Full text search

Context:

Current status:

Problem

Proposal

Impact

Replies: 4 comments · 13 replies

Uh oh!

bobmcwhirter May 29, 2025

Uh oh!

carlosthe19916 May 29, 2025 Collaborator Author

Uh oh!

bobmcwhirter May 29, 2025

Uh oh!

carlosthe19916 May 30, 2025 Collaborator Author

Uh oh!

jcrossley3 May 30, 2025 Maintainer

Uh oh!

Uh oh!

jcrossley3 May 29, 2025 Maintainer

Uh oh!

Uh oh!

carlosthe19916 May 29, 2025 Collaborator Author

2. Loosen a bit Q => SQL

Current status

Problem

Proposal

Uh oh!

Uh oh!

jcrossley3 May 29, 2025 Maintainer

Uh oh!

carlosthe19916 May 30, 2025 Collaborator Author

Uh oh!

carlosthe19916 May 30, 2025 Collaborator Author

Uh oh!

jcrossley3 May 30, 2025 Maintainer

Uh oh!

Uh oh!

carlosthe19916 Jun 4, 2025 Collaborator Author

Uh oh!

jcrossley3 Jun 4, 2025 Maintainer

Uh oh!

carlosthe19916 Jun 4, 2025 Collaborator Author

Uh oh!

jcrossley3 Jun 4, 2025 Maintainer

Uh oh!

carlosthe19916 Jun 4, 2025 Collaborator Author

Uh oh!

carlosthe19916 Jun 12, 2025 Collaborator Author

`q` Query enhancements #1670

carlosthe19916
May 29, 2025
Collaborator

Replies: 4 comments 13 replies

bobmcwhirter
May 29, 2025

carlosthe19916 May 29, 2025
Collaborator Author

carlosthe19916 May 30, 2025
Collaborator Author

jcrossley3 May 30, 2025
Maintainer

jcrossley3
May 29, 2025
Maintainer

carlosthe19916
May 29, 2025
Collaborator Author

2. Loosen a bit `Q` => `SQL`

jcrossley3 May 29, 2025
Maintainer

carlosthe19916 May 30, 2025
Collaborator Author

carlosthe19916 May 30, 2025
Collaborator Author

jcrossley3 May 30, 2025
Maintainer

carlosthe19916
Jun 4, 2025
Collaborator Author

jcrossley3 Jun 4, 2025
Maintainer

carlosthe19916 Jun 4, 2025
Collaborator Author

jcrossley3 Jun 4, 2025
Maintainer

carlosthe19916 Jun 4, 2025
Collaborator Author

carlosthe19916 Jun 12, 2025
Collaborator Author