I have been trying to get all column vs. table regardless of alias used, for example, assume this query
SELECT e.name,
(SELECT COUNT(*) FROM projects AS p WHERE p.employee_id = e.employee_id) AS project_count,
(SELECT COUNT(*) FROM tasks AS p WHERE p.employee_id = e.employee_id) AS task_count
FROM employees AS e;
Considering that alias name can be reused across different scopes I can't simply collect details like column, table and alias and do the work at last How I can do that, below is my try If you have any suggestion or any other algorithm please let me know thanks.
In the code below, I’m resolving tables for a given alias at the end of each scope.
ScopedData
class is used to store data on different query scopes. When exiting from that scope I remove the data associated with the current scope by calliing pop()
.
SqlTokenVisitor
class extending TablesNamesFinder
sligtly differs from it's parent. I've overriden one of it's visit()
methods to change the order of visiting.
import java.util.Collections;
import java.util.Iterator;
import java.util.LinkedList;
import java.util.List;
import java.util.function.Supplier;
import net.sf.jsqlparser.JSQLParserException;
import net.sf.jsqlparser.expression.Expression;
import net.sf.jsqlparser.parser.CCJSqlParserUtil;
import net.sf.jsqlparser.schema.Column;
import net.sf.jsqlparser.schema.Table;
import net.sf.jsqlparser.statement.Statement;
import net.sf.jsqlparser.statement.select.Join;
import net.sf.jsqlparser.statement.select.LateralSubSelect;
import net.sf.jsqlparser.statement.select.ParenthesedFromItem;
import net.sf.jsqlparser.statement.select.ParenthesedSelect;
import net.sf.jsqlparser.statement.select.PlainSelect;
import net.sf.jsqlparser.statement.select.Select;
import net.sf.jsqlparser.statement.select.SelectItem;
import net.sf.jsqlparser.statement.select.SelectVisitor;
import net.sf.jsqlparser.statement.select.WithItem;
import net.sf.jsqlparser.util.TablesNamesFinder;
class ScopedData<T> extends LinkedList<T> {
private final Supplier<T> creator;
private ScopedData<T> nonEmptyScope() {
if (this.isEmpty()) {
throw new IllegalStateException("Currently scope is empty");
} else {
return this;
}
}
public ScopedData(Supplier<T> creator) {
this.creator = creator;
}
public void enterScope() {
this.push(creator.get());
}
public void exitScope() {
this.nonEmptyScope().pop();
}
public T current() {
return this.nonEmptyScope().peek();
}
}
class SqlTokenVisitor<T> extends TablesNamesFinder<T> {
// I have overridden this method because I have changed the order of visiting from in my class
@Override
public <S> T visit(PlainSelect plainSelect, S context) {
List<WithItem> withItemsList = plainSelect.getWithItemsList();
if (withItemsList != null && !withItemsList.isEmpty()) {
for (WithItem withItem : withItemsList) {
withItem.accept((SelectVisitor<?>) this, context);
}
}
if (plainSelect.getFromItem() != null) {
plainSelect.getFromItem().accept(this, context);
}
if (plainSelect.getJoins() != null) {
visitJoins(plainSelect.getJoins(), context);
}
if (plainSelect.getWhere() != null) {
plainSelect.getWhere().accept(this, context);
}
if (plainSelect.getHaving() != null) {
plainSelect.getHaving().accept(this, context);
}
if (plainSelect.getOracleHierarchical() != null) {
plainSelect.getOracleHierarchical().accept(this, context);
}
if (plainSelect.getSelectItems() != null) {
for (SelectItem<?> item : plainSelect.getSelectItems()) {
item.accept(this, context);
}
}
return null;
}
private <S> T visitJoins(List<Join> joins, S context) {
for (Join join : joins) {
join.getFromItem().accept(this, context);
join.getRightItem().accept(this, context);
for (Expression expression : join.getOnExpressions()) {
expression.accept(this, context);
}
}
return null;
}
public SqlTokenVisitor() {
this.init(true);
}
}
/**
* Scoped visitor helps to process query in scoped manner for example assume this query,
*
* ```
* SELECT e.name, // new scope (select 1)
* (SELECT COUNT(*) FROM projects AS p WHERE p.employee_id = e.employee_id // new scope (select 2)) AS project_count,
* (SELECT COUNT(*) FROM tasks AS p WHERE p.employee_id = e.employee_id // new scope (select 3)) AS task_count
* FROM employees AS e;
* ```
*
* parse_statement(select_1):
* push_state(stack, select_1)
* parse_statement(select_2),
* push_state(stack, select_2)
* use_state(stack)
* pop_state(stack)
* parse_statement(select_3)
* push_state(stack, select_3)
* use_state(stack)
* pop_state(stack)
* use_state(stack)
* pop_state(stack)
*
* we are doing pop since used data no more needed also it helps to reuse the same variable
* in different scope. Another use of stack is assume
*
* ```
* select a.name, (select a.name from table1 as a) from table2 as a
* ```
*
* upon pushing state of sub select it hides the outer variable `a` since it is located at
* bottom and we look up symbol from top of stack which is good since we need to use nearest
* matching variable.
*
* In below implementation we simply call enterScope before visit and exitScope
* after visit Since the parser parses the Elements as tree we can simply
* use that,
*
* enteringScope() <- 1
* select
* |
* |-- enteringScope() <- 2
* |-- select
* |-- exitingScope() <- 2
* |
* |-- enteringScope() <- 3
* |-- select
* |-- exitingScope() <- 3
* |
* exitingScope() <- 1
*/
abstract class ScopedVisitor<T> extends SqlTokenVisitor<T> {
@Override
public <S> T visit(PlainSelect select, S context) {
this.enteringScope();
super.visit(select, context);
this.exitingScope();
return null;
}
public abstract void enteringScope();
public abstract void exitingScope();
}
class ColumnVisitor extends ScopedVisitor<Void> {
private void processScope(ScopeData scopeData) {
for (Column column : scopeData.columns) {
String resolvedTable = this.resolveAlias(column.getTable()).getName();
System.out.println(column.getColumnName() + " : " + resolvedTable);
}
}
private Table resolveAlias(Table alias) {
// Try to find alias table in scopes
for (ScopeData data : scopedData) {
Table resolved = resolveAlias(data.tables, alias);
if (resolved != null) {
return resolved;
}
}
// if no alias found then it is original table
return alias;
}
public Table resolveAlias(List<Table> tables, Table alias) {
// since if one table alias matches with another tabe name we can resolve it as table
for (Table table : tables) {
if (table.getAlias() != null && table.getAlias().getName().equals(alias.getName())) {
return table;
}
}
return null;
}
@Override
public <S> Void visit(LateralSubSelect lateralSubSelect, S context) {
// we are traking other table that are not actual table but alias
if (lateralSubSelect.getAlias() != null) {
this.scopedData.current().otherTables.add(lateralSubSelect.getAlias().getName());
}
super.visit(lateralSubSelect, context);
return null;
}
@Override
public <S> Void visit(WithItem withItem, S context) {
this.scopedData.current().otherTables.add(withItem.getAlias().getName());
super.visit(withItem, context);
return null;
}
@Override
public <S> Void visit(ParenthesedSelect select, S context) {
if (select.getAlias() != null) {
this.scopedData.current().otherTables.add(select.getAlias().getName());
}
super.visit(select, context);
return null;
}
@Override
public <S> Void visit(ParenthesedFromItem parenthesis, S context) {
if (parenthesis.getAlias() != null) {
this.scopedData.current().otherTables.add(parenthesis.getAlias().getName());
}
super.visit(parenthesis, context);
return null;
}
@Override
public <S> Void visit(Column column, S context) {
super.visit(column, context);
if (column.getTable() != null) {
this.scopedData.current().columns.add(column);
}
return null;
}
@Override
public <S> Void visit(Table table, S context) {
super.visit(table, context);
this.scopedData.current().tables.add(table);
return null;
}
@Override
public void enteringScope() {
this.scopedData.enterScope();
}
@Override
public void exitingScope() {
this.processScope(this.scopedData.current());
this.scopedData.exitScope();
}
private static class ScopeData {
public final List<Column> columns = new LinkedList<>();
public final List<Table> tables = new LinkedList<>();
public final List<String> otherTables = new LinkedList<>();
}
private final ScopedData<ScopeData> scopedData = new ScopedData<>(ScopeData::new);
}
public class Main {
public static void main(String[] args) throws JSQLParserException {
String query = "SELECT e.name, \n" + //
" (SELECT COUNT(*) FROM projects AS p WHERE p.employee_id = e.employee_id) AS project_count,\n" + //
" (SELECT COUNT(*) FROM tasks AS p WHERE p.employee_id = e.employee_id) AS task_count\n" + //
"FROM employees AS e;";
Statement stmt = CCJSqlParserUtil.parse(query);
stmt.accept(new ColumnVisitor());
}
}
Output
employee_id : projects
employee_id : employees
employee_id : tasks
employee_id : employees
name : employees
You can find JSqlParser in github
1 Answer 1
Names should Communicate Intent
Doesn't it look confusing, ScopedData
having a generic parameter ScopedData
?
private final ScopedData<ScopeData> scopedData = new ScopedData<>(ScopeData::new);
Field names in ScopeData
are not very clear as well:
public final List<Table> tables = new LinkedList<>();
public final List<String> otherTables = new LinkedList<>();
It's impossible to discern just by looking at these fields (and without reading methods) that otherTables
is meant to hold table aliases. Why not be kind to the code reader (who is often you, just at a later time) and call it tableAliases
?
By the way, scopedData
field appears at the very bottom of the class ColumnVisitor
which is not aligned with code-conventions conventions of the Java language. It's more intuitive for the code-reader when fields are introduced before they are being used, not the opposite. And there's also a Sonar rule enforcing this order of class members
Is a Collection
vs Has a Collection
You implemented ScopedData
as a subclass of java.util.LinkedList
.
At the same time, you clearly have intention is to give ScopedData
business behavior though which it's state should be manipulated. However, this class also inherits numerous methods from LinkedList
which can update its state and doesn't communicate business intent.
Additionally, the relationship between ScopedData
and LinkedList
are not polymorphic, because in order to leverage behavior that specific to ScopedData
you have to use its concrete type, not super type. If you declare it as a List
, Queue
or Deque
it would be a LinkedList
in disguise (more over, ScopedData
is not supposed to be used like that)
In this case it makes more sense to use composition by declaring a field of type Deque
(that's the JDK type you need, when a stack data structure is required) and exposing only methods that reveal business intent.
By the way, it looks LinkedList
is your go-to collection. But in most of the scenarios it's not a good choice. ArrayDeque
is a more performant Deque
/ Queue
, and ArrayList
would do better when you need a List
(unless you're updating a collection while iterating over it via the means iterator, which is the only case when LinkedList
shines)
Generic implementations vs Specific implementations
Implementations which are tailored to tackle specific problems are usually simpler and have clear intent.
On the other hand, implementing a generic solution requires a firm understanding of use cases to address them properly. When you have only one use case, or you even only started exploring the problem, a generic solution might be not the best time-investment (because abstractions you came up with at this stage will be suboptimal at best).
Although your code isn't doing much at the moment, you chose the way of generic implementation for ScopedData
. As a result you have a customized stack, which is meant to work with a dynamically provided type.
scopedData.current().otherTables.add(...);
scopedData.current().columns.add(...);
By design, you're always working with the data on the top of the stack representing query score (i.e. your current()
). And aren't columns and aliases is something that you'll be dialing with in other use-cases?
If you have chosen the route of specific solution, these lines might have look like this:
scopedData.addAliace(...);
scopedData.addColumn(...);
In case if you want to keep ScopedData
implementation generic, then at the very least you should give domain-specific methods to types it's meant to work with instead of treating them as "bags of values" with no encapsulation and useful behavior:
scopedData.current().addAliace(...);
scopedData.current().addColumn(...);
-
\$\begingroup\$ Hi! thanks for your answer is my logic for collecting fields at end of each scope is good or need to improve it? \$\endgroup\$srilakshmikanthanp– srilakshmikanthanp2024年11月11日 11:50:24 +00:00Commented Nov 11, 2024 at 11:50
-
\$\begingroup\$ @srilakshmikanthanp The overall approach on how to resolve aliases looks fine to me (if that what you're asking). \$\endgroup\$Alexander Ivanchenko– Alexander Ivanchenko2024年11月11日 12:22:45 +00:00Commented Nov 11, 2024 at 12:22
-
\$\begingroup\$ @srilakshmikanthanp And you should thoroughly unit-test your solution. By the way, why did you choose to work with JSqlParser, instead of a more hi-level and well-documented tool? \$\endgroup\$Alexander Ivanchenko– Alexander Ivanchenko2024年11月11日 20:29:38 +00:00Commented Nov 11, 2024 at 20:29
-
\$\begingroup\$ I can't find hi-level tool and i need to do it in java since i am doing some kind of column restriction from querying in web application. \$\endgroup\$srilakshmikanthanp– srilakshmikanthanp2024年11月12日 06:10:30 +00:00Commented Nov 12, 2024 at 6:10
-
\$\begingroup\$ @srilakshmikanthanp That's more down-to-earth than I expected (I thought are tinkering with your custom-built ORM or something which is backed by this parsing library). If you need to restrict access to sensitive data, then you're treating this problem in a wrong way. It requires putting in place proper Application Security mechanisms (give your user roles or authorities, apply API-level security restricting access to certain end-points, apply method-level security in your services if needed). \$\endgroup\$Alexander Ivanchenko– Alexander Ivanchenko2024年11月12日 12:00:18 +00:00Commented Nov 12, 2024 at 12:00