Getting all column vs. table regardless of alias using JSqlParser

Question 1

I have been trying to get all column vs. table regardless of alias used, for example, assume this query

SELECT e.name, 
 (SELECT COUNT(*) FROM projects AS p WHERE p.employee_id = e.employee_id) AS project_count,
 (SELECT COUNT(*) FROM tasks AS p WHERE p.employee_id = e.employee_id) AS task_count
FROM employees AS e;

Considering that alias name can be reused across different scopes I can't simply collect details like column, table and alias and do the work at last How I can do that, below is my try If you have any suggestion or any other algorithm please let me know thanks.

In the code below, I’m resolving tables for a given alias at the end of each scope.

ScopedData class is used to store data on different query scopes. When exiting from that scope I remove the data associated with the current scope by calliing pop().

SqlTokenVisitor class extending TablesNamesFinder sligtly differs from it's parent. I've overriden one of it's visit() methods to change the order of visiting.

import java.util.Collections;
import java.util.Iterator;
import java.util.LinkedList;
import java.util.List;
import java.util.function.Supplier;
import net.sf.jsqlparser.JSQLParserException;
import net.sf.jsqlparser.expression.Expression;
import net.sf.jsqlparser.parser.CCJSqlParserUtil;
import net.sf.jsqlparser.schema.Column;
import net.sf.jsqlparser.schema.Table;
import net.sf.jsqlparser.statement.Statement;
import net.sf.jsqlparser.statement.select.Join;
import net.sf.jsqlparser.statement.select.LateralSubSelect;
import net.sf.jsqlparser.statement.select.ParenthesedFromItem;
import net.sf.jsqlparser.statement.select.ParenthesedSelect;
import net.sf.jsqlparser.statement.select.PlainSelect;
import net.sf.jsqlparser.statement.select.Select;
import net.sf.jsqlparser.statement.select.SelectItem;
import net.sf.jsqlparser.statement.select.SelectVisitor;
import net.sf.jsqlparser.statement.select.WithItem;
import net.sf.jsqlparser.util.TablesNamesFinder;
class ScopedData<T> extends LinkedList<T> {
 private final Supplier<T> creator;
 private ScopedData<T> nonEmptyScope() {
 if (this.isEmpty()) {
 throw new IllegalStateException("Currently scope is empty");
 } else {
 return this;
 }
 }
 public ScopedData(Supplier<T> creator) {
 this.creator = creator;
 }
 public void enterScope() {
 this.push(creator.get());
 }
 public void exitScope() {
 this.nonEmptyScope().pop();
 }
 public T current() {
 return this.nonEmptyScope().peek();
 }
}
class SqlTokenVisitor<T> extends TablesNamesFinder<T> {
 // I have overridden this method because I have changed the order of visiting from in my class
 @Override
 public <S> T visit(PlainSelect plainSelect, S context) {
 List<WithItem> withItemsList = plainSelect.getWithItemsList();
 if (withItemsList != null && !withItemsList.isEmpty()) {
 for (WithItem withItem : withItemsList) {
 withItem.accept((SelectVisitor<?>) this, context);
 }
 }
 if (plainSelect.getFromItem() != null) {
 plainSelect.getFromItem().accept(this, context);
 }
 if (plainSelect.getJoins() != null) {
 visitJoins(plainSelect.getJoins(), context);
 }
 if (plainSelect.getWhere() != null) {
 plainSelect.getWhere().accept(this, context);
 }
 if (plainSelect.getHaving() != null) {
 plainSelect.getHaving().accept(this, context);
 }
 if (plainSelect.getOracleHierarchical() != null) {
 plainSelect.getOracleHierarchical().accept(this, context);
 }
 if (plainSelect.getSelectItems() != null) {
 for (SelectItem<?> item : plainSelect.getSelectItems()) {
 item.accept(this, context);
 }
 }
 return null;
 }
 private <S> T visitJoins(List<Join> joins, S context) {
 for (Join join : joins) {
 join.getFromItem().accept(this, context);
 join.getRightItem().accept(this, context);
 for (Expression expression : join.getOnExpressions()) {
 expression.accept(this, context);
 }
 }
 return null;
 }
 public SqlTokenVisitor() {
 this.init(true);
 }
}
/**
 * Scoped visitor helps to process query in scoped manner for example assume this query,
 * 
 * ```
 * SELECT e.name, // new scope (select 1)
 * (SELECT COUNT(*) FROM projects AS p WHERE p.employee_id = e.employee_id // new scope (select 2)) AS project_count, 
 * (SELECT COUNT(*) FROM tasks AS p WHERE p.employee_id = e.employee_id // new scope (select 3)) AS task_count
 * FROM employees AS e;
 * ```
 * 
 * parse_statement(select_1):
 * push_state(stack, select_1)
 * parse_statement(select_2),
 * push_state(stack, select_2)
 * use_state(stack)
 * pop_state(stack)
 * parse_statement(select_3)
 * push_state(stack, select_3)
 * use_state(stack)
 * pop_state(stack)
 * use_state(stack)
 * pop_state(stack)
 * 
 * we are doing pop since used data no more needed also it helps to reuse the same variable 
 * in different scope. Another use of stack is assume
 * 
 * ```
 * select a.name, (select a.name from table1 as a) from table2 as a
 * ```
 * 
 * upon pushing state of sub select it hides the outer variable `a` since it is located at
 * bottom and we look up symbol from top of stack which is good since we need to use nearest
 * matching variable.
 * 
 * In below implementation we simply call enterScope before visit and exitScope 
 * after visit Since the parser parses the Elements as tree we can simply
 * use that,
 * 
 * enteringScope() <- 1
 * select
 * |
 * |-- enteringScope() <- 2
 * |-- select
 * |-- exitingScope() <- 2
 * |
 * |-- enteringScope() <- 3
 * |-- select
 * |-- exitingScope() <- 3
 * |
 * exitingScope() <- 1
 */
abstract class ScopedVisitor<T> extends SqlTokenVisitor<T> {
 @Override
 public <S> T visit(PlainSelect select, S context) {
 this.enteringScope();
 super.visit(select, context);
 this.exitingScope();
 return null;
 }
 public abstract void enteringScope();
 public abstract void exitingScope();
}
class ColumnVisitor extends ScopedVisitor<Void> {
 private void processScope(ScopeData scopeData) {
 for (Column column : scopeData.columns) {
 String resolvedTable = this.resolveAlias(column.getTable()).getName();
 System.out.println(column.getColumnName() + " : " + resolvedTable);
 }
 }
 private Table resolveAlias(Table alias) {
 // Try to find alias table in scopes
 for (ScopeData data : scopedData) {
 Table resolved = resolveAlias(data.tables, alias);
 if (resolved != null) {
 return resolved;
 }
 }
 // if no alias found then it is original table
 return alias;
 }
 
 public Table resolveAlias(List<Table> tables, Table alias) {
 // since if one table alias matches with another tabe name we can resolve it as table
 for (Table table : tables) {
 if (table.getAlias() != null && table.getAlias().getName().equals(alias.getName())) {
 return table;
 }
 }
 return null;
 }
 @Override
 public <S> Void visit(LateralSubSelect lateralSubSelect, S context) {
 // we are traking other table that are not actual table but alias
 if (lateralSubSelect.getAlias() != null) {
 this.scopedData.current().otherTables.add(lateralSubSelect.getAlias().getName());
 }
 super.visit(lateralSubSelect, context);
 return null;
 }
 @Override
 public <S> Void visit(WithItem withItem, S context) {
 this.scopedData.current().otherTables.add(withItem.getAlias().getName());
 super.visit(withItem, context);
 return null;
 }
 @Override
 public <S> Void visit(ParenthesedSelect select, S context) {
 if (select.getAlias() != null) {
 this.scopedData.current().otherTables.add(select.getAlias().getName());
 }
 super.visit(select, context);
 return null;
 }
 @Override
 public <S> Void visit(ParenthesedFromItem parenthesis, S context) {
 if (parenthesis.getAlias() != null) {
 this.scopedData.current().otherTables.add(parenthesis.getAlias().getName());
 }
 super.visit(parenthesis, context);
 return null;
 }
 @Override
 public <S> Void visit(Column column, S context) {
 super.visit(column, context);
 if (column.getTable() != null) {
 this.scopedData.current().columns.add(column);
 }
 return null;
 }
 @Override
 public <S> Void visit(Table table, S context) {
 super.visit(table, context);
 this.scopedData.current().tables.add(table);
 return null;
 }
 @Override
 public void enteringScope() {
 this.scopedData.enterScope();
 }
 @Override
 public void exitingScope() {
 this.processScope(this.scopedData.current());
 this.scopedData.exitScope();
 }
 private static class ScopeData {
 public final List<Column> columns = new LinkedList<>();
 public final List<Table> tables = new LinkedList<>();
 public final List<String> otherTables = new LinkedList<>();
 }
 private final ScopedData<ScopeData> scopedData = new ScopedData<>(ScopeData::new);
}
public class Main {
 public static void main(String[] args) throws JSQLParserException {
 String query = "SELECT e.name, \n" + //
 " (SELECT COUNT(*) FROM projects AS p WHERE p.employee_id = e.employee_id) AS project_count,\n" + //
 " (SELECT COUNT(*) FROM tasks AS p WHERE p.employee_id = e.employee_id) AS task_count\n" + //
 "FROM employees AS e;";
 Statement stmt = CCJSqlParserUtil.parse(query);
 stmt.accept(new ColumnVisitor());
 }
}

Output

employee_id : projects
employee_id : employees
employee_id : tasks
employee_id : employees
name : employees

You can find JSqlParser in github

Question 2

Names should Communicate Intent

Doesn't it look confusing, ScopedData having a generic parameter ScopedData?

private final ScopedData<ScopeData> scopedData = new ScopedData<>(ScopeData::new);

Field names in ScopeData are not very clear as well:

public final List<Table> tables = new LinkedList<>();
public final List<String> otherTables = new LinkedList<>();

It's impossible to discern just by looking at these fields (and without reading methods) that otherTables is meant to hold table aliases. Why not be kind to the code reader (who is often you, just at a later time) and call it tableAliases?

^{By the way, scopedData field appears at the very bottom of the class ColumnVisitor which is not aligned with code-conventions conventions of the Java language. It's more intuitive for the code-reader when fields are introduced before they are being used, not the opposite. And there's also a Sonar rule enforcing this order of class members}

Is a `Collection` vs Has a `Collection`

You implemented ScopedData as a subclass of java.util.LinkedList.

At the same time, you clearly have intention is to give ScopedData business behavior though which it's state should be manipulated. However, this class also inherits numerous methods from LinkedList which can update its state and doesn't communicate business intent.

Additionally, the relationship between ScopedData and LinkedList are not polymorphic, because in order to leverage behavior that specific to ScopedData you have to use its concrete type, not super type. If you declare it as a List, Queue or Deque it would be a LinkedList in disguise (more over, ScopedData is not supposed to be used like that)

In this case it makes more sense to use composition by declaring a field of type Deque (that's the JDK type you need, when a stack data structure is required) and exposing only methods that reveal business intent.

^{By the way, it looks LinkedList is your go-to collection. But in most of the scenarios it's not a good choice. ArrayDeque is a more performant Deque / Queue, and ArrayList would do better when you need a List (unless you're updating a collection while iterating over it via the means iterator, which is the only case when LinkedList shines)}

Generic implementations vs Specific implementations

Implementations which are tailored to tackle specific problems are usually simpler and have clear intent.

On the other hand, implementing a generic solution requires a firm understanding of use cases to address them properly. When you have only one use case, or you even only started exploring the problem, a generic solution might be not the best time-investment (because abstractions you came up with at this stage will be suboptimal at best).

Although your code isn't doing much at the moment, you chose the way of generic implementation for ScopedData. As a result you have a customized stack, which is meant to work with a dynamically provided type.

scopedData.current().otherTables.add(...); 
scopedData.current().columns.add(...);

By design, you're always working with the data on the top of the stack representing query score (i.e. your current()). And aren't columns and aliases is something that you'll be dialing with in other use-cases?

If you have chosen the route of specific solution, these lines might have look like this:

scopedData.addAliace(...); 
scopedData.addColumn(...);

In case if you want to keep ScopedData implementation generic, then at the very least you should give domain-specific methods to types it's meant to work with instead of treating them as "bags of values" with no encapsulation and useful behavior:

scopedData.current().addAliace(...); 
scopedData.current().addColumn(...);

Question 3

Hi! thanks for your answer is my logic for collecting fields at end of each scope is good or need to improve it?

Question 4

@srilakshmikanthanp The overall approach on how to resolve aliases looks fine to me (if that what you're asking).

Question 5

@srilakshmikanthanp And you should thoroughly unit-test your solution. By the way, why did you choose to work with JSqlParser, instead of a more hi-level and well-documented tool?

Question 6

I can't find hi-level tool and i need to do it in java since i am doing some kind of column restriction from querying in web application.

Question 7

@srilakshmikanthanp That's more down-to-earth than I expected (I thought are tinkering with your custom-built ORM or something which is backed by this parsing library). If you need to restrict access to sensitive data, then you're treating this problem in a wrong way. It requires putting in place proper Application Security mechanisms (give your user roles or authorities, apply API-level security restricting access to certain end-points, apply method-level security in your services if needed).

Alexander Ivanchenko Alexander Ivanchenko 3,3695 silver badges22 bronze badges · Answer 1 · 2024-11-11 10:48:47Z

Names should Communicate Intent

Doesn't it look confusing, ScopedData having a generic parameter ScopedData?

private final ScopedData<ScopeData> scopedData = new ScopedData<>(ScopeData::new);

Field names in ScopeData are not very clear as well:

public final List<Table> tables = new LinkedList<>();
public final List<String> otherTables = new LinkedList<>();

It's impossible to discern just by looking at these fields (and without reading methods) that otherTables is meant to hold table aliases. Why not be kind to the code reader (who is often you, just at a later time) and call it tableAliases?

^{By the way, scopedData field appears at the very bottom of the class ColumnVisitor which is not aligned with code-conventions conventions of the Java language. It's more intuitive for the code-reader when fields are introduced before they are being used, not the opposite. And there's also a Sonar rule enforcing this order of class members}

Is a `Collection` vs Has a `Collection`

You implemented ScopedData as a subclass of java.util.LinkedList.

At the same time, you clearly have intention is to give ScopedData business behavior though which it's state should be manipulated. However, this class also inherits numerous methods from LinkedList which can update its state and doesn't communicate business intent.

Additionally, the relationship between ScopedData and LinkedList are not polymorphic, because in order to leverage behavior that specific to ScopedData you have to use its concrete type, not super type. If you declare it as a List, Queue or Deque it would be a LinkedList in disguise (more over, ScopedData is not supposed to be used like that)

In this case it makes more sense to use composition by declaring a field of type Deque (that's the JDK type you need, when a stack data structure is required) and exposing only methods that reveal business intent.

^{By the way, it looks LinkedList is your go-to collection. But in most of the scenarios it's not a good choice. ArrayDeque is a more performant Deque / Queue, and ArrayList would do better when you need a List (unless you're updating a collection while iterating over it via the means iterator, which is the only case when LinkedList shines)}

Generic implementations vs Specific implementations

Implementations which are tailored to tackle specific problems are usually simpler and have clear intent.

On the other hand, implementing a generic solution requires a firm understanding of use cases to address them properly. When you have only one use case, or you even only started exploring the problem, a generic solution might be not the best time-investment (because abstractions you came up with at this stage will be suboptimal at best).

Although your code isn't doing much at the moment, you chose the way of generic implementation for ScopedData. As a result you have a customized stack, which is meant to work with a dynamically provided type.

scopedData.current().otherTables.add(...); 
scopedData.current().columns.add(...);

By design, you're always working with the data on the top of the stack representing query score (i.e. your current()). And aren't columns and aliases is something that you'll be dialing with in other use-cases?

If you have chosen the route of specific solution, these lines might have look like this:

scopedData.addAliace(...); 
scopedData.addColumn(...);

In case if you want to keep ScopedData implementation generic, then at the very least you should give domain-specific methods to types it's meant to work with instead of treating them as "bags of values" with no encapsulation and useful behavior:

scopedData.current().addAliace(...); 
scopedData.current().addColumn(...);

Hi! thanks for your answer is my logic for collecting fields at end of each scope is good or need to improve it?
@srilakshmikanthanp The overall approach on how to resolve aliases looks fine to me (if that what you're asking).
@srilakshmikanthanp And you should thoroughly unit-test your solution. By the way, why did you choose to work with JSqlParser, instead of a more hi-level and well-documented tool?
I can't find hi-level tool and i need to do it in java since i am doing some kind of column restriction from querying in web application.
@srilakshmikanthanp That's more down-to-earth than I expected (I thought are tinkering with your custom-built ORM or something which is backed by this parsing library). If you need to restrict access to sensitive data, then you're treating this problem in a wrong way. It requires putting in place proper Application Security mechanisms (give your user roles or authorities, apply API-level security restricting access to certain end-points, apply method-level security in your services if needed).

Stack Exchange Network

Getting all column vs. table regardless of alias using JSqlParser

1 Answer 1

Names should Communicate Intent

Is a `Collection` vs Has a `Collection`

Generic implementations vs Specific implementations

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Getting all column vs. table regardless of alias using JSqlParser

1 Answer 1

Names should Communicate Intent

Is a Collection vs Has a Collection

Generic implementations vs Specific implementations

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions

Is a `Collection` vs Has a `Collection`