4.x
  • 3.x
  • 2.1
  • Dev
  • Archived Download

    Distribute Hint

    Overview

    The Distribute hint is used to control the shuffle method for joins.

    Syntax

    • Supports specifying the Distribute Type for the right table, which can be either [shuffle] or [broadcast], and should be written before the right table in the Join.
    • Supports an arbitrary number of Distribute Hints.
    • When encountering a Distribute Hint that cannot correctly generate a plan, the system will not display an error. It will make the best effort to apply the hint, and the final Distribute method will be shown in the EXPLAIN output.

    Examples

    Used in Combination with Ordered Hint

    Fix the Join order to the textual sequence, and then specify the expected Distribute method for the Join. For example:

    Before using:

    mysql>explain shape planselectcount(*)from t1 join t2 on t1.c1 = t2.c2;
    +----------------------------------------------------------------------------------+
    |Explain String(Nereids Planner)|
    +----------------------------------------------------------------------------------+
    | PhysicalResultSink |
    |--hashAgg[GLOBAL] |
    |----PhysicalDistribute[DistributionSpecGather] |
    |------hashAgg[LOCAL] |
    |--------PhysicalProject |
    |----------hashJoin[INNER_JOIN] hashCondition=((t1.c1 = t2.c2)) otherCondition=() |
    |------------PhysicalProject |
    |--------------PhysicalOlapScan[t1] |
    |------------PhysicalDistribute[DistributionSpecHash] |
    |--------------PhysicalProject |
    |----------------PhysicalOlapScan[t2] |
    +----------------------------------------------------------------------------------+

    After using:

    mysql>explain shape planselect/*+ ordered */count(*)from t2 join[broadcast] t1 on t1.c1 = t2.c2;
    +----------------------------------------------------------------------------------+
    |Explain String(Nereids Planner)|
    +----------------------------------------------------------------------------------+
    | PhysicalResultSink |
    |--hashAgg[GLOBAL] |
    |----PhysicalDistribute[DistributionSpecGather] |
    |------hashAgg[LOCAL] |
    |--------PhysicalProject |
    |----------hashJoin[INNER_JOIN] hashCondition=((t1.c1 = t2.c2)) otherCondition=() |
    |------------PhysicalProject |
    |--------------PhysicalOlapScan[t2] |
    |------------PhysicalDistribute[DistributionSpecReplicated] |
    |--------------PhysicalProject |
    |----------------PhysicalOlapScan[t1] |
    ||
    | Hint log: |
    | Used: ORDERED |
    | UnUsed: |
    | SyntaxError: |
    +----------------------------------------------------------------------------------+

    The Explain Shape Plan will display information related to the Distribute operator. Specifically:

    • DistributionSpecReplicated indicates that the corresponding data will be replicated to all BE nodes.
    • DistributionSpecGather indicates that the data will be gathered to the FE node.
    • DistributionSpecHash indicates that the data will be distributed to different BE nodes based on a specific hashKey and algorithm. Used in Combination with Leading Hint

    When writing SQL queries, you can specify the corresponding DISTRIBUTE method for each JOIN operation while using the LEADING hint. Below is a specific example demonstrating how to mix Distribute Hint and Leading Hint in an SQL query.

    explain shape plan
    select
    nation,
    o_year,
    sum(amount)as sum_profit
    from
    (
    select
    /*+ leading(orders shuffle {lineitem shuffle part} shuffle {supplier broadcast nation} shuffle partsupp) */
    n_name as nation,
    extract(yearfrom o_orderdate)as o_year,
    l_extendedprice *(1- l_discount)- ps_supplycost * l_quantity as amount
    from
    part,
    supplier,
    lineitem,
    partsupp,
    orders,
    nation
    where
    s_suppkey = l_suppkey
    and ps_suppkey = l_suppkey
    and ps_partkey = l_partkey
    and p_partkey = l_partkey
    and o_orderkey = l_orderkey
    and s_nationkey = n_nationkey
    and p_name like'%green%'
    )as profit
    groupby
    nation,
    o_year
    orderby
    nation,
    o_year desc;

    Summary

    The Distribute hint is a commonly used hint for controlling the join shuffle method, allowing manual specification of shuffle or broadcast distribution methods. Proper use of the Distribute hint can meet on-site tuning needs for join shuffle methods, increasing the flexibility of system control.

    AltStyle によって変換されたページ (->オリジナル) /