Star Schema Benchmark
Star Schema Benchmark(SSB) is a lightweight performance test set in the data warehouse scenario. SSB provides a simplified star schema data based on TPC-H, which is mainly used to test the performance of multi-table JOIN query under star schema. In addition, the industry usually flattens SSB into a wide table model (Referred as: SSB flat) to test the performance of the query engine.
This document mainly introduces the performance of Doris on the SSB 1000G test set.
We tested 13 queries on the SSB standard test dataset based on Apache Doris version 2.0.15.1.
1. Hardware Environment
| Hardware | Configuration Instructions |
|---|---|
| Number of Machines | 4 Aliyun Virtual Machine (1FE,3BEs) |
| CPU | Intel Xeon (Ice Lake) Platinum 8369B 32C |
| Memory | 128G |
| Disk | Enterprise SSD (PL0) |
2. Software Environment
- Doris Deployed 3BEs and 1FE
- Kernel Version: Linux version 5.15.0-101-generic
- OS version: Ubuntu 20.04 LTS (Focal Fossa)
- Doris software version: Apache Doris 2.0.15.1
- JDK: openjdk version "1.8.0_352-352"
3. Test Data Volume
| SSB Table Name | Rows | Annotation |
|---|---|---|
| lineorder | 5,999,989,709 | Commodity Order Details |
| customer | 30,000,000 | Customer Information |
| part | 2,000,000 | Parts Information |
| supplier | 2,000,000 | Supplier Information |
| dates | 2,556 | Date |
| lineorder_flat | 5,999,989,709 | Wide Table after Data Flattening |
4. SSB Flat Test Results
Here we use Apache Doris 2.0.15.1 for comparative testing. In the test, we use Query Time(ms) as the main performance indicator. The test results are as follows:
| Query | Doris 2.0.15.1 (ms) |
|---|---|
| q1.1 | 80 |
| q1.2 | 10 |
| q1.3 | 110 |
| q2.1 | 1680 |
| q2.2 | 1210 |
| q2.3 | 1060 |
| q3.1 | 2010 |
| q3.2 | 1560 |
| q3.3 | 600 |
| q3.4 | 10 |
| q4.1 | 2380 |
| q4.2 | 190 |
| q4.3 | 120 |
| Total | 11020 |
5. Standard SSB Test Results
Here we use Apache Doris 2.0.15.1 for comparative testing. In the test, we use Query Time(ms) as the main performance indicator. The test results are as follows:
| Query | Doris 2.0.15.1 (ms) |
|---|---|
| q1.1 | 330 |
| q1.2 | 80 |
| q1.3 | 80 |
| q2.1 | 1780 |
| q2.2 | 1970 |
| q2.3 | 1510 |
| q3.1 | 4000 |
| q3.2 | 1720 |
| q3.3 | 1510 |
| q3.4 | 160 |
| q4.1 | 4010 |
| q4.2 | 840 |
| q4.3 | 400 |
| Total | 19390 |
6. Environment Preparation
Please first refer to the official documentation to install and deploy Apache Doris first to obtain a Doris cluster which is working well(including at least 1 FE 1 BE, 1 FE 3 BEs is recommended).
7. Data Preparation
7.1 Download and Install the SSB Data Generation Tool.
Execute the following script to download and compile the ssb-tools tool.
sh bin/build-ssb-dbgen.sh
After successful installation, the dbgen binary will be generated under the ssb-dbgen/ directory.
7.2 Generate SSB Test Set
Execute the following script to generate the SSB dataset:
sh bin/gen-ssb-data.sh -s 1000
Note 1: Check the script help via
sh gen-ssb-data.sh -h.Note 2: The data will be generated under the
ssb-data/directory with the suffix.tbl. The total file size is about 600GB and may need a few minutes to an hour to generate.Note 3: A standard test data set of 100G is generated by default.
7.3 Create Table
7.3.1 Prepare the doris-cluster.conf File.
Before import the script, you need to write the FE’s ip port and other information in the doris-cluster.conf file.
The file is located under ${DORIS_HOME}/tools/ssb-tools/conf/.
The content of the file includes FE's ip, HTTP port, user name, password and the DB name of the data to be imported:
# Any of FE host
export FE_HOST='127.0.0.1'
# http_port in fe.conf
export FE_HTTP_PORT=8030
# query_port in fe.conf
export FE_QUERY_PORT=9030
# Doris username
export USER='root'
# Doris password
export PASSWORD=''
# The database where SSB tables located
export DB='ssb'
7.3.2 Execute the Following Script to Generate and Create the SSB Table:
sh bin/create-ssb-tables.sh -s 1000
Or copy the table creation statements in create-ssb-tables.sql and create-ssb-flat-table.sql and then execute them in the MySQL client.
7.4 Import data
We use the following command to complete all data import of SSB test set and SSB FLAT wide table data synthesis and then import into the table.
sh bin/load-ssb-data.sh
7.5 Checking Imported data
selectcount(*)from part;
selectcount(*)from customer;
selectcount(*)from supplier;
selectcount(*)from dates;
selectcount(*)from lineorder;
selectcount(*)from lineorder_flat;
7.6 Query Test
- SSB-Flat Query Statement: ssb-flat-queries
- Standard SSB Queries: ssb-queries
7.6.1 SSB FLAT Test for SQL
--Q1.1
SELECTSUM(LO_EXTENDEDPRICE * LO_DISCOUNT)AS revenue
FROM lineorder_flat
WHERE
LO_ORDERDATE >=19930101
AND LO_ORDERDATE <=19931231
AND LO_DISCOUNT BETWEEN1AND3
AND LO_QUANTITY <25;
--Q1.2
SELECTSUM(LO_EXTENDEDPRICE * LO_DISCOUNT)AS revenue
FROM lineorder_flat
WHERE
LO_ORDERDATE >=19940101
AND LO_ORDERDATE <=19940131
AND LO_DISCOUNT BETWEEN4AND6
AND LO_QUANTITY BETWEEN26AND35;
--Q1.3
SELECTSUM(LO_EXTENDEDPRICE * LO_DISCOUNT)AS revenue
FROM lineorder_flat
WHERE
weekofyear(LO_ORDERDATE)=6
AND LO_ORDERDATE >=19940101
AND LO_ORDERDATE <=19941231
AND LO_DISCOUNT BETWEEN5AND7
AND LO_QUANTITY BETWEEN26AND35;
--Q2.1
SELECT
SUM(LO_REVENUE),(LO_ORDERDATE DIV10000)ASYEAR,
P_BRAND
FROM lineorder_flat
WHERE P_CATEGORY ='MFGR#12'AND S_REGION ='AMERICA'
GROUPBYYEAR, P_BRAND
ORDERBYYEAR, P_BRAND;
--Q2.2
SELECT
SUM(LO_REVENUE),(LO_ORDERDATE DIV10000)ASYEAR,
P_BRAND
FROM lineorder_flat
WHERE
P_BRAND >='MFGR#2221'
AND P_BRAND <='MFGR#2228'
AND S_REGION ='ASIA'
GROUPBYYEAR, P_BRAND
ORDERBYYEAR, P_BRAND;
--Q2.3
SELECT
SUM(LO_REVENUE),(LO_ORDERDATE DIV10000)ASYEAR,
P_BRAND
FROM lineorder_flat
WHERE
P_BRAND ='MFGR#2239'
AND S_REGION ='EUROPE'
GROUPBYYEAR, P_BRAND
ORDERBYYEAR, P_BRAND;
--Q3.1
SELECT
C_NATION,
S_NATION,(LO_ORDERDATE DIV10000)ASYEAR,
SUM(LO_REVENUE)AS revenue
FROM lineorder_flat
WHERE
C_REGION ='ASIA'
AND S_REGION ='ASIA'
AND LO_ORDERDATE >=19920101
AND LO_ORDERDATE <=19971231
GROUPBY C_NATION, S_NATION,YEAR
ORDERBYYEARASC, revenue DESC;
--Q3.2
SELECT
C_CITY,
S_CITY,(LO_ORDERDATE DIV10000)ASYEAR,
SUM(LO_REVENUE)AS revenue
FROM lineorder_flat
WHERE
C_NATION ='UNITED STATES'
AND S_NATION ='UNITED STATES'
AND LO_ORDERDATE >=19920101
AND LO_ORDERDATE <=19971231
GROUPBY C_CITY, S_CITY,YEAR
ORDERBYYEARASC, revenue DESC;
--Q3.3
SELECT
C_CITY,
S_CITY,(LO_ORDERDATE DIV10000)ASYEAR,
SUM(LO_REVENUE)AS revenue
FROM lineorder_flat
WHERE
C_CITY IN('UNITED KI1','UNITED KI5')
AND S_CITY IN('UNITED KI1','UNITED KI5')
AND LO_ORDERDATE >=19920101
AND LO_ORDERDATE <=19971231
GROUPBY C_CITY, S_CITY,YEAR
ORDERBYYEARASC, revenue DESC;
--Q3.4
SELECT
C_CITY,
S_CITY,(LO_ORDERDATE DIV10000)ASYEAR,
SUM(LO_REVENUE)AS revenue
FROM lineorder_flat
WHERE
C_CITY IN('UNITED KI1','UNITED KI5')
AND S_CITY IN('UNITED KI1','UNITED KI5')
AND LO_ORDERDATE >=19971201
AND LO_ORDERDATE <=19971231
GROUPBY C_CITY, S_CITY,YEAR
ORDERBYYEARASC, revenue DESC;
--Q4.1
SELECT(LO_ORDERDATE DIV10000)ASYEAR,
C_NATION,
SUM(LO_REVENUE - LO_SUPPLYCOST)AS profit
FROM lineorder_flat
WHERE
C_REGION ='AMERICA'
AND S_REGION ='AMERICA'
AND P_MFGR IN('MFGR#1','MFGR#2')
GROUPBYYEAR, C_NATION
ORDERBYYEARASC, C_NATION ASC;
--Q4.2
SELECT(LO_ORDERDATE DIV10000)ASYEAR,
S_NATION,
P_CATEGORY,
SUM(LO_REVENUE - LO_SUPPLYCOST)AS profit
FROM lineorder_flat
WHERE
C_REGION ='AMERICA'
AND S_REGION ='AMERICA'
AND LO_ORDERDATE >=19970101
AND LO_ORDERDATE <=19981231
AND P_MFGR IN('MFGR#1','MFGR#2')
GROUPBYYEAR, S_NATION, P_CATEGORY
ORDERBY
YEARASC,
S_NATION ASC,
P_CATEGORY ASC;
--Q4.3
SELECT(LO_ORDERDATE DIV10000)ASYEAR,
S_CITY,
P_BRAND,
SUM(LO_REVENUE - LO_SUPPLYCOST)AS profit
FROM lineorder_flat
WHERE
S_NATION ='UNITED STATES'
AND LO_ORDERDATE >=19970101
AND LO_ORDERDATE <=19981231
AND P_CATEGORY ='MFGR#14'
GROUPBYYEAR, S_CITY, P_BRAND
ORDERBYYEARASC, S_CITY ASC, P_BRAND ASC;
7.6.2 SSB Standard Test for SQL
--Q1.1
SELECTSUM(lo_extendedprice * lo_discount)AS REVENUE
FROM lineorder, dates
WHERE
lo_orderdate = d_datekey
AND d_year =1993
AND lo_discount BETWEEN1AND3
AND lo_quantity <25;
--Q1.2
SELECTSUM(lo_extendedprice * lo_discount)AS REVENUE
FROM lineorder, dates
WHERE
lo_orderdate = d_datekey
AND d_yearmonth ='Jan1994'
AND lo_discount BETWEEN4AND6
AND lo_quantity BETWEEN26AND35;
--Q1.3
SELECT
SUM(lo_extendedprice * lo_discount)AS REVENUE
FROM lineorder, dates
WHERE
lo_orderdate = d_datekey
AND d_weeknuminyear =6
AND d_year =1994
AND lo_discount BETWEEN5AND7
AND lo_quantity BETWEEN26AND35;
--Q2.1
SELECTSUM(lo_revenue), d_year, p_brand
FROM lineorder, dates, part, supplier
WHERE
lo_orderdate = d_datekey
AND lo_partkey = p_partkey
AND lo_suppkey = s_suppkey
AND p_category ='MFGR#12'
AND s_region ='AMERICA'
GROUPBY d_year, p_brand
ORDERBY p_brand;
--Q2.2
SELECTSUM(lo_revenue), d_year, p_brand
FROM lineorder, dates, part, supplier
WHERE
lo_orderdate = d_datekey
AND lo_partkey = p_partkey
AND lo_suppkey = s_suppkey
AND p_brand BETWEEN'MFGR#2221'AND'MFGR#2228'
AND s_region ='ASIA'
GROUPBY d_year, p_brand
ORDERBY d_year, p_brand;
--Q2.3
SELECTSUM(lo_revenue), d_year, p_brand
FROM lineorder, dates, part, supplier
WHERE
lo_orderdate = d_datekey
AND lo_partkey = p_partkey
AND lo_suppkey = s_suppkey
AND p_brand ='MFGR#2239'
AND s_region ='EUROPE'
GROUPBY d_year, p_brand
ORDERBY d_year, p_brand;
--Q3.1
SELECT
c_nation,
s_nation,
d_year,
SUM(lo_revenue)AS REVENUE
FROM customer, lineorder, supplier, dates
WHERE
lo_custkey = c_custkey
AND lo_suppkey = s_suppkey
AND lo_orderdate = d_datekey
AND c_region ='ASIA'
AND s_region ='ASIA'
AND d_year >=1992
AND d_year <=1997
GROUPBY c_nation, s_nation, d_year
ORDERBY d_year ASC, REVENUE DESC;
--Q3.2
SELECT
c_city,
s_city,
d_year,
SUM(lo_revenue)AS REVENUE
FROM customer, lineorder, supplier, dates
WHERE
lo_custkey = c_custkey
AND lo_suppkey = s_suppkey
AND lo_orderdate = d_datekey
AND c_nation ='UNITED STATES'
AND s_nation ='UNITED STATES'
AND d_year >=1992
AND d_year <=1997
GROUPBY c_city, s_city, d_year
ORDERBY d_year ASC, REVENUE DESC;
--Q3.3
SELECT
c_city,
s_city,
d_year,
SUM(lo_revenue)AS REVENUE
FROM customer, lineorder, supplier, dates
WHERE
lo_custkey = c_custkey
AND lo_suppkey = s_suppkey
AND lo_orderdate = d_datekey
AND(
c_city ='UNITED KI1'
OR c_city ='UNITED KI5'
)
AND(
s_city ='UNITED KI1'
OR s_city ='UNITED KI5'
)
AND d_year >=1992
AND d_year <=1997
GROUPBY c_city, s_city, d_year
ORDERBY d_year ASC, REVENUE DESC;
--Q3.4
SELECT
c_city,
s_city,
d_year,
SUM(lo_revenue)AS REVENUE
FROM customer, lineorder, supplier, dates
WHERE
lo_custkey = c_custkey
AND lo_suppkey = s_suppkey
AND lo_orderdate = d_datekey
AND(
c_city ='UNITED KI1'
OR c_city ='UNITED KI5'
)
AND(
s_city ='UNITED KI1'
OR s_city ='UNITED KI5'
)
AND d_yearmonth ='Dec1997'
GROUPBY c_city, s_city, d_year
ORDERBY d_year ASC, REVENUE DESC;
--Q4.1
SELECT
d_year,
c_nation,
SUM(lo_revenue - lo_supplycost)AS PROFIT
FROM dates, customer, supplier, part, lineorder
WHERE
lo_custkey = c_custkey
AND lo_suppkey = s_suppkey
AND lo_partkey = p_partkey
AND lo_orderdate = d_datekey
AND c_region ='AMERICA'
AND s_region ='AMERICA'
AND(
p_mfgr ='MFGR#1'
OR p_mfgr ='MFGR#2'
)
GROUPBY d_year, c_nation
ORDERBY d_year, c_nation;
--Q4.2
SELECT
d_year,
s_nation,
p_category,
SUM(lo_revenue - lo_supplycost)AS PROFIT
FROM dates, customer, supplier, part, lineorder
WHERE
lo_custkey = c_custkey
AND lo_suppkey = s_suppkey
AND lo_partkey = p_partkey
AND lo_orderdate = d_datekey
AND c_region ='AMERICA'
AND s_region ='AMERICA'
AND(
d_year =1997
OR d_year =1998
)
AND(
p_mfgr ='MFGR#1'
OR p_mfgr ='MFGR#2'
)
GROUPBY d_year, s_nation, p_category
ORDERBY d_year, s_nation, p_category;
--Q4.3
SELECT
d_year,
s_city,
p_brand,
SUM(lo_revenue - lo_supplycost)AS PROFIT
FROM dates, customer, supplier, part, lineorder
WHERE
lo_custkey = c_custkey
AND lo_suppkey = s_suppkey
AND lo_partkey = p_partkey
AND lo_orderdate = d_datekey
AND s_nation ='UNITED STATES'
AND(
d_year =1997
OR d_year =1998
)
AND p_category ='MFGR#14'
GROUPBY d_year, s_city, p_brand
ORDERBY d_year, s_city, p_brand;