-
Notifications
You must be signed in to change notification settings - Fork 150
Conversation
@zhangwenchao-123
zhangwenchao-123
commented
Oct 21, 2025
- Add the module name, JIRA# to PR/commit and description.
- Add tests for the change.
The following two operator delete functions doesn't lookup in madlib library. Because it's not added in the library script file. void operator delete (void *ptr, std::size_t sz) noexcept; void operator delete[](void *ptr, std::size_t sz) noexcept; The two functions are missing previously.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a need to add /usr/local/cbdb/bin?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have we used this path?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have added path /usr/local/cloudberry/bin
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the line just after this "$ENV{GPHOME}/bin" will help catch most scenarios. Users will be sourcing cloudberry-env.sh (Cloudberry 3+) or greenplum_path.sh (Cloudberry 2 ).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool idea!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove these lines?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is still no Cloudberry 3.0 yet. So can remove this file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apache MADlib should be able to build against both the REL_2_STABLE and main (3.0.0) branches. I believe it is better to keep support for Cloudberry 3.0. As main (3.0) has not released, maybe support for 3.0 can be labelled as experimental.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with ed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We also need to add the standard Apache license header to the new files, including FindCloudberry.cmake, and FindCloudberry_1.cmake and other files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
@edespino
edespino
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❌ What's Missing (Critical Issues)
1. No main CMakeLists.txt for Cloudberry:
- src/ports/cloudberry/CMakeLists.txt doesn't exist
- This file should mirror the structure of
src/ports/greenplum/CMakeLists.txt (13KB, ~300 lines)
- Should define: port configuration, source files, SQL handling,
build functions, and version management
2. Not integrated into the build system:
- src/ports/CMakeLists.txt only contains:
add_subdirectory(postgres)
add_subdirectory(greenplum)
- Missing: add_subdirectory(cloudberry)
3. No CloudberryUtils.cmake:
- Greenplum has GreenplumUtils.cmake with utility functions
- May need similar utilities for Cloudberry-specific features
🔍 Current State
CMake configuration completed but:
- Cloudberry was NOT detected (the FindCloudberry code was never executed)
- Only PostgreSQL and Greenplum detection ran
- Build directory shows only postgres/ and greenplum/ subdirectories
However, there IS a Cloudberry installation:
- Location: /usr/local/cloudberry/
- Version: Based on PostgreSQL 14.4 with GP_VERSION_NUM 30000 (Cloudberry v3.0.0)
- This matches the src/ports/cloudberry/3/ directory structure
📊 Summary
The Cloudberry port is partially implemented. The detection logic and
version-specific configs exist, but they're not wired into the build
system.
To complete the implementation, you would need:
1. Create src/ports/cloudberry/CMakeLists.txt (modeled after Greenplum's)
2. Add add_subdirectory(cloudberry) to src/ports/CMakeLists.txt
3. Potentially create CloudberryUtils.cmake for Cloudberry-specific features
4. Test the full build process with Cloudberry detection
edespino
commented
Oct 22, 2025
Have you looked at the website updates (https://madlib.apache.org - https://github.com/apache/madlib-site) other source documentation files? We will need to review these as well.
edespino
commented
Oct 22, 2025
As @tuhaihe mentioned about ASF headers, when I ran the Apache Release Audit Tool (RAT), the following is seen (run the following in the root of the MADlib source: mvn apache-rat:check):
❯ head -30 target/rat.txt
*****************************************************
Summary
-------
Generated at: 2025年10月21日T18:46:08-07:00
Notes: 4
Binaries: 5
Archives: 0
Standards: 311
Apache Licensed: 307
Generated Documents: 0
JavaDocs are generated and so license header is optional
Generated files do not required license headers
4 Unknown Licenses
*******************************
Unapproved licenses:
src/ports/cloudberry/cmake/FindCloudberry.cmake
src/ports/cloudberry/cmake/FindCloudberry_1.cmake
src/ports/cloudberry/cmake/FindCloudberry_2.cmake
src/ports/cloudberry/cmake/FindCloudberry_3.cmake
*******************************
zhangwenchao-123
commented
Oct 22, 2025
Have you looked at the website updates (https://madlib.apache.org - https://github.com/apache/madlib-site) other source documentation files? We will need to review these as well.
No, have not. Should we update this website?
tuhaihe
commented
Oct 22, 2025
Have you looked at the website updates (https://madlib.apache.org - https://github.com/apache/madlib-site) other source documentation files? We will need to review these as well.
No, have not. Should we update this website?
Yes, we should update the related description on the website. I’d like to help with this.
zhangwenchao-123
commented
Oct 22, 2025
Have you looked at the website updates (https://madlib.apache.org - https://github.com/apache/madlib-site) other source documentation files? We will need to review these as well.
No, have not. Should we update this website?
Yes, we should update the related description on the website. I’d like to help with this.
Nice!
00d24da to
e6d3c42
Compare
zhangwenchao-123
commented
Oct 22, 2025
❌ What's Missing (Critical Issues)
1. No main CMakeLists.txt for Cloudberry: - src/ports/cloudberry/CMakeLists.txt doesn't exist - This file should mirror the structure of src/ports/greenplum/CMakeLists.txt (13KB, ~300 lines) - Should define: port configuration, source files, SQL handling, build functions, and version management 2. Not integrated into the build system: - src/ports/CMakeLists.txt only contains: add_subdirectory(postgres) add_subdirectory(greenplum) - Missing: add_subdirectory(cloudberry) 3. No CloudberryUtils.cmake: - Greenplum has GreenplumUtils.cmake with utility functions - May need similar utilities for Cloudberry-specific features🔍 Current State
CMake configuration completed but: - Cloudberry was NOT detected (the FindCloudberry code was never executed) - Only PostgreSQL and Greenplum detection ran - Build directory shows only postgres/ and greenplum/ subdirectories However, there IS a Cloudberry installation: - Location: /usr/local/cloudberry/ - Version: Based on PostgreSQL 14.4 with GP_VERSION_NUM 30000 (Cloudberry v3.0.0) - This matches the src/ports/cloudberry/3/ directory structure📊 Summary
The Cloudberry port is partially implemented. The detection logic and version-specific configs exist, but they're not wired into the build system. To complete the implementation, you would need: 1. Create src/ports/cloudberry/CMakeLists.txt (modeled after Greenplum's) 2. Add add_subdirectory(cloudberry) to src/ports/CMakeLists.txt 3. Potentially create CloudberryUtils.cmake for Cloudberry-specific features 4. Test the full build process with Cloudberry detection
Have fixed all mentioned problems and license lose
edespino
commented
Oct 22, 2025
PR Review: Cloudberry MADlib Build Issues
CMake Configuration Command
cmake \
-DCLOUDBERRY_3_PG_CONFIG=/usr/local/cloudberry/bin/pg_config \
-DCMAKE_C_COMPILER=gcc \
-DCMAKE_CXX_COMPILER=g++ \
-DCMAKE_BUILD_TYPE=Release \
-DCMAKE_INSTALL_PREFIX=/usr/local/madlib \
-DCLOUDBERRY_3_EXECUTABLE=/usr/local/cloudberry/bin/postgres \
..
CMake Configuration Error
Error:
CMake Error at src/CMakeLists.txt:202 (add_library):
Cannot find source file:
/home/cbadmin/bom-parts/madlib/src/ports/cloudberry/dbconnector/Compatibility.hpp
Tried extensions .c .C .c++ .cc .cpp .cxx .cu .mpp .m .M .mm .ixx .cppm .h
.hh .h++ .hm .hpp .hxx .in .txx .f .F .for .f77 .f90 .f95 .f03 .hip .ispc
CMake Error at src/CMakeLists.txt:202 (add_library):
No SOURCES given to target: madlib_cloudberry_3
CMake Generate step failed. Build files cannot be regenerated correctly.
Location: Referenced in src/ports/cloudberry/CMakeLists.txt:61
Observation: The directory /home/cbadmin/bom-parts/madlib/src/ports/cloudberry/dbconnector/ does not exist, while the equivalent Greenplum
directory does exist at /home/cbadmin/bom-parts/madlib/src/ports/greenplum/dbconnector/ containing:
- Compatibility.hpp
- dbconnector.hpp
Additional Build Errors (After Manual Directory Creation)
After manually creating the missing directory and copying files from Greenplum, cmake succeeded but compilation fails with multiple errors in
Compatibility.hpp:
- AggState API change: aggcontext member doesn't exist (suggests aggcontexts)
- WindowState renamed: T_WindowState not declared (suggests T_WindowAggState)
- Missing function: format_procedure not declared
- Function conflict: Ambiguous AggCheckCallContext - both the compatibility shim and PostgreSQL's native version exist
These errors indicate API differences between Greenplum's PostgreSQL base and Cloudberry's PostgreSQL base.
edespino
commented
Oct 22, 2025
@zhangwenchao-123 - Unless absolutely necessary, there is no need to force push additional PR commits. This will allow us to view the PR history easily.
00de02c to
1aad3dd
Compare
Fix SEGFAULT memory bugs There're weird SEGFAULT bug due to custom allocation erroneously paired with std::free (should be custom free) and we're unable to solve them. This is a workaround.
6b73eaa to
c725ce5
Compare
PR Review: Cloudberry MADlib Build Issues
CMake Configuration Command
cmake \ -DCLOUDBERRY_3_PG_CONFIG=/usr/local/cloudberry/bin/pg_config \ -DCMAKE_C_COMPILER=gcc \ -DCMAKE_CXX_COMPILER=g++ \ -DCMAKE_BUILD_TYPE=Release \ -DCMAKE_INSTALL_PREFIX=/usr/local/madlib \ -DCLOUDBERRY_3_EXECUTABLE=/usr/local/cloudberry/bin/postgres \ ..CMake Configuration Error
Error: CMake Error at src/CMakeLists.txt:202 (add_library): Cannot find source file: /home/cbadmin/bom-parts/madlib/src/ports/cloudberry/dbconnector/Compatibility.hppTried extensions .c .C .c++ .cc .cpp .cxx .cu .mpp .m .M .mm .ixx .cppm .h .hh .h++ .hm .hpp .hxx .in .txx .f .F .for .f77 .f90 .f95 .f03 .hip .ispc
CMake Error at src/CMakeLists.txt:202 (add_library): No SOURCES given to target: madlib_cloudberry_3
CMake Generate step failed. Build files cannot be regenerated correctly.
Location: Referenced in src/ports/cloudberry/CMakeLists.txt:61
Observation: The directory /home/cbadmin/bom-parts/madlib/src/ports/cloudberry/dbconnector/ does not exist, while the equivalent Greenplum directory does exist at /home/cbadmin/bom-parts/madlib/src/ports/greenplum/dbconnector/ containing:
- Compatibility.hpp
- dbconnector.hpp
Additional Build Errors (After Manual Directory Creation)
After manually creating the missing directory and copying files from Greenplum, cmake succeeded but compilation fails with multiple errors in Compatibility.hpp:
- AggState API change: aggcontext member doesn't exist (suggests aggcontexts)
- WindowState renamed: T_WindowState not declared (suggests T_WindowAggState)
- Missing function: format_procedure not declared
- Function conflict: Ambiguous AggCheckCallContext - both the compatibility shim and PostgreSQL's native version exist
These errors indicate API differences between Greenplum's PostgreSQL base and Cloudberry's PostgreSQL base.
Yeah, there are some other commits not picked, I will continue to complete this PR and test it.
@edespino
edespino
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a few changes to consider. I have more testing of this PR to perform.
Please do not force push changes to this PR. I want to be able to follow the history of this work. Force pushing is not helping.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cloudberry DB should be Apache Cloudberry
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
message("-- Detected Cloudberry") should be message("-- Detected Apache Cloudberry")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assume Cloudberry will stick to semantic versioning should be Assume Apache Cloudberry will stick to semantic versioning
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this symlink needed? I believe we should only be providing support for the Apache Cloudberry 2 & 3 (future) releases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Cloudberry[a-zA-Z\s]*(\d+\.\d+\.\d+)" should be "Apache Cloudberry[a-zA-Z\s]*(\d+\.\d+\.\d+)" ?
I am not entirely sure about this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cloudberry is enough to achieve our goal, while Apache Cloudberry is more accurate that maybe is better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this empty file needed?
I noticed this when I ran the Apache Release Audit tool (mvn apache-rat:check).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In apache cloudberry, it's not needed. I will remove it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fiexd
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Testing Note: Encountering TypeError: decoding str is not supported when installing on PostgreSQL 14.19.
Root Cause:
For Postgres 13+ (line 1347), libdir is already decoded to a string via .decode(), but line 1349 attempts to decode it again with
str(libdir.strip(), encoding='utf-8'), which fails because you cannot decode a string that's already been decoded.
Recommended Solution:
Ensure libdir is always decoded to a string before line 1349, then simply strip and append the path:
libdir = subprocess.check_output(['pg_config','--libdir']) if ((portid == 'greenplum' and is_rev_gte(dbver_split, get_rev_num('7.0'))) or (portid == 'postgres' and is_rev_gte(dbver_split, get_rev_num('13.0')))): libdir = libdir.decode() else: libdir = libdir.decode('utf-8') libdir = libdir.strip() + '/postgresql' This ensures libdir is consistently a string for all code paths (older and newer versions), eliminating the type inconsistency that causes the error. Request for Review: Please validate this fix works correctly for both: - Older versions (Postgres <13, Greenplum <7) where subprocess.check_output() returns bytes - Newer versions (Postgres 13+, Greenplum 7+) where explicit decoding is needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cool!
zhangwenchao-123
commented
Oct 24, 2025
All mentioned comments have been fixed and have tested it in cloudberry 3.0.
tuhaihe
commented
Oct 29, 2025
Hi @zhangwenchao-123 could you rebase your commits on the latest madlib2-master? Let's see if the CI can pass successfully. Thanks!
zhangwenchao-123
commented
Oct 30, 2025
Hi @zhangwenchao-123 could you rebase your commits on the latest
madlib2-master? Let's see if the CI can pass successfully. Thanks!
It's the NOTICE file check failed, I have fixed and test whether it can pass.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed these two files, FindCloudberry_2.cmake & FindCloudberry_3.cmake, are all symbolic links to FindCloudberry.cmake. Should we create them like GP / PG as ASCII text files? FYI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fix in e665a9f
Based on the new codebase, I can build and deploy the MADlib into the Cloudberry 2.0 + 3.0 (main) gpdemo database:
-
Build the Cloudberry gpdemo env following the docs
-
Build and deploy the MADlib
## Download this PR change
git clone https://github.com/apache/madlib.git
cd madlib
git fetch origin pull/627/head:zhangwenchao-123/support_cloudberry
git switch zhangwenchao-123/support_cloudberry
## Set Python env
sudo alternatives --install /usr/bin/python python /usr/bin/python3 1
## Install required depencies to the Cloudberry Dev container
sudo dnf install boost-devel -y
sudo dnf install -y graphviz # for docs
sudo dnf install --enablerepo=crb doxygen -y # for docs
pip install mock pandas numpy xgboost scikit-learn pyyaml pyxb-x pypmml
##
cd ~/madlib
mkdir build ; cd build
## for Cloudberry 3.0
cmake \
-DCLOUDBERRY_3_PG_CONFIG=$GPHOME/bin/pg_config \
-DCMAKE_C_COMPILER=gcc \
-DCMAKE_CXX_COMPILER=g++ \
-DCMAKE_BUILD_TYPE=Release \
-DCMAKE_INSTALL_PREFIX=/usr/local/madlib \
-DCLOUDBERRY_3_EXECUTABLE=$GPHOME/bin/postgres \
..
## for Cloudberry 2.0
cmake \
-DCLOUDBERRY_2_PG_CONFIG=$GPHOME/bin/pg_config \
-DCMAKE_C_COMPILER=gcc \
-DCMAKE_CXX_COMPILER=g++ \
-DCMAKE_BUILD_TYPE=Release \
-DCMAKE_INSTALL_PREFIX=/usr/local/madlib \
-DCLOUDBERRY_2_EXECUTABLE=$GPHOME/bin/postgres \
..
## Make, deploy, and run test
make -j$(nproc)
./src/bin/madpack -p cloudberry -c gpadmin@localhost:7000/postgres install
./src/bin/madpack -p cloudberry -c gpadmin@localhost:7000/postgres install-check
If something wrong, please help correct me. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can move the Python3 compatibility code into src/ports/postgres/modules/pmml/__init__.py_in to avoid the SQL-side code interfering with M4 macro expansion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change has been successfully tested in the Cloudberry environment and MADlib Jenkins CI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fix in e665a9f
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We also need to add the ASF license header to this file:
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
# http://www.apache.org/licenses/LICENSE-2.0
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fix in d018805
tuhaihe
commented
Feb 11, 2026
Hi @zhangyue1818 thanks for your contribution. But I tested this PR in Cloudberry 2.0 and the coming Cloudberry 2.1 release, one test case failed:
[gpadmin@cdw build]$ ./src/bin/madpack -p cloudberry -c gpadmin@localhost:7000/postgres install-check
madpack.py: INFO : Detected Apache Cloudberry version 2.0.0.
TEST CASE RESULT|Module: array_ops|array_ops.ic.sql_in|PASS|Time: 74 milliseconds
TEST CASE RESULT|Module: bayes|bayes.ic.sql_in|PASS|Time: 320 milliseconds
TEST CASE RESULT|Module: crf|crf_test_small.ic.sql_in|PASS|Time: 285 milliseconds
TEST CASE RESULT|Module: crf|crf_train_small.ic.sql_in|PASS|Time: 285 milliseconds
TEST CASE RESULT|Module: elastic_net|elastic_net.ic.sql_in|PASS|Time: 190 milliseconds
TEST CASE RESULT|Module: linalg|svd.ic.sql_in|PASS|Time: 572 milliseconds
TEST CASE RESULT|Module: linalg|matrix_ops.ic.sql_in|PASS|Time: 822 milliseconds
TEST CASE RESULT|Module: linalg|linalg.ic.sql_in|PASS|Time: 76 milliseconds
TEST CASE RESULT|Module: pmml|pmml.ic.sql_in|PASS|Time: 452 milliseconds
TEST CASE RESULT|Module: prob|prob.ic.sql_in|PASS|Time: 28 milliseconds
TEST CASE RESULT|Module: svm|svm.ic.sql_in|PASS|Time: 315 milliseconds
TEST CASE RESULT|Module: tsa|arima.ic.sql_in|PASS|Time: 1074 milliseconds
TEST CASE RESULT|Module: stemmer|porter_stemmer.ic.sql_in|PASS|Time: 34 milliseconds
TEST CASE RESULT|Module: conjugate_gradient|conj_grad.ic.sql_in|PASS|Time: 142 milliseconds
TEST CASE RESULT|Module: knn|knn.ic.sql_in|PASS|Time: 175 milliseconds
TEST CASE RESULT|Module: lda|lda.ic.sql_in|PASS|Time: 246 milliseconds
TEST CASE RESULT|Module: stats|correlation.ic.sql_in|PASS|Time: 182 milliseconds
TEST CASE RESULT|Module: stats|mw_test.ic.sql_in|PASS|Time: 42 milliseconds
TEST CASE RESULT|Module: stats|pred_metrics.ic.sql_in|PASS|Time: 255 milliseconds
TEST CASE RESULT|Module: stats|chi2_test.ic.sql_in|PASS|Time: 37 milliseconds
TEST CASE RESULT|Module: stats|anova_test.ic.sql_in|PASS|Time: 47 milliseconds
TEST CASE RESULT|Module: stats|t_test.ic.sql_in|PASS|Time: 42 milliseconds
TEST CASE RESULT|Module: stats|cox_prop_hazards.ic.sql_in|PASS|Time: 211 milliseconds
TEST CASE RESULT|Module: stats|ks_test.ic.sql_in|PASS|Time: 84 milliseconds
TEST CASE RESULT|Module: stats|robust_and_clustered_variance_coxph.ic.sql_in|PASS|Time: 355 milliseconds
TEST CASE RESULT|Module: stats|wsr_test.ic.sql_in|PASS|Time: 46 milliseconds
TEST CASE RESULT|Module: stats|f_test.ic.sql_in|PASS|Time: 38 milliseconds
TEST CASE RESULT|Module: utilities|utilities.ic.sql_in|PASS|Time: 115 milliseconds
TEST CASE RESULT|Module: utilities|pivot.ic.sql_in|PASS|Time: 119 milliseconds
TEST CASE RESULT|Module: utilities|path.ic.sql_in|PASS|Time: 159 milliseconds
TEST CASE RESULT|Module: utilities|transform_vec_cols.ic.sql_in|PASS|Time: 156 milliseconds
TEST CASE RESULT|Module: utilities|text_utilities.ic.sql_in|PASS|Time: 126 milliseconds
TEST CASE RESULT|Module: utilities|sessionize.ic.sql_in|PASS|Time: 105 milliseconds
TEST CASE RESULT|Module: utilities|encode_categorical.ic.sql_in|PASS|Time: 186 milliseconds
TEST CASE RESULT|Module: utilities|minibatch_preprocessing.ic.sql_in|PASS|Time: 186 milliseconds
TEST CASE RESULT|Module: assoc_rules|assoc_rules.ic.sql_in|FAIL|Time: 568 milliseconds
madpack.py: ERROR : Failed executing /tmp/madlib.7qnxdkya/assoc_rules/assoc_rules.ic.sql_in.tmp
madpack.py: ERROR : Check the log at /tmp/madlib.7qnxdkya/assoc_rules/assoc_rules.ic.sql_in.log
TEST CASE RESULT|Module: convex|lmf.ic.sql_in|PASS|Time: 297 milliseconds
TEST CASE RESULT|Module: convex|mlp.ic.sql_in|PASS|Time: 507 milliseconds
TEST CASE RESULT|Module: deep_learning|keras_model_arch_table.ic.sql_in|PASS|Time: 149 milliseconds
TEST CASE RESULT|Module: glm|glm.ic.sql_in|PASS|Time: 906 milliseconds
TEST CASE RESULT|Module: graph|graph.ic.sql_in|PASS|Time: 1343 milliseconds
TEST CASE RESULT|Module: linear_systems|sparse_linear_sytems.ic.sql_in|PASS|Time: 132 milliseconds
TEST CASE RESULT|Module: linear_systems|dense_linear_sytems.ic.sql_in|PASS|Time: 125 milliseconds
TEST CASE RESULT|Module: recursive_partitioning|decision_tree.ic.sql_in|PASS|Time: 252 milliseconds
TEST CASE RESULT|Module: recursive_partitioning|random_forest.ic.sql_in|PASS|Time: 322 milliseconds
TEST CASE RESULT|Module: regress|robust.ic.sql_in|PASS|Time: 193 milliseconds
TEST CASE RESULT|Module: regress|logistic.ic.sql_in|PASS|Time: 249 milliseconds
TEST CASE RESULT|Module: regress|linear.ic.sql_in|PASS|Time: 31 milliseconds
TEST CASE RESULT|Module: regress|clustered.ic.sql_in|PASS|Time: 189 milliseconds
TEST CASE RESULT|Module: regress|multilogistic.ic.sql_in|PASS|Time: 323 milliseconds
TEST CASE RESULT|Module: regress|marginal.ic.sql_in|PASS|Time: 457 milliseconds
TEST CASE RESULT|Module: sample|balance_sample.ic.sql_in|PASS|Time: 139 milliseconds
TEST CASE RESULT|Module: sample|train_test_split.ic.sql_in|PASS|Time: 166 milliseconds
TEST CASE RESULT|Module: sample|sample.ic.sql_in|PASS|Time: 20 milliseconds
TEST CASE RESULT|Module: sample|stratified_sample.ic.sql_in|PASS|Time: 112 milliseconds
TEST CASE RESULT|Module: summary|summary.ic.sql_in|PASS|Time: 148 milliseconds
TEST CASE RESULT|Module: kmeans|kmeans.ic.sql_in|PASS|Time: 661 milliseconds
TEST CASE RESULT|Module: pca|pca.ic.sql_in|PASS|Time: 1475 milliseconds
TEST CASE RESULT|Module: pca|pca_project.ic.sql_in|PASS|Time: 528 milliseconds
TEST CASE RESULT|Module: validation|cross_validation.ic.sql_in|PASS|Time: 332 milliseconds
INFO: Log files saved in /tmp/madlib.7qnxdkya
[gpadmin@cdw build]$ cat /tmp/madlib.7qnxdkya/assoc_rules/assoc_rules.ic.sql_in.log
-- Switch to test user:
SET ROLE "madlib_210_installcheck_postgres";
SET
-- Set SEARCH_PATH for install-check:
SET search_path=madlib_installcheck_assoc_rules,madlib;
SET
/* ----------------------------------------------------------------------- *//**
*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing,
* software distributed under the License is distributed on an
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
* KIND, either express or implied. See the License for the
* specific language governing permissions and limitations
* under the License.
*
*//* ----------------------------------------------------------------------- */
---------------------------------------------------------------------------
-- Rules:
-- ------
-- 1) Any DB objects should be created w/o schema prefix,
-- since this file is executed in a separate schema context.
-- 2) There should be no DROP statements in this script, since
-- all objects created in the default schema will be cleaned-up outside.
---------------------------------------------------------------------------
---------------------------------------------------------------------------
-- Setup:
---------------------------------------------------------------------------
CREATE OR REPLACE FUNCTION assoc_array_eq
(
arr1 TEXT[],
arr2 TEXT[]
)
RETURNS BOOL AS $$
SELECT COUNT(*) = array_upper(1,ドル 1) AND array_upper(1,ドル 1) = array_upper(2,ドル 1)
FROM (SELECT unnest(1ドル) id) t1, (SELECT unnest(2ドル) id) t2
WHERE t1.id = t2.id;
$$ LANGUAGE sql IMMUTABLE;
CREATE FUNCTION
CREATE OR REPLACE FUNCTION install_test() RETURNS VOID AS $$
declare
result1 TEXT;
result2 TEXT;
result3 TEXT;
result_maxiter TEXT;
res madlib.assoc_rules_results;
output_schema TEXT;
output_table TEXT;
total_rules INT;
total_time INTERVAL;
begin
DROP TABLE IF EXISTS test_data1;
CREATE TABLE test_data1 (
trans_id INT
, product INT
);
DROP TABLE IF EXISTS test_data2;
CREATE TABLE test_data2 (
trans_id INT
, product VARCHAR
);
INSERT INTO test_data1 VALUES (1,1);
INSERT INTO test_data1 VALUES (1,2);
INSERT INTO test_data1 VALUES (3,3);
INSERT INTO test_data1 VALUES (8,4);
INSERT INTO test_data1 VALUES (10,1);
INSERT INTO test_data1 VALUES (10,2);
INSERT INTO test_data1 VALUES (10,3);
INSERT INTO test_data1 VALUES (19,2);
INSERT INTO test_data2 VALUES (1, 'beer');
INSERT INTO test_data2 VALUES (1, 'diapers');
INSERT INTO test_data2 VALUES (1, 'chips');
INSERT INTO test_data2 VALUES (2, 'beer');
INSERT INTO test_data2 VALUES (2, 'diapers');
INSERT INTO test_data2 VALUES (3, 'beer');
INSERT INTO test_data2 VALUES (3, 'diapers');
INSERT INTO test_data2 VALUES (4, 'beer');
INSERT INTO test_data2 VALUES (4, 'chips');
INSERT INTO test_data2 VALUES (5, 'beer');
INSERT INTO test_data2 VALUES (6, 'beer');
INSERT INTO test_data2 VALUES (6, 'diapers');
INSERT INTO test_data2 VALUES (6, 'chips');
INSERT INTO test_data2 VALUES (7, 'beer');
INSERT INTO test_data2 VALUES (7, 'diapers');
DROP TABLE IF EXISTS test1_exp_result;
CREATE TABLE test1_exp_result (
ruleid integer,
pre text[],
post text[],
support double precision,
confidence double precision,
lift double precision,
conviction double precision
) ;
DROP TABLE IF EXISTS test2_exp_result;
CREATE TABLE test2_exp_result (
ruleid integer,
pre text[],
post text[],
support double precision,
confidence double precision,
lift double precision,
conviction double precision
) ;
INSERT INTO test1_exp_result VALUES (7, '{3}', '{1}', 0.20000000000000001, 0.5, 1.2499999999999998, 1.2);
INSERT INTO test1_exp_result VALUES (4, '{2}', '{1}', 0.40000000000000002, 0.66666666666666674, 1.6666666666666667, 1.8000000000000003);
INSERT INTO test1_exp_result VALUES (1, '{1}', '{2,3}', 0.20000000000000001, 0.5, 2.4999999999999996, 1.6000000000000001);
INSERT INTO test1_exp_result VALUES (9, '{2,3}', '{1}', 0.20000000000000001, 1, 2.4999999999999996, 0);
INSERT INTO test1_exp_result VALUES (6, '{1,2}', '{3}', 0.20000000000000001, 0.5, 1.2499999999999998, 1.2);
INSERT INTO test1_exp_result VALUES (8, '{3}', '{2}', 0.20000000000000001, 0.5, 0.83333333333333337, 0.80000000000000004);
INSERT INTO test1_exp_result VALUES (5, '{1}', '{2}', 0.40000000000000002, 1, 1.6666666666666667, 0);
INSERT INTO test1_exp_result VALUES (2, '{3}', '{2,1}', 0.20000000000000001, 0.5, 1.2499999999999998, 1.2);
INSERT INTO test1_exp_result VALUES (10, '{3,1}', '{2}', 0.20000000000000001, 1, 1.6666666666666667, 0);
INSERT INTO test1_exp_result VALUES (3, '{1}', '{3}', 0.20000000000000001, 0.5, 1.2499999999999998, 1.2);
INSERT INTO test2_exp_result VALUES (7, '{chips,diapers}', '{beer}', 0.2857142857142857, 1, 1, 0);
INSERT INTO test2_exp_result VALUES (2, '{chips}', '{diapers}', 0.2857142857142857, 0.66666666666666663, 0.93333333333333324, 0.85714285714285698);
INSERT INTO test2_exp_result VALUES (1, '{chips}', '{diapers,beer}', 0.2857142857142857, 0.66666666666666663, 0.93333333333333324, 0.85714285714285698);
INSERT INTO test2_exp_result VALUES (6, '{diapers}', '{beer}', 0.7142857142857143, 1, 1, 0);
INSERT INTO test2_exp_result VALUES (4, '{beer}', '{diapers}', 0.7142857142857143, 0.7142857142857143, 1, 1);
INSERT INTO test2_exp_result VALUES (3, '{chips,beer}', '{diapers}', 0.2857142857142857, 0.66666666666666663, 0.93333333333333324, 0.85714285714285698);
INSERT INTO test2_exp_result VALUES (5, '{chips}', '{beer}', 0.42857142857142855, 1, 1, 0);
res = madlib.assoc_rules (.1, .5, 'trans_id', 'product', 'test_data1','madlib_installcheck_assoc_rules', false);
RETURN;
end $$ language plpgsql;
CREATE FUNCTION
---------------------------------------------------------------------------
-- Test
---------------------------------------------------------------------------
SELECT install_test();
psql:/tmp/madlib.7qnxdkya/assoc_rules/assoc_rules.ic.sql_in.tmp:154: NOTICE: table "test_data1" does not exist, skipping
psql:/tmp/madlib.7qnxdkya/assoc_rules/assoc_rules.ic.sql_in.tmp:154: NOTICE: Table doesn't have 'DISTRIBUTED BY' clause -- Using column named 'trans_id' as the Apache Cloudberry data distribution key for this table.
HINT: The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew.
psql:/tmp/madlib.7qnxdkya/assoc_rules/assoc_rules.ic.sql_in.tmp:154: NOTICE: table "test_data2" does not exist, skipping
psql:/tmp/madlib.7qnxdkya/assoc_rules/assoc_rules.ic.sql_in.tmp:154: NOTICE: Table doesn't have 'DISTRIBUTED BY' clause -- Using column named 'trans_id' as the Apache Cloudberry data distribution key for this table.
HINT: The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew.
psql:/tmp/madlib.7qnxdkya/assoc_rules/assoc_rules.ic.sql_in.tmp:154: NOTICE: table "test1_exp_result" does not exist, skipping
psql:/tmp/madlib.7qnxdkya/assoc_rules/assoc_rules.ic.sql_in.tmp:154: NOTICE: Table doesn't have 'DISTRIBUTED BY' clause -- Using column named 'ruleid' as the Apache Cloudberry data distribution key for this table.
HINT: The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew.
psql:/tmp/madlib.7qnxdkya/assoc_rules/assoc_rules.ic.sql_in.tmp:154: NOTICE: table "test2_exp_result" does not exist, skipping
psql:/tmp/madlib.7qnxdkya/assoc_rules/assoc_rules.ic.sql_in.tmp:154: NOTICE: Table doesn't have 'DISTRIBUTED BY' clause -- Using column named 'ruleid' as the Apache Cloudberry data distribution key for this table.
HINT: The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew.
psql:/tmp/madlib.7qnxdkya/assoc_rules/assoc_rules.ic.sql_in.tmp:154: WARNING: terminating connection because of crash of another server process (seg0 slice3 172.17.0.6:7002 pid=45213)
DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
HINT: In a moment you should be able to reconnect to the database and repeat your command.
psql:/tmp/madlib.7qnxdkya/assoc_rules/assoc_rules.ic.sql_in.tmp:154: WARNING: terminating connection because of crash of another server process (seg0 slice1 172.17.0.6:7002 pid=45202)
DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
HINT: In a moment you should be able to reconnect to the database and repeat your command.
psql:/tmp/madlib.7qnxdkya/assoc_rules/assoc_rules.ic.sql_in.tmp:154: WARNING: terminating connection because of crash of another server process (seg0 172.17.0.6:7002 pid=45137)
DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
HINT: In a moment you should be able to reconnect to the database and repeat your command.
psql:/tmp/madlib.7qnxdkya/assoc_rules/assoc_rules.ic.sql_in.tmp:154: WARNING: writer gang of current global transaction is lost
psql:/tmp/madlib.7qnxdkya/assoc_rules/assoc_rules.ic.sql_in.tmp:154: WARNING: Any temporary tables for this session have been dropped because the gang was disconnected (session id = 596)
psql:/tmp/madlib.7qnxdkya/assoc_rules/assoc_rules.ic.sql_in.tmp:154: ERROR: DTX RollbackAndReleaseCurrentSubTransaction dispatch failed
CONTEXT: PL/Python function "assoc_rules"
PL/pgSQL function install_test() line 93 at assignment
tuhaihe
commented
Feb 11, 2026
Hi @zhangyue1818 thanks for your contribution. But I tested this PR in Cloudberry 2.0 and the coming Cloudberry 2.1 release, one test case failed:
The error above occurred in a Docker container environment. I retested MADlib installation and install-check on Cloudberry 2.0 and 2.1 running in a virtual machine, and all tests (including assoc_rules) passed without errors.
Thanks again.
Add bounds checking before accessing unique value arrays to prevent out-of-bounds reads in the SparseData operation loop. Problem: In op_sdata_by_sdata(), the loop increments indices i and j to traverse the unique values in left and right SparseData structures. After incrementing, the code immediately accesses vals->data[i] and vals->data[j] in the next iteration without verifying that i and j are within bounds (i.e., < unique_value_count). This could lead to reading beyond the allocated array boundaries. Solution: Add explicit bounds checking after index increments and before accessing the arrays. The check breaks the loop if either index reaches or exceeds the respective unique_value_count, preventing invalid memory access. The fix is placed after the index increment logic (lines 1088-1101) and before reading run_length values and accessing the vals arrays, ensuring all subsequent array operations are safe.
zhangyue1818
commented
Feb 11, 2026
Hi @zhangyue1818 thanks for your contribution. But I tested this PR in Cloudberry 2.0 and the coming Cloudberry 2.1 release, one test case failed:
The error above occurred in a Docker container environment. I retested MADlib installation and install-check on Cloudberry 2.0 and 2.1 running in a virtual machine, and all tests (including assoc_rules) passed without errors.
Thanks again.
fix in b57e5a9
tuhaihe
commented
Feb 11, 2026
Hi @zhangyue1818 thanks for your contribution. But I tested this PR in Cloudberry 2.0 and the coming Cloudberry 2.1 release, one test case failed:
The error above occurred in a Docker container environment. I retested MADlib installation and install-check on Cloudberry 2.0 and 2.1 running in a virtual machine, and all tests (including assoc_rules) passed without errors.
Thanks again.fix in b57e5a9
Thanks! Now tested and run well both on Docker and vitual machine env.