I need an aggregate function that MySQL doesn't provide.
I would like it to be in MySQL's flavor of SQL (that is, not in C).
How do I do this? What I'm stuck on is creating an aggregate function -- the docs don't seem to mention how this is done.
Examples of desired usage of a product
function:
mysql> select product(col) as a from `table`;
+------+
| a |
+------+
| 144 |
+------+
1 row in set (0.00 sec)
mysql> select col, product(col) as a from `table` group by col;
+-----+------+
| col | a |
+-----+------+
| 6 | 36 |
| 4 | 4 |
+-----+------+
2 rows in set (0.01 sec)
4 Answers 4
According to the documentation http://dev.mysql.com/doc/refman/5.5/en/adding-udf.html it's only possible to write aggregate functions in C. Sorry!
-
Either C or C++. Not SQL, anyway.Mike Sherrill 'Cat Recall'– Mike Sherrill 'Cat Recall'2012年10月18日 00:49:53 +00:00Commented Oct 18, 2012 at 0:49
-
1I presume any language which can generate binary libraries in the platform-supported binary format with C calling conventions.Colin 't Hart– Colin 't Hart2012年10月18日 07:30:12 +00:00Commented Oct 18, 2012 at 7:30
-
I don't know. It was documented as "C or C++ (or another language that can use C calling conventions)" in version 5.0. The docs dropped "or another language that can use C calling conventions" in version 5.1. That's a strange phrase to drop.Mike Sherrill 'Cat Recall'– Mike Sherrill 'Cat Recall'2012年10月18日 10:18:33 +00:00Commented Oct 18, 2012 at 10:18
-
is it available in recent mysql versions now (after few years) ?Dinesh– Dinesh2017年06月23日 16:01:32 +00:00Commented Jun 23, 2017 at 16:01
I don't know if there is way to define a new aggregate function, not without messing with MySQL source code.
But if your numbers are all positive, you may well derive from the arithmetic identity:
log( product( Ai ) ) = sum( log( Ai ) )
that you can use EXP(SUM(LOG(x)))
to calculate PRODUCT(x)
. Test in SQL-Fiddle:
SELECT EXP(SUM(LOG(a))) AS product
FROM t ;
SELECT col, EXP(SUM(LOG(a))) AS product
FROM t
GROUP BY col ;
When the data can have 0s, it gets a bit more complicated:
SELECT (NOT EXISTS (SELECT 1 FROM t WHERE a = 0))
* EXP(SUM(LOG(a))) AS p
FROM t
WHERE a > 0 ;
SELECT d.col,
(NOT EXISTS (SELECT 1 FROM t AS ti WHERE ti.col = d.col AND ti.a = 0))
* COALESCE(EXP(SUM(LOG(t.a))),1) AS p
FROM
( SELECT DISTINCT col
FROM t
) AS d
LEFT JOIN
t ON t.col = d.col
AND t.a > 0
GROUP BY d.col ;
Tested at SQL-Fiddle
For other DBMS, that do not have MySQL's auto-conversion of boolean values to integers, the
(NOT EXISTS (SELECT ...))
should be replaced with:
(CASE WHEN EXISTS (SELECT 1...) THEN 0 ELSE 1 END)
Specifically for Oracle, a few more changes will be needed, without changing the logic of the answer, only because Oracle does not follow strict ANSI standard in some areas. Tested at SQL-Fiddle-2
-
2That's sweet. High school math has come back to haunt me. +1 !!!RolandoMySQLDBA– RolandoMySQLDBA2012年04月30日 21:49:31 +00:00Commented Apr 30, 2012 at 21:49
-
1Cool math, but I actually wanted to know how to create an aggregate function in general.
product
was just supposed to be one example of several.Matt Fenwick– Matt Fenwick2012年04月30日 23:01:50 +00:00Commented Apr 30, 2012 at 23:01 -
This is pretty cool, but doesn't work if any of the values are zero, since log(0) is undefined.jameshfisher– jameshfisher2014年04月30日 17:30:12 +00:00Commented Apr 30, 2014 at 17:30
-
@jameshfisher Correct. One can write easily the extra condition, checking for zeros (where the product would be zero of course). I didn't think at the time it was necessary to add that complication.ypercubeᵀᴹ– ypercubeᵀᴹ2014年04月30日 17:40:19 +00:00Commented Apr 30, 2014 at 17:40
-
It's not clear to me how best to add that condition. We can't add the condition in the inner function of the values: since we want that
PRODUCT(..., 0, ...) = 0
, we want thatEXP(SUM(..., f(0), ...)) = 0
, for somef
that we choose, but to satisfy this, we need thatSUM(..., f(0), ...) = LOG(0)
-- again thwarted by the same issue that log(0) is undefined. We need to check for the presence of zero in some other way, e.g.MIN(ABS(a)) = 0
. So we'd haveSELECT CASE WHEN MIN(ABS(a)) = 0 THEN 0 ELSE EXP(SUM(LOG(a))) END AS product
. Is this the kind of thing you were thinking of?jameshfisher– jameshfisher2014年04月30日 17:56:05 +00:00Commented Apr 30, 2014 at 17:56
In the interests of learning how to fish, I have successfully compiled and installed a "Hello, World!" UDF (user-defined function) for MySQL found here. The hello_world.so file (after being complied with gcc -shared -o hello_world.so -I /usr/include/mysql hello_world.c
) should be stored in /usr/lib/mysql/plugins/ with 755 permissions on Ubuntu linux systems. [The "-I /usr/include/mysql" is the path to the mysql header files; I found my code wouldn't compile without this parameter, but YMMV.]
The program does nothing but print out the string "Hello, World!" for each record in the resulting dataset of a query, but that's all it is supposed to do. I will try to write a SMALL aggregate function in the next few days. There is an example of an aggregate function which computes the average cost of a group of price and quantity records; the SMALL function shouldn't be that different from that function in the end.
Hope this helps.
Technically not a user defined aggregate function, and not optimal, but this works and doesn't need to create an UDF: use GROUP_CONCAT()
in your query which will make your query an aggregate one and will consolidate all values in one comma-separated string, and then write a SQL function that processes that string.
Example (function based on this answer)::
DELIMITER $$
CREATE FUNCTION `replace_multiple`(_text text, _from text, _to text) RETURNS text CHARSET utf8mb4
DETERMINISTIC
BEGIN
DECLARE _nextfrom TEXT DEFAULT NULL;
DECLARE _nextto TEXT DEFAULT NULL;
DECLARE _nextlenfrom INT DEFAULT NULL;
DECLARE _nextlento INT DEFAULT NULL;
DECLARE _valuefrom TEXT DEFAULT NULL;
DECLARE _valueto TEXT DEFAULT NULL;
iterator:
LOOP
-- exit the loop if the list seems empty or was null;
-- this extra caution is necessary to avoid an endless loop in the proc.
IF CHAR_LENGTH(TRIM(_from)) = 0 OR _from IS NULL THEN
LEAVE iterator;
END IF;
-- capture the next value from the list
SET _nextfrom = SUBSTRING_INDEX(_from,',',1);
SET _nextto = SUBSTRING_INDEX(_to,',',1);
-- save the length of the captured value; we will need to remove this
-- many characters + 1 from the beginning of the string
-- before the next iteration
SET _nextlenfrom = CHAR_LENGTH(_nextfrom);
SET _nextlento = CHAR_LENGTH(_nextto);
-- trim the value of leading and trailing spaces, in case of sloppy CSV strings
SET _valuefrom = TRIM(_nextfrom);
SET _valueto = TRIM(_nextto);
-- use the extracted value
SET _text = replace(_text, _valuefrom, _valueto);
-- rewrite the original string using the `INSERT()` string function,
-- args are original string, start position, how many characters to remove,
-- and what to "insert" in their place (in this case, we "insert"
-- an empty string, which removes _nextlen + 1 characters)
SET _from = INSERT(_from,1,_nextlenfrom + 1,'');
SET _to = INSERT(_to,1,_nextlento + 1,'');
END LOOP;
RETURN _text;
END$$
DELIMITER ;
Table t:
id | fromThis | toThat |
---|---|---|
1 | 1 | one |
2 | 2 | two |
3 | 3 | three |
Query:
SELECT replace_multiple('my 2 cents', GROUP_CONCAT(fromThis), GROUP_CONCAT(toThat))
FROM t;
Will translate to:
SELECT replace_multiple('my 2 cents', '1,2,3', 'one,two,three')
Which for this specific custom function will return my two cents