I have a table which contains stock data for various companies. The data goes back as far as 2003, and there are approx 40M rows for each timeframe.
CREATE TABLE stocks_data.bars (
timeframe varchar(3) NOT NULL,
"timestamp" timestamp NOT NULL,
"open" float8 NULL,
high float8 NULL,
low float8 NULL,
"close" float8 NULL,
volume int8 NULL,
"security" varchar(12) NOT NULL,
ext bool NOT NULL DEFAULT false,
realtime bool NOT NULL DEFAULT false,
CONSTRAINT bars_pkey PRIMARY KEY ("timestamp", security, timeframe)
)
The data looks like so: enter image description here
I want to perform some technical calculations on these rows, for each ticker symbol ("AAPL" for example). I am using plpython3u to wrap these technical calculation functions. Now say, I want to calculate a stochastic RSI on all 40M rows, unique for each ticker ("security" column).
What would be the ideal approach for handling and passing around this data in postgres? Current naive function:
-- function should take in a list of numbers, then calculate the stochRSI and return these values
CREATE OR REPLACE FUNCTION test_01(input double precision[])
RETURNS setof double precision
AS $$
import talib
k, d = talib.STOCHRSI(input)
return [k, d]
$$ LANGUAGE plpython3u;
I have tried:
- Using the procedural function as a window function, which fails because it is not a window function:
select *, test_01(close) over (partition by "security") as tech
from stocks_data.bars bars
group by bars."security";
- Using a lateral join, but this errors because then the function is only passed a single value (stochastic RSI should be calculated on a list of values). There is also no way to separate the calculations by "security" here.
SELECT f.*
FROM stocks_data.bars bars, test_01(bars.close) f
WHERE bars.timeframe = '1d';
Are there any working & performant options here?
-
Write an aggregate function, where the state transition function gathers the values in an array, and use your function as the "final function". You'd have to return the result as an array rather than a table.Laurenz Albe– Laurenz Albe2023年11月20日 20:49:21 +00:00Commented Nov 20, 2023 at 20:49
-
stackoverflow.com/questions/13790028/…Charlieface– Charlieface2023年11月20日 23:29:28 +00:00Commented Nov 20, 2023 at 23:29
-
In my testing, appending to an array (using array_cat) is taking way too long to complete. Calculating the AVG on 40M rows takes about 2.5 seconds, while appending then computing is taking over 3 minutes.Eoin Fitzpatrick– Eoin Fitzpatrick2023年11月21日 06:14:02 +00:00Commented Nov 21, 2023 at 6:14
-
You can group your data by the "security" column and pass the grouped data to your function. This way, your function receives all the close prices for a given security as an array.TSCAmerica.com– TSCAmerica.com2023年11月21日 11:38:25 +00:00Commented Nov 21, 2023 at 11:38