2

I am developing a PostgreSQL Procedure using the plpython3u extension which let you use python procedural language into PostgreSQL.

With the following code using plpy, I am able to retrive data form table and put it into pandas dataframe.

CREATE OR REPLACE PROCEDURE public.plpy_proc_clas_full(
 )
LANGUAGE 'plpython3u'
AS $BODY$
 import pandas as pd
 
 data_lt = plpy.execute('SELECT "key", "value" FROM public."<your-table>" ORDER BY "key"'); #PLyResult --> List or Dictionary
 data_df_x = pd.DataFrame.from_records(data_lt)['key'];
 data_df_y = pd.DataFrame.from_records(data_lt)['value'];
 df = pd.concat([data_df_x, data_df_y], axis=1).values
 
 return df;
$BODY$;

But how can I write back the pandas dataframe to a table (for example after a few data manipulations in python)?

asked Aug 31, 2023 at 9:04

2 Answers 2

2

The documentation shows an example in a rather unexpected section, on trapping errors:

CREATE FUNCTION insert_fraction(numerator int, denominator int) RETURNS text AS $$
from plpy import spiexceptions
try:
 plan = plpy.prepare("INSERT INTO fractions (frac) VALUES (1ドル / 2ドル)", ["int", "int"])
 plpy.execute(plan, [numerator, denominator])
except spiexceptions.DivisionByZero:
 return "denominator cannot equal zero"
except spiexceptions.UniqueViolation:
 return "already have that fraction"
except plpy.SPIError as e:
 return "other error, SQLSTATE %s" % e.sqlstate
else:
 return "fraction inserted"
$$ LANGUAGE plpython3u;

That would suggest you need to map the dataframe back to a record and bind it to placeholders (1,ドル 2ドル) in a plpy.executed INSERT...VALUES... statement.

As with all basic db interactions, you could also dynamically construct the full text of a literal, static insert statement with inlined literal values based on that dataframe, then plpy.execute that. While simple and intuitive, it will be slower (rewriting, reformatting the data that could be otherwise passed directly and processed automatically) and less robust (no auto-quoting or sanitation, for example).

answered Aug 31, 2023 at 9:47
Sign up to request clarification or add additional context in comments.

Comments

0

After some time, I read a few articles which were using libraries like psycopg2 or plpy.connect to open a connection to destination database.

Personally, I do not find any reason to re-open a connection while manipulating data (in and out) within the same runtime/database.

I thought of a solution which makes the job, although not sure about performance while handling huge tables because of using an iteration into the dataframe which case multiple insert statements, more specific one insert per line into dataframe, which is not bulk.

CREATE OR REPLACE PROCEDURE public.plpy_proc_test(
 )
LANGUAGE 'plpython3u'
AS $BODY$
 import pandas as pd
 
 data_lt = plpy.execute('SELECT "key", "value" FROM public."ml_train" ORDER BY "key"'); #PLyResult --> List or Dictionary
 data_df_x = pd.DataFrame.from_records(data_lt)['key'];
 data_df_y = pd.DataFrame.from_records(data_lt)['value'];
 
 df = pd.concat([data_df_x, data_df_y], axis=1)
 
 for index, row in df.iterrows():
 plan = plpy.prepare('INSERT INTO test (y_hat, y_act) VALUES (1,ドル 2ドル)', ['numeric', 'numeric'])
 plpy.execute(plan, [row[0], row[1]])
$BODY$;
answered Aug 31, 2023 at 9:50

2 Comments

inserting row by row does not seem very efficient.
true, but I haven’t found any other alternative for now

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.