PostgreSQL Procedure using python write dataframe to table

Question 1

I am developing a PostgreSQL Procedure using the plpython3u extension which let you use python procedural language into PostgreSQL.

With the following code using plpy, I am able to retrive data form table and put it into pandas dataframe.

CREATE OR REPLACE PROCEDURE public.plpy_proc_clas_full(
 )
LANGUAGE 'plpython3u'
AS $BODY$
 import pandas as pd
 
 data_lt = plpy.execute('SELECT "key", "value" FROM public."<your-table>" ORDER BY "key"'); #PLyResult --> List or Dictionary
 data_df_x = pd.DataFrame.from_records(data_lt)['key'];
 data_df_y = pd.DataFrame.from_records(data_lt)['value'];
 df = pd.concat([data_df_x, data_df_y], axis=1).values
 
 return df;
$BODY$;

But how can I write back the pandas dataframe to a table (for example after a few data manipulations in python)?

Question 2

The documentation shows an example in a rather unexpected section, on trapping errors:

CREATE FUNCTION insert_fraction(numerator int, denominator int) RETURNS text AS $$
from plpy import spiexceptions
try:
 plan = plpy.prepare("INSERT INTO fractions (frac) VALUES (1ドル / 2ドル)", ["int", "int"])
 plpy.execute(plan, [numerator, denominator])
except spiexceptions.DivisionByZero:
 return "denominator cannot equal zero"
except spiexceptions.UniqueViolation:
 return "already have that fraction"
except plpy.SPIError as e:
 return "other error, SQLSTATE %s" % e.sqlstate
else:
 return "fraction inserted"
$$ LANGUAGE plpython3u;

That would suggest you need to map the dataframe back to a record and bind it to placeholders (1,ドル 2ドル) in a plpy.executed INSERT...VALUES... statement.

As with all basic db interactions, you could also dynamically construct the full text of a literal, static insert statement with inlined literal values based on that dataframe, then plpy.execute that. While simple and intuitive, it will be slower (rewriting, reformatting the data that could be otherwise passed directly and processed automatically) and less robust (no auto-quoting or sanitation, for example).

Question 3

After some time, I read a few articles which were using libraries like psycopg2 or plpy.connect to open a connection to destination database.

Personally, I do not find any reason to re-open a connection while manipulating data (in and out) within the same runtime/database.

I thought of a solution which makes the job, although not sure about performance while handling huge tables because of using an iteration into the dataframe which case multiple insert statements, more specific one insert per line into dataframe, which is not bulk.

CREATE OR REPLACE PROCEDURE public.plpy_proc_test(
 )
LANGUAGE 'plpython3u'
AS $BODY$
 import pandas as pd
 
 data_lt = plpy.execute('SELECT "key", "value" FROM public."ml_train" ORDER BY "key"'); #PLyResult --> List or Dictionary
 data_df_x = pd.DataFrame.from_records(data_lt)['key'];
 data_df_y = pd.DataFrame.from_records(data_lt)['value'];
 
 df = pd.concat([data_df_x, data_df_y], axis=1)
 
 for index, row in df.iterrows():
 plan = plpy.prepare('INSERT INTO test (y_hat, y_act) VALUES (1,ドル 2ドル)', ['numeric', 'numeric'])
 plpy.execute(plan, [row[0], row[1]])
$BODY$;

Question 4

inserting row by row does not seem very efficient.

Question 5

true, but I haven’t found any other alternative for now

no_frameworks 714 bronze badges · Accepted Answer · 2023-08-31 09:47:06Z

The documentation shows an example in a rather unexpected section, on trapping errors:

CREATE FUNCTION insert_fraction(numerator int, denominator int) RETURNS text AS $$
from plpy import spiexceptions
try:
 plan = plpy.prepare("INSERT INTO fractions (frac) VALUES (1ドル / 2ドル)", ["int", "int"])
 plpy.execute(plan, [numerator, denominator])
except spiexceptions.DivisionByZero:
 return "denominator cannot equal zero"
except spiexceptions.UniqueViolation:
 return "already have that fraction"
except plpy.SPIError as e:
 return "other error, SQLSTATE %s" % e.sqlstate
else:
 return "fraction inserted"
$$ LANGUAGE plpython3u;

That would suggest you need to map the dataframe back to a record and bind it to placeholders (1,ドル 2ドル) in a plpy.executed INSERT...VALUES... statement.

As with all basic db interactions, you could also dynamically construct the full text of a literal, static insert statement with inlined literal values based on that dataframe, then plpy.execute that. While simple and intuitive, it will be slower (rewriting, reformatting the data that could be otherwise passed directly and processed automatically) and less robust (no auto-quoting or sanitation, for example).

CollectivesTM on Stack Overflow

PostgreSQL Procedure using python write dataframe to table

2 Answers 2

Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

CollectivesTM on Stack Overflow

2 Answers 2

Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related