0

I am storing high-frequency price data in Postgres. The table is indexed and the query correctly uses the index, which I can confirm by EXPLAIN ANALYZE result:

Index Scan using ix_minute_contract_id_source_id_bar_open_dt on bars_minute tb (cost=0.58..19899291.37 rows=1 width=52) (actual time=1.418..3169.181 rows=4724529 loops=1)
 Index Cond: ((contract_id = 10726) AND (source_id = 6) AND (bar_open_dt >= '2010-01-01 00:00:00'::timestamp without time zone) AND (bar_open_dt <= '2022-09-30 00:00:00'::timestamp without time zone))
 Filter: (sub_source_id IS NULL)
Planning Time: 0.139 ms
Execution Time: 3249.658 ms

Then when reading the data from Python takes about 30 seconds. From the logs I can see:

4,724,529 rows retrieved starting from 1 in 41 s 117 ms (execution: 4 s 395 ms, fetching: 36 s 722 ms)

So my understanding is that EXPLAIN ANALYZE is quite accurate in calculating execution time but it does not take into account fetching time.

From Does PostgreSQL Explain Analyze Execution Time consider Fetching time I understand that the latter is influenced by memory, CPU and network. Is there a way to profile the fetching part to trace what exactly slows down the query?

UPDATE 1

Adding table definition and client example:

create table bars_minute
(
 id bigserial constraint pk_bars_minute primary key,
 contract_id integer not null constraint fk_bars_minute_contract_id_contracts references contracts,
 source_id integer not null constraint fk_bars_minute_source_id_sources references sources,
 bar_open_dt timestamp not null,
 price_open double precision,
 price_high double precision,
 price_low double precision,
 price_close double precision, 
 constraint uq_bars_minute_contract_id unique (contract_id, source_id, bar_open_dt)
);
create index ix_minute_contract_id_source_id_bar_open_dt on bars_minute (contract_id, source_id, bar_open_dt);

And client:

conn = psycopg2.connect("dbname='db' user='user' host='localhost' password='pwd'")
sql = """
 SELECT bar_open_dt, price_open, price_high, price_low, price_close
 FROM bars_minute
 where contract_id = 1
 AND bar_open_dt >= '2010-01-01'
 AND bar_open_dt <= '2022-09-30'
 AND source_id = 1
 ORDER BY bar_open_dt;
"""
cur = conn.cursor()
cur.execute(sql)
rows = cur.fetchall()
df = pd.DataFrame(rows)

UPDATE 2

As @jjanes suggested that the bottleneck might be in PyCharm, I indeed confirmed that the query in PyCharm console takes 2x time vs Python client. I was originally using SQLAlchemy connection with psycopg2 driver, which would take 17 seconds. I then tried using psycopg2 directly (again, as @jjanes suggested), which resulted in:

cur = conn.cursor()
cur.execute(sql) # 3 seconds
rows = cur.fetchall() # 5 seconds
df = pd.DataFrame(rows) # 3 seconds

totalling in 11 seconds.

Is it right to assume at this point that this is as good as it can be given the EXPLAIN ANALYZE results?

asked Oct 27, 2022 at 15:44
8
  • Are the client and server on the same computer, or is the query traversing a network? Commented Oct 27, 2022 at 15:58
  • Yes, the client and the server are on the same computer Commented Oct 27, 2022 at 16:07
  • That is not a familiar log message to me. What log was it, and what did you do to configure it? Commented Oct 27, 2022 at 17:02
  • That’s PyCharm/DataGrip console output message (when running queries in PyCharm console) Commented Oct 27, 2022 at 17:07
  • This is almost certainly a bottleneck in pycharm/DataGrip, so you would need to do the profiling there. So you would need to show the client code, and tag the question with those, except it also is not really a DBA topic then. Commented Oct 27, 2022 at 17:29

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.