3

I'm trying to read a JSON API with the Postgres copy from program command.

When I call

copy _temp from program 'curl -uri "https://url"';

it works fine but the API has paging and I need to loop through multiple calls.

When I call it like this:

_command := 'curl -uri "https://url?&page='||(10::text)||"';
copy _temp from program _command;`

I get

ERROR: syntax error at or near "_command"

You can't even concatenate in place like

copy _temp from program 'curl -uri "https://url?&page='||(10::text)||'"';

Percent parametric replacement in the raise notice style doesn't work either.

What gives? program is a literal string with single quotes so what is the difference between specifying a literal string and using a text or varchar variable? There doesn't seem to be any program data type (::program cast does nothing), what am I missing?

In the docs it says 'it is best to use a fixed command string' not that you can only use a fixed command string...

How do I use a dynamic command string?

Paul White
95.3k30 gold badges439 silver badges689 bronze badges
asked Jan 16, 2018 at 8:29
0

2 Answers 2

14

If you want to use dynamic commands, you need the right context for it. Plain SQL is not one.

One possibility is using PL/pgSQL, and it's EXECUTE functionality. Use DO to get into the context:

DO $$
DECLARE page integer;
BEGIN
FOR page IN SELECT i FROM generate_series(1, 10) AS t(i)
LOOP
 EXECUTE format($cmd$ COPY _temp 
 FROM PROGRAM 'curl -uri "https://url?&page=%s"'$cmd,ドル
 page::text);
END LOOP;
END; $$;

The other way is to run the loop on the shell side. The following example is using bash:

for i in $(seq 1 10); do
 psql -c "COPY _temp FROM PROGRAM 'curl -uri https://url?&page=${i}'"
done

Notes:

  • I am using two levels of dollar-quoting ($$ and $cmd$) to prevent quote multiplication.
  • I am also using format() instead of the messy concatenation.
  • If you really have to quote the URI of the curl command, you will have to escape the quotes in there.
Andriy M
23.3k6 gold badges60 silver badges104 bronze badges
answered Jan 16, 2018 at 9:48
0
2

On the matter of crawling

If the website has paging you're not going to want to do this in psql, nor shell. The reason for that simple, you (normally) won't know where to stop. That's one of the tricks of paging.

  • Some servers return 404 when you go far.
  • Some servers return 500 (though they shouldn't)
  • Some servers return 200 and tell you there are no more results.

At that point that you're doing this you need to start looking into web crawling. Factors to consider,

  • Crawling in serial, or parallel
  • Creating an HTTP request
  • Parsing the response header (unless your framework provides it)
  • Parsing the response body (if you need to process html)
  • Handling failure (retry, etc)

I'm of the opinion that the best compromise in this department is Node/Rx.js/request-promise/Cheerio: I do it a lot.

JSON Api?

I think there would be a lot of disagreement on a "proper json API". And, there are other drawbacks regardless of what you want to call it,

  • Your db session will be tied up waiting for curl
  • The transaction is subject to any server-wide timeout limits.
  • The user-functions execute in serial in the session, so you won't get to take advantage parallel requests (asynchronous crawling)
  • You'll be tearing up a new TCP connection each time you run curl.
answered Jan 17, 2018 at 2:23
0

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.