How do I use a dynamic command with copy from program?

Question 1

I'm trying to read a JSON API with the Postgres copy from program command.

When I call

copy _temp from program 'curl -uri "https://url"';

it works fine but the API has paging and I need to loop through multiple calls.

When I call it like this:

_command := 'curl -uri "https://url?&page='||(10::text)||"';
copy _temp from program _command;`

I get

ERROR: syntax error at or near "_command"

You can't even concatenate in place like

copy _temp from program 'curl -uri "https://url?&page='||(10::text)||'"';

Percent parametric replacement in the raise notice style doesn't work either.

What gives? program is a literal string with single quotes so what is the difference between specifying a literal string and using a text or varchar variable? There doesn't seem to be any program data type (::program cast does nothing), what am I missing?

In the docs it says 'it is best to use a fixed command string' not that you can only use a fixed command string...

How do I use a dynamic command string?

Question 2

If you want to use dynamic commands, you need the right context for it. Plain SQL is not one.

One possibility is using PL/pgSQL, and it's EXECUTE functionality. Use DO to get into the context:

DO $$
DECLARE page integer;
BEGIN
FOR page IN SELECT i FROM generate_series(1, 10) AS t(i)
LOOP
 EXECUTE format($cmd$ COPY _temp 
 FROM PROGRAM 'curl -uri "https://url?&page=%s"'$cmd,ドル
 page::text);
END LOOP;
END; $$;

The other way is to run the loop on the shell side. The following example is using bash:

for i in $(seq 1 10); do
 psql -c "COPY _temp FROM PROGRAM 'curl -uri https://url?&page=${i}'"
done

Notes:

I am using two levels of dollar-quoting ($$ and $cmd$ ) to prevent quote multiplication.
I am also using format() instead of the messy concatenation.
If you really have to quote the URI of the curl command, you will have to escape the quotes in there.

Question 3

On the matter of crawling

If the website has paging you're not going to want to do this in psql, nor shell. The reason for that simple, you (normally) won't know where to stop. That's one of the tricks of paging.

Some servers return 404 when you go far.
Some servers return 500 (though they shouldn't)
Some servers return 200 and tell you there are no more results.

At that point that you're doing this you need to start looking into web crawling. Factors to consider,

Crawling in serial, or parallel
Creating an HTTP request
Parsing the response header (unless your framework provides it)
Parsing the response body (if you need to process html)
Handling failure (retry, etc)

I'm of the opinion that the best compromise in this department is Node/Rx.js/request-promise/Cheerio: I do it a lot.

JSON Api?

I think there would be a lot of disagreement on a "proper json API". And, there are other drawbacks regardless of what you want to call it,

Your db session will be tied up waiting for curl
The transaction is subject to any server-wide timeout limits.
The user-functions execute in serial in the session, so you won't get to take advantage parallel requests (asynchronous crawling)
You'll be tearing up a new TCP connection each time you run curl.

score 14 · Answer 1 · 2018-01-16 09:48:13Z

If you want to use dynamic commands, you need the right context for it. Plain SQL is not one.

One possibility is using PL/pgSQL, and it's EXECUTE functionality. Use DO to get into the context:

DO $$
DECLARE page integer;
BEGIN
FOR page IN SELECT i FROM generate_series(1, 10) AS t(i)
LOOP
 EXECUTE format($cmd$ COPY _temp 
 FROM PROGRAM 'curl -uri "https://url?&page=%s"'$cmd,ドル
 page::text);
END LOOP;
END; $$;

The other way is to run the loop on the shell side. The following example is using bash:

for i in $(seq 1 10); do
 psql -c "COPY _temp FROM PROGRAM 'curl -uri https://url?&page=${i}'"
done

Notes:

I am using two levels of dollar-quoting ($$ and $cmd$ ) to prevent quote multiplication.
I am also using format() instead of the messy concatenation.
If you really have to quote the URI of the curl command, you will have to escape the quotes in there.

Evan Carroll Evan Carroll 65.7k50 gold badges259 silver badges510 bronze badges · Answer 2 · 2018-01-17 02:23:44Z

On the matter of crawling

If the website has paging you're not going to want to do this in psql, nor shell. The reason for that simple, you (normally) won't know where to stop. That's one of the tricks of paging.

Some servers return 404 when you go far.
Some servers return 500 (though they shouldn't)
Some servers return 200 and tell you there are no more results.

At that point that you're doing this you need to start looking into web crawling. Factors to consider,

Crawling in serial, or parallel
Creating an HTTP request
Parsing the response header (unless your framework provides it)
Parsing the response body (if you need to process html)
Handling failure (retry, etc)

I'm of the opinion that the best compromise in this department is Node/Rx.js/request-promise/Cheerio: I do it a lot.

JSON Api?

I think there would be a lot of disagreement on a "proper json API". And, there are other drawbacks regardless of what you want to call it,

Your db session will be tied up waiting for curl
The transaction is subject to any server-wide timeout limits.
The user-functions execute in serial in the session, so you won't get to take advantage parallel requests (asynchronous crawling)
You'll be tearing up a new TCP connection each time you run curl.

Stack Exchange Network

How do I use a dynamic command with copy from program?

2 Answers 2

On the matter of crawling

JSON Api?

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

How do I use a dynamic command with copy from program?

2 Answers 2

On the matter of crawling

JSON Api?

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions