Updating a PostgreSQL Table With Different Random Dates

Question 1

I have two tables in PostgreSQL, Employee and Leave like following:

CREATE TEMP TABLE employee_table AS SELECT
 id::int,
 name::text
FROM ( VALUES
 (1, 'John' ),
 (2, 'David') 
) AS t(id, name);
CREATE TEMP TABLE leave_table AS SELECT
 id::int,
 leave_date::date,
 emp_id::int
FROM ( VALUES
 (1, '01/10/1993' ,1),
 (2, Null ,1),
 (3, Null ,1),
 (4, '02/12/1990' ,2),
 (5, Null ,2),
 (6, Null ,2) 
) AS t(ID,Leave_Date,Emp_ID);

I want to update the Leave table and set the Leave_Date column to a random date maybe 01/01/2000 for each employee but if an employee has more than one Null entry in Leave table, I want to update his null entries with two different dates which means one employee should not have two same Leave_Date value in Leave table and after update my leave table should look like following:

ID Leave_Date Emp_ID 
1 01/10/1993 1 
2 01/01/2000 1 
3 01/01/2001 1

As shown above, initially john had two null entries in Leave table and the result shows that those entries are update with two different dates. Is there any way to this?

Question 2

so we're never updating non-null dates?

Question 3

Also when you say random date? you mean 1/1 of a random year?

Question 4

no we never update non-null dates

Question 5

First you never gave a test case for when there is only one date and it is null, so we create that.

INSERT INTO employee_table (id, name) VALUES (3, 'Evan Carroll');
INSERT INTO leave_table VALUES ( 10, null, 3 );

Then we run a command to check whether or not an emp_id has more than one entry in leave_table. The results are in that derived table. We update accordingly. Here we generate a date that represents the year start between 1900-2020. Just update this for what you mean by "random date" you didn't define it in your question.

UPDATE leave_table
SET leave_date = CASE
 WHEN t.count = 1 OR t.count IS NULL
 THEN '01/01/2000'::date
 ELSE '1/1/1900'::date + ('1 year'::interval*floor(random()*120))
END
FROM (
 SELECT emp_id, count(*) FROM leave_table
 WHERE leave_date IS NULL
 GROUP BY emp_id
) AS t
WHERE leave_date IS NULL
AND t.emp_id = leave_table.emp_id;

Then we have it

TABLE leave_table;
 id | leave_date | emp_id 
----+------------+--------
 1 | 1993年01月10日 | 1
 4 | 1990年02月12日 | 2
 2 | 1964年01月01日 | 1
 3 | 1929年01月01日 | 1
 5 | 1933年01月01日 | 2
 6 | 1902年01月01日 | 2
 10 | 2000年01月01日 | 3

Now, as @McNets pointed out yesterday, I am kind of cheating. Instead, try this (much more complex query) which suffices the question's update his [emp_id] null entries with two different dates

WITH t AS (
 SELECT
 id,
 emp_id,
 leave_date,
 count(*) OVER (PARTITION BY emp_id) AS max_nulls,
 row_number() OVER (PARTITION BY emp_id)
 FROM leave_table
 WHERE leave_table.leave_date IS NULL
)
UPDATE leave_table
SET leave_date = CASE
 WHEN t.max_nulls = 1 OR t.max_nulls IS NULL
 THEN '01/01/2000'::date
 ELSE date_series_emp.ds
END
FROM t
INNER JOIN (
 SELECT distinct_emps.emp_id,
 gs.ds,
 count(*) OVER (PARTITION BY emp_id ORDER BY random()) AS row_number
 FROM ( SELECT DISTINCT emp_id FROM leave_table ) AS distinct_emps
 CROSS JOIN generate_series('1/1/1900'::date, '1/1/1990'::date, '1 month')
 AS gs(ds)
) AS date_series_emp
 USING ( emp_id, row_number )
WHERE t.id = leave_table.id;

Breaking it apart, the CTE does this

 SELECT
 id,
 emp_id,
 leave_date,
 count(*) FILTER (WHERE leave_date IS NULL) OVER (PARTITION BY emp_id) AS max_nulls,
 row_number() OVER (PARTITION BY emp_id)
 FROM leave_table

That generates how many nulls are in the set, and row numbers from within the set that we can join on for a 1:1 with the update query,

 id │ emp_id │ leave_date │ max_nulls │ row_number 
────┼────────┼────────────┼───────────┼────────────
 2 │ 1 │ │ 2 │ 1
 3 │ 1 │ │ 2 │ 2
 5 │ 2 │ │ 2 │ 1
 6 │ 2 │ │ 2 │ 2
 10 │ 3 │ │ 1 │ 1

The only other tricky part is the inner-join select,

 SELECT distinct_emps.emp_id,
 gs.ds,
 count(*) OVER (PARTITION BY emp_id ORDER BY random()) AS row_number
 FROM ( SELECT DISTINCT emp_id FROM leave_table ) AS distinct_emps
 CROSS JOIN generate_series('1/1/1900'::date, '1/1/1990'::date, '1 month')
 AS gs(ds)

There we're taking the distinct emp_ids, and joining them on a sequence of dates that you're calling random. We count(*) over that sequence to give it a corresponding random number from within the cardinal sequences generated.

Then we join this to the table and perform the update..

This method does have a one drawback, if the input size ever exhausts your pool of "random dates" (only 1081 of them), the update on rows past that max won't be performed at all.

Question 6

It is updating the null values with the dates less than 2000 right? if it is then the answer is exactly what i want.

Question 7

Do you need the dates to be unique on empid

Question 8

Carrol: This is exactly what I need sir, but in some cases it generates random dates which is higher than year 2000, can you please tell me how should I modify the query so it can only generate random dates which are not higher than 1999 or 2000, I mean the year must be smaller than 2000.

Question 9

Could you define "random" dates? And, do you need them to be unique on empid? Is a random date the start of any month before 1/1/2000?

Question 10

yes I want them to be unique on empid sir, by random dates I mean, if an employee has two or maybe three or maybe all of its entries are null entries in leave table, all of an its null entires must be filled up with different dates and the dates should be between 01/01/1900 to suppose 01/01/1990.

Question 11

Initially I thought it can be done using a single SQL command (And maybe someone knows a good solution to solve this.):

update Leave
set Leave_Date = (select coalesce(max(l1.Leave_Date) + interval '1 day', '1900-01-01'::date) 
 from Leave l1 
 where l1.Emp_ID = Leave.Emp_ID)
where Leave_Date is null;

But, Postgres does not COMMIT all changes until all records has been updated, and max(l1.Leave_Date) returns always the same value.

I've used a FOR-LOOP inside a function.

As @EvanCarroll commented yesterday, I didn't use a random date and the question says: With Different Random Dates

Well, now instead of simply add one day to the max date, the function adds a random number of days. By now, I've used 9 days, however this behavior can be easily modified. (For instance: (interval '1 day' * (random() * 9 + 21)::int))

CREATE OR REPLACE FUNCTION Leave_Update_Null_Dates()
RETURNS integer 
AS
$$
DECLARE
 lv RECORD;
BEGIN
 FOR lv IN SELECT * FROM Leave where Leave_Date is NULL LOOP
 UPDATE Leave
 SET Leave_Date = (select coalesce(max(l1.Leave_Date) + (interval '1 day' * (random() * 9)::int), '1990-01-01'::date) 
 from Leave l1 
 where l1.Emp_ID = lv.Emp_ID)
 WHERE ID = lv.ID;
 END LOOP;
 RETURN 0;
END;
$$
LANGUAGE 'plpgsql';
select Leave_Update_Null_Dates();

This is the result:

+----+---------------------+--------+
| id | leave_date | emp_id |
+----+---------------------+--------+
| 1 | 01.10.1993 00:00:00 | 1 |
+----+---------------------+--------+
| 2 | 06.10.1993 00:00:00 | 1 |
+----+---------------------+--------+
| 3 | 08.10.1993 00:00:00 | 1 |
+----+---------------------+--------+
| 4 | 12.02.1990 00:00:00 | 2 |
+----+---------------------+--------+
| 5 | 20.02.1990 00:00:00 | 2 |
+----+---------------------+--------+
| 6 | 25.02.1990 00:00:00 | 2 |
+----+---------------------+--------+
| 7 | 01.01.1990 00:00:00 | 3 |
+----+---------------------+--------+

You can check it here: http://rextester.com/BWZ31281

I hope this helps you.

Question 12

Single SQL command: dba.stackexchange.com/a/159476/2639

Question 13

"But, I suppose Postgres does not COMMIT all changes until all records has been updated" - of course it doesn't. A statement sees (and has to see) a consistent view of the tables involved from the moment the statement started. Everything else would be terrible.

Question 14

Ill update with a solution to that tonight

Question 15

Updated my answer to do exactly that. BTW @McNets your answer isn't exactly doing what the question called. You're generating unique dates, but not "random" dates by any means. ;)

Question 16

I will edit mine next year!! BTW, HAPPY NEW YEAR @EvanCarroll

Evan Carroll Evan Carroll 65.7k50 gold badges259 silver badges510 bronze badges · Accepted Answer · 2016-12-29 16:21:54Z

First you never gave a test case for when there is only one date and it is null, so we create that.

INSERT INTO employee_table (id, name) VALUES (3, 'Evan Carroll');
INSERT INTO leave_table VALUES ( 10, null, 3 );

Then we run a command to check whether or not an emp_id has more than one entry in leave_table. The results are in that derived table. We update accordingly. Here we generate a date that represents the year start between 1900-2020. Just update this for what you mean by "random date" you didn't define it in your question.

UPDATE leave_table
SET leave_date = CASE
 WHEN t.count = 1 OR t.count IS NULL
 THEN '01/01/2000'::date
 ELSE '1/1/1900'::date + ('1 year'::interval*floor(random()*120))
END
FROM (
 SELECT emp_id, count(*) FROM leave_table
 WHERE leave_date IS NULL
 GROUP BY emp_id
) AS t
WHERE leave_date IS NULL
AND t.emp_id = leave_table.emp_id;

Then we have it

TABLE leave_table;
 id | leave_date | emp_id 
----+------------+--------
 1 | 1993年01月10日 | 1
 4 | 1990年02月12日 | 2
 2 | 1964年01月01日 | 1
 3 | 1929年01月01日 | 1
 5 | 1933年01月01日 | 2
 6 | 1902年01月01日 | 2
 10 | 2000年01月01日 | 3

Now, as @McNets pointed out yesterday, I am kind of cheating. Instead, try this (much more complex query) which suffices the question's update his [emp_id] null entries with two different dates

WITH t AS (
 SELECT
 id,
 emp_id,
 leave_date,
 count(*) OVER (PARTITION BY emp_id) AS max_nulls,
 row_number() OVER (PARTITION BY emp_id)
 FROM leave_table
 WHERE leave_table.leave_date IS NULL
)
UPDATE leave_table
SET leave_date = CASE
 WHEN t.max_nulls = 1 OR t.max_nulls IS NULL
 THEN '01/01/2000'::date
 ELSE date_series_emp.ds
END
FROM t
INNER JOIN (
 SELECT distinct_emps.emp_id,
 gs.ds,
 count(*) OVER (PARTITION BY emp_id ORDER BY random()) AS row_number
 FROM ( SELECT DISTINCT emp_id FROM leave_table ) AS distinct_emps
 CROSS JOIN generate_series('1/1/1900'::date, '1/1/1990'::date, '1 month')
 AS gs(ds)
) AS date_series_emp
 USING ( emp_id, row_number )
WHERE t.id = leave_table.id;

Breaking it apart, the CTE does this

 SELECT
 id,
 emp_id,
 leave_date,
 count(*) FILTER (WHERE leave_date IS NULL) OVER (PARTITION BY emp_id) AS max_nulls,
 row_number() OVER (PARTITION BY emp_id)
 FROM leave_table

That generates how many nulls are in the set, and row numbers from within the set that we can join on for a 1:1 with the update query,

 id │ emp_id │ leave_date │ max_nulls │ row_number 
────┼────────┼────────────┼───────────┼────────────
 2 │ 1 │ │ 2 │ 1
 3 │ 1 │ │ 2 │ 2
 5 │ 2 │ │ 2 │ 1
 6 │ 2 │ │ 2 │ 2
 10 │ 3 │ │ 1 │ 1

The only other tricky part is the inner-join select,

 SELECT distinct_emps.emp_id,
 gs.ds,
 count(*) OVER (PARTITION BY emp_id ORDER BY random()) AS row_number
 FROM ( SELECT DISTINCT emp_id FROM leave_table ) AS distinct_emps
 CROSS JOIN generate_series('1/1/1900'::date, '1/1/1990'::date, '1 month')
 AS gs(ds)

There we're taking the distinct emp_ids, and joining them on a sequence of dates that you're calling random. We count(*) over that sequence to give it a corresponding random number from within the cardinal sequences generated.

Then we join this to the table and perform the update..

This method does have a one drawback, if the input size ever exhausts your pool of "random dates" (only 1081 of them), the update on rows past that max won't be performed at all.

It is updating the null values with the dates less than 2000 right? if it is then the answer is exactly what i want.
Carrol: This is exactly what I need sir, but in some cases it generates random dates which is higher than year 2000, can you please tell me how should I modify the query so it can only generate random dates which are not higher than 1999 or 2000, I mean the year must be smaller than 2000.
Could you define "random" dates? And, do you need them to be unique on empid? Is a random date the start of any month before 1/1/2000?
yes I want them to be unique on empid sir, by random dates I mean, if an employee has two or maybe three or maybe all of its entries are null entries in leave table, all of an its null entires must be filled up with different dates and the dates should be between 01/01/1900 to suppose 01/01/1990.

Stack Exchange Network

Updating a PostgreSQL Table With Different Random Dates

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Updating a PostgreSQL Table With Different Random Dates

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions