5

I have two tables in PostgreSQL, Employee and Leave like following:

CREATE TEMP TABLE employee_table AS SELECT
 id::int,
 name::text
FROM ( VALUES
 (1, 'John' ),
 (2, 'David') 
) AS t(id, name);
CREATE TEMP TABLE leave_table AS SELECT
 id::int,
 leave_date::date,
 emp_id::int
FROM ( VALUES
 (1, '01/10/1993' ,1),
 (2, Null ,1),
 (3, Null ,1),
 (4, '02/12/1990' ,2),
 (5, Null ,2),
 (6, Null ,2) 
) AS t(ID,Leave_Date,Emp_ID);

I want to update the Leave table and set the Leave_Date column to a random date maybe 01/01/2000 for each employee but if an employee has more than one Null entry in Leave table, I want to update his null entries with two different dates which means one employee should not have two same Leave_Date value in Leave table and after update my leave table should look like following:

ID Leave_Date Emp_ID 
1 01/10/1993 1 
2 01/01/2000 1 
3 01/01/2001 1 

As shown above, initially john had two null entries in Leave table and the result shows that those entries are update with two different dates. Is there any way to this?

Evan Carroll
65.7k50 gold badges259 silver badges510 bronze badges
asked Dec 29, 2016 at 10:08
3
  • so we're never updating non-null dates? Commented Dec 29, 2016 at 16:12
  • Also when you say random date? you mean 1/1 of a random year? Commented Dec 29, 2016 at 16:12
  • no we never update non-null dates Commented Dec 31, 2016 at 4:43

2 Answers 2

6

First you never gave a test case for when there is only one date and it is null, so we create that.

INSERT INTO employee_table (id, name) VALUES (3, 'Evan Carroll');
INSERT INTO leave_table VALUES ( 10, null, 3 );

Then we run a command to check whether or not an emp_id has more than one entry in leave_table. The results are in that derived table. We update accordingly. Here we generate a date that represents the year start between 1900-2020. Just update this for what you mean by "random date" you didn't define it in your question.

UPDATE leave_table
SET leave_date = CASE
 WHEN t.count = 1 OR t.count IS NULL
 THEN '01/01/2000'::date
 ELSE '1/1/1900'::date + ('1 year'::interval*floor(random()*120))
END
FROM (
 SELECT emp_id, count(*) FROM leave_table
 WHERE leave_date IS NULL
 GROUP BY emp_id
) AS t
WHERE leave_date IS NULL
AND t.emp_id = leave_table.emp_id;

Then we have it

TABLE leave_table;
 id | leave_date | emp_id 
----+------------+--------
 1 | 1993年01月10日 | 1
 4 | 1990年02月12日 | 2
 2 | 1964年01月01日 | 1
 3 | 1929年01月01日 | 1
 5 | 1933年01月01日 | 2
 6 | 1902年01月01日 | 2
 10 | 2000年01月01日 | 3

Now, as @McNets pointed out yesterday, I am kind of cheating. Instead, try this (much more complex query) which suffices the question's update his [emp_id] null entries with two different dates

WITH t AS (
 SELECT
 id,
 emp_id,
 leave_date,
 count(*) OVER (PARTITION BY emp_id) AS max_nulls,
 row_number() OVER (PARTITION BY emp_id)
 FROM leave_table
 WHERE leave_table.leave_date IS NULL
)
UPDATE leave_table
SET leave_date = CASE
 WHEN t.max_nulls = 1 OR t.max_nulls IS NULL
 THEN '01/01/2000'::date
 ELSE date_series_emp.ds
END
FROM t
INNER JOIN (
 SELECT distinct_emps.emp_id,
 gs.ds,
 count(*) OVER (PARTITION BY emp_id ORDER BY random()) AS row_number
 FROM ( SELECT DISTINCT emp_id FROM leave_table ) AS distinct_emps
 CROSS JOIN generate_series('1/1/1900'::date, '1/1/1990'::date, '1 month')
 AS gs(ds)
) AS date_series_emp
 USING ( emp_id, row_number )
WHERE t.id = leave_table.id;

Breaking it apart, the CTE does this

 SELECT
 id,
 emp_id,
 leave_date,
 count(*) FILTER (WHERE leave_date IS NULL) OVER (PARTITION BY emp_id) AS max_nulls,
 row_number() OVER (PARTITION BY emp_id)
 FROM leave_table

That generates how many nulls are in the set, and row numbers from within the set that we can join on for a 1:1 with the update query,

 id │ emp_id │ leave_date │ max_nulls │ row_number 
────┼────────┼────────────┼───────────┼────────────
 2 │ 1 │ │ 2 │ 1
 3 │ 1 │ │ 2 │ 2
 5 │ 2 │ │ 2 │ 1
 6 │ 2 │ │ 2 │ 2
 10 │ 3 │ │ 1 │ 1

The only other tricky part is the inner-join select,

 SELECT distinct_emps.emp_id,
 gs.ds,
 count(*) OVER (PARTITION BY emp_id ORDER BY random()) AS row_number
 FROM ( SELECT DISTINCT emp_id FROM leave_table ) AS distinct_emps
 CROSS JOIN generate_series('1/1/1900'::date, '1/1/1990'::date, '1 month')
 AS gs(ds)

There we're taking the distinct emp_ids, and joining them on a sequence of dates that you're calling random. We count(*) over that sequence to give it a corresponding random number from within the cardinal sequences generated.

Then we join this to the table and perform the update..

This method does have a one drawback, if the input size ever exhausts your pool of "random dates" (only 1081 of them), the update on rows past that max won't be performed at all.

answered Dec 29, 2016 at 16:21
9
  • It is updating the null values with the dates less than 2000 right? if it is then the answer is exactly what i want. Commented Dec 31, 2016 at 5:54
  • Do you need the dates to be unique on empid Commented Dec 31, 2016 at 6:01
  • Carrol: This is exactly what I need sir, but in some cases it generates random dates which is higher than year 2000, can you please tell me how should I modify the query so it can only generate random dates which are not higher than 1999 or 2000, I mean the year must be smaller than 2000. Commented Dec 31, 2016 at 6:28
  • Could you define "random" dates? And, do you need them to be unique on empid? Is a random date the start of any month before 1/1/2000? Commented Dec 31, 2016 at 6:31
  • yes I want them to be unique on empid sir, by random dates I mean, if an employee has two or maybe three or maybe all of its entries are null entries in leave table, all of an its null entires must be filled up with different dates and the dates should be between 01/01/1900 to suppose 01/01/1990. Commented Dec 31, 2016 at 6:52
2

Initially I thought it can be done using a single SQL command (And maybe someone knows a good solution to solve this.):

update Leave
set Leave_Date = (select coalesce(max(l1.Leave_Date) + interval '1 day', '1900-01-01'::date) 
 from Leave l1 
 where l1.Emp_ID = Leave.Emp_ID)
where Leave_Date is null;

But, Postgres does not COMMIT all changes until all records has been updated, and max(l1.Leave_Date) returns always the same value.

I've used a FOR-LOOP inside a function.

As @EvanCarroll commented yesterday, I didn't use a random date and the question says: With Different Random Dates

Well, now instead of simply add one day to the max date, the function adds a random number of days. By now, I've used 9 days, however this behavior can be easily modified. (For instance: (interval '1 day' * (random() * 9 + 21)::int))

CREATE OR REPLACE FUNCTION Leave_Update_Null_Dates()
RETURNS integer 
AS
$$
DECLARE
 lv RECORD;
BEGIN
 FOR lv IN SELECT * FROM Leave where Leave_Date is NULL LOOP
 UPDATE Leave
 SET Leave_Date = (select coalesce(max(l1.Leave_Date) + (interval '1 day' * (random() * 9)::int), '1990-01-01'::date) 
 from Leave l1 
 where l1.Emp_ID = lv.Emp_ID)
 WHERE ID = lv.ID;
 END LOOP;
 RETURN 0;
END;
$$
LANGUAGE 'plpgsql';
select Leave_Update_Null_Dates();

This is the result:

+----+---------------------+--------+
| id | leave_date | emp_id |
+----+---------------------+--------+
| 1 | 01.10.1993 00:00:00 | 1 |
+----+---------------------+--------+
| 2 | 06.10.1993 00:00:00 | 1 |
+----+---------------------+--------+
| 3 | 08.10.1993 00:00:00 | 1 |
+----+---------------------+--------+
| 4 | 12.02.1990 00:00:00 | 2 |
+----+---------------------+--------+
| 5 | 20.02.1990 00:00:00 | 2 |
+----+---------------------+--------+
| 6 | 25.02.1990 00:00:00 | 2 |
+----+---------------------+--------+
| 7 | 01.01.1990 00:00:00 | 3 |
+----+---------------------+--------+

You can check it here: http://rextester.com/BWZ31281

I hope this helps you.

answered Dec 29, 2016 at 12:57
6
  • 1
    Single SQL command: dba.stackexchange.com/a/159476/2639 Commented Dec 29, 2016 at 16:22
  • 2
    "But, I suppose Postgres does not COMMIT all changes until all records has been updated" - of course it doesn't. A statement sees (and has to see) a consistent view of the tables involved from the moment the statement started. Everything else would be terrible. Commented Dec 29, 2016 at 16:41
  • Ill update with a solution to that tonight Commented Dec 29, 2016 at 16:54
  • Updated my answer to do exactly that. BTW @McNets your answer isn't exactly doing what the question called. You're generating unique dates, but not "random" dates by any means. ;) Commented Dec 31, 2016 at 3:44
  • I will edit mine next year!! BTW, HAPPY NEW YEAR @EvanCarroll Commented Dec 31, 2016 at 13:37

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.