3

I'm using SQL Server 2008 R2, building a fact table for SSAS.

I can therefore accept solutions to this via either SQL query to SSIS task.

Let's say I have a fact table that looks a little like this, its a record of when people have eaten cakes, and how many (for illustrative purposes).

Surrogate Key | CakeType | PersonKey | CakesEaten | Date | HadSeconds
1 20 1234 4 20120412 0
2 41 2345 3 20120401 0 
3 25 3456 4 20120408 0
4 39 2345 2 20120402 0
5 20 4567 3 20120409 0
6 89 9876 2 20120405 0
7 31 1579 4 20120408 0
8 29 2709 3 20120402 0
9 27 3456 3 20120410 0
10 26 2345 2 20120408 0

CakeType and PersonKey and Date are all foreign keys to dimensions.

CakesEaten and HadSeconds are measures.

Techie type question: What I want to do, is to mark a fact as 'HadSeconds' where a Person has a record in the table within 2 days of the previous fact.

Business type question: Show me the instances of people going to eat cake when they've already had some cake in the previous 2 days.

For example, Person '2345' has 3 records in this fact table, IDs 2, 4, 10.

I want to mark 'HadSeconds' for ID '4' as a '1', because the Date for ID 4 is within 2 days of ID2, therefore the person ate cake in quick succession, and this instance needs to be marked as 'HadSeconds'. ID '10' has a datestamp which is 6 days after ID '4', therefore the 'HadSeconds' column for ID 10 does not need to be marked.

Note that only records were a Person has multiple entries in the fact need to be considered.

The best (lol) thing? This fact table has approx. 60 million records in it.

What's the most efficient way to achieve this? Breaking stuff down into temporary tables? Sorting? Cursors (ew...), Aggregration tasks in SSIS...

asked Apr 12, 2012 at 19:59

1 Answer 1

3

Just an UPDATE using EXISTS will be your best bet:

UPDATE c
SET HadSeconds = 1
FROM Caketable C
WHERE EXISTS (SELECT 1
 FROM CakeTable C2
 WHERE c2.personid = c.personid
 AND c2.date > DATEADD(day, -2, c.date))

If you think this will run too long, you can always break it down by date range, cake type, etc to do smaller batches.

If you have a composite index on personid, date this should be pretty fast.

You could also do a self-join but I'm pretty confident the EXISTS will run a lot faster:

UPDATE c
SET HadSeconds = 1
FROM Caketable C
INNER JOIN CakeTable C2
 ON c2.personid = c.personid
 AND c2.date > DATEADD(day, -2, c.date)
answered Apr 12, 2012 at 20:25
6
  • Missing a closing bracket round the subquery in that code snippet, (also I think you mean c2.date instead of cd.date but that self-join seems to be the way forward I think... Cheers! Commented Apr 12, 2012 at 20:27
  • @JamesLove fixed the syntax. Let me know if you want a self join example Commented Apr 12, 2012 at 20:28
  • ? I thought that was a self join inside the EXISTS... what else did you have in mind otherwise? Commented Apr 12, 2012 at 20:29
  • It's not a JOIN, it's an exists. I'll post an example. Commented Apr 12, 2012 at 20:30
  • @JamesLove The exists should be faster since it doesn't JOIN all the non-matching records, just returns a boolean if there is a match. Commented Apr 12, 2012 at 20:31

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.