22
\$\begingroup\$

A discussion arose not long ago on the 2nd Monitor about how much reputation has been lost due to the reputation caps. There are a number of queries on the SEDE which try to address this:

All of these queries (there are others as well) appear to suffer from the same two flaws... ( ... until now with the query in this question .... )

  • the user's actual reputation is affected by things that are complicated, like posts being deleted, migrated on to, and off from the site, the starting rep bonus, and other things.
  • that you cannot calculate a person's lost reputation without knowing the order in which the votes happened.

To understand why the post migration and deletion is important, you have to understand that reputation caps happens on a daily cycle. The number of days that people max their rep is much lower (for non-Skeets) than the number of times things are deleted, etc. There are things that happen on days other than rep-max days, that affect your reputation, and mean that voting records for those things are not available on the Data Explorer. Your user record records your total reputation, but there is no way to sum up all the reputation events and re-create that. Thus, any attempts to figure out how much rep is lost will come up short, or long, and this leads to funny things like negative-lost-reputation.

To understand why the order is significant, consider the following poor user 'Bob':

  1. Lucky day, and ends up at exactly 200 rep from a good answer.
  2. Someone upvotes again, and he loses 10 rep to the cap.
  3. Someone downvotes and he loses 2 (198)
  4. Someone downvotes again, he loses another 2
  5. Now his rep is 196 and the day ends ... sorry, no mortarboard for you, and you lost 10 rep

If the order was different, then it could be lucky 'Bill' instead:

  1. Lucky day, and ends up at exactly 200 rep from a good answer.
  2. Someone downvotes and he loses 2 (198)
  3. Someone downvotes again, he loses another 2 (196)
  4. Someone upvotes again, and he scores 4, and loses 6 rep to the cap.
  5. Now his rep is 200 and the day ends ... Celebrate! Mortarboard for you!

These calculations are more common than others, but, if you were to think of the implications of vote orders when the user offers a bounty... it's big.

The most accurate way to calculate the daily rep requires recreating the order of reputation events. There are still some problems though:

  • there is no way to find out when people downvote answers, which is a -1 rep hit.
  • there is no way to find out if a user hit rep-max, lost some rep, and then an answer which he got reputation for was later deleted, and the reputation lost.
  • if the user rep-maxes on their very first day, their joining bonus makes the process impossible.

So, inspired by the complexities of actually calculating the cap, I put together this query which rebuilds the progression of reputation events for either a specific user, or for each user that has previously won the Mortarboard badge.

For Review:

How can this code be improved:

  • cursors are notoriously inefficient, but is there a better way?
  • are there missing edge cases?

Any suggestions for improvement are welcome.... (and feel free to fork the query and take it for a spin!)

declare @userid as int = ##UserId:int?-1##
declare @userrep as int
declare @epoch as date = '1 jan 2099'
declare @pdate as date
declare @rdate as date
declare @rtime as datetime
declare @raction as varchar(10)
declare @rrep as int
declare @rcnt as int
declare @rtmp as int
declare @message as varchar(250)
declare @rollingrep as int
declare @rollingbonus as int
declare @rollingcap as int
;
create table #REPDAY (
 UserId int not null,
 Reputation int not null,
 RepDate DATE not null,
 CapLimited int not null,
 UnCapped int not null,
 Discarded int not null)
;
-- loop through all users who have earned mortarboard
-- which is an appoximate list of people who have lost rep.
-- or, if a userid is specified, use that user.
declare TOPUSERS cursor for
 select Users.Id as UserId, Users.Reputation as Reputation
 from Users, Badges
 where @userid < 0
 and badges.Name = 'Mortarboard'
 and Badges.UserId = Users.id
 UNION ALL
 select Id, Reputation
 from Users
 where Id = @userid
open TOPUSERS
fetch next from TOPUSERS into @userid, @userrep
while @@FETCH_STATUS = 0
begin
 -- for each user, reset the accumulators
 select @pdate = @epoch -- something different
 select @rcnt = 0
 select @rollingrep = 0
 select @rollingbonus = 0
 select @rollingcap = 0
 declare USERVOTES cursor for
 -- NOTE: convert to DATE truncates the Time-part!!!
 select convert(DATE, tstamp) as tdate, tstamp, action, rep
 from (
 --Approved suggested edits.
 select se.ApprovalDate as tstamp, 'edit' as action, 2 as rep
 from SuggestedEdits se
 where se.OwnerUserId = @userid
 and se.ApprovalDate is not null
 UNION ALL
 -- Up and Down votes on Q's and A's
 select v.CreationDate as tstamp,
 'invote' as action,
 case when v.VoteTypeId = 1 then 15
 when v.VoteTypeId = 2 then p.PostTypeId * 5 -- cheeky 5 or 10 for question/answer
 when v.VoteTypeId = 3 then -2
 when v.VoteTypeId = 8 then v.BountyAmount
 else 0
 end as rep
 from Posts p, Votes v
 where v.PostId = p.Id
 and p.OwnerUserId = @userid
 and p.PostTypeId in (1,2)
 and v.VoteTypeId in (1,2,3,8)
 UNION ALL
 -- Bounties that were offered....
 select v.CreationDate as tstamp, 'bounty' as action, -1 * v.BountyAmount as rep
 from Votes v
 where v.VoteTypeId = 9
 and v.UserId = @userid
 ) as tdata
 order by tstamp
 open USERVOTES
 fetch next from USERVOTES into @rdate, @rtime, @raction, @rrep
 while @@FETCH_STATUS = 0
 begin
 if @pdate <> @rdate and @rcnt > 0
 begin
 -- break point, new day, save old day's data
 insert into #repday
 values (@userid, @userrep, @pdate, @rollingrep, @rollingbonus, @rollingcap)
 -- reset our accumulators
 select @rollingrep = 0, @rollingcap = 0, @rollingbonus = 0, @pdate = @rdate
 end
 select @rcnt = @rcnt + 1
 if @rrep > 10 or @raction = 'bounty' 
 -- things that score > 10 (accept) are not capped - assume bonus never <= 10
 select @rollingbonus = @rollingbonus + @rrep
 else
 -- things that score 10 or less are subject to cap
 select @rollingrep = @rollingrep + @rrep
 if @rollingrep > 200
 begin
 -- last action passed the cap... but how much past?
 select @rtmp = @rollingrep - 200
 -- set the value back to 200, and add the difference to the lost rep
 select @rollingrep = 200,
 @rollingcap = @rollingcap + @rtmp
 select @message = 'Vote Maxed ' + Convert(Varchar(20), @rdate) 
 + ' ' + @raction + ' rep ' + Convert(varchar(3), @rrep)
 + ' ' + Convert(varchar(3), @rtmp) + ' rep wasted'
 print @message
 end
 fetch next from USERVOTES into @rdate, @rtime, @raction, @rrep
 end
 close USERVOTES
 deallocate USERVOTES
 -- save away the last day's values that were not break-processed
 insert into #repday
 values (@userid, @userrep, @rdate, @rollingrep, @rollingbonus, @rollingcap)
 select @message = 'Replayed user ' + Convert(varchar(10), @userid) 
 + ' with rep ' + Convert(varchar(10), @userrep)
 + ' and repcnt ' + Convert(varchar(10), @rcnt)
 print @message
 fetch next from TOPUSERS into @userid, @userrep
END
close TOPUSERS
deallocate TOPUSERS
;
with UserLost as (
 select UserId as LostId,
 sum(Discarded) as LostAmt
 from #REPDAY
 group by UserId
)
select UserId as [User Link],
 Reputation as [Current Rep],
 Convert(Varchar(12), RepDate, 107) as [Date],
 CapLimited as [Actual Regular],
 UnCapped as [Accepts Bonuses],
 CapLimited + UnCapped as [Day Rep],
 Discarded as [Lost Rep],
 LostAmt as [Total Lost],
 Convert(Decimal(8,2), 100.0 * (convert(real,LostAmt) / convert(real,Reputation + LostAmt)) ) as [Lost%]
from #REPDAY,
 UserLost
where LostId = UserId
 and ( CapLimited + UnCapped >= 200 -- days which count for Mortarboard
 or Discarded > 0) -- days with lost rep
order by Reputation DESC, RepDate ASC
asked Feb 1, 2014 at 3:09
\$\endgroup\$

1 Answer 1

10
\$\begingroup\$

Going through this code again, it appears there are a few things that could be improved.

Use #Temp Table for User Selection

Instead of doing a cheap-shot UNION select to get the set of users to process, the right way would be to create a temp table, and then conditionally populate it:

create table #TopUsers (
 UserId int not null,
 Reputation int not null)
if @userid < 0
begin
 insert into #TopUsers
 select Users.Id as UserId, Users.Reputation as Reputation
 from Users, Badges
 where badges.Name = 'Mortarboard'
 and Badges.UserId = Users.id
end else begin
 insert into #TopUsers
 select Id, Reputation
 from Users
 where Id = @userid
end
declare TOPUSERS cursor for
 select *
 from #TopUsers

This approach is more verbose, but the intent of the code is clearer and it makes it more maintainable.

Explicit vs. Implicit JOIN

Explicit joins have been available in SQL since the SQL-92 standard. The code in this question uses the coding standards from the older SQL-89 standard.

There is nothing technically wrong with the implicit join syntax used in the code, but it is old-fashioned, and using explicit join syntax allows the join conditions in larger queries to be more obvious... changes would be, for example:

select Users.Id as UserId, Users.Reputation as Reputation
from Users, Badges
where @userid < 0
 and badges.Name = 'Mortarboard'
 and Badges.UserId = Users.id

would become:

select Users.Id as UserId, Users.Reputation as Reputation
from Users
inner join Badges
 on Badges.UserId = Users.id
where @userid < 0
 and badges.Name = 'Mortarboard'
answered Feb 3, 2014 at 22:57
\$\endgroup\$
0

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.