I need help cleaning up this monstrosity. Basically this is a user helper method that generates an array of cumulative user sign ups per week (start of week == Wednesday) starting 2016年06月15日.
So something like this:
[ 1, 5, 9, 22, 34 ]
Where sign ups each week were 1, 4, 4, 13, 12.
In production, this is taking 5 seconds to run (and only 550 users over 7 weeks are present).
def get_user_array
@total_date_array = Array.new
# Remember that the range will go as far as the last Sunday, see next step
@date_array = (Date.parse("2016-06-15")..Date.parse(Time.now.strftime("%F"))).select(&:wednesday?)
@date_array.each do |x|
start = x - 7
finish = x
@range = start...finish
@local_user_array = Array.new
User.all.reject{|u| !u.valid? }.map {|u| @local_user_array << "Yes" if @range === Date.parse(u.created_at.strftime("%F"))}
@total_date_array << @local_user_array
end
# Now get the current weeks users
@current_week_start = @date_array.last
@current_week_end = Date.parse(Time.now.strftime("%F"))
@current_week_range = @current_week_start..@current_week_end
@current_local_user_array = Array.new
User.all.reject{|u| !u.valid? }.each do |u|
@current_week_range === Date.parse(u.created_at.strftime("%F")) ? @current_local_user_array << "Yes" : false
end
@total_date_array << @current_local_user_array
## Now create the array....Phew!
sum = 0
@total_date_array = @total_date_array.map { |a| Array(a).count("Yes") }.map{ |x| sum += x }
end
I am fairly inexperienced. I guessing that switching a lot of the code to SQL rather than keeping it in memory is one way to begin.
1 Answer 1
Ok, so there's a lot to cover here.
get_user_array
is a terrible name. What does "getting a user array" mean? Is it an array of users? A user's array? Be descriptive.Why all the instance variables (
@...
)? This method is a simple helper that has no need to persist state between invocations. Don't use instance variables for this!You appear to be laboring under a lot of misunderstandings of, well, fairly basic things to be frank.
Don't use
#map
unless your intention is to transform an array 1-to-1 into a new array. You're using#map
a lot, but only once for an actual mapping operation.You're doing a lot of pointless conversion back and forth. For instance,
Array(a).count
whena
is already an array.
You also do stuff likeDate.parse(u.created_at.strftime("%F"))
which means you're taking aDate
, making it into aString
, and then back to aDate
again. I'm guessing you may be doing that on purpose to drop the time-part, but doesn't it strike you as an incredibly complicated way of doing that? Why involve strings when going fromDate
toDate
? This sort of thing should cause you to reexamine your approach, because it just doesn't smell right.User.all.reject{|u| !u.valid? }
for one, what you want is#select
- not#reject
. Right now you're saying "do not give me users that are not valid"; with#select
, you'd be saying "give me valid users". I.e.User.all.select(&:valid?)
.
However, you should really be doing neither! Selecting a subset of records based on given criteria is what databases are for. YourUser
model should ideally have a scope that'd allow you to just get valid users in one go, e.g.User.valid.all
.
However, you don't really want that either, since you're still pulling every single user for every single week that's passed since mid-June 2016. That's crazy. If you have ~500 user records and it's now been ~7 weeks since your zero-day, you're instantiating 3,500 records only to discard the vast majority with a#reject
call! As time passes and user base grows, this gets worse in a hurry. Yes, Rails is smart enough to cache some stuff, but jeez! Doesn't that strike you as a fundamentally bad approach?
Luckily, you can do more stuff in the database. Most, in fact (probably all of it, with more SQL, but I'll leave that for someone better qualified to figure out).
Supposing your User
model has a valid
scope that restricts queries to only valid records (thus avoid the horrendous #reject
stuff), you could do:
START_DATE = "2016-06-15".freeze
# ...
per_week = User
.valid
.where("created_at >= ?", START_DATE)
.order("created_at")
.group("FLOOR(DATEDIFF(created_at, '#{START_DATE}') / 7)")
.count
Note: I couldn't get proper query interpolation to work in the group
call, so it's just plain string interpolation. Only use this when you 100% control the value to be inserted, otherwise you'll leave yourself open to injection attacks.
This will produce a SQL query that
- Calculates the number of weeks between the
created_at
timestamp, and the start date - Groups rows sharing the same week
- Counts the number of rows in each group
The result you get back is something like:
{
0 => 1,
1 => 4,
2 => 4,
3 => 13,
4 => 12
}
To then do the cumulative summation you can do:
sum = 0
cumulative_per_week = per_week.map(&:last).map { |_, count| sum += count }
Now, cumulative_per_week
will be [1, 5, 9, 22, 34]
.
I'd also recommend splitting logic into two methods: One for sign-ups per week, and one for cumulative sign-ups per week:
START_DATE = "2016-06-15".freeze
def signups_per_week
User
.valid
.where("created_at >= ?", START_DATE)
.order("created_at")
.group("FLOOR(DATEDIFF(created_at, '#{START_DATE}') / 7)")
.count
.values
end
def signups_per_week_cumulative
sum = 0
signups_per_week.map { |count| sum += count }
end
-
\$\begingroup\$ Wow. It's good to get a dressing down every now and then. This is really helpful and has provided me with a lot to work on. I do have one question, however. In order to search the database and retrieve only "completed" records (completed is based upon some custom criteria), am I right it saying that you have no choice but to pull back every single object in the DB to examine it (i.e. there is no easy way around it)? \$\endgroup\$GhostRider– GhostRider2016年08月08日 13:39:17 +00:00Commented Aug 8, 2016 at 13:39
-
\$\begingroup\$ @GhostRider Depends on the criteria. I don't know what "validity" entails for your
User
records, but I'd be surprised if it can't be expressed as aWHERE
clause in the query. In the above, I assume there's avalid
scope you can call, which would add that clause and make the database handle it. If I knew what constitutes a "valid user record", I could be more specific. If is super complex, then consider adding a column to the DB just to store the record's state, and set it to true (or whatever) when the record's valid, then use that in your query. Don't load data you don't need or want. \$\endgroup\$Flambino– Flambino2016年08月08日 14:08:09 +00:00Commented Aug 8, 2016 at 14:08 -
\$\begingroup\$ That's exactly where I was headed - creating a Boolean column in the DB for "valid". It is complex because users may or may not be "valid" depending on some of the data they have supplied i.e. If they answer one way to one question then several other questions must be answered. I'm looking into writing raw sql that will do it for me but I think I may have to add that column. As a novice, thanks. \$\endgroup\$GhostRider– GhostRider2016年08月08日 14:18:17 +00:00Commented Aug 8, 2016 at 14:18
-
\$\begingroup\$ @GhostRider Sounds like the right approach. Think of it this way, you're going to be fetching those records infinitely more times than you write them, so you'll want to be able to fetch them instantly, but you can take your time writing them. And here, validity is a 1st order criterium, so it should be front and center. Do the validity check upfront and store it when you do the rare record-write, and then you can skip it when you do the thousands of subsequent record-reads. \$\endgroup\$Flambino– Flambino2016年08月08日 14:52:23 +00:00Commented Aug 8, 2016 at 14:52
Explore related questions
See similar questions with these tags.