Showing posts with label as.POSIXlt(). Show all posts
Showing posts with label as.POSIXlt(). Show all posts
Tuesday, October 13, 2009
Example 7.15: A more complex sales graphic
[フレーム]
The plot of Amazon sales rank over time generated in example 7.14 leaves questions. From a software perspective, we'd like to make the plot prettier, while we can embellish the plot to inform our interpretation about how the rank is calculated.For the latter purpose, we'll create an indicator of whether the rank was recorded in nighttime (eastern US time) or not. Then we'll color the nighttime ranks differently than the daytime ranks.
SAS
In SAS, we use the timepart function to extract the time of day from the salestime variable which holds the date and time. This is a value measured in seconds since midnight, and we use some conditional logic (section 1.4.11) to identify hours before 8 AM or after 6 PM.
data sales2;
set sales;
if timepart(salestime) lt (8 * 60 * 60) or
timepart(salestime) gt (18 * 60 * 60) then night=1;
else night = 0;
run;
Then we can make the plot. We use the axis statement (sections 5.3.7 and 5.3.8) to specify the axis ranges, rotate some labels and headers, and remove the minor tick marks. Note that since the x-axis is a date-time variable, we have to specify the axis range using date-time data, here read in using formats (section A.6.4) and request labels every three days by requesting an interval of three days' worth of seconds. We also use the symbol statement (section 5.2.2, 5.3.11) to specify shapes and colors for the plotted points.
axis1 order = ("09AUG2009/12:00:00"dt to
"27AUG2009/12:00:00"dt by 259200)
minor = none;
axis2 order=(30000 to 290000 by 130000) label=(angle=90)
value=(angle=90) minor=none;
symbol1 i=none v=dot c=red h=.3;
symbol2 i=none v=dot c=black h=.3;
Finally, we request the plot using proc gplot. The a*b=c syntax (as in section 5.6.2) will result in different symbols for each value of the new night variable, and the symbols we just defined will be used. The haxis and vaxis options are used to associate each axis with the axis definitions specified in the axis statements.
proc gplot data=sales2;
plot rank*salestime=night / haxis=axis1 vaxis=axis2;
format salestime dtdate5.;
run; quit;
R
In R, we make a new variable reflecting the date-time at the midnight before we started collecting data. We then coerce the time values to numeric values using the as.numeric() function (section 1.4.2), while subtracting that midnight value. Next, we mod by 24 (using the %% operator, section B.4.3) and lastly round to the integer value (section 1.8.4) to get the hour of measurement. There's probably a more elegant way of doing this in R, but this works.
midnight <- as.POSIXlt("2009-08-09 00:00:00 EDT")
timeofday <- round(as.numeric(timeval-midnight)%%24,0)
Next, we prepare for making a nighttime indicator by intializing a vector with 0. Then we assign a value of 1 when the corresponding element of the hour of measurement vector has a value in the correct range.
night <- rep(0,length(timeofday))
night[timeofday < 8 | timeofday> 18] <- 1
Finally, we're ready to make the plot. We begin by setting up the axes, using the type="n" option (section 5.1.1) to prevent any data being plotted. Next, we plot the nighttime ranks by conditioning the plot vector on the value of the nighttime indicator vector; we then repeact for the daytime values, additionally specifying a color for these points. Lastly, we add a legend to the plot (section 5.2.14).
plot(timeval, rank, type="n")
points(timeval[night==1], rank[night==1], pch=20)
points(timeval[night==0], rank[night==0], pch=20,
col="red")
legend(as.POSIXlt("2009-08-22 00:00:00 EDT"), 250000,
legend=c("day", "night"), col=c("black", "red"),
pch=c(20, 20))
Interpretation: It appears that Amazon's ranking function adjusts for the pre-dawn hours, most likely to reflect a general lack of activity. (Note that these ranks are from Amazon.com. In all likelihood, Amazon.co.uk and other local Amazon sites adjust for local time similarly.) Perhaps some recency in sales allows a decline in rank for some books during these hours? In addition, we see that most sales of this book, as inferred from the discontinuous drops (improvement) in rank, tend to happen near the beginning of the day, or mid-day, rather than at night.
Subscribe to:
Comments (Atom)