Showing posts with label array statement. Show all posts
Showing posts with label array statement. Show all posts

Monday, January 5, 2015

Example 2015.1: Time to refinance?

In the US, it's typical to borrow a fairly substantial portion of the cost of a new house from a bank. The cost of these loans, the mortgage rate, varies over time depending on what the financial wizards see in their crystal balls. What this means over time is that when the mortgage rates go down, the cost of living in your own house magically decreases--you take a new loan at the lower rate and pay off your old loan with it-- then you only have to pay off the new loan at the lower rate. You can find mortgage rate calculators on the web very easily-- if you don't mind their collecting your data and being bombarded with ads if you let their cookies trace you.

Instead, you can use SAS or R to calculate what you might pay for a new loan with various posted rates. There are some sophisticated tools available for either package if you're interested in the remaining principal or the proportion of each payment that's principal. Here, we just want to check the monthly payment.

R
We'll begin by writing a little function to calculate the monthly payment from the principal, interest rate (in per cent), and term (in years) of the loan. This is basic stuff, but the code here is adapted from a function written by Thomas Girke of UC Riverside.
mortgage <- function(principal=300000, rate=3.875, term=30) { J <- rate/(12 * 100) N <- 12 * term M <- principal*J/(1-(1+J)^(-N)) monthPay <<- M return(monthPay) } 
To compare the monthly costs for a series of loans offered by a local bank, we'll input the bank's loans as a data frame. To save typing, we'll use the rep() function to generate the term of the loan and the points.
offers = data.frame(
 principal = rep(275000, times=9),
 term = rep(c(30,20,15), each=3), 
 points = rep(c(0,1,2), times=3),
 rate = c(3.875, 3.75, 3.5, 3.625, 3.5, 3.375, 3, 2.875, 2.75))> offers
 principal term points rate
1 275000 30 0 3.875
2 275000 30 1 3.750
3 275000 30 2 3.500
4 275000 20 0 3.625
5 275000 20 1 3.500
6 275000 20 2 3.375
7 275000 15 0 3.000
8 275000 15 1 2.875
9 275000 15 2 2.750
(Points are an up-front cost a borrower can pay to lower the mortgage rate for the loan.) With the data and function in hand, it's easy to add the monthly cost to the data frame:
offers$monthly = with(offers, mortgage(rate=rate, term=term, principal=principal))
> offers
 principal term points rate monthly
1 275000 30 0 3.875 1293.152
2 275000 30 1 3.750 1273.568
3 275000 30 2 3.500 1234.873
4 275000 20 0 3.625 1612.610
5 275000 20 1 3.500 1594.889
6 275000 20 2 3.375 1577.282
7 275000 15 0 3.000 1899.100
8 275000 15 1 2.875 1882.611
9 275000 15 2 2.750 1866.210
In theory, each of these costs are fair, and the borrower should choose based on monthly costs they can afford, as well as whether they see a better value in having money in hand to spend on a better quality of life or to invest it in savings or in paying off their house sooner. Financial professionals often discuss things like the total dollars spent or the total spent on interest vs. principal, as well.

SAS
The SAS/ETS package provides the LOAN procedure, which can calculate the detailed analyses mentioned above. For simple calculations like this one, we can use the mort function in the data step. It will find and return the missing one of the four parameters-- principal, payment, rate, and term. To enter the data in a manner similar to R, we'll use array statements and do loops.
data t;
principal = 275000; 
array te [3] (30,20,15);
array po [3] (0,1,2); 
array ra [9] (.03875, .0375, .035, .03625, .035, 
 .03375, .03, .02875, .0275);
do i = 1 to 3;
 do j = 1 to 3;
 term = te[i];
 points = po[j];
 rate = ra[ 3 * (i-1) +j];
 monthly = mort(principal,.,rate/12, term*12);
 output;
 end;
end;
run;
proc print noobs data = t; 
var principal term points rate monthly; run;
principal term points rate monthly
 275000 30 0 0.03875 1293.15
 275000 30 1 0.03750 1273.57
 275000 30 2 0.03500 1234.87
 275000 20 0 0.03625 1612.61
 275000 20 1 0.03500 1594.89
 275000 20 2 0.03375 1577.28
 275000 15 0 0.03000 1899.10
 275000 15 1 0.02875 1882.61
 275000 15 2 0.02750 1866.21
An unrelated note about aggregators: We love aggregators! Aggregators collect blogs that have similar coverage for the convenience of readers, and for blog authors they offer a way to reach new audiences. SAS and R is aggregated by R-bloggers, PROC-X, and statsblogs with our permission. If you read this on another aggregator that does not credit the blogs it incorporates, please come visit us at SAS and R. We answer comments there and offer direct subscriptions if you like our content. In addition, no one is allowed to profit by this work under our license; if you see advertisements on this page other than as noted above, the aggregator is violating the terms by which we publish our work.

Monday, April 14, 2014

Example 2014.4: Hilbert Matrix

Rick Wicklin showed how to make a Hilbert matrix in SAS/IML. Rick has a nice discussion of these matrices and why they might be interesting; the value of H_{r,c} is 1/(r+c-1). We show how to make this matrix in the data step and in R. We also show that Rick's method for displaying fractions in SAS/IML works in PROC PRINT, and how they can be displayed in R.

SAS
In the SAS data step, we'll use an array to make the columns of the matrix and do loops to put values into each cell in a row and output the row into the data set before incrementing the row value. This is a little awkward, but does at least preserve the simple 1/(r+c-1) function of the cell values. We arrange the approach using a global macro to be more general.
%let n = 5;
data h;
array val [&n] v1 - v&n;
 do r = 1 to &n;
 do c = 1 to &n;
 val[c] = 1/(r+c-1);
 end;
 output;
 end;
run;
To print the resulting matrix, we use the fract format, as Rick demonstrated. Pretty nice! The noobs option in the proc print statement suppresses the row number that would otherwise be shown.
proc print data = h noobs; 
var v1 - v5;
format _all_ fract.; 
run;
 v1 v2 v3 v4 v5
 1 1/2 1/3 1/4 1/5
1/2 1/3 1/4 1/5 1/6
1/3 1/4 1/5 1/6 1/7
1/4 1/5 1/6 1/7 1/8
1/5 1/6 1/7 1/8 1/9


R
As is so often the case in R, a solution can be generated in one line using nesting. Also as usual, though, its a bit unnatural, and we don't deconstruct it here.
n = 5
1/sapply(1:n,function(x) x:(x+n-1))
A more straightforward approach follows. It's certainly less efficient, though efficiency would seem a non-issue in any application of this code. It's also the kind of code that R aficionados would scoff at. It does have the attractiveness of perfect clarity, however. We begin by defining an empty matrix, then simply loop through the cells of the matrix, assigning values one by one.
n=5
h1 = matrix(nrow=n,ncol=n)
for (r in 1:n) {
 for (c in 1:n)
 h1[r,c] = 1/(r+c-1)
} 
To display the fractions, we use the fractions() function in MASS package that's distributed with R.
> library(MASS)
> fractions(h1)
 [,1] [,2] [,3] [,4] [,5]
[1,] 1 1/2 1/3 1/4 1/5 
[2,] 1/2 1/3 1/4 1/5 1/6 
[3,] 1/3 1/4 1/5 1/6 1/7 
[4,] 1/4 1/5 1/6 1/7 1/8 
[5,] 1/5 1/6 1/7 1/8 1/9 


An unrelated note about aggregators: We love aggregators! Aggregators collect blogs that have similar coverage for the convenience of readers, and for blog authors they offer a way to reach new audiences. SAS and R is aggregated by R-bloggers, PROC-X, and statsblogs with our permission, and by at least 2 other aggregating services which have never contacted us. If you read this on an aggregator that does not credit the blogs it incorporates, please come visit us at SAS and R. We answer comments there and offer direct subscriptions if you like our content. In addition, no one is allowed to profit by this work under our license; if you see advertisements on this page, the aggregator is violating the terms by which we publish our work.

Monday, May 14, 2012

Example 9.31: Exploring multiple testing procedures

In example 9.30 we explored the effects of adjusting for multiple testing using the Bonferroni and Benjamini-Hochberg (or false discovery rate, FDR) procedures. At the time we claimed that it would probably be inappropriate to extract the adjusted p-values from the FDR method from their context. In this entry we attempt to explain our misgivings about this practice.

The FDR procedure is described in Benjamini and Hochberg (JRSSB, 1995) as a "step-down" procedure. Put simply, the procedure has the following steps:

0. Choose the familywise alpha
1. Rank order the unadjusted p-values
2. Beginning with the Mth of the ordered p-values p(m),
2a. if p(m) < alpha*(m/M), then reject all tests 1 ... m,
2b. if not, m = m-1
3. Repeat steps 2a and 2b until the condition is met
or p(1)> alpha/M

where M is the number of tests. The "adjusted p-value" based on this procedure is the smallest familywise alpha under which the current test would have been rejected. To calculate this, we can modify the routine above:

1. Rank order the unadjusted p-values
2. For ordered p-values p(m) M to 1,
2a. candidate ap(m) = p(m) *(M/m)
2b. if candidate ap(m)> ap(m+1) then ap(m) = ap(m+1)
2c. else ap(m) = candidate ap(m)

where ap(m) refers to the adjusted p-value corresponding to the mth ordered unadjusted p-value. It's interesting to note that the adjusted p-value for the Mth ordered test is the same as the unadjusted p-value, while the candidate adjusted p-value for the smallest test is the Bonferroni adjusted p-value. The primary difficulty with taking these p-values (as opposed to the test results) out of context is captured in steps 2b and 2c. They imply that the p-value for a given test may be lowered by other observed p-values in the family of tests. It's also true that the adjusted p-value depends on the number of tests included in the family, but this seems somewhat less troubling.

To examine the impact of the procedure on the adjusted p-values for the individual tests, we'll compare the candidate ap(m) from step 2a against the actual ap(m). Our sense is that to the degree these are different, the adjusted p-value should not be extracted from the context of the observed family of tests.


SAS
Our SAS code relies heavily on the array statement (section 1.11.5). We loop through the p-values from largest to smallest, calculating the candidate fdr p-value as above, before arriving at the final adjusted p-value. To compare the values conveniently, we make a new data set with two copies of the original data set, renaming first the candidate and then the adjusted p-values to have the same names. The in = data set option creates a temporary variable which identifies which data set an observation was read from; here it denotes which version of the same data set (and which set of p-values) was used.

data fdr;
array pvals [10] pval1 - pval10
(.001 .001 .001 .001 .001 .03 .035 .04 .05 .05);
array cfdrpvals [10] cfdr1 - cfdr10;
array fdrpvals [10] fdr1 - fdr10;
fdrpvals[10] = pvals[10];
do i = 9 to 1 by -1;
cfdrpvals[i] = pvals[i] * 10/i;
if cfdrpvals[i]> fdrpvals[i+1] then fdrpvals[i] = fdrpvals[i+1];
else fdrpvals[i] = cfdrpvals[i];
end;
run;

data compare;
set fdr (in = cfdr rename=(cfdr1=c1 cfdr2=c2 cfdr3=c3 cfdr4=c4
cfdr5=c5 cfdr6=c6 cfdr7=c7 cfdr8=c8 cfdr9=c9))
fdr (in = fdr rename=(fdr1=c1 fdr2=c2 fdr3=c3 fdr4=c4 fdr5=c5
fdr6=c6 fdr7=c7 fdr8=c8 fdr9=c9));
if cfdr then adjustment = "Candidate fdr";
if fdr then adjustment = "Final fdr";
run;

proc print data = compare; var adjustment c1-c9; run;

adjustment c1 c2 c3 c4 c5 c6 c7 c8 c9

Candidate fdr 0.010 .005 .0033 .0025 .002 .05 .05 .05 .055
Final fdr 0.002 .002 .0020 .0020 .002 .05 .05 .05 .050

(We omit the last p-value because the adjustment does not affect it.) The result shows that for many of the tests in this family, a substantially smaller p-value is obtained with the final FDR p-value than the candidate. To this degree, the FDR p-value is dependent on the observed values of the p-values in the tests in the family, and ought not to be removed from the context of these other tests. We would recommend caution in displaying the FDR p-values in such settings, given readers' propensity to use them as if they were ordinary p-values, safely adjusted for multiple testing.

R
Comparison of the R and SAS code may make SAS programmers weep. The candidate values are easily calculated, and can be presented with the final p-values in one step using the p.adjust() function. Three lines of code, albeit incorporating multiple functions in each line. (And it could sensibly be done in two, calculating the candidate p-values within the rbind() function call.) Note especially the line calculating the candidate p-values, in which vectorization allows a for loop to be avoided in a very natural fashion.

fakeps = c(rep(.2, 5), 6, 7, 8, 10, 10)/200
cfdr = fakeps * 10/(1:10)
rbind(cfdr, fdr=p.adjust(fakeps, "fdr"))[,1:9]

[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
cfdr 0.010 0.005 0.0033 0.0025 0.002 0.05 0.05 0.05 0.0556 0.05
fdr 0.002 0.002 0.0020 0.0020 0.002 0.05 0.05 0.05 0.0500 0.05


An unrelated note about aggregatorsWe love aggregators! Aggregators collect blogs that have similar coverage for the convenience of readers, and for blog authors they offer a way to reach new audiences. SAS and R is aggregated by R-bloggers and PROC-X with our permission, and by at least 2 other aggregating services which have never contacted us. If you read this on an aggregator that does not credit the blogs it incorporates, please come visit us at SAS and R. We answer comments there and offer direct subscriptions if you like our content. In addition, no one is allowed to profit by this work under our license; if you see advertisements on this page, the aggregator is violating the terms by which we publish our work.

Monday, April 30, 2012

Example 9.29: the perils of for loops

A recent exchange on the R-sig-teaching list featured a discussion of how best to teach new students R. The initial post included an exercise to write a function, that given a n, will draw n rows of a triangle made up of "*", noting that for a beginner, this may require two for loops. For example, in pseudo-code:

for i = 1 to n
for j = 1 to i
print "*"

Unfortunately, as several folks (including Richard M. Heiberger and R. Michael Weylandt) noted, for loops in general are not the best way to take full advantage of R. In this entry, we review two solutions they proposed which fit within the R philosophy.

Richard's solution uses the outer() function to generate a 5x5 matrix of logical values indicating whether the column number is bigger than the row number. Next the ifelse() function is used to replace TRUE with *.

> ifelse(outer(1:5, 1:5, `>=`), "*", " ")
[,1] [,2] [,3] [,4] [,5]
[1,] "*" " " " " " " " "
[2,] "*" "*" " " " " " "
[3,] "*" "*" "*" " " " "
[4,] "*" "*" "*" "*" " "
[5,] "*" "*" "*" "*" "*"

Michael's solution uses the lapply() function to call a function repeatedly for different values of n. This returns a list rather than a matrix, but accomplishes the same task.

> lapply(1:5, function(x) cat(rep("*", x), "\n"))
*
* *
* * *
* * * *
* * * * *

While this exercise is of little practical value, it does illustrate some important points, and provides a far more efficient as well as elegant way of accomplishing the tasks. For those interested in more, another resource is the R Inferno project of Patric Burns.

SAS
We demonstrate a SAS data step solution mainly to call out some useful features and cautions. In all likelihood a proc iml matrix-based solution would be more elegant;

data test;
array star [5] $ star1 - star5;
do i = 1 to 5;
star[i] = "*";
output;
end;
run;

proc print noobs; var star1 - star5; run;

star1 star2 star3 star4 star5

*
* *
* * *
* * * *
* * * * *

In particular, note the $ in the array statement, which allows the variables to contain characters; by default variables created by an array statement are numeric. In addition, note the reference to a sequentially suffixed list of variables using the single hyphen shortcut; this would help in generalizing to n rows. Finally, note that we were able to avoid a second do loop (SAS' primary iterative looping syntax) mainly by luck-- the most recently generated value of a variable is saved by default. This can cause trouble, in general, but here it keeps all the previous "*"s when moving on to the next row.




An unrelated note about aggregatorsWe love aggregators! Aggregators collect blogs that have similar coverage for the convenience of readers, and for blog authors they offer a way to reach new audiences. SAS and R is aggregated by R-bloggers and PROC-X with our permission, and by at least 2 other aggregating services which have never contacted us. If you read this on an aggregator that does not credit the blogs it incorporates, please come visit us at SAS and R. We answer comments there and offer direct subscriptions if you like our content. In addition, no one is allowed to profit by this work under our license; if you see advertisements on this page, the aggregator is violating the terms by which we publish our work.
Subscribe to: Comments (Atom)

AltStyle によって変換されたページ (->オリジナル) /