0
\$\begingroup\$

project is the data frame. For the purpose of the code, HOUSE.NO is a column of the type character, and NO..OF.FAMILY.MEMBER is another column of the type integer. My aim was to find out the house numbers which repeated, and then find out if the no of family members reported for each of the house matched, and identify the sets for which it didn't.

x<-1
matr<-NULL
matr2<-NULL
matr3<-NULL
r<-NULL
index<-NULL
repeat{
 y<-project$HOUSE.NO[-x]==project$HOUSE.NO[x]
 if (any(y)){
 r<-which(grepl(project$HOUSE.NO[x],project$HOUSE.NO))
 if(length(r)==2){
 check<-project$NO..OF.FAMILY.MEMBER[r[1]]!=project$NO..OF.FAMILY.MEMBER[r[2]]
 if(check){matr<-c(matr,r)}
 }
 if (length(r)==3){
 check2<-length(levels(factor(project$NO..OF.FAMILY.MEMBER[c(r[1],r[2],r[3])])))>1
 if(check2){matr2<-c(matr2,r)}
 }
 if (length(r)==4){
 check3<-length(levels(factor(project$NO..OF.FAMILY.MEMBER[c(r[1],r[2],r[3],r[4])])))>1
 if(check3){
 matr3<-c(matr3,r)}}
 if (length(r)>4&project$HOUSE.NO[x]!=""){index<-c(index,r)
 }
 }
 x<-x+1
 if(x>392){
 m1<-matrix(matr, ncol=2, byrow = TRUE)
 m2<-matrix(matr2, ncol=3, byrow = TRUE)
 m3<-matrix(matr3, ncol=4, byrow=TRUE)
 break
 }
}

The extra argument while computing index is to avoid a false entry when the HOUSE.NO is "", which is true in my data frame for 3 entries. There are 393 entries, hence the final caveat before break.

The concerns are:

  1. I am an absolute beginner in R, and the functions used here are almost all I know.

  2. This code only finds if in the case of the same number repeated more than twice, only if the entire set has the same family members. I couldn't find the row indices of only the cases which mismatched. Currently, the output includes the entire set.

  3. Do let me know tips on how to make this simpler. As it stands, I found this code to be quite a bit complicated.

(let me know if more details specific to the data frame/variables I am working with are needed. Or if the question is not suited to the site)

ADDENDUM

 HOUSE.NO NO..OF.FAMILY.MEMBER
1 14/274 6
2 14/259 6
3 14/217 5
4 14/258 4
5 14/306 5
6 14/300 8
7 14/96 4
8 14/166 4
9 14/69 5
10 14/68 2

And the expected output is just the row numbers/house.no. which fulfill the aforementioned criteria. Currently, the matrix outputs are as below. The same set is repeated in the matrix again (twice in m1, thrice in m2..etc).

 m1
 [,1] [,2]
 [1,] 20 380
 [2,] 36 68
 [3,] 37 340
 [4,] 64 191
 [5,] 36 68
 [6,] 72 329
 [7,] 88 218
 [8,] 103 199
 [9,] 111 278
[10,] 125 214
[11,] 135 387
[12,] 149 196
[13,] 64 191
[14,] 149 196
[15,] 103 199
[16,] 125 214
[17,] 215 320
[18,] 88 218
[19,] 248 317
[20,] 111 278
[21,] 310 350
[22,] 248 317
[23,] 319 324
[24,] 215 320
[25,] 319 324
[26,] 72 329
[27,] 37 340
[28,] 310 350
[29,] 20 380
[30,] 135 387
> m2
 [,1] [,2] [,3]
 [1,] 43 258 354
 [2,] 65 219 269
 [3,] 169 322 323
 [4,] 65 219 269
 [5,] 43 258 354
 [6,] 65 219 269
 [7,] 169 322 323
 [8,] 169 322 323
 [9,] 43 258 354
> m3
 [,1] [,2] [,3] [,4]
 [1,] 2 84 211 347
 [2,] 2 84 211 347
 [3,] 99 100 101 363
 [4,] 99 100 101 363
 [5,] 99 100 101 363
 [6,] 180 185 260 263
 [7,] 180 185 260 263
 [8,] 2 84 211 347
 [9,] 180 185 260 263
[10,] 180 185 260 263
[11,] 2 84 211 347
[12,] 99 100 101 363
asked Apr 14, 2016 at 8:13
\$\endgroup\$
4
  • \$\begingroup\$ can you give a sample data section together with the expected output? \$\endgroup\$ Commented Apr 24, 2016 at 8:54
  • \$\begingroup\$ @ZahiroMor Do you want a dput of the concerned variables? I don't think it is relevant. All I want the code to do was to carry out the following objective: Find out the house.no that repeated but with different no. of.family.memebers. \$\endgroup\$ Commented Apr 24, 2016 at 10:16
  • \$\begingroup\$ yes... please dput... it'll be faster than english communication :) \$\endgroup\$ Commented Apr 24, 2016 at 10:20
  • \$\begingroup\$ @ZahiroMor Added. \$\endgroup\$ Commented Apr 24, 2016 at 10:27

1 Answer 1

1
\$\begingroup\$

Your code uses several constructs that have a bit of smell in R.

Foremost is how you write the loop. An immediate replacement would be to replace the repeat with a for(i in seq_len(nrow(project))) (especially the hard coded 394 reeks)

Also you don't need to initialize the variables you will not use outside the loop - that will just prevent them from getting cleaned up after the loop.

A more R like way would be to use some higher level verbs that operate on the whole table. Such as provided by dplyr. Supposing you have something like an id in the lines you would write something like

left_join(project, project, on=c('HOUSE.NO')) %>% filter(id.1<id.2) 

such commands are usually easier to read and usually much faster than looping.

answered Apr 24, 2016 at 11:12
\$\endgroup\$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.