Here is a minimal example of my task...
I have four 2-column files. profile1.data
zone luminosity
1 1 1359.019
2 2 1359.030
3 3 1359.009
4 4 1358.988
5 5 1358.969
6 6 1358.951
7 7 1358.934
8 8 1358.917
9 9 1358.899
10 10 1358.881
profile2.data
zone luminosity
1 1 1357.336
2 2 1357.352
3 3 1357.332
4 4 1357.310
5 5 1357.289
6 6 1357.270
7 7 1357.252
8 8 1357.233
9 9 1357.214
10 10 1357.194
profile3.data
zone luminosity
1 1 1355.667
2 2 1355.687
3 3 1355.667
4 4 1355.644
5 5 1355.622
6 6 1355.602
7 7 1355.582
8 8 1355.562
9 9 1355.541
10 10 1355.520
profile4.data
zone luminosity
1 1 1354.008
2 2 1354.032
3 3 1354.013
4 4 1353.990
5 5 1353.967
6 6 1353.945
7 7 1353.923
8 8 1353.902
9 9 1353.879
10 10 1353.857
I also have a vector named phases
. There is one phase
value for each profile.data
rsp_phase1 rsp_phase2 rsp_phase3 rsp_phase4
0.002935897 0.004602563 0.006269230 0.007935897
Finally, there are profile files for one of FOUR sets labeled A to D. The set directories are named LOGS_A1a
, LOGS_B1a
, etc. and contains the profile files, a file named history.data
which contains phase
values, and a profile.index
file that states how many profiles there are in the directory. The sets do NOT have the same number of profiles.
What I am doing with this data is plotting luminosity vs phase for each zone, and putting one each of the four plots for each set on one canvas altogether.
Example of plot with a subplot for each set
For example, to create a luminosity vs phase plot of the first zone, I grab the luminosity value from every profile in the directory at zone 1. This is my first plot. Then I do the same for the other zones. At the moment, I am accomplishing this through for loops in R.
for (zone_num in 1:10){
png(file.path(paste("Light_Curve_","Zone_",zone_num,".png",sep="")),
width = 1200, height = 960)
par(mar=c(5,4,4,2) + 2)
luminosities <- c()
for (prof_num in 1:4) {
prof.path <- file.path('LOGS_A1a', paste0('profile', prof_num, '.data'))
if (!file.exists(prof.path)) next
#print(prof.path)
DF.profile <- read.table(prof.path, header=1, skip=5)
luminosity <- DF.profile$luminosity[zone_num]
luminosities <- c(luminosities, luminosity)
}
plot.table <- data.frame(phases, luminosities)
o <- order(phases)
with(plot.table, plot(x=phases[o], y=luminosities[o],
main=paste("Zone",zone_num,"Light Curve",sep=" "),
type="l", pch=3, lwd = 6, col="purple", xlab=expression("Phase"),
ylab=expression("Luminosity " (L/L['\u0298'])), cex.main=1.60,
cex.lab=1.80, cex.axis=1.60))
dev.off()
}
As you will realize, it seems that the biggest problem is that I am repeatedly reading the same files into R. This should be done once separately. Is there a way to avoid this and speed it up?
-
\$\begingroup\$ I have rolled back the code changes in revision 2. After receiving an answer you are not allowed to update the code in your question. This is not a forum where you should keep the most updated version in your question. Please see What should I do when someone answers my question? as well as what you may and may not do after receiving answers. \$\endgroup\$Sᴀᴍ Onᴇᴌᴀ– Sᴀᴍ Onᴇᴌᴀ ♦2021年07月13日 23:34:47 +00:00Commented Jul 13, 2021 at 23:34
1 Answer 1
for minimal code changes:
prof_num <- 1:4
prof.path <- file.path('LOGS_A1a', paste0('profile', prof_num, '.data'))
DF.profile <- lapply(prof.path, function(x) read.table(x, header = 1, skip = 5))
for (zone_num in 1:10) {
png(file.path(paste("Light_Curve_","Zone_",zone_num,".png",sep = "")),
width = 1200, height = 960)
par(mar = c(5,4,4,2) + 2)
luminosities <- c()
for (prof_num in 1:4) {
luminosity <- DF.profile[[prof_num]]$luminosity[zone_num]
luminosities <- c(luminosities, luminosity)
}
plot.table <- data.frame(phases, luminosities)
o <- order(phases)
with(plot.table, plot(x=phases[o], y=luminosities[o],
main=paste("Zone",zone_num,"Light Curve",sep=" "),
type="l", pch=3, lwd = 6, col="purple", xlab=expression("Phase"),
ylab=expression("Luminosity " (L/L['\u0298'])), cex.main=1.60,
cex.lab=1.80, cex.axis=1.60))
dev.off()
}
I already answered this question... where did it disappear?
-
\$\begingroup\$ Why is
prof_num <- 1:4
defined in the first line? \$\endgroup\$Woj– Woj2021年07月13日 15:27:55 +00:00Commented Jul 13, 2021 at 15:27 -
1\$\begingroup\$ @Woj why not? because in third line we read the data... it is the same variable as in your code... it shows file nr that needs to be read. \$\endgroup\$minem– minem2021年07月13日 16:16:42 +00:00Commented Jul 13, 2021 at 16:16
-
\$\begingroup\$ Understood. Also, the original post was deleted because although your solution speeds up the creation of one plot, it did not speed up the creation of the 2x2 plots I am actually creating for my task, but actually slowed it down. I am reading these profiles across 4 `sets <- c('A','B','C','D'), with each set's directory NOT having the same number of profiles. I was not sure how to convey the issue in a minimal example like the one above so the example assumes one set, but should I edit my post to show what I mean? \$\endgroup\$Woj– Woj2021年07月13日 16:32:22 +00:00Commented Jul 13, 2021 at 16:32
-
\$\begingroup\$ @Woj if you change the directories and read each time different datasets, then yes, you should specify that in question. Also, in that case, I think there will be hard to find improvements... \$\endgroup\$minem– minem2021年07月13日 16:35:12 +00:00Commented Jul 13, 2021 at 16:35
-
\$\begingroup\$ I've edited my question. I am not sure if I should include the entire compressed data (the compressed
LOGS
directories that I explain in the post) for clarity, or if I should subset all the files to keep a minimal example at the expense of clarity. But please let me know if something is not clear. \$\endgroup\$Woj– Woj2021年07月13日 19:11:08 +00:00Commented Jul 13, 2021 at 19:11