Showing posts with label RStudio. Show all posts
Showing posts with label RStudio. Show all posts
Tuesday, December 2, 2014
RStudio in the cloud for dummies, 2014/2015 edition
[フレーム]
In 2012, we presented a post showing how to run RStudio in the cloud on an Amazon server. There were 7 steps, including one with 7 sub-steps, one of which had 6 sub-sub-steps. It was still pretty easy, for what it was-- an effectively free computer in the cloud to run R on.
Today, we show the modern-- 3 years later!-- way to get the same result, only this approach is much easier, and the resulting installation includes all the best goodies of RStudio, including Markdown -> PDF and Hadley Wickham's packages pre-installed. Update, 2016: Digital ocean has changed their set-up, slightly. Check out the first step or two of this post in place of the first two steps below, if you're just starting out.
The approach builds on Docker, an infrastructure that saves start-up time and overhead, as well as efforts led by Dirk Eddelbuettel and Carl Boettiger to develop a Docker application of R. This project is called Rocker, and interested readers are encouraged to read the details. But if you want to just get up and running, here are the simple steps to get going.
1. Go to Digital Ocean and sign up for an account. By using this link, you will get a 10ドル credit. (Full disclosure: Ken will also get a 25ドル credit once you spend 25ドル real dollars there.) The reason to use this provider is that they have a system ready to run with Docker already built in. In addition, their prices are quite reasonable. You will need to use a credit card or PayPal to activate your account, but you can play for a long time with your 10ドル credit-- the cheapest machine is $.007 per hour, up to a 5ドル per month maximum.
2. On your Digital Ocean page, click "Create droplet". Then choose an (arbitrary) name, a size (meaning cost/power) of machine, and the region closest to you. You can ignore the settings. Under "Select Image", choose the "Applications" tab and select "Docker (1.3.2 on 14.04)". (The numbers in the parentheses are the Docker and Ubuntu version, and might change over time.) Then click "Create Droplet" at the bottom of the page.
3. It takes about a minute for the machine to start up. When it's ready, click the "Console Access" button. This opens a text terminal to your Ubuntu machine, inside your web page. Press enter to get a prompt, and log in (your username is root) using the password that was sent to your e-mail. You'll have to change the password.
4a. To start a terminal session of R, type
docker run --rm -ti rocker/r-baseyou should see a bunch of messages about pulling and downloading, but eventually you will get the ">" prompt-- you can do R in here, but who would want to?
4b. To get RStudio server running, type
docker run -d -p 8787:8787 rocker/rstudioBut this is really not where you want to be. Instead, run the following command, to get a set-up that includes more useful packages installed in and with R.
docker run -d -p 8787:8787 rocker/hadleyverse
5. Use it! The IP address of your server is displayed below the terminal where you typed in your docker command. Open a new browser tab and go to the address http://(ip address):8787. For example: http://135.104.92.185:8787. You'll see the RStudio login screen, and can enter "rstudio" (without the quotes) as the username and password. The system is well tuned enough that you can open a new file --> markdown --> PDF and immediately click "Knit PDF", and see the example document beautifully presented back to you in moments.
That's it. It's still way cooler than sliced bread. let us know if you try it, and if you run into any trouble. Oh, and if you're feeling creeped out by the standard username and password in your RStudio, you can set them up from your docker command as follows.
docker run -d -p 8787:8787 -e USER=ken -e PASSWORD=ken rocker/hadleyverseOther customization details and further information can be found on this Rocker page.
Update
I should perhaps have noted that what you are running here is in fact RStudio Server, and that you can allow additional users on your RStudio using instructions found here.
An unrelated note about aggregators: We love aggregators! Aggregators collect blogs that have similar coverage for the convenience of readers, and for blog authors they offer a way to reach new audiences. SAS and R is aggregated by R-bloggers, PROC-X, and statsblogs with our permission, and by at least 2 other aggregating services which have never contacted us. If you read this on an aggregator that does not credit the blogs it incorporates, please come visit us at SAS and R. We answer comments there and offer direct subscriptions if you like our content. In addition, no one is allowed to profit by this work under our license; if you see advertisements on this page, the aggregator is violating the terms by which we publish our work.
Monday, February 13, 2012
RStudio in the cloud, for dummies
[フレーム]
You can have your own cloud computing version of R, complete with RStudio. Why should you? It's cool! Plus, there's a lot more power out there than you can easily get on your own hardware. And, it's R in a web page. Run it from your tablet. Run it from work, even if you're not supposed to install software. Run it from your boyfriend's laptop while he's on a beer run.This entry is largely made possible by the work of Louis Alsett, who's completing his doctoral work at Trinity College, University of Dublin. We had thought that running a cloud compute application was beyond our current technical abilities, but Louis' work makes it pretty easy to do. In this entry, we'll show you how. (Louis graciously vetted the text for this entry, but all errors are our responsibility).
Start-up
1. Get an account with Amazon Web Services (AWS). This is slightly more involved than your ordinary Amazon account, but not a big deal. There are no fees unless you use the services. There's also a "free tier" which means that you start with 750 hours of usage per month for a year.* Effectively, they're giving you a free computer for a year.
2. Go to this handy page maintained by Louis. Click on the 32-bit link for your region. This is a shortcut that gets you the right AMI. (An AMI is an "Amazon Machine Image". Each of these is effectively an operating system with a bunch of pre-loaded software. The ones that Louis maintains have R and RStudio built into them, and have an additional feature we'll encounter later.) You can also find Louis' AMIs without his page.**
3. The shortcut from Louis' page brings you through the first steps of setting up. You'll next see a page in the "Request Instances Wizard". An "instance" consists of some virtual hardware for your chosen OS and software. It's effectively a computer in the cloud. The defaults on the wizard are fine, with one key exception, but we'll add a little detail about what they are.
a. Click Continue on that first page, as if you had reviewed the data. (You might notice that this is a Ubuntu OS. But if you're not a Linux user, don't fret-- you won't know that's what OS is running.)
b. On the "Instance details" page, you can also click continue. The main option here is choosing the virtual hardware the AMI will run on. (The defaults are fine.)
c. The next page is also "Instance details" and you can click through.
d. The next "Instance details" page lets you assign a name to the instance. This can be useful if you end up running several instances at the same time, but you can click through for now.
e. Click through the "Create Key Pair" page; this is also convenient if you're a heavy user, but not necessary.
f. The next step is to "Configure Firewall". This is where you do have to pay attention. Since you'll want to access your virtual machine via a browser, you need to allow HTTP access. To do this,
1) click "Create a New Security Group".
2) Give a name (like "RStudio") and
3) a description ("RStudio")-- both are required. Then,
4) in the "Port range" window, type 80. (Leave the source at 0.0.0.0/0, which means that you can connect from any IP address.)
5) Click "Add Rule". You should get a little blue box describing the rule. Now
6) click continue. On the following page, click "Back" and check that your new security group is selected.
g. Click "Launch". Your virtual machine is being started! There's a page with some links, which you can "Close".
Use
4. To use the new computer, click "Instances" on the "Navigation" panel of the AWS Management Console Amazon EC2 page. You'll see a row with an "empty" Name, and a State that is either "Pending" or "Running". (You might need to click refresh to see when it starts running.) When it's running, click on it. You get a bunch of information in the box below.
5. Scroll down to "Public DNS". Copy the DNS and paste it into the address bar in your browser. If all went well, you should see an RStudio login window. This is the genius of Louis' approach-- you never need to see the operating system. Use the username rstudio and password rstudio. In a moment, that beautiful, familiar RStudio interface appears!
6. For security, it makes sense to change your password. But since Louis wants to spare you the OS, he's cleverly built in a way to change it from within RStudio. Just change the "Password" in the "Welcome.r" file, then source it. You should probably avoid saving the "Welcome.r" file-- maybe just close it-- because saving it will result in your password being saved as plain text. Probably not a big risk, but why tempt fate?
7. You can close your browser and open the window again any time you like, from any browser you like, using your new password.
There's your R in the cloud! Use RStudio's built-in package installation tools to easily build your working environment.
Management
Our understanding of the "Free Usage Tier" is that you can leave this on all the time for a year without incurring any charges. Amazing. But caveat emptor.
You should also know how billing works. According to the FAQ, for instances other than the "micro" version we used here, (or for "micro" instances after your "Free Usage Tier" period is over) you're billed an hourly rate between when the instance starts running and when it's terminated. The "micro" linux instance that we chose above will cost 0ドル.02/hour after the free period is over. Still cheap to use for a few hours, but too costly to leave on all the time for fun.
However, there is also a "stopped" state. The stopped state is important for the other aspect of billing: data storage. Storage costs something like 0ドル.10 per GB per month. When your instance is "stopped" you don't pay the hourly instance charge, but you still pay the monthly data storage charge. The "free usage tier" includes 30GB of storage for the first year. (Obviously, there are no charges when your instance is terminated, since you lose all stored data.) Louis' AMIs have only 2 GB of storage built in, so they will run cheap, once your free usage period is over.
As long as an instance is running, you retain all aspects of your session-- it's just as if you had a computer that you left on running RStudio all the time. An instance that is stopped will retain all the loaded packages and local objects, but you have to log into AWS to start it.
Amazon warns that instances will occasionally fail, and if that happens, you're supposed to be able to restart them, as if you had stopped them on purpose.*** But it might be good idea to back things up.
Happy cloud computing!
* The free usage is limited to "micro" instances, such as we use here. For any other kind, the usual fees apply.
** To find the AMIs without Louis' handy web page, start the AWS management console, go to EC2, and click "Launch Instance". You'll get a page with some standard instances where you can click a radio button to "Launch Classic Wizard". Click that, then Community AMIs. Then search for rstudio, click the one with the RStudio and R version you want, and proceed as from step 2.
*** Louis says " I've not experienced it first-hand as they've been reliable for me, but apparently the instance will disappear and the hard drive will be left hanging round. When you are in the Amazon console on the EC2 tab, if you look further down the left "Navigation" you'll see "Volumes" under the "Elastic Block Store" section. You can look there when your instance is running and see its hard drive which will say "attached" -- this becomes "available" if an instance fails. So, you need to create a new instance and then attach the drive to it and reboot."
Labels:
Amazon web services,
Louis Aslett,
RStudio
8
comments
Friday, February 10, 2012
managing projects using RStudio
[フレーム]
We're continually amazed with new developments within RStudio, the integrated developed environment for R that we highlighted previously (Among others, Andrew Gelman agrees with us about its value). The most recent addition addresses one of our earlier concerns, by adding support for projects within RStudio. These allow work to be divided into multiple contexts, each with their own working directory, workspace, history, and source documents. For those multi-taskers amongst us, this is a big win. Projects can be created within a new or existing directory, as well as through use of a version control system (Git or Subversion).
When you create or move to a project within RStudio lots of useful things happen:
(1) A new R session (process) is started
(2) The .Rprofile file in the project's main directory (if any) is sourced by R
(3) The .RData file in the project's main directory is loaded (this can be controlled by an option).
(4) The .Rhistory file in the project's main directory is loaded into the RStudio History pane (and used for Console Up/Down arrow command history).
(5) The current working directory is set to the project directory.
(6) Previously edited source documents are restored into editor tabs, and
(7) Other RStudio settings (e.g. active tabs, splitter positions, etc.) are restored to where they were the last time the project was closed.
If you haven't updated your version of RStudio recently, or have never checked it out, this would be a great time to consider it. More information about these new features can be found here, along with an excellent screencast overview here. Happy coding!
Monday, February 28, 2011
Plug for RStudio: powerful, free, and easy to use interactive development environment for R
[フレーム]
As a longtime SAS user, one obstacle for me in using R professionally has been figuring out a process for saving and testing code across several work sessions and integrating code composition and execution. There are a couple of integrated R environments available, including ESS, TINN-R, and others. However, each of these seemed to require a serious investment of time, and I never did get around to using them (nor did Nick, despite several good-faith attempts). Instead I used a clunky system of editing code via a text editor, then copy and pasting or sourcing. This really inhibited my ability to at first learn then efficiently code in R.
Then Nick introduced me to the folks who have created RStudio. They are a small group of wicked smart programmers who know how to help other programmers be more efficient. They've now turned their attention to help statisticians and other R users. RStudio, publicly available as of 2/28/2011, is an open source product that is freely available. Its abilities are extremely broad, and I'm bound to miss something important in the brief description below, but suffice it to say that it's well worth your time to check it out. Neither Nick nor I have any vested interest in recommending it (though he's moved all of his teaching of introductory and intermediate statistics courses to it, along with his collaborative research projects).
RStudio is an integrated development environment for R that includes 1) text editing windows from which code can be submitted to the console and/or saved to the OS, 2) live lists of the objects in your workspace, 3) easily searchable infinite history with ability to insert from the history to the console or a text editing window, 4) tab completion in the console for objects, commands, and help, 5) interface with the OS for access to files, 6) help window with back and forward buttons, 7) package downloading, and 8) support for Sweave to facilitate reproducible analysis. Despite all these capabilities, RStudio is very easy to get started with.
There is also a server version, which you can access over the web if someone installs it and gives you access. If you're not familiar with this idea, it means you can work from most browsers--I was even able to use it on a Kindle. The cloud version saves your workspace from session to session, so you can work in exactly the same way, in exactly the same workspace (with a continuous history and all your objects), on whatever OS/CPU you have in front of you-- Windows, Mac OS, Chrome, Linux. You can switch OS, you can shut your computer down, and RStudio comes up just as you left it. Forgot your laptop? No problem.
The standalone version is an ordinary downloadable program. It uses the existing R binaries on your Mac (OSX 10.5+), Windows (XP/Vista/7), Ubuntu or Fedora Linux machine. The local and server applications have the same interface.
For me, the most useful aspect has been the integrated editor, but each one of the items I listed above has saved me a great deal of time over the past few months. The integrated help alone might be reason enough to adopt it. As a consulting statistician, RStudio is a huge leap forward. It changes R from a important tool which I have to be able to use into a plausible system in which to do all of my work. I really can't overestimate its value to me. Go to http://www.rstudio.org/ to learn more, see screenshots, and download!
Subscribe to:
Comments (Atom)