Consistently Infrequent

November 24, 2011

source_https(): Sourcing an R Script from github over HTTPS

Filed under: R — Tags: , , , , , — BD @ 12:21 pm

The Objective

I wanted to source R scripts hosted on my github repository for use in my blog (i.e. a github version of ?source). This would make it easier for anyone wishing to test out my code snippets on their own computers without having to manually go to my github repo and retrieve a series of R scripts themselves to make it run.

The Problem

The base R function source() fails with HTTPS links on Windows 7. There may be a way around this by starting R using –internet2 from the command line (search for CMD in windows) but that would just be another inconvenience like having to download an R script through your browser in the first place.

An easier approach would be to use RCurl:getURL() by setting either ssl.veryifypeer=FALSE or cainfo to a SSL certificates file. That’s easy enough to achieve but I wanted to wrap the code in a function for convenience as follows:


source_github <- function(u) {
  # load package
  require(RCurl)

  # read script lines from website
  script <- getURL(u, ssl.verifypeer = FALSE)

  # parase lines and evealuate in the global environement
  eval(parse(text = script))
}

source("https://raw.github.com/tonybreyal/Blog-Reference-Functions/master/R/bingSearchXScraper/bingSearchXScraper.R")

The problem with the code above was that the functions sourced from the desired R script file only existed locally in source_github() and not globally to the rest of the R session. Sadface.

The Solution

Asking on Stack Overflow produced an answer from the mighty Spacedman who added envir=.GlobalEnv as a parameter to eval. This means that the evaluation is done in the global environment and thus all the contents of the R script are available for the entire R session.

Furthermore, it occurred to me that I could make the function generic to work with any R script that is hosted over a HTTPS connection. To this end, I added a couple of lines of code to download a security certificates text file from the curl website.

source_https <- function(u, unlink.tmp.certs = FALSE) {
  # load package
  require(RCurl)

  # read script lines from website using a security certificate
  if(!file.exists("cacert.pem")) download.file(url="http://curl.haxx.se/ca/cacert.pem", destfile = "cacert.pem")
  script <- getURL(u, followlocation = TRUE, cainfo = "cacert.pem")
  if(unlink.tmp.certs) unlink("cacert.pem")

  # parase lines and evealuate in the global environement
  eval(parse(text = script), envir= .GlobalEnv)
}

source_https("https://raw.github.com/tonybreyal/Blog-Reference-Functions/master/R/bingSearchXScraper/bingSearchXScraper.R")
source_https("https://raw.github.com/tonybreyal/Blog-Reference-Functions/master/R/htmlToText/htmlToText.R", unlink.tmp.certs = TRUE)

Using unlink.tmp.certs = TRUE will delete the security certificates text file that source_https downloads and is an optional parameter (probably best to use it only on the final call of source_https to avoid downloading the same certificates file multiple times).

UPDATE

Based on Kay’s comments, here’s a vectorised version with cross-platform SSL certificates:

source_https <- function(url, ...) {
  # load package
  require(RCurl)

  # parse and evaluate each .R script
  sapply(c(url, ...), function(u) {
    eval(parse(text = getURL(u, followlocation = TRUE, cainfo = system.file("CurlSSL", "cacert.pem", package = "RCurl"))), envir = .GlobalEnv)
  })
}

# Example
source_https("https://raw.github.com/tonybreyal/Blog-Reference-Functions/master/R/bingSearchXScraper/bingSearchXScraper.R",
             "https://raw.github.com/tonybreyal/Blog-Reference-Functions/master/R/htmlToText/htmlToText.R")

 

12 Comments »

  1. Hi Tony,

    How is it going?
    ..check my little edit of your function: https://github.com/gimoya/theBioBucket-Archives/blob/master/R/source_https.R
    I hope it’s ok putting your function to my repository?

    Best, Kay

    Comment by kay — December 9, 2011 @ 10:14 pm

    • Hi Kay, good to hear from you, and great news about you getting a github repo (will make it easier for me to browse code from your blog in the future!).

      I have absolutely no problem with you putting my function in your repo and doing whatever you want with it as everything code-related on this blog is opensource and copyrighted under the “Creative Commons Attribution-NonCommercial 3.0 Unported (CC BY-NC 3.0) License” – reference: https://creativecommons.org/licenses/by-nc/3.0

      I like what you’ve doing with the CAINFO parameter – I had thought about doing that too but ran into a problem when doing it under Ubuntu 11.10 (worked fine under Windows):

      
      source_https("https://raw.github.com/gimoya/theBioBucket-Archives/master/R/RegEx_Examples.R")
      #Error in curlPerform(curl = curl, .opts = opts, .encoding = .encoding) : 
      #  Problem with the SSL CA cert (path? access rights?)
      
      

      The error above occurs because on linux I don’t have access rights to “/usr/local/lib/R/site-library/RCurl/CurlSSL/ca-bundle.crt” – reference: http://www.omegahat.org/RCurl/FAQ.html

      I never found the time to solve the issue and so that’s why I resorted downloading the certs directly in my version of the function in order to keep it as a solution that works across platforms. (I prefer you version when used under Windows however). 🙂

      Comment by Tony Breyal — December 10, 2011 @ 1:05 am

  2. Kay, thinking about it, it’s quite easy to make it work under any platform – I have submitted a patch for you on github (I’ve only ever emailed patches before so hope I did it correctly by making a pull request!) 🙂

    Comment by Tony Breyal — December 10, 2011 @ 1:19 am

  3. Hey guys, thank you for the solutions you have provided. It helps me a lot.

    I have tested and am interested in using your methodology of putting codes onto github and source it via https.

    I am not sure if I understand everything correctly. But so far, the remaining challenge for me is to get the source_https function to all the machines I use. But putting the source_https function on to github won’t help, because you need it to source github. Is there any way we can eliminate this inconvenience? Maybe publish source_https via a R package? Thanks!

    Comment by Alex — December 25, 2011 @ 8:49 pm

  4. […] goes to Tony Breyal for putting together a solution for sourcing r code from github.) […]

    Pingback by Printing nested tables in R – bridging between the {reshape} and {tables} packages | R-statistics blog — January 30, 2012 @ 7:44 am

  5. Thanks for sharing this function. I put it in a public Gist: https://gist.github.com/fernandomayer/6158625. It worked for me on Ubuntu 13.04

    Comment by fernandomayer — August 5, 2013 @ 8:35 pm

  6. Good article. I will be facing many of these issues as well..

    Comment by Librerias en guadalajara — October 17, 2013 @ 6:16 pm

  7. Hi, nice article.

    For gists, its much easier – source() works fine in my experience.

    If you click on “Raw” in the top right corner of the gist files, and copy the section of the URL until “raw/”, then you can source this straight into R without problems…

    source(“https://gist.githubusercontent.com/aghaynes/80f37df49854dbd8013a/raw/”)

    HTH

    Alan

    Comment by Alan — March 16, 2015 @ 4:21 pm

    • Working with the raw code is great… but what if it’s a big chunk of code?
      I have been trying to write an R script that will download the .tar, untar() it, and source() it in an elegant and robust way, but I’m a hacky newbie who can’t get it to work. Suggestions?
      If it’s useful, here’s the function: https://gist.github.com/John-R-Wallace/3eab07a93877e87ec968/
      It’s small, but I’m interested in a robust solution for larger code as well.
      Or do I have the wrong end of the stick, and for larger code you’d just work with a package or whatever?

      Comment by mixtrak — March 24, 2015 @ 2:38 am

    • Thanks for posting – I was wrestling with the same issue but the discrete comment above: ‘source() works fine in my experience’ made me realise I hadn’t actually attempted `source()` – and as it turns out it works fine in my usage cases too (Github and Gitlab) and interestingly does not throw the SSL certificate error issue. Sometimes the simplest solution is the best!

      Comment by Amy M — February 20, 2018 @ 3:38 pm


RSS feed for comments on this post. TrackBack URI

Leave a reply to Tony Breyal Cancel reply

Create a free website or blog at WordPress.com.