Consistently Infrequent

November 10, 2011

Facebook Graph API Explorer with R (on Windows)

Filed under: R — Tags: , , , , , , , — Tony Breyal @ 2:16 pm

I wanted to play around with the Facebook Graph API  using the Graph API Explorer page as a coding exercise. This facility allows one to use the API with a temporary authorisation token. Now, I don’t know how to make an R package for the proper API where you have to register for an API key and do some OAth stuff because that is above my current skill set but the Explorer page itself is a nice middle ground.

Therefore I’ve came up with a self contained R function which allows me to do just that (full code at end of post):


# load packages
library(RCurl)
library(RJSONIO)

# get facebook data
df <- Facebook_Graph_API_Explorer()
t(df[7,])

# post.id                      "127031120644257_319044381442929"
# from.name                    "Doctor Who"
# from.id                      "127031120644257"
# to.name                      "Doctor Who"
# to.id                        "127031120644257"
# to.category                  "Tv show"
# created.time                 "2011-11-10 11:13:42"
# message                      "Has it ever been found out who blew up the TARDIS?"
# type                         "status"
# likes.count                  NA
# comments.count               "3"
# sample.comments              "Did the tardis blow up I haven't seen all of sesion 6&7 [next>>] \"7\" ??? [next>>] the pandorica was obsorbin earth so he blew it up with the tardis"
# sample.comments.from.name    "Alex Nomikos [next>>] Paul Morris [next>>] Vivienne Leigh Bruen"
# sample.comments.from.id      "100001033497348 [next>>] 595267764 [next>>] 100000679940192"
# sample.comments.created.time "2011-11-10 11:23:36 [next>>] 2011-11-10 11:29:56 [next>>] 2011-11-10 13:04:53"

In the above, I’m using “[next>>]” as a way separating entities in the same cell in order to keep the data frame structure. The order is maintained across cells i.e. the first entity of each cell of the sample.comments.from.name column corresponds to the first entity of of each cell of the sample.comments.from.id column, etc, etc.

The main problem I experienced, and have been experiencing for a long time with R, is dealing with a list which has a NULL as one of it’s elements and then un-listing it whilst still maintaining the same length:. For Example:

mylist <- list(a=1, b=NULL, c="hello"
unlist(mylist, use.names = FALSE)
# [1] "1"     "hello"

Whereas what I really want is for the NULL to be converted to NA and thus have the length of the resulting vector be the same as that of the original list, e.g.

mylist <- list(a=1, b=NULL, c="hello"
mylist[sapply(mylist, is.null)] <- NA
unlist(mylist, use.names = FALSE)
# [1] "1"     NA      "hello"

But I don’t know of any automatic way of doing that and so have to do it manually each time. I tell you, these NULL elements in a lists are really causing me headaches when it comes to using unlist!

Anyway, back to the Facebook_Graph_API_Explorer() function, there are a couple of points to bear in mind:

  1. This will only work on Windows because I don’t know what a cross platform version of winDialogString is. I’m guessing the tcltk package has something but I can’t see what it would be.
  2. You must already be signed into Facebook (i.e. you must have an account and be signed in) before you call my Facebook_Graph_API_Explorer()

The function will guide you through the process with dialogue boxes so it should be easy to use for anyone. I think next time I’ll try a web scraping exercise on the HTML of a facebook wall page using XPath, depends on how much time I get!

Tony Breyal

P.S. Full code is below:


library(RCurl)
library(RJSONIO)

Facebook_Graph_API_Explorer <- function() {
  get_json_df <- function(data) {
    l <- list(
        post.id = lapply(data, function(post) post$id),
        from.name = lapply(data, function(post) post$to$data[[1]]$name),
        from.id = lapply(data, function(post) post$to$data[[1]]$id),
        to.name = lapply(data, function(post) post$to$data[[1]]$name),
        to.id = lapply(data, function(post) post$to$data[[1]]$id),
        to.category = lapply(data, function(post) post$to$data[[1]]$category),
        created.time = lapply(data, function(post) as.character(as.POSIXct(post$created_time, origin="1970-01-01", tz="GMT"))),
        message = lapply(data, function(post) post$message),
        type = lapply(data, function(post) post$type),
        likes.count = lapply(data, function(post) post$likes$count),
        comments.count = lapply(data, function(post) post$comments$count),
        sample.comments = lapply(data, function(post) paste(sapply(post$comments$data, function(comment) comment$message), collapse = " [next>>] ")),
        sample.comments.from.name = lapply(data, function(post) paste(sapply(post$comments$data, function(comment) comment$from$name), collapse = " [next>>] ")),
        sample.comments.from.id = lapply(data, function(post) paste(sapply(post$comments$data, function(comment) comment$from$id), collapse = " [next>>] ")),
        sample.comments.created.time = lapply(data, function(post) paste(sapply(post$comments$data, function(comment) as.character(as.POSIXct(comment$created_time, origin="1970-01-01", tz="GMT"))), collapse = " [next>>] "))
        )
    # replace all occurances of NULL with NA
    df = data.frame(do.call("cbind", lapply(l, function(x) sapply(x, function(xx) ifelse(is.null(xx), NA, xx)))))
    return(df)
  }

  # STEP 1: Get certs so we can access https links (we'll delete it at the end of the script)
  if(!file.exists("cacert.perm")) download.file(url="http://curl.haxx.se/ca/cacert.pem", destfile="cacert.perm")

  # STEP 2: Get fackebook token to access data. I need a crossplatform version of winDialog and winDialogString otherwise this only works on Windows
  winDialog(type = "ok", "Make sure you have already signed into Facebook.\n\nWhen  browser opens, please click 'Get Access Token' twice. You DO NOT need to select/check any boxes for a public feed.\n\n After pressing OK, swich over to your now open browser.")
  browseURL("http://developers.facebook.com/tools/explorer/?method=GET&path=100002667499585")
  token <- winDialogString("When  browser opens, please click 'Get Access Token' twice and copy/paste token below", "")

  # STEP 3: Get facebook ID. This can be a fanpage or whatever e.g. https://www.facebook.com/DoctorWho
  ID <- winDialogString("Please enter FB name id below:", "https://www.facebook.com/DoctorWho")
  ID <- gsub(".*com/", "", ID)

  # STEP 4: Construct Facebook Graph API URL
  u <- paste("https://graph.facebook.com/", ID, "/feed", "?date_format=U", "&access_token=", token, sep = "")

  # STEP 5: How far back do you want get data for? Format should be YYYY-MM-DD
  user.last.date <- try(as.Date(winDialogString("Please enter a date for how roughly far back to gather data from using this format: yyyy-mm-dd", "")), silent = TRUE)
  current.last.date <- user.last.date + 1

  # Get data
  df.list <- list()
  i <- 1
  while(current.last.date > user.last.date) {
    # Download the JSON feed
    json <- getURL(u, cainfo = "cacert.perm")
    json <- fromJSON(json, simplify = FALSE)
    data <- json$data
    stopifnot(!is.null(data))

    # Get json Data Frame
    df.list[[i]] <- get_json_df(data)
    i <- i + 1

    # variables for while loop
    current.last.date <- as.Date(as.POSIXct(json$data[[length(json$data)]]$created_time, origin="1970-01-01", tz="GMT"))
    print(paste("Current batch of dates being processed is:", current.last.date, "(loading more...)"))
    u <- json$paging$`next`
  }

  # delete security certificates we downloaded earlier for https stites.
  file.remove("cacert.perm")
  # return data frame
  df <- do.call("rbind", df.list)
  return(df)
}

df <- Facebook_Graph_API_Explorer()
t(df[4,])
# post.id                      "127031120644257_319062954774405"
# from.name                    "Torchwood"
# from.id                      "119328091441982"
# to.name                      "Torchwood"
# to.id                        "119328091441982"
# to.category                  "Tv show"
# created.time                 "2011-11-10 12:05:21"
# message                      "If you're missing Torchwood & Doctor Who and are after some good, action-packed science fiction, why not check out FOX's awesome prehistoric romp, Terra Nova? It's carried in the UK on Sky TV and is well worth catching up with & following! The idea - The Earth is dying, it's in its final years. Life's intolerable & getting worse. Scientists take advantage of a rift in time & space to set up a 'fresh start' colony on Terra Nova - the earth, 60 million years ago. The adventure then begins..."
# type                         "link"
# likes.count                  NA
# comments.count               "0"
# sample.comments              ""
# sample.comments.from.name    ""
# sample.comments.from.id      ""
# sample.comments.created.time ""

UPDATE: Based on a sugestion from @BrockTibert  I’ve now set up a github account and the above code can be found here: https://raw.github.com/tonybreyal/Blog-Reference-Functions/master/R/facebook_Graph_API_Explorer/facebook_Graph_API_Explorer.R

UPDATE 2: An alternative web-scraping method to bypass the API with R: http://tonybreyal.wordpress.com/2012/01/06/r-web-scraping-r-bloggers-facebook-page-to-gain-further-information-about-an-authors-r-blog-posts-e-g-number-of-likes-comments-shares-etc/

About these ads

2 Comments »

  1. Thanks for posting this and including the code. I’ll have a play with it to see what I can come up with. :-)

    Comment by jedifran — November 10, 2011 @ 5:17 pm

  2. [...] Therefore I’ve came up with a self contained R function which allows me to do just that (full code at end of post): Facebook Graph API Explorer with R (on Windows) « Consistently Infrequent [...]

    Pingback by mic (mic100) | Pearltrees — March 8, 2012 @ 12:46 am


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

The Shocking Blue Green Theme Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 71 other followers

%d bloggers like this: