Consistently Infrequent

November 10, 2011

Facebook Graph API Explorer with R (on Windows)

Filed under: R — Tags: , , , , , , , — Tony Breyal @ 2:16 pm

I wanted to play around with the Facebook Graph API  using the Graph API Explorer page as a coding exercise. This facility allows one to use the API with a temporary authorisation token. Now, I don’t know how to make an R package for the proper API where you have to register for an API key and do some OAth stuff because that is above my current skill set but the Explorer page itself is a nice middle ground.

Therefore I’ve came up with a self contained R function which allows me to do just that (full code at end of post):


# load packages
library(RCurl)
library(RJSONIO)

# get facebook data
df <- Facebook_Graph_API_Explorer()
t(df[7,])

# post.id                      "127031120644257_319044381442929"
# from.name                    "Doctor Who"
# from.id                      "127031120644257"
# to.name                      "Doctor Who"
# to.id                        "127031120644257"
# to.category                  "Tv show"
# created.time                 "2011-11-10 11:13:42"
# message                      "Has it ever been found out who blew up the TARDIS?"
# type                         "status"
# likes.count                  NA
# comments.count               "3"
# sample.comments              "Did the tardis blow up I haven't seen all of sesion 6&7 [next>>] \"7\" ??? [next>>] the pandorica was obsorbin earth so he blew it up with the tardis"
# sample.comments.from.name    "Alex Nomikos [next>>] Paul Morris [next>>] Vivienne Leigh Bruen"
# sample.comments.from.id      "100001033497348 [next>>] 595267764 [next>>] 100000679940192"
# sample.comments.created.time "2011-11-10 11:23:36 [next>>] 2011-11-10 11:29:56 [next>>] 2011-11-10 13:04:53"

In the above, I’m using “[next>>]” as a way separating entities in the same cell in order to keep the data frame structure. The order is maintained across cells i.e. the first entity of each cell of the sample.comments.from.name column corresponds to the first entity of of each cell of the sample.comments.from.id column, etc, etc.

The main problem I experienced, and have been experiencing for a long time with R, is dealing with a list which has a NULL as one of it’s elements and then un-listing it whilst still maintaining the same length:. For Example:

mylist <- list(a=1, b=NULL, c="hello"
unlist(mylist, use.names = FALSE)
# [1] "1"     "hello"

Whereas what I really want is for the NULL to be converted to NA and thus have the length of the resulting vector be the same as that of the original list, e.g.

mylist <- list(a=1, b=NULL, c="hello"
mylist[sapply(mylist, is.null)] <- NA
unlist(mylist, use.names = FALSE)
# [1] "1"     NA      "hello"

But I don’t know of any automatic way of doing that and so have to do it manually each time. I tell you, these NULL elements in a lists are really causing me headaches when it comes to using unlist!

Anyway, back to the Facebook_Graph_API_Explorer() function, there are a couple of points to bear in mind:

  1. This will only work on Windows because I don’t know what a cross platform version of winDialogString is. I’m guessing the tcltk package has something but I can’t see what it would be.
  2. You must already be signed into Facebook (i.e. you must have an account and be signed in) before you call my Facebook_Graph_API_Explorer()

The function will guide you through the process with dialogue boxes so it should be easy to use for anyone. I think next time I’ll try a web scraping exercise on the HTML of a facebook wall page using XPath, depends on how much time I get!

Tony Breyal

P.S. Full code is below:


library(RCurl)
library(RJSONIO)

Facebook_Graph_API_Explorer <- function() {
  get_json_df <- function(data) {
    l <- list(
        post.id = lapply(data, function(post) post$id),
        from.name = lapply(data, function(post) post$to$data[[1]]$name),
        from.id = lapply(data, function(post) post$to$data[[1]]$id),
        to.name = lapply(data, function(post) post$to$data[[1]]$name),
        to.id = lapply(data, function(post) post$to$data[[1]]$id),
        to.category = lapply(data, function(post) post$to$data[[1]]$category),
        created.time = lapply(data, function(post) as.character(as.POSIXct(post$created_time, origin="1970-01-01", tz="GMT"))),
        message = lapply(data, function(post) post$message),
        type = lapply(data, function(post) post$type),
        likes.count = lapply(data, function(post) post$likes$count),
        comments.count = lapply(data, function(post) post$comments$count),
        sample.comments = lapply(data, function(post) paste(sapply(post$comments$data, function(comment) comment$message), collapse = " [next>>] ")),
        sample.comments.from.name = lapply(data, function(post) paste(sapply(post$comments$data, function(comment) comment$from$name), collapse = " [next>>] ")),
        sample.comments.from.id = lapply(data, function(post) paste(sapply(post$comments$data, function(comment) comment$from$id), collapse = " [next>>] ")),
        sample.comments.created.time = lapply(data, function(post) paste(sapply(post$comments$data, function(comment) as.character(as.POSIXct(comment$created_time, origin="1970-01-01", tz="GMT"))), collapse = " [next>>] "))
        )
    # replace all occurances of NULL with NA
    df = data.frame(do.call("cbind", lapply(l, function(x) sapply(x, function(xx) ifelse(is.null(xx), NA, xx)))))
    return(df)
  }

  # STEP 1: Get certs so we can access https links (we'll delete it at the end of the script)
  if(!file.exists("cacert.perm")) download.file(url="http://curl.haxx.se/ca/cacert.pem", destfile="cacert.perm")

  # STEP 2: Get fackebook token to access data. I need a crossplatform version of winDialog and winDialogString otherwise this only works on Windows
  winDialog(type = "ok", "Make sure you have already signed into Facebook.\n\nWhen  browser opens, please click 'Get Access Token' twice. You DO NOT need to select/check any boxes for a public feed.\n\n After pressing OK, swich over to your now open browser.")
  browseURL("http://developers.facebook.com/tools/explorer/?method=GET&path=100002667499585")
  token <- winDialogString("When  browser opens, please click 'Get Access Token' twice and copy/paste token below", "")

  # STEP 3: Get facebook ID. This can be a fanpage or whatever e.g. https://www.facebook.com/DoctorWho
  ID <- winDialogString("Please enter FB name id below:", "https://www.facebook.com/DoctorWho")
  ID <- gsub(".*com/", "", ID)

  # STEP 4: Construct Facebook Graph API URL
  u <- paste("https://graph.facebook.com/", ID, "/feed", "?date_format=U", "&access_token=", token, sep = "")

  # STEP 5: How far back do you want get data for? Format should be YYYY-MM-DD
  user.last.date <- try(as.Date(winDialogString("Please enter a date for how roughly far back to gather data from using this format: yyyy-mm-dd", "")), silent = TRUE)
  current.last.date <- user.last.date + 1

  # Get data
  df.list <- list()
  i <- 1
  while(current.last.date > user.last.date) {
    # Download the JSON feed
    json <- getURL(u, cainfo = "cacert.perm")
    json <- fromJSON(json, simplify = FALSE)
    data <- json$data
    stopifnot(!is.null(data))

    # Get json Data Frame
    df.list[[i]] <- get_json_df(data)
    i <- i + 1

    # variables for while loop
    current.last.date <- as.Date(as.POSIXct(json$data[[length(json$data)]]$created_time, origin="1970-01-01", tz="GMT"))
    print(paste("Current batch of dates being processed is:", current.last.date, "(loading more...)"))
    u <- json$paging$`next`
  }

  # delete security certificates we downloaded earlier for https stites.
  file.remove("cacert.perm")
  # return data frame
  df <- do.call("rbind", df.list)
  return(df)
}

df <- Facebook_Graph_API_Explorer()
t(df[4,])
# post.id                      "127031120644257_319062954774405"
# from.name                    "Torchwood"
# from.id                      "119328091441982"
# to.name                      "Torchwood"
# to.id                        "119328091441982"
# to.category                  "Tv show"
# created.time                 "2011-11-10 12:05:21"
# message                      "If you're missing Torchwood & Doctor Who and are after some good, action-packed science fiction, why not check out FOX's awesome prehistoric romp, Terra Nova? It's carried in the UK on Sky TV and is well worth catching up with & following! The idea - The Earth is dying, it's in its final years. Life's intolerable & getting worse. Scientists take advantage of a rift in time & space to set up a 'fresh start' colony on Terra Nova - the earth, 60 million years ago. The adventure then begins..."
# type                         "link"
# likes.count                  NA
# comments.count               "0"
# sample.comments              ""
# sample.comments.from.name    ""
# sample.comments.from.id      ""
# sample.comments.created.time ""

UPDATE: Based on a sugestion from @BrockTibert  I’ve now set up a github account and the above code can be found here: https://raw.github.com/tonybreyal/Blog-Reference-Functions/master/R/facebook_Graph_API_Explorer/facebook_Graph_API_Explorer.R

UPDATE 2: An alternative web-scraping method to bypass the API with R: https://tonybreyal.wordpress.com/2012/01/06/r-web-scraping-r-bloggers-facebook-page-to-gain-further-information-about-an-authors-r-blog-posts-e-g-number-of-likes-comments-shares-etc/

7 Comments »

  1. Thanks for posting this and including the code. I’ll have a play with it to see what I can come up with. :-)

    Comment by jedifran — November 10, 2011 @ 5:17 pm

  2. […] Therefore I’ve came up with a self contained R function which allows me to do just that (full code at end of post): Facebook Graph API Explorer with R (on Windows) « Consistently Infrequent […]

    Pingback by mic (mic100) | Pearltrees — March 8, 2012 @ 12:46 am

  3. saya sudah melakukan tips poin 2 dan 3 untuk point 1 menyusul , Aerona

    Comment by John — August 27, 2015 @ 1:52 am

  4. オラ!私がしてきた先に行くとあなたを与えるために| 勇気勇気今、最終的に得た 読書のためのあなたのブログからシャウトアウトハフマン 送信!ちょうどしたかったあなたを伝える 仕事仕事 偉大な良い幻想を追いつきます!
    メール便選択時:送料無料 即日出荷 http://www.zebracar.co.il

    Comment by メール便選択時:送料無料 即日出荷 — December 31, 2015 @ 10:07 am

  5. すごい迫力! ブログサイトこのテンプレート/テーマを私は本当に掘りです。それは、シンプルでありながら効果的です。多くの時間、それはだ非常に難しい ユーザビリティ素晴らしい使い勝手と外観との間の「完璧なバランス」ことを取得します。これで仕事素晴らしいを行って、私はあなたがいるあなたがしたあなたがしたこと言わなければなりません。ブログの負荷非常に、高速の私のためサファリ さらにまた。 例外ブログ!
    メール便送料無料 商品到着後レビュー記載でプレゼント http://www.paramountpicturesaustralia.com.au

    Comment by メール便送料無料 商品到着後レビュー記載でプレゼント — December 31, 2015 @ 10:08 am

  6. あなた人の人 本当にすべてでこれを共有するための| たくさんたくさん | おかげで、ありがとうございました 認識 |あなたがしている、あなたが何であるかを約 |} {話!ブックマークさ。 ください も)= Webサイトサイト 私に相談してください。私たちは、があります リンク 貿易 配置 の中私たちを
    数量限定品 即日出荷 http://www.mulher.unimontes.br

    Comment by 数量限定品 即日出荷 — December 31, 2015 @ 10:08 am

  7. ただ あなたの記事のようにあると言う驚異。 | 単にあなたのポストで鮮明透明性がある 壮大な優れたと私はでき |この主題の専門家あなたがしているあなたがあると仮定します。 まあ今後のポストに| 更新日まで保つために| | フィードRSSフィード私はあなたをつかむためにあなたの許可を持つましょう許可。おかげで百万と継続 報酬仕事。
    ギフト・贈り物_送料込み 即日出荷 http://themeowpost.com

    Comment by ギフト・贈り物_送料込み 即日出荷 — December 31, 2015 @ 10:09 am


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

The Shocking Blue Green Theme. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 76 other followers

%d bloggers like this: