Mining Facebook – Mining the Social Web using python

- - Tutorials

Mining data from different sources has become a trend for the past few years. The need of structured and characteristic data has lead the data miners to advice their machines to mine social data. Twitter, Google + and Facebook are some of the social networks, now serving as a mountain of behavioral data. In this post Facebook will be our Zen source and python will be the miner. Python programming language is well known for it’s capability to withdraw web data effectively and efficiently. Cutting the long story short, today we will be using Facebook Graph API through python to mine the facebook page likes for each individual adjusting in our friend list.

In order to do this, it will require us an access token which you can get from here. Make sure you choose version 1.0 otherwise the program won’t result what you expect it to. A python installed machine with additional modules namely ‘facebook’. If you’re running on a linux machine install facebook module through the following command.

sudo pip install facebook-sdk

Other than linux, go to this link and get it installed

Getting the top 10 common likes amongst your friends and global likes for those pages

Importing the modules shall not lead to any questions. We are using two more modules excluding facebook namely Counter and itertools. The Counter module will be used to count the repetition of elements which in here will be used to find the 10 most common likes amongst user’s connections(friends). The another module the program requires is itertools. In order to iterate through two or more items python facillitates the module named itertools.

Getting the access token for Facebook Graph API

When logged in go to http://developers.facebook.com

Click on tools which you will find on the upper portion of the page. Under tools, click to Graph API Explorer which will drive you to a page similar to the one demonstrated below. To the upper right corner you get to choose from the drop-down menu, the version you want to use. For this program we will stick to version 1.0 because it facilitates the access token for variety of permissions which is limited in the newer versions.

facebookgraphapi

version 1.0

 

Click on get access token button and you will be prompted with the list of available permissions check mark the fields named user_likes and user_frineds under the menu User Data Permissions. Now switch to the tab Friends Data Permissions and check mark the fields named friends_likes. When done stress the button Get Access Token. The access token is an alphanumeric string which is very long.

Now when we have the access token we can create a connection to the Graph API through python using facebook module. For example: g = facebook.GraphAPI(access_token)

When connection is created, we now query the Graph API for different data that we need. In order to get the name and id of our friends we need to use the get_connections method over the connection created(from above example g) which takes “me” and “friends” as parameters.

Now since we have the names and ids of our friends, we now query the Graph API with the id to get the likes for each id and store it to a dictionary where name remains as key and the value will the the pages that name has liked.

Now comes the play of the module Counter. This will count the total number of likes for each pages collectively amongst our friends circle in facebook. Since our goal is also to find the global likes for the top 10 liked pages amongst friends, we require the page id which is acquired by the line below the counter.

Now we have all the data. All we need to do is print them out. We only require top 10 pages and their corresponding number of likes within one’s facebook circle and global likes for those pages. First let us print the top 10 pages and number of likes within one’s circle. For this we iterate through a loop for the variable friends_likes.most_common(10).

The action of itertools comes to action now. The purpose of this module is to loop through two or more items parallely. We will iterate through friends_likes.most_common to get the page name and iterate through friends_likes_id to query through the id to get the global likes for that page. In order to get information about one item (for example a page , a user) we use get_object method taking id and connections(permission) as arguments. The final output of the program will look like the image below. We have the likes amongst friends circle and global likes. Now it’s up to you what conclusion you wana find from this valuable piece of information.

querying the facebook graph api to get valuable output

output

import facebook    #sudo pip install facebook-sdk
from collections import Counter
import itertools
# the access token should be stored as a string
access_token = "CAACEdEose0cBAHmnWcPlAY8YiEdQgK5tPPFZClzZCeLmYpw2z3AH8aZAHZB8rZAJ0JySvQpIQltZCuYeIrA46f0M4rizbaJadYP6wfTOqFCZAVSlJNpZBCP1RGZANy9TNdhS02lf9GKnTWR2ZBTUFbg7Di5hHtePIFBHQzIsZBjIwIuxdDkbwvMTwH9gQwKM2s5KXjL4QeGWrTUnARESA7S7Cbt"

g = facebook.GraphAPI(access_token) #creating connection to the Facebook Graph API through facebook-sdk
 
friends = g.get_connections("me", "friends")['data'] #getting the name and id of friends 

likes = { friend['name'] : g.get_connections(friend['id'], "likes")['data'] for friend in friends } #a dictionary comprehension to store
#frined's name as key and liked page information as it's value

friends_likes = Counter([like['name'] for friend in likes for like in likes[friend] if like.get('name')]) #counting the repetition of pages
#and getting name of the page

friends_likes_id = Counter ([like['id'] for friend in likes for like in likes[friend] if like.get('id')]) #getting id for the pages with
#repetitions

for i in friends_likes.most_common(10):    #getting the 10 most repeated pages amongst friends, returns a tupple with page name and no. of likes
    print i[0],i[1] 

for i,j in itertools.izip(friends_likes.most_common(10),friends_likes_id.most_common(10)): #using itertools to iterate through two items and printing the page name with the global likes
    print i[0],g.get_object(j[0])['likes']

 

bhishan

I am Bhishan Bhandari, a CS student and life hacker. I specialize in automation. I sell my services on fiverr. You can hire me for projects here Buy Services Follow me on github for code updates Github You can always communicate your thoughts/wishes/questions to me at bbhishan@gmail.com

Leave a Reply

Your email address will not be published. Required fields are marked *