NPR API Tutorial
NPR (National Public Radio) has been providing apikeys to public in order to attract people to code and build applications that uses NPR’s application programming interface. This is a tutorial on how to use the NPR api to parse a story of any topics from npr.org website. Python programming language would be my priority in this npr api tutorial.
How to get npr api key
In order to make queries to npr.org we would need the api key. If you haven’t got it, here’s how to get one. Lets begin 🙂
For getting the npr api key, register here . You’ ll receive a verification email. As soon as you visit the verification link, login to npr.org and check your profile. In open API tab you’ ll find an alphanumeric string. That’s the api key.
How to parse a story using NPR API
In this npr api tutorial, we will parse the latest complete story of any topics of your choice. Every topics in the national public radio’s website has an id. Click here for complete index. Choose your favorite. I ll choose technology which has an id 1019. The coding is ready to go 🙂
Importing the modules for npr api tutorial
from urllib2 import urlopen
from json import load, dumps
Assigning the npr api key to a variable for ease
The key given here is not real one and doesn’t resemble the npr api key. The npr api key is a string containing alphanumeric characters. Assign it to the variable key as a string.
key = ‘mykey’
The structure of the initial query url with api key
The api key is must. The api key is placed at the end of the initial url after the “?” in the form apiKey=key Here we’re uing string concatenation to build a query url below. Key is the variable assigned to your alphanumeric apikey stored as a string.
url = ‘http://api.npr.org/query?apiKey=’ + key
Adding the query parameter and understanding them.
Each query parameter comes in the key/value pair. Every pair of key/value query parameter is separated by ‘&’ One of the necessary query parameter for this npr api tutorial is numResults=value where I’m going to set the value equals to one since we’ ll be parsing a single story. This depends on how many stories you want to extract.
url += ‘&numResults=1’
Another one is format. Json, Xml and feeds format are available for its value. Since we’re using python for this npr api tutorial we’ll set the format to json because it is somewhat similar to the dictionary and lists in python so that we’ll be able to extract the plain text easily. As mentioned before, all the query parameter pair must be separated with a “&”
url += ‘&format=json’
Another one is the most important one. The id=value query parameter pair. This defines which topic to query. For this npr api tutorial I’ ll use id=1019 which stands for technology section.
url += ‘&id=1019′
requiredAssets=value is another query parameter. This defines the type of content to parse. For this api tutorial we’ ll use requiredAssets=text . If you want to parse image than you may replace text with image and audio if you want to parse audio content.
url += ‘&requiredAssets=text’
Time to make a request through our url and storing the response to the variable response as shown below. The url inside the parameter is the querying url we’ve made by concatenating the apikey/value pair followed by the query parameters to the initial url.
response = urlopen(url)
Parsing the story form the npr.org website
Now we set the unorganized content to the variable json_obj. Coding without understanding is a dumb thing to do therefore firstly lets write the content present in json_obj into a file named output.json. See the code below After f.close() run the code and view the output.json file before proceeding forward. We’ll be viewing the format in the output.json file and print useful content. Good command over dictionaries and list in python can be a great enhancer for you in this npr api tutorial. Remember we used format=json parameter previously and let me make it clear that the content in the variable json_obj is somewhat similar to the dictionary in python which you may also feel while parsing the content from json_obj variable. Infact it is a dictionary with many child dictionaries and lists within the parent dictionary. We loop for story in json_obj[‘list’][‘story’] because we need not consider the parent dictionary since all the useful content is within the story key which is the value for list key. This is the format for json while parsing a xml or feed data is completely different. Walking through the codes. Since every story has a title so we need not check for it and we directly print it out. Look at the output.json file, the title of the story resides as a value for the key [‘$text’] which is again one of the values for the key [‘title’]. Therefore we print [‘title’][‘$text’] for the title of the story. Similarly we parse the storyDate and teaser. Look at the output.json file which is a dictionary and search for storyDate and teaser. Teaser is the meta description for a story and every story in npr’s website has it. You may has visualized that the date resides as a value for [‘$text’] which is again a value of the key [‘storyDate’]. There must be no surprise that we print the value of a dictionary using the key. Similarly for printing teaser, search for where the teaser resides and print it using it’s key in the dictionary. Byline is the author of the story and a story may contain more than one authors but in this npr api tutorial we’ll parse the first name. These author names is stored in a list as you may see in the output.json file so we parse only the value in first index. It is possible that some stories do not have byline therefore we use a if statement to check if byline is present in the story. In python checking this criteria is damn simple, see below. Not all stories have ‘show’ metadata therefore we check it with a simple if statement and print it out if present. The’ show ‘ resource is also placed as a list and I believe it won’t be tough to analyze why we used the first index . Similarly in this npr api tutorial we will print the url for the story and the valid html url is present as a value of the key in the first index of the link list. Done with printing the url for the story. Now to print the complete content of the story, we need to iterate through the loop for paragraph in story[‘textWithHtml’][‘paragraph’] and print the paragraph that is formatted. The story text is present as an array of text object in the npr’s website. The npr api supports two types of text, textWithHtml and text(plain text). Here we’re using the textWithHtml since the content may contain hyperlinks within the story.
json_obj = load(response)
f = open(‘output.json’, ‘w’)
for story in json_obj[‘list’][‘story’]:
print “TITLE: ” + story[‘title’][‘$text’] + “\n”
print “DATE: ” + story[‘storyDate’][‘$text’] + “\n”
print “TEASER: ” + story[‘teaser’][‘$text’] + “\n”
if ‘byline’ in story:
print “BYLINE: ” + story[‘byline’][‘name’][‘$text’] + “\n”
if ‘show’in story:
print “PROGRAM: ” + story[‘show’][‘program’][‘$text’] + “\n”
print “NPR URL: ” + story[‘link’][‘$text’] + “\n”
for paragraph in story[‘textWithHtml’][‘paragraph’]:
print paragraph[‘$text’] + “\n”
That’s the end of the npr api tutorial. This is how we parse a complete story using npr api. There’s no way better to learn than exploring yourself. Good luck. Explore the npr api. Make sure you got a valid npr api key. 🙂