NPR API Tutorial II – How to use story API to parse a story having text, audio and image from npr.org

- - Tutorials

NPR API TUTORIAL

In this NPR API Tutorial, I’ll explain you on each line of codes to parse a complete story containing text, images and audio from npr.org using the NPR API while making queries to interface using python programming language. Have you got your npr api key? If not I’ve covered how to get it in my previous post which focuses on parsing a single text story of a specific category from npr.org Find it here. This post focuses on explaining each lines of code that we’ll use to parse a complete story containing text, audio as well as images from the Nation Public Radio’s official website.

 

What will we parse in this NPR API Tutorial?

1. The title of the story

2. Published date

3. Teaser (Meta Description of the npr story)

4. Byline (Author)

5. Show (if present)

6. URL of the story

7. Image in the story

8. Caption of the image

9. Producer of the image (Image Credit)

10. Audio

11. The content of the story (the complete story in formatted paragraphs)

The above mentioned contents will be written into a text file in a formatted way while walking through the NPR API Tutorial. We’ll parse our complete story from a json file which will make it lot easier for us to work with because a json file is a text file containing the complete content of an artile in a form of dictionary. Here by dictionary I mean dictionary in python. If you’ve got command over dictionaries and lists in python, you will make through this NPR API Tutorial. For those who skipped those lessons or didn’t quite understood it, I shall take the pain to explain in best possible way I can 🙂

How to parse a complete story containing text,audio and image from npr.org using NPR API

 1. Importing the necessary modules

from urllib2 import urlopen

from json import load, dumps

2. Assigning a variable to the NPR API KEY

Assign the npr api key to a variable key. Remember the key variable should contain string data type meaning that the you key should be inside quotes. Here I’ve assumed a random name for the key.

key = ‘key’

3. The initial url for the NPR API

The base url for story query is given below. As we move further, we will concatenate some more string to the variable url containing base url for query.

url = ‘http://api.npr.org/query?apiKey=’ + key

4. Adding query string parameters

The query string parameter is in a key/value format each pair separated by a ampersand ‘&’ .The order of the key/value parameter is not important while assigning a valid value to the key is fairly important.

url += ‘&numResults=1’

numResults is a query string parameter which specifies the maximum number of stories returned for a query. In this npr api tutorial on how to parse a complete story containing text,audio and image of any category from npr.org, we’ll parse a single story thefore we specify numResults=1 and remember the case of the alphabets matters in python.

url += ‘&format=json’

The Story API outputs many formats including podcast XML and RSS but for ease we will be working with json output format which is basically a dictionary.

url += ‘&id=1026’

The id=value parameter specifies which category to look for. In this case 1026 stands for space category in npr.org website while you may wish to choose from a completely different category. Here’s the index for the id’s of different categories.

url += ‘requiredAssets=text,image,audio’

The requiredAssets query parameter specifies which format of data to look for. Since this is a tutorial on how to parse a complete story containing text,audio and image using NPR API, we will specify requiredAssets=text,audio,image all separated by commas. Look at the url, this is a complete querying url which we’ll be using to extract a story from npr.org.

5. Making Story API call and storing the contents returned to a variable

response = urlopen(url)

json_obj = load(response)

In the above code we requested the Story API using urlopen and loaded the json response to the variable json_obj which is a cumbersome bunch of dictionaries.

6. Writing the json response to a file

f = open(‘output.json’,’w’)

f.write(dumps(json_obj, indent=4))

f.close()

fd = open(‘nprstory.txt’,’w’)

The purpose of writing the json response to a file is to understand the structure of the json response and to extract the necessary information by finding the key/value pair of dictionaries so that we may access the values required for us using the dictionary(json response). We are going to write readable and formatted content to the nprstory.txt.

7.Open the output.json and write the necessary content to another file in a formatted way

Save the codes as a .py file and run it and look for the output.json file. Open the file and get familiar with the huge bunch of dictionaries present in it. 

for story in json_obj[‘list’][‘story’]:

Now you may be confused for why we iterated through the loop. Well we used looping mechanism because for now we’ve set requiredAssets=1 but in future we may want to extract a huge number of stories. Now again why did we loop through json_obj[‘list’][‘story’]? The content we need to extract resides as a value of the key ‘story’ which is a value of the key ‘list’. The codes beginning below must be indented inside the for loop we stated just above this paragraph.

print “TITLE: “ + story[‘title’][‘$text’] + “\n”

fd.write(“TITLE: ” + story[‘title’][‘$text’] + “\n”)

Look into the output.json file, the title of the story resides as a value of the key ‘$text’ which is again a value of the key ‘title’. This is how we access values of a dictionaries in python. So since we have extracted a valid title from the json file we will write the content in a managed way to the file nprstory.txt

print “DATE: “ + story[‘storyDate’][‘$text’] + “\n”

fd.write(“DATE: ” + story[‘storyDate’][‘$text’] + “\n”)

The published date of the story is present as a value of the key ‘$text’ which is again a value for the key ‘storyDate’ . Writing it to a file is similar to the previous one.

print “TEASER: “ + story[‘teaser’][‘$text’] + “\n”

fd.write(“TEASER: ” + strory[‘teaser’][‘$text’] + “\n”)

By this time I hope you understood the concept of extracting contents from a json file. It is the game of playing with dictionaries. Similar to the prior ones the meta description of the story resides as a value of the key ‘$text’ which is a value for the key ‘teaser’ .We write the content extracted to the nprstory.txt and there must be no question on how.

if ‘byline’ in story:

print “BYLINE: “ + story[‘byline’][0][‘name’][‘$text’]

fd.write(“BYLINE: “ + story[‘byline’][0][‘name’][‘$text’] “\n”)

Let’s welcome our new friend lists into our NPR API Tutorial. Byline is the name of the author of the story and every strory of NPR may not contain it. Therefore we set a condition to check it’s presence. The name of the author is stored as a value for the key ‘$text’ which is a value for the key ‘name’ which is again a value for the key of the zeroth index of ‘byline’ .Look the output.json file for better understanding.

if ‘show’ in story:

print “PROGRAM: “ + story[‘show’][0][‘program’][‘$text’] + “\n”

fd.write(“PROGRAM: “ + story[‘show’][0][‘program’][‘$text’] +”\n”)

Similarly show may not be present for every story in npr.org website therefore condition checking is done. The information for the program is present as a value of the key ‘$text’ which is a value of key ‘program’ which again is a value for the key of zeroth index of the value of ‘show’ . We’re writing this content to nprstory.txt file in this NPR API Tutorial.

print “URL: ” + story[‘link’][0][‘$text’] + “\n”

fd.write(“URL: “ + story[‘link’][0][‘$text’] + “\n”)

The above code writes to the nprstory.txt file. The valid url for the story resides as a value of the key ‘$text’ which is a value of the zeroth index of ‘link’

print “IMAGE: “ + story[‘image’][0][‘src’] + “\n”

fd.write(“IMAGE: “ + story[‘image’][0][‘src’] + “\n”)

Since we had specified requiredAssets=text,image,audio therefore we don’t need to check whether image exists or not. It is certain that the story containing image is parsed. The link to the image is present as a value of key ‘src’ which again is a value to the zeroth index of ‘image’.

if ‘caption’ in story:

print “CAPTION: “ + story[‘image’][0][‘caption’][‘$text’] + “\n”

fd.write(“CAPTION: ” + story[‘image’][0][‘caption’][‘$text’] + “\n”)

if ‘producer’ in story:

print “IMAGE CREDIT: ” + story[‘image’][0][‘produce’][‘$text’] + “\n”

fd.write(“IMAGE CREDIT: ” + story[‘image’][0][‘producer’][‘$text’] + “\n”)

Image caption and the producer of the image may not be present in the image therefore we used if statement to check its presence. The image caption is a value of key ‘$text’ which is a value of key ‘caption’ which is again a value for the zeroth index of ‘image’ key. Similar is the case for image credit.

 

print “MP3 AUDIO: ” + story[‘audio’][0][‘format’][‘mp3’][0][‘$text’] + “\n”

fd.write(“MP3 AUDIO: ” + story[‘audio’][0][‘format’][‘mp3’][0][‘$text’] + “\n”)

Since in this NPR API Tutorial we’re parsing story with all text, image and audio therefore we need not check if audio is present. The audio link resides as a value of the key ‘$text’ which is value for the key zeroth index of ‘mp3’ which in turn is a value of the key ‘format’ that is value for the zeroth index of key ‘audio’. Please refer to output.json file while writing codes.

Almost done with the coding for parsing the complete story containing text, image and audio using NPR API, but the final section yet to deal with. The text content of the story meaning the story itself.

for paragraph in story[‘text’][‘paragraph’]:

print paragraph[‘$text’] + “\n”

fd.write(paragraph[‘$text’] + “\n” +”\n”)

fd.close()

The text content resides as a value the key ‘$text’ which is a value of key ‘paragraph’ and again ‘paragraph’ is the value for the key ‘text’. We iterate through the loop because the text content is present in form of paragraph inside story[‘text’][‘paragraph’]. Now run the code and check for the file nprstory.txt. The file must contain the structured contents. This is how we parse a complete story containing text, image and audio from npr.org website using the story API. Make sure you do the indentations well. Get used to with the json response by analyzing and visualizing complex dictionaries. Good luck 🙂

Bhishan Bhandari [22] A one man army producing contents and maintaining this blog. I am a hobbyist programmer and enjoy writing scripts for automation. If you'd like a process to be automated through programming, I also sell my services at Fiverr . Lately, I like to refresh my Quora feeds. Shoot me messages at bbhishan@gmail.com  

Leave a Reply

Your email address will not be published. Required fields are marked *