Grab certain text from gmail using python
I tried this sometimes a month ago. This article could help on extracting parts of an email(GMAIL). As I walk further I’ll show you how to extract a single word(basically sounds funny). I’ll be using python programming language in order to grab a word programmatically.
The background behind this program will drive me to explain about a practice in my college. Everyday we receive an email in a formatted manner which contains a word, its meaning, normal day use and an example sentence accommodating this word. The idea of extracting the word from email and saving it somewhere in the local machine came from one of my classmate’s.
All at first I created a label named “Deerword” in the gmail account I am using for my college. This label would contain the emails that I mentioned earlier(a word each day). The second thing I did was googled for similar operations though non was so beneficial. Dictionaries and lists were something that excited me much while learning python programming language therefore working with a json response was my choice. I used a standard library called imaplib in order to access my gmail account. Below is the working code that I am using to extract a word each day and save it to my local machine. The word I wanted was between two ‘*’. Therefore I iterated though a loop to get the first ‘*’ and concatenate the following words to a variable until another ‘*’ was found. The string contained in the variable was then written to a ‘.txt’ file which was opened in append mode. The below code obtains the raw content of the body section. This content generally contains a list with two elements within it. The first element is again a list containing two strings as it’s element while the second element of the parent list is just ignorable string. In order to understand the response type uncomment three lines starting from line 7. All I need to do is automate this program in a way that runs every 24 hours and a new word each day is saved to my local machine. If at all you need to extract something else, understanding the response by uncommenting line 7, 8 and 9 would help.
import imaplib from json import loads, dumps msrvr = imaplib.IMAP4_SSL('imap.gmail.com',993) msrvr.login('firstname.lastname@example.org','password') stat, count = msrvr.select('Deerword') stat, data = msrvr.fetch(count,'(UID BODY
#fd = open(rawtext.json, “w”)
word = data
word = word
f = open('deerwords.txt','a')
f.write(deer_word + "\n")
for i in range(0,len(word)):
if word[i] == '*':
For this program to run you will want a module to access gmail account i.e imaplib. If you wish to omit line 7,8 and 9 you need not use json module therefore you may ommit line 2. Line 3 establishes a mail server to access gmail. The fourth line takes email and password as parameters in order to create a session or simply authenticate to your gmail account. The following line is an assignment statement which assigns the total number of emails present in the label called “Deerword”. At the sixth line the latest email(email present at the zeroth index) is assigned to variable data. Note that this variable contains only the raw text(content) present in the body section i.e content below the greeting section. Line 7,8 and 9 opens a new json file named “rawtext.json” in write mode. The content present at the variable data is written to the file and indented in a row of four spaces in order to make it understandable.
word = data assigns the content present at the zeroth index of the list data. This word variable again contains two elements(strings). I’ve assigned word = word because the word I need resides as somewhere in the string that is the second(1th index) element of the variable word. A function definition can be seen below assignment statements. This functions takes three parameters namely word, i, deer_word. The code inside the function checks whether or not the word present at the i+1 th index is “*”. If conditions passes then the word is concatenated to a variable deer_word. The beauty of recursive function can be seen clearly above. The function is called within itself increasing the value of i by 1. When the condition check fails then final word that is the concatenation of the words between the two “’*” is written to a text file. Below the function definition is a loop which calls the function. The loop continues from 0 to length of the variable word. When the condition mathes function is called. As soon as the execution of function ends the break statement terminates the loop thereby finishing the program.
Any questions regarding extracting of text from mails are welcome. Happy coding 🙂