Skip to Content
Technical Articles

Web Scraping With Python to Beautify Email

Introduction

Further to our earlier posts on SCN Question Read,automation of the task and sending email notification using Python here, let us now proceed to enhance the email content of the notification. When we look at popular email notifications that we receive, it is never a text file that contains just plain text about some information. It definitely is enhanced with images(even videos in some case) and we could observe lavish display of css style and intelligent layout etc.

For example, if we look at the current trends in 2019, something like the below that being used by Salesforce in sending custom illustrations in their emails stand out.This greatly would catch the attention of possible users. Full reference here.

Fig 1. Reference email template, fromĀ  EmailMonks

We would take this as the reference and come up with something that is better than plain text but more appealing. However few items like background, certain colors we would omit for brevity.

Let us proceed in the following manner.

Step 1: Web Scraping the SCN Question & Answer site for details.

Step 2: Open the questions for a short preview of content, Clip Texts

Step 3: Include Images to enhance the look and feel, provide links etc.

Ok, let us get started.

Step 1: Web Scraping

Unlike our earlier series where we extracted minimal information from a web site and handled the remaining parts by manually , we would utilize some popular libraries available in Python 3 here.

One of the critical one to be used is the BeautifulSoup . The current version is Beautiful Soup 4. After installing the same, we would open the SCN page and analyze the contents and pick the items of our interest.

When we inspect the page for SCN question and answers, we could observe a common pattern here.

All the questions are coming under the unordered list, <ul> tag followed by <li> tag, an ordered list.

Fig 2. HTML Elements holding Q&A

Notice here css class name of the list as ‘dm-contentList’.

We locate the element with this class name in the retrieved result, now we have the list containing all questions(15 comes in single page).

Now we proceed to pick theĀ  individual items.

Fig 3. Individual item holding the questions and asked by, answered by etc.

These items comes under link <a> tag and the title or the text under the same tag shows the actual question. Ok, time for a quick check where we are. Let us go ahead and print the results retrieved yet.

This would give us list of all 15 questions with lot of other information as below:-

Fig 4. Item details containing question, title holds the information of our question

Now we retrieve each of the items in the list and pick up the title information. Like before, we use our regular expression library to pick the string starting with title.

Which gives us the title list as below:-

Among this, pick only the ones whose length is not zero, then tokenize the same with “<” to avoid duplicate entries and more over restrict the list to only 4 items. The reason being simple that when we look at the Email template, they are short and appealing. It may not be worthwhile to bombard the user with all the available questions on the SCN page. Now put the title question to final variable “itemDetails”.

Now we proceed to pick the link to open each of the questions as well. This is important as we need to show a short preview of the questions in second step.

Step 2: Open the questions

In this step we quickly open each of the above four questions and pick up the text details those. The link to questions we already have in the earlier step.

This would give us the preview of the questions.

Fig 5. Opening the questions and reading the question details to form the email contents

Step 3: Form the Email Content

Now that we have all the information we have to generate our email content. Let us focus on enhancing the look and feel. We can see that email content is actually HTML in nature. So we proceed to create the content accordingly.

We use the “Flexbox” which is a new layout available in CSS3. This would typically give the freedom to design a responsive and flexible layout and ease set up of header, footer and content on the screen(may not work in IE). Styling we provide inline. Below is how we create the header for the HTML content.

Rest of the lines of code for retrieving the question, preview etc. remains same. Let us try to provide images on the screen. There are few image services available out there like UnSpash, Loerem Picsum etc. We take LoremPicsum here and generate few random images by altering the image number in the image URL.

We use the random library and generate a random image number as well.

Little bit of beautifying and clipping the text contents to 500 characters, we generate the actual question preview like below:-

Alter the rows for a different background and a link with ‘Read More’ item to open the questions directly from user side.

As seen in other professional email notification, mostly the subject holds the first question, we set the same so.

Finally we send that email…

Fig 6. Email arriving at the user inbox

The questions on the site at that time were as below:-

Fig 7. Q& A Page and Email contents accordingly

We open the email to read the contents.

Fig 8. Email notification with details

We scroll down to the bottom of the screen as well to verify the content and items.

Fig 9. Email notification adapted on a mobile device.

Result

We formed the HTML contents of the email to look like the popular notifications. We also provided the links to read further as well.

Full code available on GitRepo here.

 

Thanks,

Jakes

 

 

 

Be the first to leave a comment
You must be Logged on to comment or reply to a post.