Automate the Boring Stuff with Python
We recently decided to provide In-App help using the xRay framework for one of the tools that my team develops. One of the requirements of the tool was that the IDs of the controls should be stable. By the definition in their documentation, a stable ID is one which:
- Does not begin with “__”.
- Begins with “__”, but contains a stable part after “–“.
We needed to do an evaluation of how many IDs in our tool were stable and how many we’d have to change. I cringed at the thought of having to read through the entire source and hunt down the IDs of each control and check whether it was stable or not. It was boring stuff. And what needs to be done with boring stuff? It needs to be automated!
I remembered one of my friends giving me a book – “Automate the Boring Stuff with Python”. He was working primarily on Python and he said the book was really good to learn the Python language. I hadn’t used Python to do anything solid. I had only solved a few competitive programming problems that were a little too long to write in C++ but had a few lines solution in Python. I browsed through the book and I found it interesting. But I put it down and decided to pick it up when I had something tangible to work on in Python, since I learn best when I immediately apply what I read about.
That ‘something tangible’ never came up, until I had to check for these stable IDs. I decided to save the web pages locally and then write script to extract all the IDs and then check if they were stable or not (based on the rules given by the xRay team) and then display how many were stable and how many were not).
From the type of problem I had at hand, it was obvious I would have to use regular expressions. So I started reading about Python’s re module and got started. The first script I wrote was to extract all the IDs from the html page:
import re import sys source = sys.argv dest = sys.argv page = open(source, "r") idlist = open(dest, "w") pattern = "id=\"([^\"]*)\"" s = re.compile(pattern) for line in page: a = re.findall(pattern, line) for item in a: # Exclude sap added ids (starting with sap-ui-) if not re.match("sap-ui-", item): idlist.write(item + "\n")
I supply the page that I saved as the first command-line argument and the file in which I want to store the list of IDs as the second command-line argument:
$ python parseid.py mypage.html idlist.txt
I then wrote another script to read the file that contains the list of IDs and count the number of stable and unstable ones:
import re import sys source = sys.argv idlist = open(source, "r") # Stable IDs: # - Don't start with __ # - If they start with __, they contain stable part after -- total = 0 stable = 0 unstable = 0 for id in idlist: total += 1 if re.match("__", id): if re.match(".*--.*", id): stable += 1 else: unstable += 1 else: stable += 1 print "Stable = ", stable print "Unstable = ", unstable print "----------------" print "Total = ", total
I supply the file that contains the IDs as a command-line argument:
$ python countstable.py idlist.txt
I got a nice little output like this:
Stable = 535 Unstable = 142 ---------------- Total = 677
I just re-ran the script for each of the pages I had to check, and voila! The job was done with 0% boredom, and I ended up learning a little bit of Python. J
If you ever come across any task that seems repetitive and boring, think if you can write a script to automate it. There are many scripting languages you could use; I’d recommend you try out Python, because it’s super cool (you don’t have to use semi-colons, and it uses line indentation instead of curly braces for blocks).
The book that inspired the title of this post:
Nice blog! Yep...Python can be handy at times....I am NOT a big fan of it but it does have its place. There are TONS for free coding sites (like Code Academy, FreeCodeCamp, Code School, etc) where you can pick it up pretty easily. Thanks for sharing!