The Legend of the Headless Chrome-man
Something has been bothering me for the last several weeks. I’ve learned that when I get this itchy feeling, it’s best to pursue it to it’s end so that I can remove it from my consciousness and move on to other things.
Such is the case of the headless chrome-man.
I’ve been working on a series of blog posts with accompanying project that demonstrates what a successful partner provided SaaS offering with multitenancy should look like. You can follow my story here:
If you look at the topic of multitant application development, you quickly realize that there is a needed interaction with the supporting platform. The way multitenant applications work is integrated with the concept of subaccounts existing within a global account on a particular landscape. The multitenant application account itself is deployed into a provider subaccount and registers itself with the system such that client subaccounts under the same global account can subscribe themselves to it.
Each subaccount currently must be created manually through the SAP Cloud Platform Cockpit user interface prior to it being subscribed to the multitenant app running in the provider subaccount(again within the same global account).
This poses a dilemma. What if I want my prospect clients to be able to sign up for my SaaS offering without any manual intervention? I don’t want to have to have a human in the workflow slowing things down. I want my clients to immediately start benefitting from my offering.
The manual process:
Currently there is no public API for programmatically managing the lifecycle of subaccounts. The manual process for creating a new subaccount looks like this.
Log into the SAP Cloud Platform Cockpit as a user with administrative rights to the global account.
This goes through an SSO authentication process that eventually lands you on the main page for the region in which the global account is hosted.
Select your global account.
Pick on the subaccounts link in the Info section or on the left-side menu bar.
This is the current url.
Notice it is at this point you can create a new subaccount with the “New Subaccount” button.
We can effectively start with this as our starting url and if we aren’t authorized in our current browser session, we’ll be relocated to our SSO sign-in screen and then relocated back to this screen.
What’s important to understand is that in an authorized mode, the browser contains cookies that allow further actions with the server because it’s maintaining a session context with the server. If you loose(or have never established) a session, you’ll get an unauthorized response. This is why you can’t just use curl and basic-auth to simulate these steps.
Just to show you what the next step looks like, we’ll click on the button and we get presented with the following popover dialog box. There are a few mandatory text fields you need to enter and a few combobox fields that allow you to select the environment, provider, and region. Depending on your choices, the dialog changes to allow for specifying a mandatory subdomain field. As long as the subdomain is unique to the landscape, clicking the “Create” button with result in a successfully created new subaccount.
Easy right? Now how would you go about automating that set of actions in a browser???
Intelligent Robotic Process Automation:
My first inclination is to bring out the SAP toolbox and see what I can wield to accomplish the task.
Intelligent Robotic Process Automation(iRPA) seems like it might fit the bill nicely.
iRPA does its work by controlling Windows based applications(including browsers) by understanding those applications structure and using windows system provided mechanisms to replicate user actions. Once a set of tasks is defined, it’s uploaded to a SAP Cloud Platform side component to manage and schedule the jobs(tasks).
In addition, it can work in a user-centric way that guides a windows user to automate portions of their work and assist with hints and desktop integration with AI agents.
While I think iRPA can be used to accomplish my needs, it seems like a lot to have to set up a whole Windows desktop environment and to keep it available 24/7 in order to do something as simple as creating a subaccount. Also, as of the time of this writing there is no API for triggering a iRPA job programmatically but that I’m told that this is a feature of an upcoming release. Be sure to check with the iRPA roadmap to see if this has come to pass. Stay tuned as I may revisit this with a future blog post.
Just a POST:
But wait, since we’re talking about a process that happens in a browser and don’t most modern web app use a framework that relies on some sort of RESTful interface to the backend anyway? Why can’t we just inspect the POST and replicate it with our application?
Let’s go back to the point of creating a subaccount described above and open up the chrome based developer tools window. Notice I’ve switched to the Network tab and have cleared any existing connections. Now let’s watch what happens when we press the “Create” button.
A bunch of requests were made, but if we scroll back up to the top and select the first one we see that indeed it was a POST.
If we scroll the connection content to the Request Header section, we can click on the “View Source” link to see what was sent. We can see already that it’s sending some json.
Now we can inspect the request payload. Aren’t the debug tools handy?
Now we can indeed see the full json payload that was sent.
The request got a 202 response so we know that the subaccount was sucessfully created. Also we can pick out the url that was uses in the AJAX request from the Request URL above.
So we have everything we need, right? Well, not exactly. Remember when I was saying that you need a session context for this to work? That is stored in the set of cookies inside the browser and while we can see them here, trying to capture them for use in your application won’t work because they will expire soon enough.
You could find a library in your application language that facilitated the SSO exchange in order to create the session context. However, unless your application is registered to participate in such interchanges, that may not work either. When we’re using a browser, it’s the browser that’s making the requests and establishing the SSO context. But what if we could remote control a real browser programmatically?
Enter the Headless Browser:
Turns out folks that do automated testing of code/websites/mobile applications have had this need and have been doing this for a long time. There are various tools that can do this and I created a site to scrape insurance application data from websites 10 years ago using a tool called PhantomJS.
Selenium is just such an automated testing tool and has a python interface so it seems like a good fit for my needs.
Since we are developing a SaaS offering that runs in SAP Cloud Platform, Cloud Foundry, I wanted to build this as a module as part of my larger multitenant application.
In order to include a copy of the chrome browser and a driver that controls it and the python libraries that are needed to provide the module’s web server interface, I couldn’t just use the python buildpack. Now I know that there are ways to customize your own buildpack and/or perhaps use multiple builpack applied to the same module in order to build up the required set of components, but I decided to take the docker approach. You can push a docker containter as a module in a cloud foundry multi-target application as long as it conforms to the expected behavior. I’ve provided the Docker project here:
Upon build and again at startup, it clones/pulls a git repo containing the python code.
Should you use the docker project, you will need to edit the Dockerfile and replace
git clone https://github.com/alundesap/module_headless.git
With your copied version of the module repo.
Currently this module runs stand-alone, but I will be incorporating it onto a larger multi-target app and turn on the security so that the user is required to authenticate in order to call it’s functions. The full app repo is at:
Security Note: Currently the credentials of the user that is used in the headless browser activity is hard-coded into the server.py module. I order for this to work in your account you’ll need to substitute the credentials with those that have administrative privileges to create a subaccount in your global account. In a real situation you would want to store them in a secure way. I will eventually us the HANA secure-store service for this.
I’m not going to get into all the details of what it took to make this all work. Inspect the Dockerfile to see how I download and install chrome and the chromedriver that selenium requires.
Turning our attention to the module. I’ll past some snippet here and comment so that you can see how the selenium command map to the manual process described above.
I’ve defined a route(path) that my python module will respond to called /headless/chrome. This will trigger the whole process from loading the cockpit page and logging into the SSO endpoint and performing the create subaccount task. When finished it will show this page with a link to the captured pages.
Since all of this will be happening inside some process running on Cloud Platform, how will we know if it’s going as we expect? In order to facilitate some inspection of the inner workings I’ve added a static page of links that point to screen grabs that are captured through the process. This way we can use our browser to “see” what’s happening.
Here is the python selenium code loading the first page.
driver.get('https://account.us1.hana.ondemand.com/cockpit/#/globalaccount/aTeam/subaccounts') driver.get_screenshot_as_file('/root/app/pages/' + 'page01.png')
And the resulting page after we’ve waited for everything to load.
If you’re paying attention, you’re probably thinking that this isn’t the page we asked for? Indeed the headless browser isn’t yet logged in so the SSO exchange sends us to its login page. Now we need to find the DOM element for the user and password login and fill them in.
email.send_keys('email@example.com') password = driver.find_element_by_id('j_password') password.send_keys('Xxxx###!') login = driver.find_element_by_id('logOnFormSubmit') driver.get_screenshot_as_file('/root/app/pages/' + 'page02.png')
Here’s the screenshot prior to clicking the “Log On” button. Notice the password is hidden. Remember we’re actually remote controlling a real browser and it behaves as we might expect.
Now we trigger the logon.
login.click() driver.get_screenshot_as_file('/root/app/pages/' + 'page03.png')
And finally see the page we’re expecting.
Now we need to find the identity of the “New Subaccount” button and click it.
BTW, I’ve found the base way to identify the DOM element is to follow the same steps in a browser on my desktop and use the chrome developer tools window.
Then go down into the Elements section and right-click on the highlighted section and select “Copy selector”.
Now we can use this as a target in the find_elemeny_by_id call.
addSubaccount = driver.find_element_by_id('__jsview1--addSubAccount') addSubaccount.click() driver.get_screenshot_as_file('/root/app/pages/' + 'page04.png')
Now we see the popover dialog prompting us to enter the details. The text input fields are pretty straightforward but the combobox presents a bit of a challenge. You need to first figure out what to click and then which item in the list needs to be clicked.
displayName = driver.find_element_by_id('CreateNewSubAccountDialog--displayName-inner') displayName.send_keys('anewclient') description = driver.find_element_by_id('CreateNewSubAccountDialog--description-inner') description.send_keys('This is a new client subaccount so that it can be subscribed to our provider app.') environmentsComboInput = driver.find_element_by_id('CreateNewSubAccountDialog--environmentsCombo') environmentsComboInput.click() environmentsComboSelect = driver.find_element_by_id('__item7-CreateNewSubAccountDialog--environmentsCombo-1') environmentsComboSelect.click() driver.get_screenshot_as_file('/root/app/pages/' + 'page05.png')
This will trigger some logic on the page that reveals an additional section of the form that include an additional text field called subdomain.
Similar logic us used for selecting the provider and region.
Now we can finish by filling in the subdomain field and triggering a click of the “Create” button.
subdomain = driver.find_element_by_id('CreateNewSubAccountDialog--subdomain-inner') subdomain.send_keys('anewsubdomain') subdomain.send_keys(Keys.TAB) driver.get_screenshot_as_file('/root/app/pages/' + 'page08.png') createButton = driver.find_element_by_id('__button11') createButton.click()
And magically the request to create the subaccount is performed. We can see that the “waiting” screen is displayed…
Now if we go back to our browser and refresh the screen (after some delay) we will see our created subaccount.
Fragility vs. Availability:
One of the biggest concerns you will face when using this approach is that admittedly it is inherently fragile. If the producers of the SAP Cloud Platform Cockpit change how a subaccount is created, the UI designer changes the IDs of the elements were interacting with, or just network latency or site unavailability will cause this method to fail. I would only promote this approach as a last resort if no other viable API is available to perform the needed actions. An API by definition is a much more stable interface than that of the UI of a browser based application. This can be mitigated by running nightly tests to see if the expected results occur, but even then you will need to maintain the code that drives this process actively. Once an appropriate API is available, I would recommend swapping out this logic for that of exercising the API.
That being said, I also think that having more tools in your tool belt is better than not. Waiting for all the right tools and platforms to become available may delay your efforts such that you miss your market opportunity.
And if you miss your market opportunity you’re out of the game…