Building Resilient Apps on SAP Cloud Platform
There was a blog I came across some time ago (here) that pointed to some SAP help pages talking about Developing Resilient Apps on SAP Cloud Platform. When I saw the title I was immediately drawn to it. Building apps that are resilient is a really good objective but it made me think – what does resilience actually mean. I started reading the blog and followed through to the PDF document that SAP supplied and I found the information interesting. I like technical information as much as the next person but this went to a detailed level that even I found a bit too much. I have bookmarked it though as I do believe it is important to keep in mind – especially if you are part of a team that is tasked to develop applications. It talked about Resilient Software Design concepts of Availability and offered the following.
“Availability is defined in the following equation: A: = MTTF / (MTTF + MTTR)”
Figure:1 Elements of the Availability Equation (source: SAP)
Not sure how many designers or developers out there are following these guidelines but just in case you are interested I have included the link to the full document here https://help.sap.com/doc/6a87ef93fe554dde919548e9c4c86299/Cloud/en-US/resilient_apps.pdf.
When I read the content, I agreed that yes the information contained in the links was good to know and probably good as a guide but I wanted to provide my own idea of what resilience means when it comes to developing applications on SAP Cloud Platform. I wanted a more practical and everyday guide that teams could follow as guidelines so hopefully this blog post achieves this. This was the inspiration for this blog.
Most of the below is targeting the building of mobile applications however they are guidelines that fit development of most applications whether they be native apps, hybrid apps or web applications. My concept of resilience is based on my experiences over the last few years being part of a development team that has created over 50 Fiori applications in SAP Cloud Platform. Some of my experiences also relate to a recent big project I have been on to deliver SAP Warehouse Management mobility applications on multiple devices with varying levels of complexity.
But before I go into the detail – perhaps a definition of resilience. I googled software application development sites across the spectrum that related to development activities and found multiple versions of the same sort of message.
“Resiliency is the ability of a system to recover from failures and continue to function.“
Most talked about a “system” not necessarily an application although they could be considered the same thing.
Resilience – My Layman’s View
From my perspective the following dot points represent my thoughts, a layman’s view, on what resilient applications should look like. I will refer to this as DREMTAC as everything is an acronym these days :-).
- Dependable – operates in the same fashion on all devices at all times
- Reliable – does what it is expected to do over and over again without fail
- Efficient – performs the function as efficiently and as quickly as possible
- Maintainable – built in a way that is easy to support and maintain. The ideal being flexible as well to be able to easily include new functionalities without rewriting the codebase.
- Transparent – provides all of the necessary information to the user in order to make appropriate decisions
- Accurate – represents information from objects exactly as it is kept in source systems
- Consistent – produces the same results based on the same inputs provided and within the same time frame
When I think of native applications I use on a daily basis on my device of choice, most conform to the above resilience factors. When I send a friend a message using Messages or Whatsapp it always gets there and a few back and forth messages proves reliability. I use Slack on multiple devices – on a desktop, on my iPhone and on my tablet occasionally and I get the same consistent experience and a dependable one at that. Applications that do not have these attributes I would suggest are used but only for a short time as they pretty quickly turn off the user. My point is that most of the time users of applications when they go about their business expect resilience and as software developers we are trusted with providing this as a bare minimum. So, how do we do this? I will now go through some of what I have come across that may assist with this. One the major elements I believe is testing. Rigorous testing often leads to apps being extremely resilient as they have been pummeled with scenarios.
Testing – Include the Unhappy Path
When developing applications people in general tend to focus, design and develop for the happy path. There is often very little thought that goes into what takes place when the path is not happy. Most of the projects I have been involved in tend to cover unit test scenarios that work. Going through the steps leading to successful outcomes – if the user enters everything as they should, they attach the correct file types, they enter the right data, they select the right options, they navigate correctly – one way and also fill out all of the mandatory fields. Happy Path! Unfortunately, this does not lead to building software applications that are resilient – the applications are not robust or scalable and more often than not as soon as a few more scenarios are tested there is usually an uncovering of a mountain load of defects. At this stage you may think – this is ok it is early days however this sort of development practice will catch up with you – and very quickly.
So, how do we build resilience when designing, building and testing applications? In my simple terms I ask a simple question – what if?
- What if the user tries to attach a Word document instead of the required PDF type?
- What if the user tries to attach multiple documents when they should only upload a single one?
- What if the user does not enter the fields they need to? Should the user be guided through the entry process?
- What if the user enters alphas in a number field??
- What if the user enters more characters in an entry field than should be entered?
- What if the user hits the Submit button twice – quickly!
- What if users try and back out of a function that has only performed half of what it needs to – should they be allowed to navigate out?
- What if the user rotates the device? Will information render in the same way or should the device be locked?
- What if the user submits and the function is performed, should the user be alerted that something has taken place?
All of these questions and more should be asked to gain knowledge on what areas need to be designed for, developed and tested and there is seriously no limit on this. The only limit would be time.
The more negative testing that is included when testing applications, the better and more resilient the application will be. Follow the unhappy path to more resilience!
Resilient Application Development
As well as rigorous testing there are some additional practical elements that if dealt with appropriately will lead to more resilient applications being developed. Most of these were experienced in a recent project I have been involved in. The idea behind this list is to think about them way up front when the UX designers and Business Analysts are detailing the requirements and to include them on any high-fidelity designs prototypes. Thinking about all of these concepts up front is difficult so some of them will still need to be dealt with in the development phase. The below areas are meant to provide some guidance, these are by no means exhaustive but hopefully they help.
- When a user clicks on a tile to run the application where does the cursor go.
- When the user enters or scans information in that field where then does the cursor go.
- Also, once scanned is there an automatic carriage return to confirm the entry?
- Does a validation occur at this time?
- When multiple entry fields are included then the cursor must jump from field to field after the relevant information has been entered or scanned.
This was easier said than done and required a lot of thought during the development of the applications. When developing applications, especially mobile ones the placing and operation of the cursor is important – it actually saves time for the users because they would not need to click into the field to then enter the data – as the cursor would already be placed in the field and ready for input.
Here is an example in an application when a Bin number has been scanned and the cursor is now waiting for the next user input.
Figure:2 Cursor Control – Crucial for Resilient Applications
Actively thinking about where cursors need to be placed through the application and the navigation aspects ultimately reduces development times and provides more resilient, robust applications.
- At which point in time should buttons be active – thereby allowing the user to press them?
- If users remove contents from a field and change their mind the button (that was active) may need to be changed to Not Active so that the user does not press it and an error occur.
- Is there an ability for users to press the Submit button a number of times? If so, this more than likely will cause errors.
Determining when buttons are active ideally should be part of UX design activities and definitely included in high fidelity prototypes. This will provide users the look and feel of how the application will operate and under what conditions Buttons are open to be Actioned all leading to more resilient applications should agreement of that prototype is given.
During the most recent project we needed to determine the conditions by which action buttons should be enabled. In one instance in our Block Bin application we display the Reason code and the user must select one. Only when a reason code is selected and populated should the Submit button be active – so that the user can press it. Prior thoughts need to be made in determining when buttons are active. For example, if the reason code is then removed by the user then the Submit button should be set to Inactive. If this is not carried out and the user actions the Submit button then errors will occur – eating away at the Dependable and Reliable aspects of what I called Resilience.
Here is an example where the button is Not active because the relevant input has not been entered. You can see that the Submit button cannot be selected because the Material number needs to be scanned or entered.
Figure:3 Button Enablement – Not Active due to missing fields
Once the necessary field contents are populated the button will then become active.
Figure:4 Button Enablement – Active as all expected fields populated
It is important to note that if the user navigates back up to the Material number or Pick Quantity fields and removes the contents the Submit button should then go back to being Inactive.
Thinking about this as early on in the project as possible (especially in the development phase) will save time later on when dealing with defects that are raised because the application does not deal with these situations.
Entry field validations
- How many characters can be entered into a field?
- Is the field numeric so it needs to be restricted so that only numbers can be entered.
- If the field relates to a backend object what happens if the user enters a greater amount of characters.
- Can the field have decimal places – if so how many and how should this be catered for?
Usually, and trust me from experience, the dreaded termination occurs where the entire Odata service abends which is a really bad user experience. If users receive this message during testing it can greatly impact on confidence levels so best to think about these things in advance. These are only some questions that need thought when handling validations on entry fields, there are many more.
While one part is actually determining the validations that need to be made, the other part is determining what is shown to the user when a validation result comes back. What is shown to the user can also be different depending on what the result is. Here are a few examples of what I am conveying here.
Figure:5 Entry Field Validations – Incorrect format
This example shows the response to the user if the input entered is in the wrong format. A simple toast message is applied to let the user know that what was entered does not conform to the format required. Note, the additional masking text included in the field to assist the user with this. When designing and developing applications this element provides resilience in the form of transparency. The user knows up front how the field contents need to be entered.
Figure:6 Entry Field Validations – Valid Input but not valid for processing
This example shows the response to the user if the input entered is in the correct format however cannot be processed as it does not meet the criteria for processing – in this case a Goods Receipt. When designing and developing applications this element provides resilience in the form of consistency. If the user was allowed to process documents that were invalid they would not get the same consistent results so in these cases the user should be stopped from progressing.
Figure:7 Entry Field Validations – Invalid Data Entered
This example shows the response to the user if the input entered is in the correct format however is invalid. Normally in these cases errors occur when a value that is not contained in the master table is entered. In the above case, the Storage Bin entered (while meeting the field format) did not actually exist. When designing and developing applications this element provides resilience in the form of reliability. If the user was allowed to carry on and process a Storage Bin that actually did not exist then the application could not reliably process the outcome. Additionally, the application has wasted time of the user in carrying on throughout the process when in actual fact a more transparent approach will render better results.
- Can negative quantities be entered?
- Can zero quantities be entered?
- Can a quantity greater than expected be entered?
- Does the quantity have to match expected?
- Can the field have decimal places – if so how many and how should this be catered for?
This is definitely one to work out in the UX design phase. Being clear on the quantities that can be entered as well as the validations that need to occur greatly assists with development activities around resilience and of course with the timeline :-). Transparency once again is a key in this category with the following examples covered.
Figure:8 Quantity Validations – Negative quantities
This example shows where a user has entered a negative number. Now you may say – surely a user would not enter a negative quantity….but if they did you would sleep well at night knowing the applications are handling it. Even covering a fringe scenario will make a difference to the overall resilience of the application – reliability and dependability are front and centre here.
Figure:9 Quantity Validations – Mismatch
This example shows where a user has entered a number for a warehouse countback – part of a Stocktake scenario. In this instance, the user enters a number they have counted for the Material however in this case it differs from what is expected. The user is given another chance to enter the Quantity that does reside in the system. In this instance validations are crucial and goes toward a consistent, reliable and dependable application.
Keyboard versus Numberpad
- If the field requires numbers to be entered, are you showing a Number pad instead of a keyboard?
- Can the numberpad or keypad be hidden?
- Can the height of the numberpad or keyboard be reduced or adjustment to improve the user experience?
When building applications on mobile devices available real estate is the key so users don’t want keyboards or number pads displaying all of the time and they also don’t want them to take up most of the screen. Taking this into account while building applications will ensure a lower amount of defects and happy business users because of an enhanced and much better user experience. During the project we did find the keyboard was sometimes intrusive so most of the fields that required input also allowed scanning to occur – of a barcode, QR code or equivalent. Additionally, on some devices we needed to reduce the size of the keyboard. We did this specifically for the Zebra TC8000 device – there was a keyboard height setting that could be adjusted however was not aware of this until after development was already underway.
- Can the user always back out to the previous screen?
- Can the user go right back to the initial Fiori Launchpad screen (showing the tiles).
- What will the user do if there are 2 navigation options and do they act in the same way?
- Once the user has processed a document where does the application navigate back to? If the user searched prior does it still show the previous search results so the user can select another entry?
Not having a consistent navigation pattern reduces the consistency of the applications being developed. Additionally, if the user can navigate out of applications without completing the required tasks the application starts to produce unreliable results which means the application is not resilient.
During this project we had a number of situations where we did not want the user to cancel out of the current task they were doing which was pretty difficult to do. We also set the Back arrow in the main Launchpad header screen to always go back to the Launchpad initial screen. We did this via a Shell Plugin. While these things seem quite small to change it does mean a stop and start approach and fine tuning does take time out of the normal development process so best to get ahead of this!
Figure:10 Navigation aspects
Device centric behaviours and settings
- What devices will be used?
- What OS versions apply? Android / iOS / Windows?
- What happens when the user scans a barcode? Is an automatic carriage return required?
- Are all apps valid for the device?
One of the devices on our project was the Zebra TC8000 handheld scanning device which was pretty small so from a high-level, decisions as to which apps fit which devices is an absolute minimum requirement. Once this decision has been made, laying out which fields should be included is the next stage. Ideally, this should be worked out prior to developers coding ?. Additionally, there are settings within each device that may need to be tweaked once they start being used. As I covered in the Numberpad vs Keyboard section above we had to reduce the keyboard height within this device to improve the user experience. Knowing the devices that will be used and knowing the features that are offered is yet another aspect of resilient application development.
One such example I remember early on in this project was the Memory option on entry fields was enabled so if you scanned or entered a material number, options would show that matched previous entries. This is like Autocorrect however once a large number of materials was entered the list would take up most of the screen – not ideal. This was seriously annoying so we had to disable this in the settings.
Figure:11 Device Centric behaviours – Memory (Autocorrect)
- Is permanent wi-fi always available?
- Are additional access points required for the new applications?
- Are devices mobile and do they move around various parts of the office/warehouse?
Knowing what type of connectivity options there will be may determine possible solution options within the applications being developed. Additionally, connectivity between the devices to be used also needs to be investigated. The options can be so varying that any development estimates or actual development time may be impacted by the decision. Additionally, if applications are expected to have permanent connectivity and connections are lost then the application will start becoming unreliable. Results would not be dependable so this directly impacts on the application’s resilience.
In a recent project there was talk of using Bluetooth printers as the business wanted mobile printers. Typically, communicating and printing on Bluetooth devices we have found to be problematic (in terms of compatibility) so we hooked up IP addresses thereby allowing wi-fi connectivity on mobile printers so that print labels could be produced around the warehouse. This approach improved the Maintainability of the application – improving the application’s resilience.
Data refresh intervals
- If a worklist is presented should it be refreshed of it’s own accord?
- How often should the list be refreshed – every minute?
Determining the information needs of users is key and normally part of the UX Design activities. This of course would include when information should be refreshed on occasions and whether it would need to be. Knowing this in advance will change the way applications will be developed. For example, If a new object is created in the backend source system yet does not show on the application’s worklist then the user would start questioning the accuracy of the application – thereby the resilience. Ideally, if work lists are included for objects in the backend, setting an automatic refresh option would go along way towards providing a more accurate view. Expecting users to hit the refresh button on browsers or navigate back to then run the application again to effectively retrieve the most updated information is not a great user experience.
- Will the device be used in portrait or landscape mode?
- Should the device be locked for the best user experience?
Building responsive applications is the objective so that when users start moving around their screen the applications adjust accordingly. Smaller screens obviously show less, more key information but with tablets there is only a slight difference between portrait and landscape. Key here is also how the users will utilise the device they have. When performing functions at the coalface the real way devices will be used can be found.
Regardless of the device or in which mode is presented – applications need to behave and operate in the same manner. Understanding the device and how it responds in various modes will lead to more resilient applications being built.
- How are errors handled in the application?
- Is the error message simply forwarded and shown to the user?
- Should nice messages be built to enhance the user experience?
The handling of errors and more generally messages back to the user is paramount to building resilient applications. As mentioned throughout this blog, transparency is key here. Providing informative messages through application functions – including the handling of error messages in a nice way.
Here is an example of a not so nice message.
Figure:12 Not so nice message example
During the most recent project we did receive this message a number of times. This would occur when the backend could not handle and process the request sent to it from the frontend. Stopping this message from appearing to users is the challenge. This point goes hand in hand with the field validations as normally messages would be returned to the user.
Our guide for the project was to simply forward the error message from the backend. An example of this was if the MM Period was closed and we performed a Goods Receipt via the app we would receive the same message as we would if we performed the MIGO transaction in the backend. This is important from a Maintenance perspective – the easier the application is to support the more resilient the application is when small changes are applied.
Overall there were some really great lessons learned in the recent project that I was part of – with the main developers, my colleagues being Luke Phelan and Priyanka Patankar. The topics covered above are just some of the items we discussed and came across during our development activities and overall very proud to say that we achieved the building of highly resilient applications. I am sure both Luke and Priyanka can add some insight to this topic and provide additional elements they believe add to the building of resilient applications.
Finally, I would love to hear from other’s experiences in this area. What do you think resilience is? Do you agree with my summary? What other areas do you think should be included to ensure resilience in the applications being built?
If you can add anything to the list please comment below.
As always, thanks for reading!
Interesting topic Phil. I'll be keeping DREMTAC in mind next time I build an app 😉 one area I also think is important is Decoupling - an error/failure in one function/service should not affect the rest of the app's functionality.
I also read somewhere that you can always reduce failure but you can never be 100% failure-free. "Failing gracefully" is a term that i really like, how are you going to give the best possible experience when something fails I think is a question we should always ask. Cheers
Nice one Greg and yes I am definitely interested in this topic and have been for some time. Challenging as well is another word that comes to mind. 🙂 Good point about decoupling, the only way this is done well is if more time is spent up front design the interface rather than getting straight into coding. I still find that not enough is done up front to design the development so to speak, that is - the guts of the application and the logic within it.
Yeah, like the "failing gracefully" term. It implies that the user experience is kept intact even though failures could take place and this is definitely something we should strive for!
Thanks for reading and providing your feedback on this interesting topic!
Thanks for a wonderfully written and informative article on a topic that is near and dear to me--TESTING!
As you have covered various area of testing, I suggest to the audience to read about ISO-9126 which helps to define software quality attributes at such a level that test strategies and test plans can be iteratively developed for any product.
Resiliency is one attribute that can be covered in multiple areas of ISO-9126 for both functional testing (majority of the development work is in this area) and non-functional testing (security/pen testing, Performance/Volume/Stress testing, Business Continuity Planning--high availability, disaster recovery, fail over, etc...)