401 errors and AppRouter
Recently I have been working with a customer to track down random 401 errors that have been appearing in their app. This has been happening since the app went live and there was no clear indication why this was happening. It was however very frustrating for their users as everyone kept getting logged out while working.
Initially all our investigations pointed to a network problem where something was being blocked or some stability issues with a piece of hardware. All investigation in this direction turned out inclusive but a couple of issues were identified and fixed but the issue still persisted. so it was back to the drawing board….
Luckily another team were working on a similar problem but had a more reproducible scenario. What they were seeing was in a dashboard they had built, some tiles of the dashboard would fail to load randomly due to authorization errors.
After a lot of digging around and logging, all indications pointed to AppRouter. One of our colleagues then pointed out that any cloud app is not allowed to hold state in running application instances. Cloud Foundry occasionally needs to manage its resources so it can move cells around. This happens when there is a problem with the data center core or when Cloud Foundry components need to be maintained for example. .
Now, AppRouter is hosted in the SAP BTP Cloud Foundry Runtime which is effectively a Diago Cell. This does however mean it is subject to being evacuated or rescheduled at the whim of Cloud Foundry. Any design then needs to have multiple instances and implicitly not hold any state locally otherwise this will be lost when an evacuation happens.
Luckily the AppRouter team did think about this and the documentation covers how to setup AppRouter to handle this scenario in detail using Redis or your own state management solution.
What is not currently clear in the documentation is:
- Most importantly, using a state persistence mechanism such as Redis is mandatory for production environments otherwise the sessions will be lost when the cell is evacuated.
- Developers should be scaling their approuter instances to at least 2 but ideally 3 instances in production. This ensures that the application is scaled across multiple availability Zones and ensures that when a cell is evacuated to be updated, the other AppRouters (cells) can pickup the load and not lose state.
The second scenario is handled by setting the instances property in the manifest.yaml file. There is a blog post on scaling CF apps here on how to do this manually or via an update to the manifest.yaml file.
The second scenario means that your manifest file will need to be updated to include redis and have the configuration included in the environment variables in Cloud Foundry. This is pretty straight forward and well documented in the AppRouter documentation.
I would recommend reading up on:
SAP Note 2835933 – Application was restarted
and the AppRouter documentation on state management.