“Real Men (and women!) Use C”
Sometimes it takes 10 minutes to setup a reverse-proxy configuration. For those times you can live happily ever-after with mod_proxy, and remain blissfuly ignorant of the monstrosity known as mod_rewrite. Although it is simply another Apache module, mod_rewrite is a complex and versatile tool; “the swiss army knife” of Apache modules, if you will. To quote a quote from the mod_rewrite complete documentation, “Despite the tons of examples and docs, mod_rewrite is voodoo. Damned cool voodoo, but still voodoo” Addicted C programmers say “real men use C” (and real women, too; no pun intended); real reverse-proxies, those who “hide” complex scenarios, use mod_rewrite.
Why should I even care?
Here’s how I had the pleasure of learning about mod_rewrite — true story. A customer had a working portal, which they only used internaly. A decission was made to open some of the content to external users, as in from outside the internal network. If this was a simple portal it would’ve been easy to accomplish. The problem was that this portal was exposing all kinds of web-applications — portal applications, SAP CRM, custom web-applictions built on other SAP WASs, and third-party web-applications. All these applications were running on all kinds of hosts. The problem was that mod_proxy, capable as it is, has its limitations — namely, the ability to configure it using fixed, simple URLs only. Some of the SAP web-applications were using URLs to pass session information, or cookies. It means that the beginning of requests to those web-applications all start with “/sap(“, followed by an arbitrary string, and then “)/”. Here’s an example of how two of these requests looked in the internal network: http://server1.company.com/sap(fjdksalfFHJDKS434)/bc/bsp/sap/webapp1 http://server2.company.com/sap(FDdnmdJ4hjKFDa)/bc/bsp/sap/webapp2 And this is how these two requests will look like when using the same portal via a reverse-proxy: http://revproxy.company.com/sap(fjdksalfFHJDKS434)/bc/bsp/sap/webapp1 http://revproxy.company.com/sap(FDdnmdJ4hjKFDa)/bc/bsp/sap/webapp2 Since all the requests will be to the reverse-proxy host, mod_proxy has to distinguish between the requests using some fixed string which follows the host-name. That’s impossible because of the cookie in the URL — this cookie changes from session to session. mod_rewrite, luckily, uses regular-expressions to parse requests. (It seems Reg-Exs, the technology which makes people want to rip their arms when they can’t get it to work and have a moment of divine clarity when they find out how amazingly useful it is, is having some sort of a renaissance here on SDN — check out the search resules for blogs with “regular expressions” in them)
So what’s the trick?
The trick is to use mod_rewrite to parse those complex, dynamic URLs and prepend a fixed string to the URL, one which mod_proxy can easily manage. mod_proxy will take it from there and do it’s magic, including removing that excess string we added to the URL. I know, I didn’t understand what I just wrote either. I say learning by example is the best way, and learn by example you shall. Here’s a simple flow of the process:
The sample scenario
This is the sample network scenario — one reverse-proxy server (http://www.company.com), proxying three internal servers — portal.company.com, was1.company.com, and was2.company.com Proxying the portal is a piece of cake, so we’ll start with that — load and activate mod_proxy, and proxy the portal (I’m leaving out the ProxyMapping configuration on the J2EE engine, it’s covered in previous posts):
LoadModule proxy_module modules/mod_proxy.so LoadModule proxy_http_module modules/mod_proxy_http.so ProxyVia on ProxyTimeout 600 ProxyRequests Off ProxyPass /irj http://portal.company.com:50000/irj ProxyPassReverse /irj http://www.company.com/irj ProxyPass /logon http://portal.company.com:50000/logon ProxyPassReverse /logon http://www.company.com/logon Both WASs (was1.company.com and was2.company.com) generate URLs which start with /sap following by a cookie. We need to use somekind of HTTP tracer to find out the URL patterns we can use to distinguish between the two. You can use any tracer — HTTPWatch, ieHeaders, Live HTTP Headers extension for Firefox, or even general network tools like Ethereal. What we’re looking for is the part of the URL which follows the cookie — which directories on the server each web application uses. Let’s say we traced the communications and we found the following URLs are used by the application on was1.company.com:
http://was1.company.com/sap([long cookie string])/bc/bsp/sap/public/..... http://was1.company.com/sap([long cookie string])/bc/bsp/sap/zzz_yyy/..... and the application on was2.company.com retrieves files from the following URLs:
http://was2.company.com/sap([long cookie string])/bc/bsp/sap/crm_bsp/..... http://was2.company.com/sap([long cookie string])/bc/bsp/sap/icons/..... (In some cases you might find that more than one internal server are trying to access a directory with the same name, like “/bc/bsp/sap/public” — I don’t know of a solution for this other than to rename the directories on some of the servers to differentiate the URLs.) Now that we’ve identified the URLs to forward we can start messing around with mod_rewrite. The following are the lines to load the module and activate it:
LoadModule rewrite_module modules/mod_rewrite.so RewriteEngine On RewriteLog "/apache_rewrite.log" RewriteLogLevel 2 I’ve set the log-level to “2” in this case — it’s pretty detailed. 3 is even more detailed, and anything above that it too detailed in my opinion. After you got it working you’ll probably want to lower the log-level to 1 or disable the log altogether. As I wrote earlier, the trick is to prepend a fixed string to the URLs and hand them over to mod_proxy for the actual redirection. Here’s the statements:
RewriteRule ^/sap(.*)/bc/bsp/sap/public(.*) http://was1.company.com/WAS1$0 [P] RewriteRule ^/sap(.*)/bc/bsp/sap/zzz_yyy(.*) http://was1.company.com/WAS1$0 [P] RewriteRule ^/sap(.*)/bc/bsp/sap/crm_bsp(.*) http://was2.company.com/WAS2$0 [P] RewriteRule ^/sap(.*)/bc/bsp/sap/icons(.*) http://was2.company.com/WAS2$0 [P] Let’s break down one of the first line to understand it:
- ^ — marks that the reg-ex match should be done against the beginning of the string. Note that the source URL mod_rewrite gets does not include the host-name.
- /sap(.*) — the “/sap” is just part of the URL. “.*” in reg-ex means a variable length string containing any character. Put 1+1 together and you got a small reg-ex which matches all those URL cookies — /sap(FDHJfhdsjkfhs47ifh4fkj) will match it, /sap(57834294hjr4kwbnejkfwhwuif) will match it, and so on and so forth — anything which has “/sap” at the begining followed by any string will match.
- /bc/bsp/sap/public(.*) — this will match anything which starts with “/bc/bsp/sap/public” and followed by anything — this way the match will work regardless of what follows the last verb of the URL (/public in this case)
- http://was1.company.com/WAS1$0 — this is what the URL should be rewritten to. The host-name of the internal server is added and a fixed prefix “/WAS1” is also added. The “$0” tells mod_rewrite to concatenate the original URL to the new one. This means the URL mod_proxy will get will look something like this: http://was1.company.com/WAS1/sap(FDHjkfsdhffdshjkfh4f4)/bc/bsp…… and so on.
- [P] — This is an extremely important directive — it tells mod_rewrite to forward the new URL to mod_proxy for processing. If this directive isn’t there the new URL will never reach mod_proxy.
The rest of the lines are basically the same. What’s missing now is the mod_proxy directives to handle the actuall proxying:
ProxyPass /WAS1 http://was1.company.com ProxyPass /WAS2 http://was2.company.com Note the difference between these directives and the ones used for the portal — in the portal we put “/irj” and “/logon” both in the source and the target of the directive (ProxyPass /irj), but in this case we only put “/WAS1” and “/WAS2” in the source. We do this because we want mod_proxy to remove the extra prefix we added using mod_rewrite. Another difference is that there’s no ProxyPassReverse directive — in my experience it was not needed since the URLs are usually relative. What you should do is make sure the web-server in the internal servers has the ProxyMapping function (or some similar function in a different environmnet) configured properly. And that’s it — it should be working now.
More fun with mod_rewrite
Before we get to the end of this post, here are two more examples of what mod_rewrite can be used for:
- RewriteRule ^/$ http://portal.company.com/irj/index.html [P] — will rewrite all requests to the root of the web server to the portal
- RewriteRule ^/(somestring)(.*) http://server.company.com/SomeSTRING$2 [NC,P] — sometimes you have a directory with mixed-case letters and you find that some web-pages try to reach it using the wrong letter-case (for example, trying to access “/SomeSTRING/file.jpg” using “/someSTRING/file.jpg”) — in general you should fix that web-page, but if you want a do-it-once solution you can use mod_rewrite to rewrite all occurances of that string, regardless of case, to the desired way. the “NC” directive at the end of the line means “No Case”, as in case-insensitive. This way mod_rewrite will treat “somestring”, “SoMeString”, etc as the same string. Another thing to note here is that we use “$2” and no “$0” at the target URL — this is because we want mod_rewrite to remove the original way the string was written in the URL. $2 tells it to concatenate the original URL starting from the second “/”, which will be just after the string we want to remove (the first one is at the beginning of the URL)
I know it’s a lot to take in, but it’s easier than it seems — after some trial-and-error you get the hang of it and it becomes pretty simple. Keep the mod_rewrite documentation handy at all times 😉 Alon.