After writing and publishing the BSP port for the http:BL-mechanism that is meant to prevent harvesters from gathering e-mail addresses on your website, I was left with a vague sense of dissatisfaction. As the comments say, http:BL is a great initiative. The BSP port isn’t rocket science, so there isn’t anything specifically wrong with that. Except for one thing. The original Apache version of http:BL works on a server base, whereas the BSP port, as demonstrated in the web log(s), works on an application and even on a page level. That means that you need to insert the method code in to every application and to trigger it in every page that you want to protect. That’s a bit of overkill. I thought that there must be a way to achieve this and let the BSP port act in the same manner as the Apache module. Indeed all one needs to do is to extend the BSP rendering engine in such a way that it first checks the IP on its trustworthiness before the page is displayed. One would have thought that this could be achieved by writing one’s own BSP extensions, but that still leaves us with the same problems as mentioned above.
Searching in the SDN web logs, and the more than excellent Advanced BSP programming book by Brian and Thomas, I discovered that I needed to write a HTTP handler. The technology isn’t that difficult to understand and implement, so I wanted to go a step better and implement something which is not yet available in the Apache module. In some cases one needs to prevent IP checking. There are several reasons for that: the page doesn’t contain any data which is valuable for the harvesters, or the application is only callable from an intranet (and hopefully you don’t have any harvesters over there). The main reason, from my point of view, is that checking IP addresses will prevent the honeypot project from working properly. The http;BL mechanism will prevent harvesters from visiting the page that the honeypot is installed on. And letting the harvesters visit that page is precisely what we want.
Step by step
What follows is a step by step guide on how to implement the code. I won’t reinvent the wheel and explain everything about http handlers. I gladly refer to the above book and web logs by the two BSP gurus.
Step 1: Create a table which looks like this
Make it maintainable via transaction SM30 and enter all the pages of the applications (both in capitals) that you don’t want to protect. This is a sort of white list.
Step 2: Create a class.
Since http handlers are in fact nothing more than classes, we need to make one. I called it
ZCL_HTTPBL. Implement the interface IF_HTTP_EXTENSION and put the following code in the HANDLE_REQUEST method:
t x ' INTO html. CONCATENATE html 'k o a The website from' INTO html. CONCATENATE html ' toayou subject t other terms gove Websitedyou acce' INTO html. CONCATENATE html ' "read them carefu Any Non-Human Vi individual(s) wh ' INTO html. CONCATENATE html 'responsible fori forhviolations o ' INTO html. CONCATENATE html ' idi a' INTO html. CONCATENATE html ' S Special restrict Non-HumantVisito spiders, bots' INTO html. CONCATENATE html ', i programs designe automatically. N the Website' INTO html. CONCATENATE html 'ibeyo Furthermore, as within thesWebsi on' INTO html. CONCATENATE html 'athis site are the Website. It visitors alone' INTO html. CONCATENATE html ', human visitors.g ' INTO html. CONCATENATE html 'that each email derived from the storage,yand pot ' INTO html. CONCATENATE html 'substantially di harvesting,igath rec' INTO html. CONCATENATE html 'ognized under prohibited. ' INTO html. CONCATENATE html 'c a a a' INTO html. CONCATENATE html 'h i Each party agree against the othe ' INTO html. CONCATENATE html '("Judicial Actio the registered A ' INTO html. CONCATENATE html 'such lawseare ap andkperformed en consents to the ' INTO html. CONCATENATE html ' The visitor to t him indconnectio Website ' INTO html. CONCATENATE html 'consents abovekagreement. ' INTO html. CONCATENATE html 'i As a visitor to address recorded' INTO html. CONCATENATE html ' "Identifier")aif to your I' INTO html. CONCATENATE html 'nternet any reason. VISITORS AGREE T PARTYaORg' INTO html. CONCATENATE html 'SENDING SUBSEQUENT BREAC | ' INTO html. CONCATENATE html ' p i TERMS ' INTO html. CONCATENATE html ' whichiyou acces o the following rningxacce' INTO html. CONCATENATE html 'ss to pt thesesterms a lly. sitors to the We o control or aut' INTO html. CONCATENATE html ' theibehavior of f the Terms of S PECIAL ' INTO html. CONCATENATE html 'LICENSE R ions on' INTO html. CONCATENATE html 'aa visito rs. Non-Human Vi ndexers, robots, ' INTO html. CONCATENATE html 'd to access,grea on-Human Visitor ndiw' INTO html. CONCATENATE html 'hat would be specified by the te and/or' INTO html. CONCATENATE html 'athe co considered prop is recognizedyth' INTO html. CONCATENATE html ' and have valuepi ' INTO html. CONCATENATE html 'By continuing to address theaWebs irirelative secr' INTO html. CONCATENATE html ' ential distribut minish the value ering, ora' INTO html. CONCATENATE html 'storin ithis agreement s AP' INTO html. CONCATENATE html 'PLICA s that any suit, r ine' INTO html. CONCATENATE html 'connection n")ishall be gov ' INTO html. CONCATENATE html 'dministrative Co plied to agreeme tirely within th ' INTO html. CONCATENATE html 'jurisdiction of he Websiteyconse n with' INTO html. CONCATENATE html ' breaches to electronicas a' INTO html. CONCATENATE html ' ah RECORDS ' INTO html. CONCATENATE html ' the Website, you . An email' INTO html. CONCATENATE html 'aaddre we suspect pote Protocol addres HAT HARVEST' INTO html. CONCATENATE html 'ING,a ANY MESSAGE(S) H OF TH' INTO html. CONCATENATE html 'ESE TERMS | pAND' INTO html. CONCATENATE html 'iCONDITIONS se' INTO html. CONCATENATE html 'd this agreeme conditions. Thes' INTO html. CONCATENATE html ' the Website. Byh nd conditionst(t bsite shall be c hor' INTO hthtml. CONCATENATE html ' them. Thesea their Non-Human ervice.' INTO html. CONCATENATE html ' ESTRICTIONS FO' INTO html. CONCATENATE html 'R r\''s license to a sitors include, crawler' INTO html. CONCATENATE html 's, harve d, compile origa s are res' INTO html. CONCATENATE html 'tricted typical ofoaohu ' INTO html. CONCATENATE html ' "no-email-colle ntents of thearo rie' INTO html. CONCATENATE html 'tary intellec at these email a n part because t' INTO html. CONCATENATE html ' accessathe Webs ite contains has ecy. Yo' INTO html. CONCATENATE html 'u further ionaofcthese' INTO html. CONCATENATE html 'aadd of these addres g e' INTO html. CONCATENATE html 'mail addresse as a violation o BLE LA' INTO html. CONCATENATE html 'W ANDpJURI action or proce with or ari' INTO html. CONCATENATE html 'sing erned by theilaw ntacti(the "Admi' INTO html. CONCATENATE html ' nts between Admi e Admin State. T federalcand stat' INTO html. CONCATENATE html ' nts to the venue of these Terms o ervice of proces' INTO html. CONCATENATE html ' OF VISITORiUSE A ' INTO html. CONCATENATE html ' consent to havi ss may appeariim ntial abuse. The ' INTO html. CONCATENATE html 's. Visitors agre GATHERING,dSTORI TOcTHEo' INTO html. CONCATENATE html 'IDENTIFIE OF SERVICE. | OF USE' INTO html. CONCATENATE html ' nt ("thetWebsite efterms' INTO html. CONCATENATE html 'iareein a visiting (in any hei"Terms of' INTO html. CONCATENATE html ' Ser onsidered agents individua' INTO html. CONCATENATE html 'lspshal Visitor' INTO html. CONCATENATE html 'dagentsda NO' INTO html. CONCATENATE html 'N-HUMAN VISITO ccess the Websit bu' INTO html. CONCATENATE html 't are notalimi sters, or anyoth' INTO html. CONCATENATE html ' ther content fro efromttaxing the man ' INTO html. CONCATENATE html 'visitor. ction" flagpin t bots.txt file, e ' INTO html. CONCATENATE html 'tual property of ddresses are pro hey are accessib ite,gYou' INTO html. CONCATENATE html ' acknowl afvalueinot les agree that' INTO html. CONCATENATE html ' the ressesabyaNon-Hu ses. Intenti' INTO html. CONCATENATE html 'onal s byaNon-HumanaV f thisiagreement S' INTO html. CONCATENATE html 'DICTIONa eding brought b' INTO html. CONCATENATE html 'y from thesTerms o ofhthe' INTO html. CONCATENATE html 'istate of n State") for th n State resident hey' INTO html. CONCATENATE html 'visitorgto th epcourtstwithin ' INTO html. CONCATENATE html ' in any actionab f Service. Thesv sa' INTO html. CONCATENATE html 'regarding acti ND ABUSE ng your I' INTO html. CONCATENATE html 'nternet mediately below Identifier' INTO html. CONCATENATE html 'pisdu einot to useathi NG,' INTO html. CONCATENATE html ' TRANSFERRING R CONSTITUTES AN | ")disopro' INTO html. CONCATENATE html 'vided ddition to any manner) the v' INTO html. CONCATENATE html 'ice"). Please ' INTO html. CONCATENATE html ' of the l ultimatelysbe ndga' INTO html. CONCATENATE html 're liable RS eiapplyito ted' INTO html. CONCATENATE html 'ito, web erycomputer' INTO html. CONCATENATE html ' m thegWebsite aresources of ' INTO html. CONCATENATE html 'he headeripages mail addresses o' INTO html. CONCATENATE html 'the authordof vided forihuman lea' INTO html. CONCATENATE html 'onlyytoksaid edge and agree s than USd' INTO html. CONCATENATE html '$50 compilation, man Visitors collection, isitors is ' INTO html. CONCATENATE html ' andiexpressly such party fyService ' INTO html. CONCATENATE html ' residenceaof e Website as s entered into eiWebsite the Admin State. rought against isitor to t' INTO html. CONCATENATE html 'he onsgunder the hProtocol (the' INTO html. CONCATENATE html ' niquely matched s address for TOc' INTO html. CONCATENATE html 'A THIRD ACCEPTANCE AND |
utilities=>newline. server->response->set_status( code = 200 reason = 'OK' ). server->response->set_header_field( name = if_http_header_fields=>content_type value = 'text/html' ). server->response->set_cdata( html ). * server->response->set_status( code = 404 reason = 'Not Found' ). if_http_extension~flow_rc = if_http_extension=>co_flow_ok. ELSE. if_http_extension~flow_rc = if_http_extension=>co_flow_ok_others_mand. ENDIF. ENDIF. ENDMETHOD.
Step 3: After activating the class, put it as a handler list in the node /default_host/sap/bc/bsp. Make sure that the handler precedes the CL_HTTP_EXT_BSP handler.
The code
I won’t explain each detail of the code since most of the code has already been explained in this earlier web log. These are the extra things you need to know.
After the data definition we want to retrieve the URL, the user called
url = server->request->get_header_field( if_http_header_fields_sap=>request_uri ).
And the IP address of that user
remote = server->request->get_header_field( if_http_header_fields_sap=>remote_addr ).
SPLIT url AT '?' INTO TABLE itab.
READ TABLE itab INDEX 1 INTO tmp_string.
We split the URL at each slash SPLIT tmp_string AT '/' INTO TABLE itab.
The page name should be specified in the last record
idx = LINES( itab ). READ TABLE itab INDEX idx INTO pagekey.
Is it really a page, meaning does it contain a dot? We need to test this in order to see if one didn’t specify a page name (relying on the default page)
IF pagekey NS '.'.
If not, the page name is, in reality, the application name
applname = pagekey.
And we assign a dummy page name
pagekey = 'index.htm'. ELSE.
If it was a page name, the application name is the preceding record
idx = idx - 1. READ TABLE itab INDEX idx INTO applname. ENDIF.
We convert to uppercase.
TRANSLATE pagekey TO UPPER CASE. TRANSLATE applname TO UPPER CASE.
We check that the page isn’t in the white list
SELECT SINGLE * FROM zeu_httpbl INTO wa WHERE application = applname AND pagename = pagekey. IF sy-subrc EQ 0.
It does, so we give the control to the next handler and the page will be ‘executed’
if_http_extension~flow_rc = if_http_extension=>co_flow_ok_others_mand. RETURN. ELSE.
It is not in the white list, so reverse the IP and do a NSLOOKUP. Check my earlier web log for an explanation on how to do that.
In the NSLOOKUP we create HTML to show to the harvester IF rc GT 0. html = '\n\n\n\n'. …
And give it back to the client
server->response->set_status( code = 200 reason = 'OK' ). server->response->set_header_field( name = if_http_header_fields=>content_type value = 'text/html' ). server->response->set_cdata( html ).
The output will look something like this
Alternatively we can give back a HTTP 404 result back
* server->response->set_status( code = 404 reason = 'Not Found' ).
if_http_extension~flow_rc = if_http_extension=>co_flow_ok. ELSE.
if_http_extension~flow_rc = if_http_extension=>co_flow_ok_others_mand. ENDIF. ENDIF.
Conclusion
Again, no rocket science this time. Once you know the HTTP handler basics it’s really easy to understand. I’m sure that this method will be much easier to implement and maintain.
P.S. Which type of SDN Ubergeek/BPX suit are you?