Skip to Content

h2. Introduction

All the bloggers in SDN face a common problem of formatting

the Weblog content

before posting it into SDN. Formatting is mostly removing

the unnecessary tags and having only the allowed tags in SDN

Weblogs which is a

painful task. To avoid this painful task, I had come up with a small solution which I thought of sharing with you all fellow bloggers through this blog.

You can also find an equivalent ABAP program by Brain McKellar  ( in his weblog,

The 1-2-3 Steps To Producing a Weblog





import java.util.Vector;

import org.cyberneko.html.parsers.DOMParser;

import org.w3c.dom.NamedNodeMap;

import org.w3c.dom.Node;

import org.xml.sax.SAXException;


  • @author: Felix Jeyareuben, Cognizant Technology Solutions


public class FormatSDNWeblog {

     static Vector vTags = null;

     static Vector noEndTags = null;

     static boolean _flag = false;

     static FileOutputStream fos = null;

     static String outputFile;

     public static void main(String[] args) {

          try {

               String inputFile = “weblog.htm”;

               if (args.length >= 1)

                    inputFile = args[0];

               else {

                    System.out.println(“Usage: java FormatSDNWeblog <html-file>”);



               outputFile = “sdn_” + inputFile;


  • Ref: SDN: Weblogs and Formatting! By Craig Cmehil


               String validTags[] = { “p”, “b”, “i”, “em”, “strong”, “code”, “tt”,

                         “br”, “a”, “sub”, “sup”, “ul”, “ol”, “li”, “pre”, “img”,

                         “blockquote”, “small”, “div”, “hr”, “h2”, “h3”, “h4”, “h5”,

                         “table”, “tr”, “td”, “th”, “center”, “textarea”, “a” };

               // No End Tags

               noEndTags = new Vector();




               vTags = new Vector();

               for (int i = 0; i < validTags.length; i++)


DOMParser parser = new DOMParser();

fos = new FileOutputStream(new File(outputFile));


// A recursive function which does the stripping of unnecessary tags

SDNParser(parser.getDocument(), “”);



.println(“Filtered html successfully converted into SDN Weblog Content as “

+ outputFile + “!”);

} catch (FileNotFoundException e) {


} catch (SAXException e) {


} catch (IOException e) {




public static void SDNParser(Node node, String intend) throws IOException {

String _node = “”;

Node ch = null;

// To check if the current node is a TAG

if (node.getNodeType() == 1) {

_node = node.getNodeName();

// To remove unnecessary

if (_node.equalsIgnoreCase(“P”)) {

ch = node.getFirstChild();

if (ch.getNodeType() == 3

&& (int) ch.getNodeValue().charAt(0) == 160)



// To check if the current TAG is a valid one

if (vTags.contains(_node.toLowerCase())) {

_flag = true;

” + intend + “<” + _node).getBytes());

// Iterating through the attributes of the current node

NamedNodeMap a = node.getAttributes();

if (a != null) {

for (int i = 0; i < a.getLength(); i++) {

// Removing the ‘class’ attribute which might be found

// in the valid allowed TAGS

if (a.item(i).getNodeName().toLowerCase().startsWith(

“class”, 0))


// Removing the ‘style’ attribute which might be found

// in the valid allowed TAGS

if (a.item(i).getNodeName().toLowerCase().startsWith(

“style”, 0))


fos.write((” ” + a.item(i)).getBytes());




} else

_flag = false;

for (Node child = node.getFirstChild(); child != null; child = child


// Recursive call to it’s child node

SDNParser(child, intend + ”     “);

// Ending the tag

if (vTags.contains(_node.toLowerCase())

&& !noEndTags.contains(_node.toLowerCase()))

” + intend + “</” + _node + “>”).getBytes());

} else {

// Else part is of text and isn’t any TAG

// To check if it is the root document

if (node.getNodeType() != 9)

if (_flag)


for (Node child = node.getFirstChild(); child != null; child = child


SDNParser(child, intend + ”     “);




The above code contains all the valid tags allowed in SDN


Ref: SDN: Weblogs and Formatting! By Craig

Cmehil SDN: Weblogs and Formatting!

Step-by-Step Demo

After completing the blog, click Save As from the word document


Select Web Page, Filtered


Click Yes to save it in html


When we open the saved html in Notepad, there are many

unnecessary tags and attributes.


Execute the command(s)

javac -classpath


java -classpath

nekohtml.jar;xercesImpl.jar;xmlParserAPIs.jar;. FormatSDNWeblog


Make sure you have javac.exe & java.exe in path.


The generated output file containing only the allowed tags by SDN


To report this post you need to login first.


You must be Logged on to comment or reply to a post.

    1. Anonymous
      Hi Gregor,

      I certainly know about the Brain McKeller’s ABAP program. Any blogger in SDN must have read his blog, The 1-2-3 Steps To Producing a Weblog! But many people like me who doesn’t know ABAP and/or doesn’t have ‘developer access’ may not be able create the ABAP program.

      Most importantly, to execute an ABAP program, you need to have R/3 access through sapgui. But executing a java program you simply need a JRE on that OS! Hence, its far easy to execute a java program than an ABAP program for anyone. 😉

      But I got the idea of putting this blog from SDN:Weblogs and Formatting! by Craig Cmehil, because he had clearly mentioned the tags allowed. But previously, many tags are allowed! I don’t know if the ABAP program is coded to reflect the current changes in SDN Weblog (since I am not an ABAPer). Hence I decided to write a java program for it.

      Best regards,

      1. Gregor Wolf
        Hello Felix,

        don’t get me wrong. I think that it is great to have this Java program from you. But I think you should provide Links to the other Weblogs also.


        1. Anonymous
          Hi Gregor,
          I didn’t get you wrong 🙂
          I also updated my blog to have links of other weblogs.
          Best regards,

Leave a Reply