Skip to Content
Author's profile photo Former Member

Use RegEx to cleanup XML Schemas exported from Access

The problem

SAP MDM uses XML Schemas for importing and exporting. Getting an XSD describing your repository can also be useful for XI. One of the simplest ways to get a schema from an existing repository is to export some records to an Access database (MDB), and use Export to XML in Access to get an XSD. The problem is that sometimes the resulting XSD has all kinds of weird strings in it instead of white spaces; something like this:


Instead of simply “Account Group A”. This can be quite annoying — you have to clean everything by yourself. I hate manual labor, so I found a simpler solution.

The solution – RegEx!

Regular Expressions (or RegEx) is voodoo to many people. I have to admit I’m hardly fluent at it myself, but I know enough to get by with what I need for this particular case. I used the Regular Expression HOWTO as guidance when I created these 3 simple RegEx snippets which can be used to cleanse the XSDs Acccess generates. These expressions are used to find instances of specific patterns in the file. You need to use some kind of search & replace mechanism which supports RegEx to actually clean the XSD. jEdit supports RegExs in its Find & Replace dialog. Another option is to use perl (see sample later on). The 3 patterns are:

  1. _x\w{1,4}_ :Find all occurances of those “_x0020_”, “_x0027_”, “_x0028_” etc etc. What the patterns says is: look for a “_x” followed by up to four alpha-numeric characters (“\w{1,4}”), followed by a “_”. The search resules of this pattern should be replaced by a space (” “)
  2. \s{2,} :Finds all occurances of two white-spaces. This is needed because I saw that if the original XSD had this kind of string: “_x0027__x0028_” it will be replaced by two spaces ”  “, when what you’re looking for is one space. The search results of this pattern should be replaced by a space (” “) too
  3. \s” :Finds all occurances of white-spaces before the trailing ‘”‘. This is useful to make something like this: “Account Group A ” into this: “Account Group A” (remove the trailing space.) The results of this search pattern should be replaced by a ‘”‘.

A string of Perls

Perl is considered to be one of the most powerful script languages for text manipulations. It’s also a great scripting language for anything else (though it’s syntax can be daunting at first). Perl has built-in support for Regular Expressions, so it’s a perfect candidate to create a script from those 3 little patterns I’m using here. If you’re using Linux or any kind of UNIX you most likely already have Perl, and if you don’t your package manager can install it quickly. If you’re a Windows user you can get download & install ActivePerl  from ActiveState for free. Perl is known for it’s ability to run “one-liners” — a single line of code which is executed from the command line. Put the following three one-liners into a shell script and give it the XSD file you got from Access as the parameter and it’ll cleanse the file for you:UNIX version (bash):


# Removing "_x..._"
perl -p -i.bak -e "s/_x\w{1,4}_/ /g" $1
# Removing extra spaces
perl -p -i.bak1 -e "s/\s{2,}/ /g" $1
# Removing spaces before "
perl -p -i.bak2 -e "s/\s\"/\"/g" $1

Windows version (put in a .bat file):

rem Removing "_x..._"
perl -p -i.bak -e "s/_x\w{1,4}_/ /g" %1
rem Removing extra spaces
perl -p -i.bak1 -e "s/\s{2,}/ /g" %1
rem Removing spaces before "
perl -p -i.bak2 -e "s/\s\"/\"/g" %1

Perl will create a backup file before every step, and they will be named like the original file name with a “.bak”, “.bak1” and “.bak2” extension.

Assigned Tags

      You must be Logged on to comment or reply to a post.
      Author's profile photo Former Member
      Former Member

      I hope you have seen XOM Lite which is available in SMP along with a How-To guide. This is a pretty good tool for generating XSD from your repository.



      Author's profile photo Former Member
      Former Member
      Hi Rajani --

      I know XOM. This is just another option of course. Anyway, in future versions of MDM both this method and XOM will not be needed to generate XSDs from repositories.


      Author's profile photo Former Member
      Former Member

      Does it mean that Console will have an option to generate XSD??



      Author's profile photo Former Member
      Former Member
      Actually yes, it looks like the console will be able to export a complete XML description of the repository. That is of course sufficient for generating XSDs... Or even Relax NG with the help of a simple xslt transformation.

      regards, jc