User Experience Insights
Big XML file flattening with Excel Power Query for SAF-T and other requirements
Starting with 2022, SAF-T reporting to Romanian fiscal authorities is mandatory, more information here. Reporting has to be generated at document level in XML files and that creates a problem of working with files with sizes of hundred of megabytes.
As you know XML files are very inflated because it store the structure and format for each value. Retrieving applications such as Notepad ++ or XML Notepad for XML format (nodes) are designed to upload the entire file in memory. In case of files over 100 MB you get a freeze or out of memory crash. XML format gives you all details, but without calculations as in tables. You have to move to flattening to solve the problem of size and get table format for calculations. However with flattening applications I experienced same problem – out of memory crash. I have tried to go to the root – the programming language and tested some Python libraries, same problem – out of memory crash. I think worse trying these solutions: one with Python from Jeff Heaton, Processing Large XML Wikipedia Dumps that won’t fit in RAM in Python, jupyter notebook code here and another one streaming solution from Microsoft here.
Since a lot of users work with Microsoft Office I explored the capabilities of Excel Power Query and I can confirm that it handles transformation of files up to 100 MB, a beginner tutorial here.
Flattening a file with a size above 100 MB becomes slower or is freezing in dialog steps Connecting, Navigator and Table Expansion (flattening). One solution is to split XML files into chunks of 100 MB and then merge results. This means more work, however solves the problem.
I found an workaround to flatten XML files up to 400 MB that outputs about of 400 thousand rows using Advanced Editor script Power Query M formula language.
This is what I want to share with you.
- Generate a small.XML file with small amount of data, but complete structure.
- Menu: Data/Get data/From File/From XML.
- Create transformation rules by expanding tables.
- Copy the transformation code from Advanced editor script and save it into plain text Notepad.
- You are ready to flatten the big.XML.
- Next time repeat .
- Open Advanced and paste the code from  and replace small.XML with big.XML.
- Press Close & Load.
- Running background query starts.
- Hundreds of thousands of rows are filled with flattened data.
Data / Get data / From File / From XML
Navigator / Transformation / Power Query Editor
Power Query Editor / Tables expanding
Advanced editor script / Copy small.XML to Notepad and save it for next use
Advanced editor script / Replace small.XML with big.XML
What is the biggest XML file you succeed to flatten with Excel Power Query?
What are the other tools and tips you used to flatten XML with MS Office?