Today we are going to search for some vulnerabilitis in the code responsible for XML parsing.
Here is the example Java
code.
import java.io.File;
import javax.xml.XMLConstants;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import org.w3c.dom.Document;
public class Main {
public static void main(String[] args) {
try {
File xmlFile = new File("c:\\od0dopentestera\\3\\3.xml");
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document doc = dBuilder.parse(xmlFile);
System.out.println(doc.getDocumentElement().getTextContent());
} catch (Exception e) {
e.printStackTrace();
}
}
}
It retrieves the content of the file 3.xml
and then parses it using DocumentBuilder
1.
In the last line we get the document's root element and display it.
What does the input file have inside? It starts with the xml declaration and then the version attribute, which is required here.

If we do not provide a version - the file can not be processed and we will get an error message.
Next, the test
root element is defined with the sample value: demo
.
So if our code example works properly, we should see a demo
string on the screen.

The XML standard, however, is much more developed2.
One of the additional functionalities is the ability to create entities.
They are similar to macros, for example from C language.
In short, we can define a template with long content.

Instead of pasting the same content in each document - just use the entity.
Parser will automatically find all entities and replace them with this long string of data that we define.
Let's see an example.
Entities are created using the keyword DOCTYPE
and ENTITY
.
In my case, the newly created entity is called replace
. Its value is: very long tekst
.
Now in the root element I refer to the entity using an ampersand and a semicolon.
When we run the parser, we can see that our entity has been replaced with the predefined text.

So where is today's vulnerability?
The creators of XML
had an idea that it would be nice to have entities defined in external files.
Thanks to this, long strings of text will not disturb the readibility of xml files.
The keyword SYSTEM
lets us enable this functionality.
Next, we give the name of the file that contains the content of the entity.

And here lies the core of the problem. As you can expect, the attacker can provide here any path that he likes.
In my case, there is a secret file with secret
content. I create XXE entity, which will read the contents of this file.

Now when we look at the result - we can see the content of this secret file coming from the server on which the application is launched.
This attack is called XML eXternal Entity.

So how do you protect yourself from it?
One of the solutions may be launching a special mode: Secure processing
3, which is disabled by default.
dbFactory.setFeature(XMLConstants.FEATURE_SECURE_PROCESSING, true);
Now we'll get an error message when we try to attack.

And that's all I wanted to show in today's episode.
As you can see, parsing XML files can be complicated.
Perhaps this is why the json
format is getting more and more popular?