19-02-2019 / From 0 to pentesting hero

YAML

A popular opinion says to not use the pickle1 class on a data given by user because on deserialization it may lead to the object injection attack and malicious code execution.

pickle

But what about other formats? Are they also dangerous?

Today's episode of "from 0 to pentesting hero" I'm going to talk about yaml files.

So why do programmers decide to use this format?

Some of them would rise the argument about readability - this format forces data storage in individual lines.

Yaml file format

It also allows you to merge changes easier - for example, if we use a version control system - such as GIT.

It is easier to edit a code broken into single lines than a json file, especially when it is not formatted and visible to the user as one long string in one line.

Let's look at today's example.

import yaml
with open("test.yaml", "r") as s:
	y = yaml.load(s)
	print y['name']

We read the contents of the test.yaml file using the yaml.load function and then display the name.

Everything works properly.

But the yaml format is not as simple2 as it might seem.

Yaml format specification

In standard use case, it recognizes the type of data that is assigned to it.

Sometimes, however, we would like to have control over the type stored in the configuration.

That's why tags exist - that is, commands starting with an exclamation mark. Thanks to them we can choose the type explicitly.

Example tags

Just like here - the value of 0.5 is going to be treated as a string and not as a float.

In yaml we can also store binary files - saving them using base64 encoding and using the tag: !!binary.

Some parsers implement language-specific tags - and in Python, we can define tuples with the python tuple tag

So where is the vulnerability today?

In the standard parser used in python, we can also use the apply3 tag, which allows you to call any function from any module, and pass appropriate parameters to them.

So it is equivalent to the ability to execute arbitrary code on the server that we attack.

In our case, we will use the os.system function to list the contents of the current directory.

Directory content

Now, instead of the user's name, we can see the contents of the directory.

How to protect yourself against this vulnerability?

Replace load with safe_load.

safe_load

Then all dangerous tags will stop working.

Proper loading of yaml file

It is a pity that this is not the standard way this function works because I think that many people do not realize the danger.