Reading YAML & JSON: My Wholesome Guide to Learning Python (Part 4)

Nepal Brothers
5 min readSep 28, 2021

Yaml is everywhere nowadays. Kubernetes, Docker, AWS Cloudformation, its everywhere. And, you can see why that is the case. Look at the yaml we have for Mystery Inc and you can clearly get a sense of what the yaml file is all about.

So, in this chapter, we are going to learn how to read the yaml file, but also learn few stuff along the way.

Lets see what are the things we could possibly need to do in python

  1. Read the file
  2. Load them into a concrete class like the one we built in Part 3.

Read the file

Suppose our file is named as company.yaml , then we would use python’s built in library to read the file.

What does this do? This basically opens the file, and streams the data instead of keeping everything in the memory.

How steaming differs from reading at once?

Consider the movie from Netflix that you are so keen about and you want to watch it on 4k. If you were to download it (reading all at once), it will probably take a long time to download, and you should have enough storage for that movie. Now, if we were to stream, we can just watch it bit by bit, and when we are going to watch next minute, we probably are okay to discard the earlier minute.

When should I stream vs read at once?

Based on your situation, you will probably want to do either. Given that python won’t know how large the file is, it steams the file. What you can do is gather all the streaming bits and create a file out of it, and I have done it so many times. In fact, it is so common that there are common libraries and functions that you can call to make it happen.

Lets read the whole file at once

This reads the whole file, but our next step is to actually create a whole class. This is where you would probably want to create a project if you don’t have yet and then work in there. The benefit of having a project will like you can have multiple files and get external libraries which can make parsing yaml easy. If you aren’t quite sure yet, be sure to check out a blog post about how to create a virtual environment and why we should create it.

Use PyYAML to read the YAML file

Step 1: Be in the virtual environment

PyYAML makes your life easy. It is a library that you can import into your virtual environment and then use it to create a concrete class based on the yaml file.

Step 2: Use pip to add PyYAML to the project

You can use this website and copy the command there.

The other option is to type in the command yourself.

pip install PyYAML

Step 3: Write the code

If you look at the above file, it is really simple.

We have defined the dataclass by annotating with @dataclass.

Annotating it with dataclass means that we are saying the class holds the data. Dataclass has other cool features too, but it makes sense that when it is about holding the data, dataclass is a way to go.

Step1: Read the whole file as a string

Reading a file as a string seems reasonable, considering the file is very small.

Step 2: Parse the string as dictionary which we will then load

We cannot create a dataclass from string directly. It needs to have some structure and yaml.safe_load(yaml_string) loads the yaml as a dictionary, which lays foundation to creating a dataclass.

Step 3: Change that dictionary to data class

There are few concepts here. If you look at line 53, we are passing the parsed_yaml and it has 2 STARS in front of it. So, this begs us the question

What does 2 stars mean?

It means that, when passing a dictionary, we are telling python to unpack the value present in the dictionary, and then assign them to the respective properties.

So, for the Employee dataclass, firstName of dictionary will be mapped to firstName of the class and it goes on and on.

What does __post_init() mean and why do we need it?

For the dataclass, post_init() basically means that once the dataclass is loaded, do any operations you like on it.

So, the two star basically meant that the dictionary would be mapped to the property value. So, for the company, it would mean that its properties will be mapped based on the dictionary.

Before post init:

company_name of Employee corresponds to company_name of dict
description of Employee corresponds to description of dict
employee of Employee corresponds to dictionary list of employee of dict

if you think about employee: List[Employee] , this basically is a dictionary in the first place. But, we basically want to change that to concrete class too, don’t we? So, post_init is the right place for us to change that.

This is all you need to create a data class from the yaml file. Can you think of how to read the JSON file instead? The concept is same, you will change the json file into dictionary and then load them using 2 stars into the dataclass.

So, for our code, our json file of the same yaml is:

And, you would then read this json file and change to dictionary.

We can actually reuse the same code once we change the json to dictionary.

They are so same. We just read the file and then convert to dictionary. Everything else is same :)

--

--