All the ways to write a string in YAML

Brett Weir Mar 13, 2023 6 min read

YAML gets a lot of flak for being too complex, and with good reason. It's loaded with features that:

  • you are probably not aware of,

  • solve problems that you probably don't have, and

  • are documented in ways that you can't understand.

Did you know that there are at least six ways to represent a string in YAML? If you do know that, it's probably not because you read the YAML spec. Have a look at this gem:

Scalar

The content of a scalar node is an opaque datum that can be presented as a series of zero or more Unicode characters.

3.2.1.1. Nodes

What the heck does that mean? It sure sounds an awful lot like a string, but I can only assume that if it was a string, they would have said "string". 🤔

The YAML spec eventually does tell you how to write a string, but only after wading through tons of jargon and similar-sounding but unrelated features.

I don't want you to suffer through that, so today, we'll explore all the ways to write a string in YAML, and hopefully stay sane in the process.

Test setup

In order to see the results of syntax we'll be trying, the easiest thing to do is convert to JSON. Compared to YAML, JSON is simple. It has only half a dozen types and the whole of its syntax can be expressed on a single, short page.

With that in mind, we'll write a short Python snippet to convert YAML to JSON. You'll need to install pyyaml first:

pip install --user pyyaml

Once we have pyyaml, we'll create a test.py file and add the following:

# test.py
import json
import yaml
data = yaml.safe_load(open("test.yaml"))
print(json.dumps(data, indent=2))

Then create a test.yaml file to copy examples into. You'll then be able to run tests yourself, like this:

python3 test.py

Implicit strings

You can add strings in YAML without any special notation. Just start writing:

title: A Walk to Forget

Oh, what convenience! But also, a trap! If you wanted to record someone's username, you might try the following, but turns out @ is a "reserved indicator", so you'd be sad:

user: @chickenman
...
yaml.scanner.ScannerError: while scanning for the next token
found character '@' that cannot start any token
  in "yammy.yaml", line 1, column 7

If you're an Ansible user, and want to add some templating, double sad:

name: {{ bacon }}
...
yaml.constructor.ConstructorError: while constructing a mapping
  in "yammy.yaml", line 1, column 7
found unhashable key
  in "yammy.yaml", line 1, column 8

Have a book title with a subtitle? Triple sad:

book: War: What is it for
...
yaml.scanner.ScannerError: mapping values are not allowed here
  in "yammy.yaml", line 1, column 10

Want to pass an environment variable to answer a question with yes? Well, this one you can, but, not in a way you expect:

variables:
  enabled: yes
{
  "variables": {
    "enabled": true
  }
}

To be fair, that last one isn't part of the spec, but seems to have found its way into some YAML parsers because of a comment in the spec pre-1.2:

For example, the boolean "true" might also be written as "yes".

YAML makes it pretty easy to do multi-line strings.

description: I like chicken
  chicken is good

The above syntax will eat newlines and your string will become one line, like this:

{
  "description": "I like chicken chicken is good"
}

Quoted strings (' and ")

The only way to be sure that YAML does what you think it does is to use explicit strings (sorry, scalars!), which defeats one of the niceties of using YAML in the first place.

You can use single- or double-quoted strings. It doesn't really matter which you choose. The only advantage of one over the other is whether your string contains quotes, as this can save you some escaping:

  • Most of the time, it's easiest to stick to one. I choose double quotes:

    description: "What is this feature for?"
    
  • String contains double quotes? Wrap the string in single quotes:

    description: '{"food": "bacon"}'
    
  • String is an indecipherable mess of quotation? Give up and use escaping:

    config: '/bin/bash -c ''echo "$VALUE"'''
    

Here's all of the examples from the previous section, parsed as strings:

title: "A Walk to Forget"

user: "@chickenman"

name: "{{ bacon }}"

book: "War: What is it for"

variables:
  enabled: "yes"
{
  "title": "A Walk to Forget",
  "user": "@chickenman",
  "name": "{{ bacon }}",
  "book": "War: What is it for",
  "variables": {
    "enabled": "yes"
  }
}

Single or double quotes can be used as a more explicit way to do multi-line strings:

description: "I like chicken \
  chicken is good"

Which will render into this, again, eating newlines in the process:

{
  "description": "I like chicken chicken is good"
}

Literal blocks (|)

Another, cleaner way to do multi-line strings is the "literal block scalar":

|
  I like chicken
  chicken is good
"I like chicken\nchicken is good\n"

Folded blocks (>)

This one is called the "folded block scalar", and it adds this bit of flare to your YAML markup:

>
  I like chicken
  chicken is good
"I like chicken chicken is good\n"

Look, ma, it ate my whitespace! No, but this is actually really great for CLI commands that don't fit on a single line. Here's an example I lifted from the Python pipeline:

>
  $PYTHON -m pytest
  --cov="${PYTHON_PACKAGE_DIR}"
  --cov-report html:public/coverage
  --cov-report xml:coverage.xml
  --cov-report term

The above will be unrolled to the following, with a very tasteful amount of whitespace:

"$PYTHON -m pytest --cov=\"${PYTHON_PACKAGE_DIR}\" --cov-report html:public/coverage --cov-report xml:coverage.xml --cov-report term\n"

And just look at how you can use quotes inside your string without having to worry about escaping at all! Okay, YAML, you're really helping out right now!

Block chomping (|- and >-)

This is a syntax that I use all the time that I don't even know the name of or how I found it initially. And I don't know how to explain it either, so let's find out what our YAML parser does:

|-
  I like chicken
  chicken is good
"I like chicken\nchicken is good"

OMG, it just eats the trailing newline? Of course, that must be why I habitually use that form. Ditto for >-, compared to >:

>-
  I like chicken
  chicken is good
"I like chicken chicken is good"

Turns out this behavior is called block chomping in the spec, and options other than - are supported! What!

Conclusion

Did YAML need 6+ ways to write a string? Probably not. But it has them, and now that you know that they exist, you won't be caught off-guard when you encounter YAML string syntax in the wild.


Tags

#python #yaml