CurlyML


Introduction

CurlyML is a simple, concise format for storing and transmitting structured data and documents.

CurlyML allows you describe structured data in a way that is efficiently, easily readable and writeable to both human beings and machines, with no extraneous fuss or redundant markup.

Essentially, CurlyML is designed to get out of your way so you can get on with describing your data. This is in direct contrast with XML, which is overkill for the vast majority of tasks.

Example 1: Simple car

# this line is a comment
# so is this one

car {
	make { Toyota }
	model { Corolla }	# this over here is a comment too
	doors { 4 }
	information {
		url { http://www.toyota.com/products/corolla#info } 
		# the '#' in the url above isn't the start of a comment
		# because it is inside a "word".
	}
	stereo {
		custom { true }
		brand { Aiwa }
		model { GX7-40 }
		comment { "I liked it" }
	}
}

As you can see, most of what is visible pertains to the data or its structure. You would have a hard time denying that this is simple, clear, concise and very readable. You've probably deduced most of CurlyML's rules just from looking at this, but just in case: This is a heirarchy of data, best thought of as a tree. Curly braces, as used in numerous popular programming languages, define the scope of nodes in the tree. Whitespace characters (spaces, tabs, newline etc) are for the most part ignored: you can write it any way you want.

The root node in this case is "car". The root node contains other nodes, and which may contain other nodes and so on. Every node will generally have some text in it, this could be thought of as its name for nodes that have children, or as just plain text data for nodes that don't. We make a point of not caring about the difference in the interests of simplicity.

If you are familiar with XML you will no doubt noticed that CurlyML nodes don't have attributes. We think that the attribute vs. child-node argument amounts to semantic nitpickery when it comes to the real world, and XML attributes decrease readability. Forget about 'em. Anything you can put in an attribute you can put in a child node, in CurlyML, with no cost.

The rules

  1. Nodes with children (parent nodes) are only allowed to have alphanumeric names, although in the interests of readability, underscores and hyphens are also allowed in parent node names.
  2. Multiple top level nodes are allowed in a document. The document itself is considered the real root node.
  3. Leaf nodes (i.e. those with no children or an empty child list "{}") can contain any text.
  4. Child node text can be contained in double quotes in order to display text with space characters preserved, or in order to display text which contains curly braces (or quotes).
  5. Quoted text in child nodes can use C/Java-style escape characters: \n (newline) \r (carriage return) \t (tab) \" (quote itself) \uXXXX (unicode character).
  6. Any unquoted text starting with the comment sequence (#) is considered to be a comment, until the end of the line (i.e. \n or \r or EOF is encountered). If the # forms part of an existing word being parsed, however, it will not be treated as a comment. The reasoning for this is that this character will often form part of HTML links, and it's more intutive to have a comment beginning after a space or newline than to potentially cause unexpected breakage due to using # in otherwise valid strings. The parser doesn't care if you have spaces in a comment, but you do need to have a space or newline before a comment.
  7. A word followed by an opening curly declares the beginning of a parent node named by that word.
  8. Opening and closing curly counts must match (curlies inside quoted text not included).
  9. Default character set is UTF-8.

Example 2: node type smorgasbord

#example of flexible text nodes

journal-entry {
	date { 2004-12-15 }
	Today I went to the library, but I couldn't find any books on chickens.
	Oh well.
	mood { hungry }
	references { }		# an empty parent node
}

How would the parser parse this? The intutive way. It would construct a root node named "/" by default, add a child node "journal-entry" node to that. Inside that, things get interesting. The first thing we add to the journal-entry node is a child node called "date", with a leaf node inside it for the date 2004-12-15.

Next, we start scooping up text for our next node. We gobble up all the words until we pass "mood" and hit an opening curly. The parser knows that it has not only completed collecting all that text this node, but because it's hit an opening curly, it's also collected the name of a subsequent node which it knows to be a parent.

It then parses the mood subtree in much the same way as the date one.

Then it runs into another node, references. It knows this is a parent node, because it's followed by an opening curly. But this parent node has no children. This is how we distinguish empty parent nodes from leaf nodes. A parent node has a list of children which may be empty. A leaf node doesn't have a list for children.

If we were to draw boxes to show the resulting node heirarchy, it might look like this:

PARENT NODE
TEXT: /
CHILDREN:
PARENT NODE
TEXT: journal-entry
CHILDREN:
PARENT NODE
TEXT: date
CHILDREN:
LEAF NODE
TEXT: 2004-12-15
LEAF NODE
TEXT: Today I went to the library, but I couldn't find any books on chickens. Oh well.
PARENT NODE
TEXT: mood
CHILDREN:
LEAF NODE
TEXT: hungry
PARENT NODE
TEXT: references
CHILDREN: none

Our reference implementation illustrates everything that we care about when it comes to a node: what its name/text is, and who its children are. We don't care about anything else. Really.

A CurlyML parser could be written to load a document into memory. In this case the parser should return a a single root Node with name that user can never use: "/". All the top-level nodes in the document should be children of that node. A document could contain a bunch of top level nodes. We see no reason to restrict a document to only having a single top level node (i.e. root). The document itself is really the root, so let's not make the user mess around defining the root.

Frequently Asked Questions


SourceForge.net Logo