Sep 01 2008

Predictable Binary Representation (PBR)

Published by Kefah Issa at 1:30 pm under General

XML is nice, YAML and JSON are cool, proprietary formats for binary encapsulation are not good; those are all facts that many people agree to.

But they all the good solutions remain sub-optimal when it comes to network bandwidth overhead and cpu usage needed to parse the blob every time.

This might have, and should have been considered long time ago. I didn’t find any open format to cover it however, hence I’m suggesting one…

Features

The following needs are not addressed by those data representation formats, but are by PBR :

1. Supporting Binary content. You need to revert to Base64 or the like to do this. but its not natively supported.
2. Completly Predictable (so no guess work in parsing). I don’t want to waste cpu cycles and logic to parse the content in an unpredicable manner. I want to be able to exactly tell what the next step a head and how much data I need to manage.
3. No escaping. I really don’t like escaping. I want to be able to put my content AS IS ™.
4. Reliable : Checksum. Thats a cool addition that will make me confident of my data.

Here I outline the details of PBR (pronoucde  Pe.be.r) along with reference implementation (TBD) :

The data is represented in a message.

Each message is made of envelope header and body.

The Data elements (Entries) are organized in Entry-Name/Entry-Value pairs.

The body is simple a list of Entries (aka root node).

Supported Data Types

Just like JSON PBR does support the basic and most used data types that cover most of today’s modern needs

  1. Boolean
  2. String (UTF-8)
  3. Binary
  4. Integer
  5. Double
  6. Date
  7. Map (List is just a map with unnamed entries)

- Depth or number of entries is unlimited.
- Content-size limit it 2^31 (2GB).

PBR is not human readable, but using a binary editor or in the future a simple PBR reader would allow to easily read and manually manipulate the data.

Envelope Header

  • PBR Version (2 Bytes, signed short)
  • MD5 Checksum of the body (16 Bytes)
  • Total Message Size exclusive of the envelope header  in Bytes(4 Bytes, signed long)
  • Number of Level-One Entries (4 Bytes, signed long)

Total Envelop Header size is 26 bytes

The Message Body is a list of data entries. The number of those data entries (1st-level) is as defined in the header above.

Data Entry

  • Entry Type (1 Byte, signed byte) : The type id of the entry, 1..7 as per the types list above.
  • Entry Name Size (1 Byte, signed byte) : The Size of the Name of the entry.  can be set to 0.
  • Entry Name Value (0..127 Bytes) : The entry name in UTF-8.
  • Number of Children (4 Bytes, signed long)
  • Value Size : (4 Bytes, signed long) : The size in bytes of the value of this entry.
  • Value : ( 1 .. 2^31)

Any entry can have children, just like XML node.

However it can only either have children or a value (i.e. Number of Children ==0).

If Number of Children > 0, then the value contains an aggregation of a list of other entries that need to be read.

2 Responses to “Predictable Binary Representation (PBR)”

  1. Phirason 01 Sep 2008 at 2:01 pm

    Nice, I Like it.

  2. noor - نور » A ride with hessianon 30 Nov 2008 at 11:54 pm

    [...] time ago I blogged about the same idea Predictable Binary Representation (PBR) ), naively thinking that no one has done this [...]

Trackback URI | Comments RSS

Leave a Reply