Sep 01 2008
Predictable Binary Representation (PBR)
XML is nice, YAML and JSON are cool, proprietary formats for binary encapsulation are not good; those are all facts that many people agree to.
But they all the good solutions remain sub-optimal when it comes to network bandwidth overhead and cpu usage needed to parse the blob every time.
This might have, and should have been considered long time ago. I didn’t find any open format to cover it however, hence I’m suggesting one…
Features
The following needs are not addressed by those data representation formats, but are by PBR :
1. Supporting Binary content. You need to revert to Base64 or the like to do this. but its not natively supported.
2. Completly Predictable (so no guess work in parsing). I don’t want to waste cpu cycles and logic to parse the content in an unpredicable manner. I want to be able to exactly tell what the next step a head and how much data I need to manage.
3. No escaping. I really don’t like escaping. I want to be able to put my content AS IS ™.
4. Reliable : Checksum. Thats a cool addition that will make me confident of my data.
Here I outline the details of PBR (pronoucde Pe.be.r) along with reference implementation (TBD) :
The data is represented in a message.
Each message is made of envelope header and body.
The Data elements (Entries) are organized in Entry-Name/Entry-Value pairs.
The body is simple a list of Entries (aka root node).
Supported Data Types
Just like JSON PBR does support the basic and most used data types that cover most of today’s modern needs
- Boolean
- String (UTF-8)
- Binary
- Integer
- Double
- Date
- Map (List is just a map with unnamed entries)
- Depth or number of entries is unlimited.
- Content-size limit it 2^31 (2GB).
PBR is not human readable, but using a binary editor or in the future a simple PBR reader would allow to easily read and manually manipulate the data.
Envelope Header
- PBR Version (2 Bytes, signed short)
- MD5 Checksum of the body (16 Bytes)
- Total Message Size exclusive of the envelope header in Bytes(4 Bytes, signed long)
- Number of Level-One Entries (4 Bytes, signed long)
Total Envelop Header size is 26 bytes
The Message Body is a list of data entries. The number of those data entries (1st-level) is as defined in the header above.
Data Entry
- Entry Type (1 Byte, signed byte) : The type id of the entry, 1..7 as per the types list above.
- Entry Name Size (1 Byte, signed byte) : The Size of the Name of the entry. can be set to 0.
- Entry Name Value (0..127 Bytes) : The entry name in UTF-8.
- Number of Children (4 Bytes, signed long)
- Value Size : (4 Bytes, signed long) : The size in bytes of the value of this entry.
- Value : ( 1 .. 2^31)
Any entry can have children, just like XML node.
However it can only either have children or a value (i.e. Number of Children ==0).
If Number of Children > 0, then the value contains an aggregation of a list of other entries that need to be read.
Nice, I Like it.
[...] time ago I blogged about the same idea Predictable Binary Representation (PBR) ), naively thinking that no one has done this [...]