Protocol Buffers, TCP, and Golang

A Recipe for "Fun"

Sean Kelly

@StabbyCutyou

Who am I?

I work for Tapjoy

I am a core mainter of...

Chore : Tapjoys Job System in Ruby
http://github.com/Tapjoy/chore

Dynamiq : Tapjoys Queue / Topic System ontop of Riak in Golang
http://github.com/Tapjoy/dynamiq

Buffstreams : Streaming Protocol Buffers over TCP made easy in Golang
http://github.com/StabbyCutyou/buffstreams

And a bunch of other libraries of dubious value!

What am I here to blather on about?

Literally the BEST stuff ever:

Binary packed data formats!

Network communication protocols!

Streaming bytes!

Edge cases!

Why I care about this stuff

I had an idea for a project

It involved using TCP to stream data between servers

Nothing in Go handled my needs "out of the box"

But there were enough pieces to build my own solution

... With no shortage of trial and error

Binary Packed Data Formats

Protocol Buffers*

AVRO

Thrift

BSON

Message Pack

Probably some others

Protocol Buffers

Protobuffs for short

Uses an Interface Definition Language (IDL) to describe data

Very efficient to serialize / deserialize

Data is represented as a very tightly packed sequence of raw bytes

Protobuffs IDL

Most languages have libraries that generate classes based on the IDL

Supports a range of types:

  • String
  • Bool
  • Bytes
  • Float
  • Double
  • 10 Different Ints
  • Enumerations
  • Custom Types

Sample Protobuffs IDL


                package message;

                message Note    {
                  required string name = 1;
                  required int64 date = 2;
                  required string comment = 3;
                }
              

Sample Protobuffs Struct


package message

import proto "github.com/golang/protobuf/proto"
import math "math"

// Reference imports to suppress errors if they are not otherwise used.
var _ = proto.Marshal
var _ = math.Inf

type Note struct {
	Name             *string `protobuf:"bytes,1,req,name=name" json:"name,omitempty"`
	Date             *int64  `protobuf:"varint,2,req,name=date" json:"date,omitempty"`
	Comment          *string `protobuf:"bytes,3,req,name=comment" json:"comment,omitempty"`
	XXX_unrecognized []byte  `json:"-"`
}

func (m *Note) Reset()         { *m = Note{} }
func (m *Note) String() string { return proto.CompactTextString(m) }
func (*Note) ProtoMessage()    {}

// Various Getters go here

func init() {
}

              

Why use Protocol Buffers?

(or something like it)

JSON is too slow or too large to use at scale

You want some notion of type-enforcement

Want to maintain a backwards-compatible serialization schema

A service or technology you want to use also uses it

Protobuffs Gotchas

Can never remove fields, only deprecate old ones and add new ones

In Golang, the protobuffs code gen DOES NOT PASS LINT

The serialized message lacks a beginning / end demarcation*

TCP

Stands for Transmission Control Protocol

Over-achieving brother to the User Datagram Protocol (UDP)

Relies on handshakes, repeated acknowledgement of packets

Trades performance for reliable, ordered delivery

TCP Connections

Opening connections is a (relatively) slow, expensive operation

Maintaining open, idle connections is almost free in comparison

"Closed" connections remain for short periods, awaiting any final packets

Have a lifecycle of different states

TCP "Simple"

Open a connection

Write data

Close the connection

TCP Streaming

Open a connection, cache it for re-use

Write data in a nearly-constant stream

Close the connection when you're service exits

TCP in Golang

Super easy to get going with, thanks to the net package

TCPListener: Open a port, hand incoming connections to go-routines

TCPConn: Read incoming bytes

TCPConn: Write outgoing bytes

Even easier to switch between various protocols thanks to net.Conn

Sample TCP Server Code


// The for-loop in here will block the thread, so you want to run
// this in something go-routined, or have it use one internally
func startListening() error {
  socket, err := net.Listen("tcp", ":5432"))
  if err != nil {
    return err
  }
  for {
    // Block and wait for connections
    conn, err := socket.Accept()
    if err != nil {
      // Connection failed before it even began - ignore and move on
      // It should be up to the client to re-try a failed connection, not the server
    } else {
      // Here is where you'd handle listening on this connection
      // Because you want to get back to accepting new connections ASAP,
      // you'd run your handling code in it's own go-routine as well
      go handleConn(conn)
    }
  }
}
              

TCP / Golang Gotchas

net package has a custom Error - which includes a "Temporary()" helper

Connections - What to do when the client / server abruptly closes the connection?

... And handling each place where it could happen

TCP Behavior - You can tune how the OS handles tcp, but there are tradeoffs

EOF

How will EOFs manifest?

Will you get a read of 0 bytes + EOF?

Will you get a read of X final bytes + EOF?

The answer...

... Both! Both cases are normal in Golang!

Backpressue

How does TCP tell you to calm down?

Suddenly, your writes only partially complete

TCPConn returns # of bytes written for a reason

Solution: Check bytes written, continue to write until finished

"It's all fun and games until someone saturates the NIC"

Streaming and Protobuffs Gotchas

Remember how protobuff messages lack demarcation?

Solution 1: Use a custom delimeter to demarcate where things begin / end

... But you have to ensure no one puts that delimeter into a message body

Solution 2: Prepend a message header of X bytes to describe the playload

... But you are now limited to messages big enough to be described in X bytes

Those are your tradeoffs - pick one and understand the implications

Wrapping Up

Protocol Buffers are a fast, efficient way to serialize data

... So long as you're ok with an IDL, pre-compile steps, and un-lintable code

TCP in Golang is really easy to get going with

... But you need to know your edge cases

Using the two together is a great way to communicate between apps

... But it requires a good amount of boilerplate and forethought to work smoothly

BuffStreams - Streaming TCP abstraction, designed for Protobuffs

I kept re-writing the same code

But I wasn't happy with the design

Nor having to copy and paste changes between projects

So, I created BuffStreams

... I almost called it BeamDog

Design

There is nothing magical in BuffStreams

It is not a "better" TCP socket

It handles the boilerplate and edgecases

It does not internally rely on Protocol Buffers

How does it handle Streaming?

BuffStreams uses "Solution 2" : Prepended message sizes

Configure per-listener and per-writer max message size

It allocates the # of bytes needed to represent max message size

It automatically adds the size to outgoing messages

It strips the size from the message prior to invoking the Callback

How it works in an application

You define a callback for handling incoming messages

You start a TCPListener

You pass in that callback

That callback is given a whole set of message bytes, minus the header, for you to deserialize as you see fit

Optimizing performance

Avoid all interfaces

Avoid any locking internally

Avoid unnecessary allocations

Avoid unnecessary state management

Avoid channels

Buffers, Slices, and avoiding allocations

Allocate a buffer for incoming messages once per connection

Re-use that buffer for each read

Use the number of bytes read to get a slice with the data

Next message starts from 0, overwrites the old buffer data

Never use the whole buffer, only [0:bytesRead]

Sample Code


headerBuffer := make([]byte, headerByteSize)
for {
  var headerReadError error
  var totalHeaderBytesRead = 0
  var bytesRead = 0
  // First, read the number of bytes required to determine the message length
  for totalHeaderBytesRead < headerByteSize && headerReadError == nil {
    // While we haven't read enough yet
    // pass in the slice that represents where we are in the buffer overall
    bytesRead, headerReadError = conn.Read(headerBuffer[totalHeaderBytesRead:])
    // Handle any errors that might happen - usually EOFs, which you can ignore until
    // after you account for the bytes read.
    ...
    // Account for the bytes we read
    totalHeaderBytesRead += bytesRead
  }
  // Handle the above error if possible
  ...
  // You've got the header bytes, use them to get the full message body, like we did the header
  ...
  // End of loop, read and processed a full message, get the next one from the connection
}
              

BuffStreams Gotchas

The most important "gotcha": It's fairly untested in production!

Because you must specify a max message size, it's not good for messages of un-knowable size limitations

It doesn't handle serialization by design, this is up to you

Currently, it drops failed message callbacks on the floor

... So your callback needs to handle error cases for you

... You can choose to log errors as well

It's up to you to manage the TCPListeners and Writers you use

... Although there is a simple Manager that will make this easier, with a performance hit

In Conclusion

Golang makes writing networking code fairly simple

Protobuffs are good if want an efficient format for serialization

Slices are great, get creative with them

Buffstreams aims to do all of the "boring" parts of streaming protobuffs over TCP

Thanks!

Questions?

Protocol Buffers, TCP, and Golang

A Recipe for "Fun"

Sean Kelly

@StabbyCutyou