A Recipe for "Fun"
Sean Kelly
@StabbyCutyou
I work for Tapjoy
I am a core mainter of...
Chore : Tapjoys Job System in Ruby
http://github.com/Tapjoy/chore
Dynamiq : Tapjoys Queue / Topic System ontop of Riak in Golang
http://github.com/Tapjoy/dynamiq
Buffstreams : Streaming Protocol Buffers over TCP made easy in Golang
http://github.com/StabbyCutyou/buffstreams
And a bunch of other libraries of dubious value!
Binary packed data formats!
Network communication protocols!
Streaming bytes!
Edge cases!
I had an idea for a project
It involved using TCP to stream data between servers
Nothing in Go handled my needs "out of the box"
But there were enough pieces to build my own solution
... With no shortage of trial and error
Protocol Buffers*
AVRO
Thrift
BSON
Message Pack
Probably some others
Protobuffs for short
Uses an Interface Definition Language (IDL) to describe data
Very efficient to serialize / deserialize
Data is represented as a very tightly packed sequence of raw bytes
Most languages have libraries that generate classes based on the IDL
Supports a range of types:
package message;
message Note {
required string name = 1;
required int64 date = 2;
required string comment = 3;
}
package message
import proto "github.com/golang/protobuf/proto"
import math "math"
// Reference imports to suppress errors if they are not otherwise used.
var _ = proto.Marshal
var _ = math.Inf
type Note struct {
Name *string `protobuf:"bytes,1,req,name=name" json:"name,omitempty"`
Date *int64 `protobuf:"varint,2,req,name=date" json:"date,omitempty"`
Comment *string `protobuf:"bytes,3,req,name=comment" json:"comment,omitempty"`
XXX_unrecognized []byte `json:"-"`
}
func (m *Note) Reset() { *m = Note{} }
func (m *Note) String() string { return proto.CompactTextString(m) }
func (*Note) ProtoMessage() {}
// Various Getters go here
func init() {
}
JSON is too slow or too large to use at scale
You want some notion of type-enforcement
Want to maintain a backwards-compatible serialization schema
A service or technology you want to use also uses it
Can never remove fields, only deprecate old ones and add new ones
In Golang, the protobuffs code gen DOES NOT PASS LINT
The serialized message lacks a beginning / end demarcation*
Stands for Transmission Control Protocol
Over-achieving brother to the User Datagram Protocol (UDP)
Relies on handshakes, repeated acknowledgement of packets
Trades performance for reliable, ordered delivery
Opening connections is a (relatively) slow, expensive operation
Maintaining open, idle connections is almost free in comparison
"Closed" connections remain for short periods, awaiting any final packets
Have a lifecycle of different states
Open a connection
Write data
Close the connection
Open a connection, cache it for re-use
Write data in a nearly-constant stream
Close the connection when you're service exits
Super easy to get going with, thanks to the net package
TCPListener: Open a port, hand incoming connections to go-routines
TCPConn: Read incoming bytes
TCPConn: Write outgoing bytes
Even easier to switch between various protocols thanks to net.Conn
// The for-loop in here will block the thread, so you want to run
// this in something go-routined, or have it use one internally
func startListening() error {
socket, err := net.Listen("tcp", ":5432"))
if err != nil {
return err
}
for {
// Block and wait for connections
conn, err := socket.Accept()
if err != nil {
// Connection failed before it even began - ignore and move on
// It should be up to the client to re-try a failed connection, not the server
} else {
// Here is where you'd handle listening on this connection
// Because you want to get back to accepting new connections ASAP,
// you'd run your handling code in it's own go-routine as well
go handleConn(conn)
}
}
}
net package has a custom Error - which includes a "Temporary()" helper
Connections - What to do when the client / server abruptly closes the connection?
... And handling each place where it could happen
TCP Behavior - You can tune how the OS handles tcp, but there are tradeoffs
How will EOFs manifest?
Will you get a read of 0 bytes + EOF?
Will you get a read of X final bytes + EOF?
The answer...
... Both! Both cases are normal in Golang!
How does TCP tell you to calm down?
Suddenly, your writes only partially complete
TCPConn returns # of bytes written for a reason
Solution: Check bytes written, continue to write until finished
"It's all fun and games until someone saturates the NIC"
Remember how protobuff messages lack demarcation?
Solution 1: Use a custom delimeter to demarcate where things begin / end
... But you have to ensure no one puts that delimeter into a message body
Solution 2: Prepend a message header of X bytes to describe the playload
... But you are now limited to messages big enough to be described in X bytes
Those are your tradeoffs - pick one and understand the implications
Protocol Buffers are a fast, efficient way to serialize data
... So long as you're ok with an IDL, pre-compile steps, and un-lintable code
TCP in Golang is really easy to get going with
... But you need to know your edge cases
Using the two together is a great way to communicate between apps
... But it requires a good amount of boilerplate and forethought to work smoothly
I kept re-writing the same code
But I wasn't happy with the design
Nor having to copy and paste changes between projects
So, I created BuffStreams
... I almost called it BeamDog
There is nothing magical in BuffStreams
It is not a "better" TCP socket
It handles the boilerplate and edgecases
It does not internally rely on Protocol Buffers
BuffStreams uses "Solution 2" : Prepended message sizes
Configure per-listener and per-writer max message size
It allocates the # of bytes needed to represent max message size
It automatically adds the size to outgoing messages
It strips the size from the message prior to invoking the Callback
You define a callback for handling incoming messages
You start a TCPListener
You pass in that callback
That callback is given a whole set of message bytes, minus the header, for you to deserialize as you see fit
Avoid all interfaces
Avoid any locking internally
Avoid unnecessary allocations
Avoid unnecessary state management
Avoid channels
Allocate a buffer for incoming messages once per connection
Re-use that buffer for each read
Use the number of bytes read to get a slice with the data
Next message starts from 0, overwrites the old buffer data
Never use the whole buffer, only [0:bytesRead]
headerBuffer := make([]byte, headerByteSize)
for {
var headerReadError error
var totalHeaderBytesRead = 0
var bytesRead = 0
// First, read the number of bytes required to determine the message length
for totalHeaderBytesRead < headerByteSize && headerReadError == nil {
// While we haven't read enough yet
// pass in the slice that represents where we are in the buffer overall
bytesRead, headerReadError = conn.Read(headerBuffer[totalHeaderBytesRead:])
// Handle any errors that might happen - usually EOFs, which you can ignore until
// after you account for the bytes read.
...
// Account for the bytes we read
totalHeaderBytesRead += bytesRead
}
// Handle the above error if possible
...
// You've got the header bytes, use them to get the full message body, like we did the header
...
// End of loop, read and processed a full message, get the next one from the connection
}
The most important "gotcha": It's fairly untested in production!
Because you must specify a max message size, it's not good for messages of un-knowable size limitations
It doesn't handle serialization by design, this is up to you
Currently, it drops failed message callbacks on the floor
... So your callback needs to handle error cases for you
... You can choose to log errors as well
It's up to you to manage the TCPListeners and Writers you use
... Although there is a simple Manager that will make this easier, with a performance hit
Golang makes writing networking code fairly simple
Protobuffs are good if want an efficient format for serialization
Slices are great, get creative with them
Buffstreams aims to do all of the "boring" parts of streaming protobuffs over TCP
Protocol Buffers, TCP, and Golang
A Recipe for "Fun"
Sean Kelly
@StabbyCutyou