A Long Commit

November 17, 2020

I recently made a commit that I’m a bit proud of. At work, most of my commit messages (or “check-ins”—thanks, TFS) are usually short and abbreviated. But every so often I work on something more involved where I go through several revisions before I even make a commit (sometimes because it doesn’t even become compilable or working just yet). If it is particularly exciting to have reached that conclusion, then I’ll be inclined to note down not only my thoughts about the end result, but also to describe my own personal journey to that conclusion.

Here is said commit: https://github.com/Jman012/jlsftp/commit/b8895d6d4d6272f4f66bc508b45c989ebb6df4bb

The following is the contents of the commit. I could have used more code examples or linked to more sources, but it was still fun to write and I thought it would make for a nice short blog post.

First working draft of the NIO SftpPacketDecoder

This is the first version that is working based on a single unit test. It went through several revisions already.

The tricky part of all of this is dealing with the ByteBuffer. It’s very useful, but can be used in various ways, especially in regard to either reading or geting its data, and working with slices.

The serializer works by reading off of the buffer, but if that fails then we don’t want those bytes consumed already for the next invocation of the decoder, so we want the buffer reset. This is in contrast to a stateful decoder that might consume the bytes and store them across decoder invocations. While this might simplify the decoder, the deserializer might get really messy and I wouldn’t want to deal with that. It would be a lot more annoying to write the many different serializers, compared to the single decoder.

So, the decoder needs to start fresh on the next invocation if the deserializer needs more data. This could be done by resetting the reader index if we ever need more data, or leaving it as read when continuing.

The complication arises when we might have a lot of data in the buffer, and deserialize SSH_FXP_INIT. While this shouldn’t happen because the client shouldn’t send two packets at a time, it’s something that should be handled carefully. The init packet doesn’t contain a count of the extension data before iterating over them. It relies on the packet length to determine how many extensions there are. So if we passed in the entire decoding buffer, it might read into the next packet by accident.

Thus, we need to slice the buffer based on the packet length. Reading on a buffer slice doesn’t read on the original buffer, so we need to move the reader index on the original buffer if the slice reading was successful. That is the result of this commit. We slice twice, once to read the packet length/type, and again for the payload data for the deserializer, each time synthetically reading the original buffer as needed on success. This also helps because we can split up the decoder into more functions, instead of one very large decode(context:buffer:) method.

Part of this decoder implementation did away with the packet length and type deserialization in the original deserializer. I think I could maybe move that back out of the decoder and into the serializer object, to simplify the decoder somewhat. It will need to report better errors for the decoder, though, so I’ll have to be careful about that.

The next complication will be unit testing. I cannot create a ChannelHandlerContext mock. This will be tricky if the convenient ByteToMessageDecoderVerifier doesn’t get all scenarios (notably .body message parts which may not be deterministic).

Overall I’m happy with the progress, and can refactor if needed. See the commented decode(context:buffer:) for the first draft. It’ll be removed soon.