Reverse Engineering Proprietary Network Protocols

The most interesting reverse engineering work I've done wasn't on software — it was on a proprietary TCP protocol used by a legacy access control system at a self-storage facility. The vendor had gone out of business, the documentation had disappeared, and the gate keypads, door controllers, and the billing system that was supposed to talk to them all used a binary protocol with no public specification. The only option was to reconstruct the protocol from captured traffic. What I'm describing here is that process, generalized into a repeatable methodology.

This kind of work is legal when you own the system or have authorization from the system owner, and when your goal is interoperability or replacement — not circumventing access controls you're not authorized to bypass. The techniques I'm describing are standard practice in industrial control system integration, legacy system migration, and security research. Be clear on your authorization scope before touching any of this.

Setting Up Wireshark for Protocol Capture

The first step is getting clean, labeled packet captures. I set up a network tap between the access control server and one of its clients (a keypad controller in this case) using a managed switch with port mirroring enabled. All traffic from the target port gets mirrored to a capture laptop running Wireshark.

For a proprietary TCP protocol, start with a display filter to isolate traffic between the two endpoints:

ip.addr == 192.168.1.10 and ip.addr == 192.168.1.50 and tcp

Follow a TCP stream (right-click any packet, "Follow > TCP Stream") to see the full conversation as a hex dump or raw bytes. Wireshark's "Show data as" > "Hex Dump" gives you the byte values and their ASCII representation side by side, which is invaluable for spotting string fields embedded in otherwise binary messages.

Save your baseline capture with a meaningful filename: baseline-door-unlock-sequence.pcapng. You'll be building a library of labeled captures that correspond to specific system events. The labeling is more important than the capturing — I've lost hours re-analyzing captures because I didn't document what system action triggered each one.

Differential Analysis: Sending Known Inputs

Differential analysis is the core of the methodology. The principle is simple: make one known change to the system input, capture the resulting traffic, and compare it to your baseline. The difference is evidence about what that input controls in the protocol.

For the access control system, I started with the most observable events: a valid keypad code entry (door unlocks), an invalid keypad code entry (door stays locked), and a timeout with no entry. I captured each scenario multiple times to identify which bytes were constant and which varied. Bytes that are always the same across all captures are likely structural (message type, length fields, magic bytes). Bytes that vary predictably based on the input are the payload.

The process looks like this in practice:

Capture Event A three times: door-unlock-001.pcap, door-unlock-002.pcap, door-unlock-003.pcap
Capture Event B three times: door-deny-001.pcap, door-deny-002.pcap, door-deny-003.pcap
Compare A captures to each other: bytes that differ within the same event type are likely timestamps or sequence numbers
Compare A captures to B captures: bytes that differ between event types are likely status codes or result fields

I use a Python script to automate this diff process across multiple pcap files, extracting just the payload bytes from each TCP stream:

from scapy.all import rdpcap, TCP

def extract_payloads(pcap_file):
    packets = rdpcap(pcap_file)
    payloads = []
    for pkt in packets:
        if TCP in pkt and bytes(pkt[TCP].payload):
            payloads.append(bytes(pkt[TCP].payload).hex())
    return payloads

baseline = extract_payloads('door-unlock-001.pcap')
variant = extract_payloads('door-deny-001.pcap')

for i, (b, v) in enumerate(zip(baseline, variant)):
    if b != v:
        print(f"Byte offset {i}: baseline={b}, variant={v}")

Identifying Message Framing

Before you can parse individual messages, you need to understand how messages are delimited in the byte stream. Two common patterns:

Length-prefixed: The first 1, 2, or 4 bytes of each message encode the total message length. Common in binary protocols. To test for this, capture a sequence of messages and check whether the first few bytes correlate numerically with the total byte count of each message payload.

Delimiter-based: Messages are separated by a fixed byte sequence — often 0x0D 0x0A (CRLF), a null byte, or a protocol-specific magic sequence. Look for repeated byte patterns that appear between logical units in the stream.

In the access control protocol I analyzed, messages used a 2-byte little-endian length prefix followed by a 1-byte message type identifier, a 4-byte session ID, and then the variable-length payload. Figuring out that length prefix format took about 20 minutes of staring at hex dumps and noticing that the first two bytes of each message always matched len(message) - 2 in decimal. Once I saw that pattern I tested it against 15 different captures and it held consistently.

Reverse Engineering an Authentication Handshake

The authentication handshake was the most complex part. The sequence was: client connects, server sends a challenge (looked like random bytes), client sends a response, server sends either an accept or reject message. Classic challenge-response authentication.

The challenge was always 16 bytes. The response was always 32 bytes. The length of the response suggested SHA-256 (32 bytes = 256 bits). To confirm, I needed to figure out what was being hashed. Common patterns for challenge-response: HMAC(key, challenge), hash(key + challenge), or hash(challenge + key).

I found the client software on a maintenance laptop at the facility. Running strings on the binary found a 32-character string that looked like a hardcoded key — it was alphanumeric and appeared in no other context. Testing SHA256(key + challenge) against a known good capture matched. That was the authentication scheme: SHA-256 of a hardcoded shared key concatenated with the server challenge.

Not a great authentication scheme (hardcoded shared key, no forward secrecy, replay attack possible if the challenge has low entropy), but that's a discussion for a different article. The point is that differential analysis plus strings analysis of the client binary got me to the answer in about four hours.

Building a PHP Implementation from the Reverse-Engineered Spec

Once I had a partial protocol spec documented, I built a PHP client to implement it. The spec doc lived in a Markdown file alongside the code, updated as I discovered more about each message type. The implementation used PHP's socket extension for raw TCP:

class AccessControlClient {
    private \Socket $socket;
    private string $sessionId;
    private const MSG_AUTH_CHALLENGE = 0x01;
    private const MSG_AUTH_RESPONSE  = 0x02;
    private const MSG_AUTH_ACCEPT    = 0x03;
    private const MSG_DOOR_COMMAND   = 0x10;

    public function connect(string $host, int $port, string $sharedKey): void {
        $this->socket = socket_create(AF_INET, SOCK_STREAM, SOL_TCP);
        socket_connect($this->socket, $host, $port);

        // Read challenge
        $challenge = $this->readMessage();
        assert($challenge['type'] === self::MSG_AUTH_CHALLENGE);

        // Compute response
        $response = hash('sha256', $sharedKey . $challenge['payload'], true);

        // Send response
        $this->sendMessage(self::MSG_AUTH_RESPONSE, $response);

        // Read accept
        $accept = $this->readMessage();
        if ($accept['type'] !== self::MSG_AUTH_ACCEPT) {
            throw new \RuntimeException('Authentication failed');
        }

        $this->sessionId = substr($accept['payload'], 0, 4);
    }

    private function readMessage(): array {
        $header = socket_read($this->socket, 7); // 2 len + 1 type + 4 session
        ['len' => $len, 'type' => $type] = unpack('vlen/Ctype', $header);
        $sessionId = substr($header, 3, 4);
        $payload = $len > 7 ? socket_read($this->socket, $len - 7) : '';
        return ['type' => $type, 'session' => $sessionId, 'payload' => $payload];
    }

    private function sendMessage(int $type, string $payload): void {
        $len = 7 + strlen($payload);
        $header = pack('vC', $len, $type) . $this->sessionId;
        socket_write($this->socket, $header . $payload);
    }
}

I built the implementation incrementally, adding one message type at a time and validating each against live captures before moving to the next. The entire protocol for door unlock commands, status queries, and event log retrieval was implemented in about 400 lines of PHP after two weeks of analysis and testing.

The resulting client connected the access control system to the billing platform, enabling automatic overlocking of delinquent tenants and real-time unit status updates — functionality the facility had never had because the original vendor software didn't support it. That's the value proposition of reverse engineering done right: not breaking systems, but making them work together when the documentation no longer exists to tell you how.

Reverse Engineering Proprietary Network Protocols

Setting Up Wireshark for Protocol Capture

Differential Analysis: Sending Known Inputs

Identifying Message Framing

Reverse Engineering an Authentication Handshake

Building a PHP Implementation from the Reverse-Engineered Spec

Reverse Engineering Legacy Billing Systems

Understanding 2600 Magazine and Hacker Culture

Setting Up Wireshark for Protocol Capture

Differential Analysis: Sending Known Inputs

Identifying Message Framing

Reverse Engineering an Authentication Handshake

Building a PHP Implementation from the Reverse-Engineered Spec

RELATED ARTICLES

Reverse Engineering Legacy Billing Systems

Understanding 2600 Magazine and Hacker Culture