Ramit Mittal

How does cURL know not to print binary responses?

You can’t work on a CDN platform and not make HTTP requests all the time. And you’re often more interested in the request/response headers than the response body. Familiarity with CLI HTTP clients is a must. HTTPie was my staple and restclient.el was my favorite. But none of them did quite what I wanted. HTTPie can’t repeat requests, and restclient.el has a hard dependency on Emacs. So I did what any developer with a lot of free time would; write my own HTTP client so I couldn’t complain anymore. And, I was pretty satisfied with it; Until

that happened 😕.

I hit an image URL and the client attempted to print the binary image response on the terminal. I instantly remembered how HTTPie handles this issue. HTTPie detects that the response contains non-printable characters and prints a warning message instead of the actual content.

If you knew anything about HTTP headers, your first instinct would be to check the Content-Type header for a clue about what the response body contains. But that’s more of a suggestion than a fact. An HTTP client should know better than to trust third-party servers to set the correct Content-Type header.

I looked at some documentation for the unicode and utf8 packages in Go and wrote some code to detect whether the response body contained non-printable characters.

// Scans the byte slice of an HTTP response body for non-printable characters
// and return a dummy value to not mess up the terminal
func formatResponseBody(body []byte) string {
	idx := 0
	for idx < len(body) {
		r, size := utf8.DecodeRune(body[idx:])
		if unicode.IsPrint(r) {
			idx += size
		} else {
			return "\n\nRESPONSE CONTAINS NON-PRINTABLE CHARACTERS.\n"
		}
	}
	return string(body)
}

This function finds the characters represented by each byte (or group of bytes) and checks whether they are printable. This function correctly detects non-printable content when the input byte slice contains the bytes from a PNG or JPEG image. But after integrating this function, the HTTP client also started reporting HTML pages as binary. I found that unicode.isPrint does not consider the line feed byte 0xa as printable. I added a check for the same in my second implementation.

func formatResponseBody(body []byte) string {
	idx := 0
	for idx < len(body) {
		r, size := utf8.DecodeRune(body[idx:])
		if r != utf8.RuneError {
			idx += size
		} else if body[idx] == byte(10) {
			idx += 1
		} else {
			return "\n\nRESPONSE CONTAINS NON-PRINTABLE CHARACTERS.\n"
		}
	}
	return string(body)
}

Hoping to find a solution where I didn’t have to iterate over the entire response body, I tried to see how HTTPie and cURL solve this problem. Both scan the response body but use much simpler logic to detect binary data. They simply check for the existence of the 0-byte 0x0.

    def iter_body(self) -> Iterable[bytes]:
        for line, lf in self.msg.iter_lines(self.CHUNK_SIZE):
            if b'\0' in line:
                raise BinarySuppressedError()
            line = self.decode_chunk(line)
            yield smart_encode(line, self.output_encoding) + lf

HTTPie: httpie/output/streams.py

  if(is_tty && (outs->bytes < 2000) && !config->terminal_binary_ok) {
    /* binary output to terminal? */
    if(memchr(buffer, 0, bytes)) {
      warnf(config->global, "Binary output can mess up your terminal. "
            "Use \"--output -\" to tell curl to output it to your terminal "
            "anyway, or consider \"--output <FILE>\" to save to a file.\n");
      config->synthetic_error = TRUE;
      return CURL_WRITEFUNC_ERROR;
    }
  }

cURL: src/tool_cb_wrt.c

If this is good enough for curl then it is good enough for my HTTP client. I experimented with tricking cURL into printing the unprintable to confirm this. Here is an Express server that responds to incoming requests with some non-zero bytes corresponding to non-printable ASCII characters.

const express = require('express');
const app = express();

app.get("/", (req, res) => {
    res.end(Buffer.from([255, 255, 255, 255, 11, 255]))
})
app.listen(8080)

And cURL couldn’t care less.