Leaves on the line

A project log for PIP Arduino Web Browser

Surf the web using an Arduino and an ethernet shield. Because you can.

GilchristGilchrist 10/15/2014 at 07:590 Comments

From early March 2014

So where are we at with the PIP Arduino web browser? Let me explain. No, there is too much. Let me sum up.

It can download raw HTML from a fixed website to a cached file on a SD card. Most of the time. [1]

Yeah, that's about it. I've got a bad feeling about this. But after all my obsessing last time about the lack of space for program code, I forgot to mention that the Ethernet and SD card libraries only left about 864 bytes of RAM to play with. This really is like retro coding after all.

To make matters worse, HTML is an extremely verbose mark-up language. That's fine it you want to do a load of string passing and regex matching. I don't. If fact, I didn't even want to crack open the Arduino string library to save space. [2]

So here is my cunning plan; replace all the verbose text tags with a single byte equivalent for the tags I'll be implementing and dump the rest. [3] After I've done the hard work of parsing HTML once, handling one byte codes should make rendering much easier.

The ASCII character set is little used most of the time for values above 128, so I considered using characters there. But some of them are used for special characters. [4]

ASCII below 32 (space) are generally non-printing characters and a lot are there for "Historical reasons" now, so I'll commandeer some of them. It's open season on everything after ASCII 13 (carriage return) up to 31. All your ASCII are belong to us.

Before I could really get into HTML decoding, I had to address the problem with download reliability. It seem that the Ethernet shield is fine for downloading little 1KB pages for IoT projects [5] but larger pages … Houston, we have a problem.

It seems the shield supplies data to the Arduino in 2KB chunks (even the Wiz5100 has much more memory for buffering). My download program can often consumed the Serial buffer down to nothing and have to wait some time for it to refill.

Too long. More time than the procedure will wait for before timing out. The Serial.readBytesUntil() procedures aren't much better. And the example Ethernet code doesn't wait around. If ethernetClient.avaialble() returns 0, it's immediately checking for ethernetClient.connectd() - which often bails early. [6]

I adapted some found code for a custom readBytesUntil. I can give this a really big timeout period. [7] I've also got an increasing delay period when the buffer's empty, to really make sure the connection is dropped before giving up.

I'm using these routine to zip through the pre-HTML headers and other bits, so the pre-emptive terminator of '<' is handy.

Code time:

// Consumes data from Serial until the given
// string is found, or time runs out
byte findUntil (uint8_t *string, boolean terminate) {
char currentChar = 0;
long timeOut = millis() + 5000;
char c = 0;

  while (millis() < timeOut) {
    if (ethernetClient.available()) {
      timeOut = millis() + 5000;
      c =;

      if (terminate && (c == '<')) return 1; // Pre-empted match

      if (c == string[currentChar]) {
        if (string[++currentChar] == 0) {
          return 2; // Found
      else currentChar = 0;
  return 0; // Timeout

// Delays until Serial data is available
boolean inputWait() {
byte wait = 0;
long timeOut = millis() + 5000; // Allow 5 seconds of no data

  while ((millis() < timeOut) && (!ethernetClient.available())) {
    delay(++wait * 100);
  if ((!ethernetClient.available()) && (!ethernetClient.connected())) return 0;
  else return 1;

[0] The title of this post refers to a common reason given for UK trains being delayed or late. "Your train has been delayed due to leaves of the line."

[1] There are issues I'll get into later.

[2] And I've heard it's a bit buggy sometimes.

[3] Sounds easy when you say it fast. Like "Put a tail on it and call it a weasel." Have you ever tried doing that?

[4] That's where Johnny foreigner keeps all his funny letters. And there's no real standard for the characters up there anyway.

[5] Internet of Things.

[6] No-one has any patience nowadays, it seems. I blame computer games. Welcome to Global Thermonuclear War. Shall we play a game?

[7] Five whole seconds! Which is a lot in internet time.