Close

Using ftrace to debug device tree problems

A project log for Put a Raspberry Pi CM4 into an original iPad

Open up a 2010 iPad, remove its logic board, and fabricate a new board based around the Raspberry Pi CM4.

evanEvan 08/18/2021 at 07:280 Comments

For the past few days, I've been trying to get the sn65ds83 driver (recently added to linux) running and talking to my board. It has been a (re-)learning experience trying to get this device tree overlay to cooperate.

Following my own advice from the last time I touched device trees, I used udevadm monitor and dmesg -w to watch for kernel/udev messages while loading my overlay with sudo dtoverlay pipad_screen. This gave me this helpful error message:

[  178.309859] sn65dsi83: probe of 1-002d failed with error -2

From what I can tell from looking at the kernel source, this implies that the sn65dsi83_probe function was returning -2. Unfortunately, the sn65dsi83_probe function can return errors from several different places. Fortunately, there's a kernel feature called ftrace that can trace every function call that happens within the kernel. The easiest way to use it is to use the trace-cmd command, which you can install on Raspberry Pi OS with apt install trace-cmd. You record with trace-cmd record -p function_graph (with various options for filtering), then view the results with trace-cmd report.

Supposedly, you can trace everything by just running trace-cmd record -p function_graph with no filter options, but when I tried this, I had a lot of dropped events and didn't see the sn65dsi83_probe function in the results.

After learning some things about ftrace (by manually constructing a list of functions to filter on with the -l flag), I discovered that ftrace can only trace functions in kernel modules that are loaded before ftrace starts. Normally, the ti-sn65dsi83 module is loaded automatically when I run dtoverlay pipad_screen, but that prevents ftrace from seeing it.

If I manually modprobe ti-sn65dsi83 before running trace-cmd record -p function_graph -F dtoverlay pipad_screen, I get no dropped events, and the trace for the sn65dsi83_probe function was there in the output of trace-cmd report:

sn65dsi83_probe() {
  devm_kmalloc() {
    __kmalloc_track_caller() {
      kmalloc_slab();
      should_failslab();
    }
    devres_add() {
      _raw_spin_lock_irqsave() {
        preempt_count_add();
      }
      _raw_spin_unlock_irqrestore() {
        preempt_count_sub();
      }
    }
  }
  of_device_get_match_data() {
    of_match_node() {
      _raw_spin_lock_irqsave() {
        preempt_count_add();
      }
      __of_match_node.part.0() {
        __of_device_is_compatible() {
          __of_find_property();
          of_prop_next_string();
        }
        __of_device_is_compatible() {
          __of_find_property();
          of_prop_next_string();
          of_prop_next_string();
        }
      }
      _raw_spin_unlock_irqrestore() {
        preempt_count_sub();
      }
    }
  }
  devm_gpiod_get() {
    ...
  }
}

This implies that sn65dsi83_probe was returning immediately after the call to devm_gpiod_get finished. Oh, right, I never put the enable GPIO pin for the sn65dsi83 in my dts file. I figured the property would be called enable because that's the string being passed to devm_gpiod_get, but nope. From looking at documentation for other bridges, I inferred that it's enable-gpios.

Adding enable-gpios = <&gpio 5 0>; to my device tree and trying again, this time I get from dmesg -w:

[   95.093834] sn65dsi83: probe of 1-002d failed with error -22

 Looking at my report, this time I see:

sn65dsi83_probe() {
  devm_kmalloc() {
    ...
  }
  of_device_get_match_data() {
    ...
  }
  devm_gpiod_get() {
    ...
  }
  of_graph_get_endpoint_by_regs() {
    ...
  }
  of_property_count_elems_of_size() {
    ...
  }
  of_graph_get_remote_port_parent() {
    ...
  }
  of_node_put();
}

Looks like it's failing just after of_graph_get_remote_port_parent and of_node_put. These aren't called directly from sn65dsi83_probe, but by sn65dsi83_parse_dt. Because sn65dsi83_parse_dt is defined as a static function, it can't be traced by ftrace, and everything it calls appears directly beneath sn65dsi83_probe.

I think I'm being caught by this code:

    ctx->dsi_lanes = of_property_count_u32_elems(endpoint, "data-lanes");
    ctx->host_node = of_graph_get_remote_port_parent(endpoint);
    of_node_put(endpoint);

    if (ctx->dsi_lanes < 0 || ctx->dsi_lanes > 4)
        return -EINVAL;

Sure enough, EINVAL is defined as 22. This probably means I need to fix/add a data-lanes entry in my device tree. Looking at the sn65dsi86 driver (which is better-documented Turns out, the sn65dsi83 driver is perfectly well-documented, I just didn't cherry-pick that commit onto my own kernel branch. 🤦♂️), it looks like this needs to go on the endpoint node, not the bridge node. Once I fixed this, and moved my port to port@0 (instead of port@1 which I had for some reason), I no longer get the failed with error -22 message. Now I have no errors at all when loading the overlay.

Time to figure out what the next step is, I guess.

For posterity, heres a gist of my dts file.

Discussions