Close

Using ftrace to debug device tree problems

A project log for Put a Raspberry Pi CM4 into an original iPad

Open up a 2010 iPad, remove its logic board, and fabricate a new board based around the Raspberry Pi CM4.

evanEvan 08/18/2021 at 07:281 Comment

For the past few days, I've been trying to get the sn65ds83 driver (recently added to linux) running and talking to my board. It has been a (re-)learning experience trying to get this device tree overlay to cooperate.

Following my own advice from the last time I touched device trees, I used udevadm monitor and dmesg -w to watch for kernel/udev messages while loading my overlay with sudo dtoverlay pipad_screen. This gave me this helpful error message:

[  178.309859] sn65dsi83: probe of 1-002d failed with error -2

From what I can tell from looking at the kernel source, this implies that the sn65dsi83_probe function was returning -2. Unfortunately, the sn65dsi83_probe function can return errors from several different places. Fortunately, there's a kernel feature called ftrace that can trace every function call that happens within the kernel. The easiest way to use it is to use the trace-cmd command, which you can install on Raspberry Pi OS with apt install trace-cmd. You record with trace-cmd record -p function_graph (with various options for filtering), then view the results with trace-cmd report.

Supposedly, you can trace everything by just running trace-cmd record -p function_graph with no filter options, but when I tried this, I had a lot of dropped events and didn't see the sn65dsi83_probe function in the results.

After learning some things about ftrace (by manually constructing a list of functions to filter on with the -l flag), I discovered that ftrace can only trace functions in kernel modules that are loaded before ftrace starts. Normally, the ti-sn65dsi83 module is loaded automatically when I run dtoverlay pipad_screen, but that prevents ftrace from seeing it.

If I manually modprobe ti-sn65dsi83 before running trace-cmd record -p function_graph -F dtoverlay pipad_screen, I get no dropped events, and the trace for the sn65dsi83_probe function was there in the output of trace-cmd report:

sn65dsi83_probe() {
  devm_kmalloc() {
    __kmalloc_track_caller() {
      kmalloc_slab();
      should_failslab();
    }
    devres_add() {
      _raw_spin_lock_irqsave() {
        preempt_count_add();
      }
      _raw_spin_unlock_irqrestore() {
        preempt_count_sub();
      }
    }
  }
  of_device_get_match_data() {
    of_match_node() {
      _raw_spin_lock_irqsave() {
        preempt_count_add();
      }
      __of_match_node.part.0() {
        __of_device_is_compatible() {
          __of_find_property();
          of_prop_next_string();
        }
        __of_device_is_compatible() {
          __of_find_property();
          of_prop_next_string();
          of_prop_next_string();
        }
      }
      _raw_spin_unlock_irqrestore() {
        preempt_count_sub();
      }
    }
  }
  devm_gpiod_get() {
    ...
  }
}

This implies that sn65dsi83_probe was returning immediately after the call to devm_gpiod_get finished. Oh, right, I never put the enable GPIO pin for the sn65dsi83 in my dts file. I figured the property would be called enable because that's the string being passed to devm_gpiod_get, but nope. From looking at documentation for other bridges, I inferred that it's enable-gpios.

Adding enable-gpios = <&gpio 5 0>; to my device tree and trying again, this time I get from dmesg -w:

[   95.093834] sn65dsi83: probe of 1-002d failed with error -22

 Looking at my report, this time I see:

sn65dsi83_probe() {
  devm_kmalloc() {
    ...
  }
  of_device_get_match_data() {
    ...
  }
  devm_gpiod_get() {
    ...
  }
  of_graph_get_endpoint_by_regs() {
    ...
  }
  of_property_count_elems_of_size() {
    ...
  }
  of_graph_get_remote_port_parent() {
    ...
  }
  of_node_put();
}

Looks like it's failing just after of_graph_get_remote_port_parent and of_node_put. These aren't called directly from sn65dsi83_probe, but by sn65dsi83_parse_dt. Because sn65dsi83_parse_dt is defined as a static function, it can't be traced by ftrace, and everything it calls appears directly beneath sn65dsi83_probe.

I think I'm being caught by this code:

    ctx->dsi_lanes = of_property_count_u32_elems(endpoint, "data-lanes");
    ctx->host_node = of_graph_get_remote_port_parent(endpoint);
    of_node_put(endpoint);

    if (ctx->dsi_lanes < 0 || ctx->dsi_lanes > 4)
        return -EINVAL;

Sure enough, EINVAL is defined as 22. This probably means I need to fix/add a data-lanes entry in my device tree. Looking at the sn65dsi86 driver (which is better-documented Turns out, the sn65dsi83 driver is perfectly well-documented, I just didn't cherry-pick that commit onto my own kernel branch. 🤦♂️), it looks like this needs to go on the endpoint node, not the bridge node. Once I fixed this, and moved my port to port@0 (instead of port@1 which I had for some reason), I no longer get the failed with error -22 message. Now I have no errors at all when loading the overlay.

Time to figure out what the next step is, I guess.

For posterity, heres a gist of my dts file.

Discussions

Evan wrote 01/23/2024 at 06:04 point

Since I keep coming back to this post every time I need to remember how to use ftrace: The -F option to trace-cmd record causes it to only record calls that were directly caused by the process in question. If something isn't showing up that you expect to be there, try dropping the -F argument. This may generate a lot more data.

  Are you sure? yes | no