Tuesday, 12 February 2013

Optimus and Ubuntu 12.10 (Part 5)

This post is the fifth of a series of posts on tweaking Ubuntu 12.10 to exploit Optimus technology on my Lenovo W530 to the extent I need. Make sure you are familiar with the context and objectives.

As described in Part 1, plymouth issues when Optimus is enabled on the W530 in EFI mode. The issue around missing usplash/plymouth turned out to be connected to the random order in which the kernel initialized the graphics devices.

Part 3 captures the root cause of the symthom: Usplash is hardcoded to render to /dev/fb0 which is fine if the IGD is initialized first by the kernel, but not good if /dev/fb0 is associated with the DIS framebuffer.

One workaround already explained is blacklisting nouveau, and loading it later, after usplash is started on the IGD framebuffer. I seeked more elegant approaches and tinkered with initram to explore the alternatives:

  • Forcing proper module load order in initrd:/etc/modules - I found that this file is only processed after the kernel finished autoloading, and cannot load blacklisted modules.
  • Crafting custom initram scripts to take care of module loading early in the book process - this approach did not prove effective either.
  • Creating an initram script to disable DIS proved to be impossible as debugfs was not mounted at the time of initram script execution.
  • Packaging custom udev rules into the initial ramdisk to ensure the desired framebuffer numbering. I settled with this option.

Ensuring consistent framebuffer numbering via udev rules

Traditionally, Unix systems exposed a static set of device files under /dev but current versions of Linux ship with a device manager that can dynamically populate the /dev directory with nodes for the devices present. It allows persistent naming insensitive to the order of hardware initialization or hotplug. This is achieved by udev rules, which match the devices based on their attributes and perform actions such as creating device nodes, symlinks, changing permissions and ownership, or if fact, running arbitrary logic.

This section provides a very brief walk-through on how the rules for the current use-case can be created. First, information about the framebuffer devices had to be gathered, and the properties and attributes for the matching part had to be identified.


$ udevadm info -a -n /dev/fb0 | head -n 24

Udevadm info starts with the device specified by the devpath and then
walks up the chain of parent devices. It prints for every device
found, all possible attributes in the udev rules key format.
A rule to match, can be composed by the attributes of the device
and the attributes from one single parent device.

  looking at device '/devices/pci0000:00/0000:00:01.0/0000:01:00.0/graphics/fb0':
    KERNEL=="fb0"
    SUBSYSTEM=="graphics"
    DRIVER==""
    ATTR{pan}=="0,0"
    ATTR{name}=="nouveaufb"
    ATTR{mode}==""
    ATTR{console}==""
    ATTR{blank}==""
    ATTR{modes}=="U:1024x768p-0"
    ATTR{state}=="0"
    ATTR{bits_per_pixel}=="32"
    ATTR{cursor}==""
    ATTR{rotate}=="0"
    ATTR{stride}=="4096"
    ATTR{virtual_size}=="1024,768"

$ udevadm info -a -n /dev/fb1 | sed -n '8,24p'
  looking at device '/devices/pci0000:00/0000:00:02.0/graphics/fb1':
    KERNEL=="fb1"
    SUBSYSTEM=="graphics"
    DRIVER==""
    ATTR{pan}=="0,0"
    ATTR{name}=="inteldrmfb"
    ATTR{mode}==""
    ATTR{console}==""
    ATTR{blank}==""
    ATTR{modes}=="U:1920x1080p-0"
    ATTR{state}=="0"
    ATTR{bits_per_pixel}=="32"
    ATTR{cursor}==""
    ATTR{rotate}=="0"
    ATTR{stride}=="7680"
    ATTR{virtual_size}=="1920,1080"

As it can be seen from the listing above, is is sufficient to match devices of the graphics subsystem named by the kernel as "fd0" and "fb1". The attribute "name" can be used to discriminate between the two framebuffers. The goal is create device node /dev/fb0 for the intel framebuffer /dev/fb1 for the nouveau framebuffer respectively. This can be achieved by the following udev rule:


KERNEL=="fb?", SUBSYSTEM=="graphics", ATTR{name}=="nouveaufb", NAME="fb1"
KERNEL=="fb?", SUBSYSTEM=="graphics", ATTR{name}=="inteldrmfb", NAME="fb0"

Normally, one would save these lines to a file in the appropriate folder, say /etc/udev/rules.d/82-explicit-fb-assignment.rules. (The numbering is used to ensure the rule is not overridden, as rules are parsed in lexicographical order.)

In this very case, however, the rule will not yield the desired results. One would see usplash on shutdown, but not on boot. The rule file is located on disk, in the root partition to be precise. It gets mounted after usplash initializes during boot...

Packaging custom udev rules into initrd

As implied by the nature of the use case, it would not suffice to drop the custom udev rule into /etc/udev/rules.d/, it needs to be included in the initial ramdisk. Customization of initial ramdisk content is best done via custom hook scripts - they provide a clean, modular way of achieving the current goal while being minimally intrusive to other parts of the system. The hook scripts will run whenever the a new initrd is created and can contribute files to the ramdisk. They do not become part of the ramdisk themselves.


$ cat <<EOX | sudo tee /etc/initramfs-tools/hooks/explicit-fb-assignment
#!/bin/sh -e

PREREQ="udev"

# Output pre-requisites
prereqs()
{
   echo "$PREREQ"
}

case "$1" in
    prereqs)
   prereqs
   exit 0
   ;;
esac


. /usr/share/initramfs-tools/hook-functions

# Create udev rules to control fb1/fb0 assignment
cat > ${DESTDIR}/lib/udev/rules.d/82-explicit-fb-assignment.rules <<EOF
KERNEL=="fb?", SUBSYSTEM=="graphics", ATTR{name}=="nouveaufb", NAME="fb1"
KERNEL=="fb?", SUBSYSTEM=="graphics", ATTR{name}=="inteldrmfb", NAME="fb0"
EOF

EOX

$ sudo chmod +x /etc/initramfs-tools/hooks/fb_order
# create new initrd for the current kernel
$ sudo update-initramfs -c -k $(uname -r)
# verify on next reboot

The listing above provides the commands for creating the hook and generating a new initrd. Having applied this tweak, usplash/plymouth is fully functional and reliable, as /dev/fb0 always refers to the intel framebuffer device.

Troubleshooting

One has to remember that there are 2 udev daemons launched at different parts of the boot process. One runs off initrd, while the other is spawned after the root filesystem has been mounted. The former uses the rules included in ramdisk, while the later one reads the rules from /{lib|etc}/udev/ruled.d of the root partition, and does not use any rule from the initrd.

Friday, 1 February 2013

Tracking down TrackPoint issues

I occasionally experience strange, inconsistent behavior of the TrackPoint on my Lenovo w530. Sometimes, middle button scroll was not working at all, other times it kind of worked, however, after having scrolled and released the middle button, the clipboard's content was pasted into the current caret position. Also, scrolling pdf documents in evince with the TrackPoint simply resulting into up and down shaking and vibrating pages. I also noticed that in this cases, the cursor moved during middle button scroll which was very weird.

Locating the root cause

The symptoms above were not persistent across reboot - occasionally they were present. Further, restarting the X server typically improved the situation. After a bit of experimenting it turned out, that the strange behaviour occurs more frequently when running of a fast SSD, however, even with a rotating disk the symptoms appeared from time to time.

After a bit of googling I used the command xinput to enumerate X input devices and view their settings. I confirmed that wheel emulation was enabled, and configured for button 2. With xev I also confirmed that the middle TrackPoint button indeed fired button 2 events.


$ xinput --list
⎡ Virtual core pointer                        id=2    [master pointer  (3)]
⎜   ↳ Virtual core XTEST pointer              id=4    [slave  pointer  (2)]
⎜   ↳ SynPS/2 Synaptics TouchPad              id=13   [slave  pointer  (2)]
⎜   ↳ <default pointer>                       id=6    [slave  pointer  (2)]
⎣ Virtual core keyboard                       id=3    [master keyboard (2)]
    ↳ Virtual core XTEST keyboard             id=5    [slave  keyboard (3)]
    ↳ Power Button                            id=7    [slave  keyboard (3)]
    ↳ Video Bus                               id=8    [slave  keyboard (3)]
    ↳ Video Bus                               id=9    [slave  keyboard (3)]
    ↳ Sleep Button                            id=10   [slave  keyboard (3)]
    ↳ Integrated Camera                       id=11   [slave  keyboard (3)]
    ↳ AT Translated Set 2 keyboard            id=12   [slave  keyboard (3)]
    ↳ ThinkPad Extra Buttons                  id=14   [slave  keyboard (3)]
∼ TPPS/2 IBM TrackPoint                       id=15   [floating slave]
$ xinput --list-props "TPPS/2 IBM TrackPoint" | grep "Wheel Emulation"
 Evdev Wheel Emulation (425): 1
 Evdev Wheel Emulation Axes (426): 6, 7, 4, 5
 Evdev Wheel Emulation Inertia (427): 10
 Evdev Wheel Emulation Timeout (428): 200
 Evdev Wheel Emulation Button (429): 2
$ xev

First I thought I should just disable the paste action on middle button click, by remapping the buttons. The middle button click actions can be completely disabled by the following command:


$ # disable normal middle button action (like paste)
$ xinput set-button-map "TPPS/2 IBM TrackPoint" 1 0 3 4 5 6 7 

This command eliminated the annoying text-pasting behavior whenever the middle button was released, but it did not solve the occasional inability to use middle button close, neither the shaking/vibrating pages in evince, so I continued investigating the issue...

I drew the conclusion that the error is caused by race conditions on around X server start, resulting an improper start up sequence of input device initialization. The W530 has 8 logical CPU cores, and my thinkpad is also equipped with a fast SSD - I already encountered similar issues in other areas which I recorded in previous posts.

My theory

I revisited the symptoms and tried to find a plausible explanation for the behavior. It seemed like 2 xinput devices were concurrently handling the TrackPoint, to a degree that varied across X restarts.

  • Ability to move the cursor, but inability to use middle mouse scroll - in cases the 'other' devices actively handling the TrackPoint.
  • Button click events when finishing middle wheel scroll - in case both X input devices are actively interpreting raw button press and release events.
  • Moving cursor during middle button scroll - when both X input devices are interpreting pointer motion with the middle button pressed.
  • Shaking pdf pages - I believe evince has some software level 'page dragging' capability built in, where the drag directions are the opposite of the scroll direction, resulting the strange vibrating effect. To clarify this assumption, let us imagine what happened if both X input devices were indeed actively handling the raw TrackPoint events: the pointer is moving upwards while the middle button is in a pressed state.
    • The TrackPoint X input device, as wheel emulation is enabled, sends wheel events scrolling the page up.
    • The other X input device, without wheel emulation, simply forwards the cursor movements and the pressed state of the middle button to the application, which activated page dragging. Dragging the current page upwards is equivalent to scrolling down.
    It sound plausible that this situation could result a conflict between scrolling up and down at the same time, yielding vertically vibrating pages...

According to an old email thread the X server automatically ads the input device <default pointer> if there are no configured pointing devices, and it is well possible that in my case TrackPoint initialization is done after, or in parallel to the X server checking for configured devices.

Steps to fix it

Looking at the output of xinput --list again, as shown above, made me wonder what a floating slave could mean - according to the GDK3 reference manual this indicates that the device is not attached to any virtual device (master). In the case when middle button scroll was not working at all, I could simply enable the TrackPoint device from the command line, which restored my ability to use middle button scroll but also reproduced the phenomenon of pasting-when-scrolling and vibrating pdf pages. To me this seemed to prove the theory described above. I experimented with various xinput calls and eventually fixed the situation by disabling the default pointer pointer.

After having minimized the list of commands needed to resolve the middle button scroll issue, I decided to restore normal button mapping and re-enable middle button paste. Actually, it does not conflict with middle button scroll at all. Just tapping/clicking the middle button triggers the paste action (or normal middle button action as defined for the actual application) while keeping it pressed enters wheel emulation mode - and releasing it does not fire the normal click event. The timout for a normal middle button click is configured to 200 ms which seems to work fine for me.


$ xinput enable "TPPS/2 IBM TrackPoint"
$ xinput disable "<default pointer>"
$ # quickly click on middle button to paste, hold it down to start scrolling

These are the minimal commands I use to fix the TrackPoint behavior whenever it occurs.