Tuxicate - linux tweaks

Tuesday, 12 February 2013

Optimus and Ubuntu 12.10 (Part 5)

This post is the fifth of a series of posts on tweaking Ubuntu 12.10 to exploit Optimus technology on my Lenovo W530 to the extent I need. Make sure you are familiar with the context and objectives.

As described in Part 1, plymouth issues when Optimus is enabled on the W530 in EFI mode. The issue around missing usplash/plymouth turned out to be connected to the random order in which the kernel initialized the graphics devices.

Part 3 captures the root cause of the symthom: Usplash is hardcoded to render to /dev/fb0 which is fine if the IGD is initialized first by the kernel, but not good if /dev/fb0 is associated with the DIS framebuffer.

One workaround already explained is blacklisting nouveau, and loading it later, after usplash is started on the IGD framebuffer. I seeked more elegant approaches and tinkered with initram to explore the alternatives:

Forcing proper module load order in initrd:/etc/modules - I found that this file is only processed after the kernel finished autoloading, and cannot load blacklisted modules.
Crafting custom initram scripts to take care of module loading early in the book process - this approach did not prove effective either.
Creating an initram script to disable DIS proved to be impossible as debugfs was not mounted at the time of initram script execution.
Packaging custom udev rules into the initial ramdisk to ensure the desired framebuffer numbering. I settled with this option.

Ensuring consistent framebuffer numbering via udev rules

Traditionally, Unix systems exposed a static set of device files under /dev but current versions of Linux ship with a device manager that can dynamically populate the /dev directory with nodes for the devices present. It allows persistent naming insensitive to the order of hardware initialization or hotplug. This is achieved by udev rules, which match the devices based on their attributes and perform actions such as creating device nodes, symlinks, changing permissions and ownership, or if fact, running arbitrary logic.

This section provides a very brief walk-through on how the rules for the current use-case can be created. First, information about the framebuffer devices had to be gathered, and the properties and attributes for the matching part had to be identified.


$ udevadm info -a -n /dev/fb0 | head -n 24

Udevadm info starts with the device specified by the devpath and then
walks up the chain of parent devices. It prints for every device
found, all possible attributes in the udev rules key format.
A rule to match, can be composed by the attributes of the device
and the attributes from one single parent device.

  looking at device '/devices/pci0000:00/0000:00:01.0/0000:01:00.0/graphics/fb0':
    KERNEL=="fb0"
    SUBSYSTEM=="graphics"
    DRIVER==""
    ATTR{pan}=="0,0"
    ATTR{name}=="nouveaufb"
    ATTR{mode}==""
    ATTR{console}==""
    ATTR{blank}==""
    ATTR{modes}=="U:1024x768p-0"
    ATTR{state}=="0"
    ATTR{bits_per_pixel}=="32"
    ATTR{cursor}==""
    ATTR{rotate}=="0"
    ATTR{stride}=="4096"
    ATTR{virtual_size}=="1024,768"

$ udevadm info -a -n /dev/fb1 | sed -n '8,24p'
  looking at device '/devices/pci0000:00/0000:00:02.0/graphics/fb1':
    KERNEL=="fb1"
    SUBSYSTEM=="graphics"
    DRIVER==""
    ATTR{pan}=="0,0"
    ATTR{name}=="inteldrmfb"
    ATTR{mode}==""
    ATTR{console}==""
    ATTR{blank}==""
    ATTR{modes}=="U:1920x1080p-0"
    ATTR{state}=="0"
    ATTR{bits_per_pixel}=="32"
    ATTR{cursor}==""
    ATTR{rotate}=="0"
    ATTR{stride}=="7680"
    ATTR{virtual_size}=="1920,1080"

As it can be seen from the listing above, is is sufficient to match devices of the graphics subsystem named by the kernel as "fd0" and "fb1". The attribute "name" can be used to discriminate between the two framebuffers. The goal is create device node /dev/fb0 for the intel framebuffer /dev/fb1 for the nouveau framebuffer respectively. This can be achieved by the following udev rule:


KERNEL=="fb?", SUBSYSTEM=="graphics", ATTR{name}=="nouveaufb", NAME="fb1"
KERNEL=="fb?", SUBSYSTEM=="graphics", ATTR{name}=="inteldrmfb", NAME="fb0"

Normally, one would save these lines to a file in the appropriate folder, say /etc/udev/rules.d/82-explicit-fb-assignment.rules. (The numbering is used to ensure the rule is not overridden, as rules are parsed in lexicographical order.)

In this very case, however, the rule will not yield the desired results. One would see usplash on shutdown, but not on boot. The rule file is located on disk, in the root partition to be precise. It gets mounted after usplash initializes during boot...

Packaging custom udev rules into initrd

As implied by the nature of the use case, it would not suffice to drop the custom udev rule into /etc/udev/rules.d/, it needs to be included in the initial ramdisk. Customization of initial ramdisk content is best done via custom hook scripts - they provide a clean, modular way of achieving the current goal while being minimally intrusive to other parts of the system. The hook scripts will run whenever the a new initrd is created and can contribute files to the ramdisk. They do not become part of the ramdisk themselves.


$ cat <<EOX | sudo tee /etc/initramfs-tools/hooks/explicit-fb-assignment
#!/bin/sh -e

PREREQ="udev"

# Output pre-requisites
prereqs()
{
   echo "$PREREQ"
}

case "$1" in
    prereqs)
   prereqs
   exit 0
   ;;
esac


. /usr/share/initramfs-tools/hook-functions

# Create udev rules to control fb1/fb0 assignment
cat > ${DESTDIR}/lib/udev/rules.d/82-explicit-fb-assignment.rules <<EOF
KERNEL=="fb?", SUBSYSTEM=="graphics", ATTR{name}=="nouveaufb", NAME="fb1"
KERNEL=="fb?", SUBSYSTEM=="graphics", ATTR{name}=="inteldrmfb", NAME="fb0"
EOF

EOX

$ sudo chmod +x /etc/initramfs-tools/hooks/fb_order
# create new initrd for the current kernel
$ sudo update-initramfs -c -k $(uname -r)
# verify on next reboot

The listing above provides the commands for creating the hook and generating a new initrd. Having applied this tweak, usplash/plymouth is fully functional and reliable, as /dev/fb0 always refers to the intel framebuffer device.

Troubleshooting

One has to remember that there are 2 udev daemons launched at different parts of the boot process. One runs off initrd, while the other is spawned after the root filesystem has been mounted. The former uses the rules included in ramdisk, while the later one reads the rules from /{lib|etc}/udev/ruled.d of the root partition, and does not use any rule from the initrd.

Friday, 1 February 2013

Tracking down TrackPoint issues

I occasionally experience strange, inconsistent behavior of the TrackPoint on my Lenovo w530. Sometimes, middle button scroll was not working at all, other times it kind of worked, however, after having scrolled and released the middle button, the clipboard's content was pasted into the current caret position. Also, scrolling pdf documents in evince with the TrackPoint simply resulting into up and down shaking and vibrating pages. I also noticed that in this cases, the cursor moved during middle button scroll which was very weird.

Locating the root cause

The symptoms above were not persistent across reboot - occasionally they were present. Further, restarting the X server typically improved the situation. After a bit of experimenting it turned out, that the strange behaviour occurs more frequently when running of a fast SSD, however, even with a rotating disk the symptoms appeared from time to time.

After a bit of googling I used the command xinput to enumerate X input devices and view their settings. I confirmed that wheel emulation was enabled, and configured for button 2. With xev I also confirmed that the middle TrackPoint button indeed fired button 2 events.


$ xinput --list
⎡ Virtual core pointer                        id=2    [master pointer  (3)]
⎜   ↳ Virtual core XTEST pointer              id=4    [slave  pointer  (2)]
⎜   ↳ SynPS/2 Synaptics TouchPad              id=13   [slave  pointer  (2)]
⎜   ↳ <default pointer>                       id=6    [slave  pointer  (2)]
⎣ Virtual core keyboard                       id=3    [master keyboard (2)]
    ↳ Virtual core XTEST keyboard             id=5    [slave  keyboard (3)]
    ↳ Power Button                            id=7    [slave  keyboard (3)]
    ↳ Video Bus                               id=8    [slave  keyboard (3)]
    ↳ Video Bus                               id=9    [slave  keyboard (3)]
    ↳ Sleep Button                            id=10   [slave  keyboard (3)]
    ↳ Integrated Camera                       id=11   [slave  keyboard (3)]
    ↳ AT Translated Set 2 keyboard            id=12   [slave  keyboard (3)]
    ↳ ThinkPad Extra Buttons                  id=14   [slave  keyboard (3)]
∼ TPPS/2 IBM TrackPoint                       id=15   [floating slave]
$ xinput --list-props "TPPS/2 IBM TrackPoint" | grep "Wheel Emulation"
 Evdev Wheel Emulation (425): 1
 Evdev Wheel Emulation Axes (426): 6, 7, 4, 5
 Evdev Wheel Emulation Inertia (427): 10
 Evdev Wheel Emulation Timeout (428): 200
 Evdev Wheel Emulation Button (429): 2
$ xev

First I thought I should just disable the paste action on middle button click, by remapping the buttons. The middle button click actions can be completely disabled by the following command:


$ # disable normal middle button action (like paste)
$ xinput set-button-map "TPPS/2 IBM TrackPoint" 1 0 3 4 5 6 7

This command eliminated the annoying text-pasting behavior whenever the middle button was released, but it did not solve the occasional inability to use middle button close, neither the shaking/vibrating pages in evince, so I continued investigating the issue...

I drew the conclusion that the error is caused by race conditions on around X server start, resulting an improper start up sequence of input device initialization. The W530 has 8 logical CPU cores, and my thinkpad is also equipped with a fast SSD - I already encountered similar issues in other areas which I recorded in previous posts.

My theory

I revisited the symptoms and tried to find a plausible explanation for the behavior. It seemed like 2 xinput devices were concurrently handling the TrackPoint, to a degree that varied across X restarts.

Ability to move the cursor, but inability to use middle mouse scroll - in cases the 'other' devices actively handling the TrackPoint.
Button click events when finishing middle wheel scroll - in case both X input devices are actively interpreting raw button press and release events.
Moving cursor during middle button scroll - when both X input devices are interpreting pointer motion with the middle button pressed.
Shaking pdf pages - I believe evince has some software level 'page dragging' capability built in, where the drag directions are the opposite of the scroll direction, resulting the strange vibrating effect. To clarify this assumption, let us imagine what happened if both X input devices were indeed actively handling the raw TrackPoint events: the pointer is moving upwards while the middle button is in a pressed state.
- The TrackPoint X input device, as wheel emulation is enabled, sends wheel events scrolling the page up.
- The other X input device, without wheel emulation, simply forwards the cursor movements and the pressed state of the middle button to the application, which activated page dragging. Dragging the current page upwards is equivalent to scrolling down.
It sound plausible that this situation could result a conflict between scrolling up and down at the same time, yielding vertically vibrating pages...

According to an old email thread the X server automatically ads the input device <default pointer> if there are no configured pointing devices, and it is well possible that in my case TrackPoint initialization is done after, or in parallel to the X server checking for configured devices.

Steps to fix it

Looking at the output of xinput --list again, as shown above, made me wonder what a floating slave could mean - according to the GDK3 reference manual this indicates that the device is not attached to any virtual device (master). In the case when middle button scroll was not working at all, I could simply enable the TrackPoint device from the command line, which restored my ability to use middle button scroll but also reproduced the phenomenon of pasting-when-scrolling and vibrating pdf pages. To me this seemed to prove the theory described above. I experimented with various xinput calls and eventually fixed the situation by disabling the default pointer pointer.

After having minimized the list of commands needed to resolve the middle button scroll issue, I decided to restore normal button mapping and re-enable middle button paste. Actually, it does not conflict with middle button scroll at all. Just tapping/clicking the middle button triggers the paste action (or normal middle button action as defined for the actual application) while keeping it pressed enters wheel emulation mode - and releasing it does not fire the normal click event. The timout for a normal middle button click is configured to 200 ms which seems to work fine for me.


$ xinput enable "TPPS/2 IBM TrackPoint"
$ xinput disable "<default pointer>"
$ # quickly click on middle button to paste, hold it down to start scrolling

These are the minimal commands I use to fix the TrackPoint behavior whenever it occurs.

Friday, 18 January 2013

Optimus and Ubuntu 12.10 (Part 4)

This post is the fourth of a series of posts on tweaking Ubuntu 12.10 to exploit Optimus technology on my Lenovo W530 to the extent I need. Make sure you are familiar with the context, especially the objectives and constraints as described in Part 1, Part 2 and Part 3.

Moving windows from the primary screen to the external VGA screen

As it has been explained before, windows cannot cross screen boundaries, and the GNOME desktop cannot span multiple X screens even in the case when the these screens belong to the same display (X instance). It should be obvious that this is also the case with screens that belong to different X servers.

The objectives defined in the previous parts require the ability to extend the GNOME desktop to the monitor attached to the external VGA port, to run presentations using two monitors and to clone the primary monitors content to the external monitor.

In order to meet one of the objectives, one could mirror, or better said clone the content of one screen to another screen. To be even more precise, only a given portion of the screen has to be cloned to other viewport, the area of the other screen that is displayed by the external monitor. There is a userspace tool called hybrid screenclone to perform exactly this task - find more on this tool below.

Extending the desktop to a screen of the other X server

Thinking further, in order to be able to extend the desktop and thereby meet other objectives, one could set up a mock monitor in the first X server, and then, clone the content to the screen of the other X server so it would show up on the external monitor.

My first approach was to examine the video outputs of the integrated graphics device, and found VGA[12] that is wired to /dev/null (see Part 2). With various xrandr commands I could force the unused, thus always disconnected output to be configured with a fixed resolution, right of the primary monitor. A screenshot confirmed I was half way through: it contained a black are next to the primary desktop. I could even drag windows onto this black area, however, the desktop would not extend to this portion of the X screen. Also, libreoffice refused to start the slideshow in multi-monitor mode, as it only detected one connected monitor. While the idea was not completely useless, this approach did turn out not to be usable in production.

A bit of googling revealed, that the author of hybrid screenclone also maintains a patch against the intel video driver which adds a dynamically configurable virtual output - enabling exactly the scenario I was targeting.

Intel driver hack

The listing below takes the reader through the process of creating and installing a package containing the patched version of the intel video driver.


$ mkdir /tmp/foo && cd /tmp/foo # we are going to compile in tmpfs/RAM
$ sudo aptitude build-dep --schedule-only  xserver-xorg-video-intel
$ sudo aptitude # review interactively what is going to be installed
$ apt-get source xserver-xorg-video-intel
$ cd xserver-xorg-video-intel-2.20.9/
$ wget https://raw.github.com/liskin/patches/master/hacks/xserver-xorg-video-intel-2.20.2_virtual_crtc.patch
$ # the newer patch did not match.
$ patch -p1 < xserver-xorg-video-intel-2.20.2_virtual_crtc.patch
$ # now update the version to show this is a patched package
$ # NEVER alter packages without making it clear in the package version!
$ # I prepend this to debian/changelog:
$ mv debian/changelog debian/changelog.old && cat <<EOF > debian/changelog
xserver-xorg-video-intel (2:2.20.9-0ubuntu2+virtual-crtc) quantal; urgency=low

  [ Tibor Bősze ]
  * Add xserver-xorg-video-intel-2.20.2_virtual_crtc.patch

 -- Tibor Bősze <tibor.boesze@gmail.com>  Sun, 13 Jan 2013 03:15:00 +0200

EOF
$ cat debian/changelog.old >> debian/changelog && rm debian/changelog.old
$ # now build and install the package
$ dpkg-buildpackage -b
$ sudo dpkg -i ../xserver-xorg-video-intel_2.20.9-0ubuntu2+virtual-crtc_amd64.deb
$ # as you see, the package version will clearly show that this is a patched package
$ # prevent the package to be automatically updated
$ sudo aptitude hold xserver-xorg-video-intel

After a reboot, the command below will active the virtual monitor. The graphical display manager will not show any second display, however, creating a screenshot quickly confirms that the second virtual monitor is active, and the desktop correctly extends to it. Also, libreoffice impress can finely use it for running the slideshow in dual monitor mode.


$ xrandr --output LVDS2 --auto --output VIRTUAL --mode 800x600 --right-of LVDS2

Screenclone

To render this post complete, below are listed the commands to download and compile the tool.


$ # we are still doing stuff in tmpfs/RAM
$ aptitude install git-core
$ git clone git://github.com/liskin/hybrid-screenclone.git
$ cd hybrid-screenclone && make
g++ -std=c++0x -g -Wall    screenclone.cc  -lpthread -lX11 -lXdamage -lXtst -lXinerama -lXcursor -o screenclone
screenclone.cc:18:33: fatal error: X11/Xcursor/Xcursor.h: No such file or directory
compilation terminated.
make: *** [screenclone] Error 1
$ apt-file search Xcursor.h
libxcursor-dev: /usr/include/X11/Xcursor/Xcursor.h
$ aptitude install libxcursor-dev
$ make
g++ -std=c++0x -g -Wall    screenclone.cc  -lpthread -lX11 -lXdamage -lXtst -lXinerama -lXcursor -o screenclone
screenclone.cc:24:37: fatal error: X11/extensions/Xinerama.h: No such file or directory
compilation terminated.
make: *** [screenclone] Error 1
$ aptitude install libxinerama-dev libxdamage-dev libxtst-dev
$ make
$ mv screenclone ~/optimus/

By default the tool will clone the first screen of the first display to the first screen of the second one, that is, :0.0 to :1.0. This almost completely fits my use case, I added the parameter -x 1 which limits the content to be copied to the area of the screen that is displayed by the second monitor, the VIRTUAL output in my case.


$ # clone the viewport of the VIRTUAL output from :0.0 to the top left corner of :1.0
$ ~/optimus/screenclone -x 1

Part 5 will revisit the issue of changing framebuffer number assignment and provide a more elegant solution to fixing usplash.

Wednesday, 16 January 2013

Optimus and Ubuntu 12.10 (Part 3)

This post is the third of a series of posts on tweaking Ubuntu 12.10 to exploit Optimus technology on my Lenovo W530 to the extent I need. Make sure you are familiar with the context, especially the objectives and constraints as described in Part 1 and Part 2.

My previous post on the topic describes how to control the power state of the DIS with vgaswitheroo that is part of the stock Ubuntu 12.10 kernel. It explains key terms related to X also shows alternative ways to use both the DIS and IGD within the same X server.

As the "single X server with 2 screens" approach is not an option until the related bugfix is available in the official repositories, I investigated how a second X server could be used to reach my goals. As a first step I analysed existing solutions in the area. Starting a second X server is core concept of Bumblebee, that enables rendering on the DIS, and then uses VisualGL to copy the content of individual windows back to the screen handled by the primary X server that uses the IGD only. Unfortunately, this project does not support external monitors in the case when ports are wired to DIS only. Also, it currently does not seem to be mature and stable enough for my production thinkpad. Nevertheless, it gave me a starting point...

Two X servers with one screen each

Starting a separate X instance to handle the DIS enables better isolation/sandboxing but also introduces additional issues.

First of all, the primary X instance has to be configured in a way that it will not grab any resources of the DIS, else the secondary instance fails to starts up with the message "No screens found".
Similarly, the second instance has to be configured explicitly to only use the DIS and related monitors.
I used configuration to override the actual connection status of the external VGA port and always enable VGA internally. Without an enabled monitor X would not start up.
The desktop cannot be extended to a monitor of the second X instance.
One could use a separate window manager on the second X instance - twm is very lightweight one. I stayed with a naked X for reasons described below.
With two X instances, I get a cursor on both displays. Without a pointing device, X would not start either. There are solution to use a mock input device, but anyway, having a two cursors that move in tandem on the two monitors is not critical. With my current configuration, the touchpad only controls the cursor on my primary X, while the trackpoint controls both.
Finally, the second X server will run as root, essentially with access control disabled so mortals can open windows on the second X as well.


$ cd ~
$ mkdir optimus && cd optimus
$ cat >>xorg.conf.nouveau<<EOF
Section "Modes"
 Identifier "FallbackModes" # Mode to use if External-VGA is diconnected
 Modeline "1024x768"   65.00  1024 1048 1184 1344  768 771 777 806 -hsync -vsync
EndSection

Section "ServerLayout"
 Identifier "Layout0"
 Screen "Screen0"
 Option "AutoAddDevices" "false"
 Option "AutoEnableDevices" "false"
 Option "AutoAddGPU" "false"
EndSection

Section "Monitor"
 Identifier "External-VGA"
 UseModes "FallbackModes"
 Option "Enable" "true" # always enabled
 Option "PreferredMode" "1024x768"
EndSection

Section "Monitor"
 Identifier "LCD"
 Option "Enable" "false" # always disabled
EndSection

Section "Device"
 Identifier "DIS"
 Driver "nouveau"
 BusID "PCI:1:0:0"
 Option "HWCursor" "true"
 # The numbers in output names change based on whether IGD or DIS is           
 # initialized first by the kernel. This tweak takes care of both cases.
 Option "Monitor-VGA-1" "External-VGA"
 Option "Monitor-VGA-2" "External-VGA"
 Option "Monitor-LVDS-1" "LCD"
 Option "Monitor-LVDS-2" "LCD"
EndSection

Section "Screen"
 Identifier "Screen0"
 Device "DIS"
 Monitor "External-VGA"
 DefaultDepth 24
 SubSection "Display"
  Depth 24
 EndSubSection
EndSection
EOF

$ cat >>xorg.conf.intel<<EOF
Section "ServerLayout"
   Identifier "Layout0"
   Screen "Screen0"
   Option "AutoAddDevices" "true"
   Option "AutoEnableDevices" "true"
   Option "AutoAddGPU" "false"
EndSection

Section "Device"
   Identifier "IGD"
   Driver "intel"
   BusID "PCI:0:2:0"
EndSection

Section "Screen"
   Identifier "Screen0"
   Device "IGD"
   DefaultDepth 24
   SubSection "Display"
      Depth 24
   EndSubSection
EndSection
EOF
$ sudo cp xorg.conf.intel /etc/X11/
$ sudo rm /etc/X11/xorg.conf # this is the link to the 2 screen config
$ sudo ln -s /etc/X11/xorg.conf.intel /etc/X11/xorg.conf
$ sudo lightdm restart

If there is no explicit default xorg configuration, then the X server will hold both /dev/dri/card[01] in which case the second X instance could not start up.

Starting, disabling and restoring external VGA output

I used the following small script to ensure DIS is powered on, spawn the X server, decorate it with a random background. After pressing Enter, X is terminated and DIS powered off.


#!/bin/bash

msg() {
 echo "******* $1"
}

msg "Ensuring DIS is powered on."
echo ON | sudo tee /sys/kernel/debug/vgaswitcheroo/switch
msg "Launching X server on display :1."
sudo /usr/bin/X -ac -audit 0 -config /home/tibi/optimus/xorg.conf.nouveau -sharevts -verbose 1 -logverbose 9 -logfile /tmp/Xorg.1.log -nolisten tcp -noreset :1 &

PID=$!
sleep 2
msg "PID is $PID, log goes to /tmp/Xorg.1.log."
# bonus: get a random background
BACKGROUND=$(find /home/tibi/Pictures -maxdepth 1 -name '*.jpg' | sort --random-sort | head -1)
msg "Setting background: $BACKGROUND"
gm display -window root -display :1.0 $BACKGROUND
msg "DONE."

msg "Press Enter to terminate and clean up."
read
msg "Terminating X server..."
sudo kill $PID
sleep 2
echo OFF | sudo tee /sys/kernel/debug/vgaswitcheroo/switch
msg "Discrete graphics device powered off."

I can safely disable and restore the external display without stopping the X server with the following commands:


### DISABLE
# numbers change across reboot, one of the two will work, other will print an error.
xrandr -d :1 --output VGA-1 --off
xrandr -d :1 --output VGA-2 --off
echo OFF | sudo tee /sys/kernel/debug/vgaswitcheroo/switch

### RESTORE
echo ON | sudo tee /sys/kernel/debug/vgaswitcheroo/switch
# numbers change across reboot, one of the two will work, other will print an error.
xrandr -d :1 --output VGA-1 --auto
xrandr -d :1 --output VGA-2 --auto

Usplash and plymouth issues

As it has been stated earlier, the issue around missing usplash/plymouth turned out to be connected to the random order in which the kernel initialized the graphics devices.


$ lspci | grep VGA
00:02.0 VGA compatible controller: Intel Corporation 3rd Gen Core processor Graphics Controller (rev 09)
01:00.0 VGA compatible controller: NVIDIA Corporation GK107 [Quadro K1000M] (rev a1)
$ # framebuffers - the order (0 and 1) changes randomly across reboots
$ cat /proc/fb
0 inteldrmfb
1 nouveaufb

Usplash is hardcoded to render to /dev/fb0 which is fine if the IGD is initialized first by the kernel, but not good if /dev/fb0 is associated with the DIS framebuffer. Kernel boot fbcon=map:1 is of no help in solving this issue. One workaround to mediate this problem is blacklisting nouveau, and loading it later, after usplash is started on the IGD framebuffer.


$ cat <<EOF | sudo tee /etc/modprobe.d/blacklist-nouveau.conf
# blacklist nouveau to force IGD framebuffer to take precedence
blacklist nouveau
EOF
$ sudo update-initramfs -c -k all

### and later during the boot process...

modprobe nouveau # we need modprobe as it was blacklisted during boot
udevadm settle # wait until all udev events are handled
echo OFF > /sys/kernel/debug/vgaswitcheroo/switch

This approach is rather a quick and dirty workaround. Technically, it would be enough - and much more elegant - to merely ensure the proper order of module loading by tinkering around with initram scripts; this is subject to further investigation.

About Part 4

Part 4 shows how this second naked X server can be used to achieve the objectives outlined in the previous post.

Monday, 14 January 2013

Optimus and Ubuntu 12.10 (Part 2)

This post is a follow up on Optimus and Ubuntu 12.10. Please make sure you are familiar with the context.

Setting the scope and objectives

This series of posts takes the reader through Nvidia Optimus related tweaks I applied on my Lenovo W530. My goal is to get a very stable system with extended battery life, and the ability to connect an external projector to the VGA port and cover the following use cases:

Extend the desktop to the external monitor.
Get a cloned output of the primary monitor to the external monitor with panning support - this means, that in the case the external monitor's resolution is smaller, a smaller viewport will follow the mouse and show a cropped clone of the desktop's content. The viewport will follow the mouse to show the area of interest.
Run LibreOffice presentations with the external monitor showing the current slide and the primary monitor showing the presentation overview, notes and time.
Never ever get X freezes or kernel lockups on suspend/resume with or without an external monitor connected.
Switching to virtual terminals should always work in a bulletproof manner. The box is a workhorse, cannot allow hiccups.
Might sound like a small detail, but a properly displayed usplash/plymouth is also important, not only for cosmetic purposes.

All these features I am able to use today on my T400 (that includes a single Intel graphics device), with the help either the GNOME GUI, or xrandr scripts in special cases. However, on the W530 with it's two graphics devices, both the the external VGA and mini DisplayPort are wired to the nvidia chip so I will not able to hook up an external monitor with only the integrated graphics device enabled. The discrete device is a resource hog on one hand, and the W530 is not able to properly boot with only discrete graphics enabled in the case hardware virtualization support is also enabled.

I took the decision to push myself and tweak Ubuntu 12.10 until my goals and criteria is met. I started looking for existing solutions to get Optimus working, and educating myself on the topic. On the road I took the decision not to use Bumblebee or other unstable/immature components that could impose stability issues of additional risks of X or kernel lockups.

Further, I also set a secondary objective: trying to reach my goals with open-source components only. So, I tried to avoid the proprietary Nvidia driver as much as possible.

All these goals are met by now, with the open source nouveau driver and set of tricks and tweaks. This writing focuses more on the way of investigating alternatives and achieving the goals as opposed to merely providing the final solution.

Stock Ubuntu with Optimus enabled in firmware

Having applied the lightdm tweak from the earlier post, the blank screen issues disappeared completely even with Optimus enabled in firmware setup. The issue around missing usplash/plymouth turned out to be connected to the random order in which the kernel initialized the graphics devices.


$ lspci | grep VGA
00:02.0 VGA compatible controller: Intel Corporation 3rd Gen Core processor Graphics Controller (rev 09)
01:00.0 VGA compatible controller: NVIDIA Corporation GK107 [Quadro K1000M] (rev a1)
$ # framebuffers - the order (0 and 1) changes randomly across reboots
$ cat /proc/fb
0 inteldrmfb
1 nouveaufb
$ # vgaswitheroo state - the order (0 and 1) changes randomly across reboots
$ sudo cat /sys/kernel/debug/vgaswitcheroo/switch
0:IGD:+:Pwr:0000:00:02.0
1:DIS: :Pwr:0000:01:00.0

With Optimus enabled, the X sever starts up properly, but external monitors are not recognized out of the box. Power state of the discrete graphics device can be controlled through vgaswitheroo, but switching between the discrete (DIS) and integrated graphics device (IDG) cannot be performed. Why is that, and what is vgaswitcheroo at all?

vgaswitcheroo

It is a mechanism built into the kernel that allows to switch between multiple graphics devices if (and only if) the configuration is equipped with a hardware multiplexer.It is documented not to work with Nvidia Optimus, as it does not incorporate a hardware multiplexer.

Nevertheless, it automatically loaded (it is part of the stock Ubuntu kernel) and allowed me to change the power state of the discrete graphics device. There are other solutions for changing the power state either automatically or manually, such as acpi_call or bbswitch. Both of these other solution have one thing in common - they are kernel modules not considered stable or mature. So my choice is to go with vgaswitcheroo for power state management, using the commands as shown below:


$ sudo cat /sys/kernel/debug/vgaswitcheroo/switch # display status
$ echo OFF | sudo tee /sys/kernel/debug/vgaswitcheroo/switch # power off DIS
$ echo ON | sudo tee /sys/kernel/debug/vgaswitcheroo/switch # power on DIS

I would like to stress again, that it is used for power state management only, as actual switching cannot be done with Optimus. One can verify power state of the discrete device with lspci but also with the power led of a monitor connected to the VGA port.

About X servers, displays, screens and monitors

The core concept of Bumblebee is to start up a second X server to enable rendering on the DIS, and then use VisualGL to copy the content of individual windows back to the screens handled by the primary X server that uses the IGD only. This approach is not necessary the best one.

Contrary to what many might believe, one single X server can handle many screens and many video cards. Let me take a bottom up approach to define the terms and structure used by X - or at least my understanding thereof.

A Monitor is a graphical output device, such as a projector, an LCD panel, CRT or other kind of monitor. They are physically connected (in a static/persistent or dynamic/transient manner) to one of the video cards.
A Screen is a virtual area where applications can render their windows. A screen can be composed of multiple monitors running on the same video card, each monitor can show a portition/viewport of the screen - they can even overlap or show cloned content. (It should be noted that they can also be fully virtual as in the case of VNC.) This way it is obvious that windows within the same screen can overlap or span multiple monitors. Windows cannot overlap or span multiple screens. Window managers can run independently on each screen.
One instance of X server is associated with a display. It can handle multiple video cards and multiple screens, these screens share input devices such as keyboard and mouse. One screen can be active at a time, and the active screen can be changed with the mouse (although there also existed a small command line utility for this, called switchscreen).
Xinerama is an extension that allows Monitors even from multiple video cards to be combined into one screen. It does not support hardware acceleration nor dynamic reconfiguration.

Revisiting the core concept of Bumblebee in light of the information above, one intuitive alternative would be the use of a single X server, with one screen for the IDG, and another screen for the DIS. Monitors can be turned on and off dynamically, and based on that, the screen size can also be adjusted dynamically. One thing that needs to be investigated is the behaviour of the X server when one of the video devices is powered off.

How wiring affects what X identifies

DIS with nouveau driver:

LVDS-1 or LVDS-2 which is a dead end and always shows as disconnected
VGA-1 or VGA-2 which is the external VGA output wired to the nvidia card
DP-1
DP-2
DP-3

IDG with the intel driver - note the differences in naming, no hyphen is used:

LVDS2 or LVDS1 which is wired to the LCD panel
VGA2 or VGA1 which would be supported by the card but is a dead end, not wired to anything

One X server with two screens

The configuration below does not address the change of output names caused by the non deterministic order in which the kernel initialized the graphics devices.


cat <<EOF | sudo tee /etc/X11/xorg.conf.2screens
Section "ServerLayout"
  Identifier "Layout0"
  Screen 0 "Screen0"
  Screen 1 "Screen1" RightOf "Screen0" # two screens side by side
  Option "AutoAddDevices" "true"
  Option "AutoEnableDevices" "true"
  Option "AutoAddGPU" "true"
EndSection

Section "Device"
  Identifier "IGD"
  Driver "intel"
  BusID "PCI:0:2:0" # this can be read from the lspci output above
EndSection

Section "Screen" # IDG screen will not have explicit monitor configuration
  Identifier "Screen0"
  Device "IGD"
  DefaultDepth 24
  SubSection "Display"
    Depth 24
  EndSubSection
EndSection

Section "Monitor"
  Identifier "External-VGA"
  Option "Enable" "true" # will always show as connected, so we have at least one active output
  Option "PreferredMode" "1024x768"
EndSection

Section "Monitor" 
  Identifier "LCD"
  Option "Enable" "false" # this is a dead end - could have used "Ignore" "true"
EndSection

Section "Device"
  Identifier "DIS"
  Driver "nouveau"
  BusID "PCI:1:0:0" # this can be read from the lspci output above
  Option "Monitor-VGA-2" "External-VGA" # connecting output names with monitor config
  Option "Monitor-LVDS-2" "LCD"
  Option "HWCursor" "true"
EndSection

Section "Screen" # DIS screen will only have one active monitor: External-VGA
  Identifier "Screen1"
  Device "DIS"
  Monitor "External-VGA"
  DefaultDepth 24
  SubSection "Display"
      Depth 24
  EndSubSection
EndSection
EOF
$ # now create a link xorg.conf that will point to our actual configuration.
$ sudo ln -s /etc/X11/xorg.conf.2screens /etc/X11/xorg.conf 
$ sudo service lightdm restart # ... and log in
$ DISPLAY=:0.1 xclock # start xclock on screen 1 and smile

The configuration above would work fine. Unfortunately, the mouse could not leave screen 0 when I was testing - it wrapped meaning when I pulled the pointer outside of screen 0 on the right side, it entered on the left side of screen 0 instead of screen 1. This is an upstream bug, marked as "fix released", however a working package is not available in the official Ubuntu 12.10 repositories yet.

Powering off the DIS without any provision while the screen was actively used froze the kernel, however powering it off once after turning off the output devices seemed stable.


echo OFF | sudo tee /sys/kernel/debug/vgaswitcheroo/switch
# freezes the kernel, requires hard reset

# if the output is disabled first like this:
xrandr --screen 1 -q # see which devices are connected
xrandr --screen 1 --output VGA-2 --off # switch off connected devices
echo OFF | sudo tee /sys/kernel/debug/vgaswitcheroo/switch
# works fine, and screen can be restored when the devices reactivated.

Unfortunately, this approach is not usable at in production until the bugfix making me unable to switch screens is resolved.

It should be also noted, that when running with a two screens configuration, windows cannot span or cross screens, and the desktop can not be simply extended either - without further magic to be covered later. However, any application can be opened on any of the two screens.

Xinerama

This old extension allows one to extend the desktop to screen 1 with a single line of configuration change. The following line has to be added to the ServerLayout section:


Option "Xinerama" "on"

Two downsides made me search for other alternatives: The performance with Xinerama is very very poor, but even more important, Xinerama does not support dynamic configuration changes, so resizing, rearranging the monitors or switching outputs on or off is simply not supported. With the current set of objectives this simply means Xinerama is not what I am looking for.

About Part 3

Part 3 explains setting up two X servers and various related tweaks to fulfill all objectives listed above.

Wednesday, 9 January 2013

Fixing the visual appearance of 32bit applications on Ubuntu 12.10 64bit

Both Lotus Notes and Lotus Sametime are available as 32 bit applications. Ubuntu 12.10 provides multiarch support out of the box, which refers to the ability of a system to install and run applications of multiple different binary targets on the same system, in this case i386-linux-gnu application on an amd64-linux-gnu system. Nevertheless, the visual appearance of 32 bit applications is not in-line with the look-and-feel of the amd64 ones. This is especially annoying as I am going to have Lotus Notes in front of me on a daily basis.

The following section describes how to find and install the set of 32 bit packages that will resolve the issue as much as possible. First some background information. Lotus notes uses an eclipse based framework, and eclipse uses SWT, their cross platform standard widgeting toolkit. SWT on linux uses GTK+, the gimp toolkit as the native backend for rendering widgets. Ubuntu 12.10 provides a desktop environment that mainly uses GTK3, a version of GTK+ available since 2011. GTK3 is backwards compatible with GTK2 which is available since 2002. However, one can install GTK2 and GTK3 themes and rendering engines separately...

To find out more about why the 32bit applications do not render as we would expect, let us install a 32bit GTK2 and GTK3 demo application.


$ sudo aptitude install gtk2.0-examples:i386 gtk-3-examples:i386 --without-recommends
$ gtk3-demo 
Gtk-Message: Failed to load module "overlay-scrollbar"

(gtk3-demo:5164): Gtk-WARNING **: Theme parsing error: gtk-widgets.css:62:17: Theming engine 'unico' not found
Gtk-Message: Failed to load module "canberra-gtk-module"
$ gtk-demo
Gtk-Message: Failed to load module "overlay-scrollbar"

(gtk-demo:4930): Gtk-WARNING **: Unable to locate theme engine in module_path: "murrine",
Gtk-Message: Failed to load module "canberra-gtk-module"

It can be seen from the output what modules and theming engines could not be loaded. Now let us take incrementally try to resolve the issues.


$ # let us first take a try at installing 32bit overlay-scrollbar
$ aptitude install overlay-scrollbar-gtk2:i386 overlay-scrollbar-gtk3:i386 --without-recommends --simulate
The following NEW packages will be installed:
  overlay-scrollbar-gtk2:i386{b} overlay-scrollbar-gtk3:i386{b} 
0 packages upgraded, 2 newly installed, 0 to remove and 0 not upgraded.
Need to get 79.7 kB of archives. After unpacking 250 kB will be used.
The following packages have unmet dependencies:
 overlay-scrollbar-gtk2:i386 : Depends: overlay-scrollbar:i386 which is a virtual package.
 overlay-scrollbar-gtk3:i386 : Depends: overlay-scrollbar:i386 which is a virtual package.
The following actions will resolve these dependencies:

     Keep the following packages at their current version:
1)     overlay-scrollbar-gtk2:i386 [Not Installed]        
2)     overlay-scrollbar-gtk3:i386 [Not Installed]        

Accept this solution? [Y/n/q/?] q
Abandoning all efforts to resolve these dependencies.
Abort.
$ # as it can be seen, there is a missing dependency, so it cannot be installed.

$ # let us take a try at unico theming engine
$ aptitude install gtk3-engines-unico:i386 --without-recommends --simulateThe following NEW packages will be installed:
  gtk3-engines-unico:i386{b} 
0 packages upgraded, 1 newly installed, 0 to remove and 0 not upgraded.
Need to get 9,066 B of archives. After unpacking 57.3 kB will be used.
The following packages have unmet dependencies:
 gtk3-engines-unico : Conflicts: gtk3-engines-unico:i386 but 1.0.2+r139-0ubuntu2 is to be installed.
 gtk3-engines-unico:i386 : Conflicts: gtk3-engines-unico but 1.0.2+r139-0ubuntu2 is installed.
The following actions will resolve these dependencies:

     Remove the following packages:
1)     gtk3-engines-unico          
2)     light-themes                
3)     ubuntu-artwork              
4)     ubuntu-desktop              

Accept this solution? [Y/n/q/?] q
Abandoning all efforts to resolve these dependencies.
Abort.
$ # This time, there is a conflict: 
$ # the 32bit and 64bit versions of unico are mutually exclusive.

$ # to make the long story short, the following packages could be installed:
$ #   - libcanberra-gtk3-module:i386
$ #   - libcanberra-gtk3-module:i386
$ #   - gtk2-engines-murrine:i386
$ sudo aptitude install libcanberra-gtk3-module:i386 libcanberra-gtk3-module:i386 gtk2-engines-murrine:i386 --without-recommends

$ # both the output and the visual appearance confirm things got better...
$ gtk-demo 
Gtk-Message: Failed to load module "overlay-scrollbar"
tibi@grizzly:~$ gtk3-demo 
Gtk-Message: Failed to load module "overlay-scrollbar"

(gtk3-demo:6039): Gtk-WARNING **: Theme parsing error: gtk-widgets.css:62:17: Theming engine 'unico' not found
$ gtk-demo 
Gtk-Message: Failed to load module "overlay-scrollbar"

After installing the packages as shows above, both Lotus Notes and Lotus Sametime render the widgets much nicer. The theme is not perfectly inline with the out of the box Ubuntu 12.10 theme, but I can live without having the overlay scrollbar in Notes and Sametime.

Tuesday, 8 January 2013

Lotus Notes migration to Ubuntu 12.10 64bit

Having made the first steps in securing the W530 (see previous posts) and having set up further security & compliance related components as dictated by corporate policy, I was ready to start migrate my data from the old t400 thinkpad to the new box. The most critical part of the data to be migrated is Lotus Notes data. I am proficient with the operating system and all other software components I have on my old laptop but I am not a Lotus Notes experts at all, it is mostly a black box for me and a potential source of trouble challenges. The old laptop is running Ubuntu 10.04 32bit while I set up the new one with Ubuntu 12.10 64bit as mentioned in previous posts.

Backing up Lotus Noted Data

To clarify the scope, I was migrating from my data from a Lotus Notes 8.5.2 environment on Ubuntu 32bit to a 8.5.3 one on 64bit. Additionally, I also decided to clean the cruft accumulated over the last 7 years... I checked for defunct workspace icons and any obsolete local replicas and removed those. Then sorted all email in my inbox to their appropriate folders (usually, I do this while waiting for connecting flights at airports) and archived all emails from before 2013. I also changes the archive settings to save any new items to a new archive file. After cleaning up, I was ready to migrate my data.

First and foremost, do know what you have to migrate. I wanted to migrate the minimum required amount of settings but not less. Lotus Notes stores data in the proprietary Network Storage Facility format, these files have nsf os NSF extension. These files are self contained document-oriented databases storing semi structured data - the application logic and view definitions are also incorporated. There are some files like cache.nsf or log.nsf that do not carry valuable information, others will have to be backed up.

My mail file is just a normal nsf file, however it is encrypted with the private key stored in my ID file. It is a good practice to keep a backup of your ID file anyway, I have seen people believe that remembering their Lotus Notes password would be enough to recover a corrupt or deleted ID file...

Workspace definitions and icons are stored in the file desktop8.ndk, which also needs to be backed up. I also made sure to backup my custom signatures.

There are some settings stored in the eclipse workspace, but I did not migrate any of those. I was trying hard to start with an clean as possible environment and ended up with using the following command to create the backup on the old machine:


# backup all databases except log.nsf, the id file, desktop file and signatures
find ~/lotus/notes \( -iname '*.nsf' -o -iname '*.id' \
  -o -iname '*.htm' -o -iname 'desktop8.ndk' \) \
  -not -iname 'log.nsf' -print0 | tar cspzf \
/media/SAMSUNG/backup/t400-20130103/lotus_notes_data-sparse.tar.gz --null -T -

Installation and migration

On the W530, before restoring the files, I installed Lotus Notes and let it create the data directory, and initialize the ~/lotus/notes/data folder with default files. To do so, I started it, waited until the splash screen disappears and the first application window appears asking me inputs for initial setup. At this very first screen, I pressed cancel to abort and exit. I made sure all Lotus Notes processes terminated - one can use the "Lotus Notes Zap" utility for this.

Then I extracted the data and issues some more tweaks using the commands listed below. Other minor customization like setting the theme and default font was done from within the graphical interface, but nothing worth to document...


cd ~ # make sure we are in the right directory
tar xzf /media/tibi/SAMSUNG/backup/t400-20130103/lotus_notes_data-sparse.tar.gz
# start Lotus Notes... and choose "office network" as current location, look around, then exit.

chmod -x ~/lotus/notes/data/*.gif # just some cosmetics...
# I use English language, but more with my preferred date format
cat <<EOF >> ~/lotus/notes/data/notes.ini
DateOrder=YMD
ClockType=24_HOUR
DateSeparator=-
TimeSeparator=:
EOF
# restart notes and verify the date format used in your inbox.

The result is a fully functional install with all the data migrated, however it is lacking a few more tweaks to improve the visual appearance of native widgets. These tweaks are going to be covered in the next post.