Saturday, 22 June 2013

OpenCL, Python and Ubuntu 12.10 (Part 2)

Using multiple OpenCL devices for computation intensive tasks might turn out to be a bit more challenging. Distributing load between heterogeneous GPUs is straightforward, however, ensuring that one gets the cumulative performance of all the GPUs to the same level as the sum of the individual devices is a bit tricky. At least with the drivers that ship with Ubuntu 12.10.

The current case that motivated this post is tweaking a configuration with multiple AMD GPUs of different generations. There is much controversial information available on whether an X session is required or not, whether to use crossfire cables or not, and whether dummy-plugs are required or not. It also seems to be less certain if and how cards of different generations play together.

Having searched the internet for utilities that help in debugging OpenCL related issues I decided to create my own version, that is not extremely chatty but provides a quick overview of the devices recognised along with their most important performance related parameters. The code is based on a script I found on a forum, extended and customized to match my needs.


#!/usr/bin/python

# 2013-04-03 03:35:03 

import sys
import os
import time
import platform
import imp

def getPyOpenCLPath():
    try:
        file, pathname, descr = imp.find_module('pyopencl')
    except:
        pathname = 'Not found'
    return str(pathname)

path = getPyOpenCLPath()

print 'opencl-info @ %s' % time.asctime()
print 'Operating System: %s %s' % (platform.system(), platform.dist())
print 'Python Version: %s (%s)' % (platform.python_version(), platform.architecture()[0])
print 'PyOpenCL Path: %s' % path

if path == 'Not Found':
    print 'Exiting' 
    sys.exit()

try:
    import pyopencl
    import pyopencl.version
except:
    print 'Unable to load PyOpenCL! OpenCL not supported?'
    sys.exit()
 
print 'PyOpenCL Version: %s' % pyopencl.VERSION_TEXT

try:
    platforms = pyopencl.get_platforms()
except:
    print 'Cannot get platform.'

if len(platforms) == 0:
    print 'No OpenCL platforms found!' 
    sys.exit()

count = 0

for i,p in enumerate(platforms):
    print ''
    print '[cl:%d] %s' % (i, p.name.replace('\x00','').strip())
    for k in ['vendor', 'profile', 'version']:
        print '    %s %s' % ((k + ':').ljust(16), getattr(p,k))
    print ''

    devices = platforms[i].get_devices()
    if len(devices) > 0:
        # Iterate through devices
        for j,d in enumerate(devices):
            count += 1
            print '    [cl:%d:%d] %s' % (i, j, d.name.replace('\x00','').strip())
            print '        type:                %s' % pyopencl.device_type.to_string(d.type)
            print '        memory:              %d MB' % (d.global_mem_size//1024//1024)
            print '        compute units:       %s' % d.max_compute_units
            print '        max clock:           %s MHz' % d.max_clock_frequency
            print '        max work group size: %s' % d.max_work_group_size
            print '        max work item size:  %s' % d.max_work_item_sizes
            # Iterate through device info
            #for name in filter( lambda x: not x.startswith('_'), dir(d)):
            #    try:
            #        print(name + ': '+ str(getattr(d, name)))
            #    except:
            #        print(name + ': (skipped)')

After trials and errors and a bit of tinkering I was able to set up the environment to run fine without an X session, without crossfire cables and without any dummy plug, and yield optimal performance. Documenting the configuration details are beyond the scope of this post, however the script above is provided to help others in OpenCL related debugging and optimisation.