codecs.org Part of GStreamer Family
home
SpeciaLib
libcodec
bitstream
Hyperopt
nopcodes
 
SourceForge Logo
 
 
 
 
 
 

What is SpeciaLib?

It's a set of tools and routines to make implementation of run-time specializable libraries easier. It has several major modes, which can all interact with each other to varying degrees:

  • Function specialization: construct the necessary infrastructure for run-time selection of various implementations of the same function
  • Object specialization: enable run-time specialization of whole objects, where an entire vtable of methods has to be switched simultaneously
  • Meta-specialization: take code that utilizes other specialized functions and build separate version for each architecture, allowing deep inlining while retaining the ability to specialize at run-time

How would I use it?

Using SpeciaLib to build a library is not especially complicated, but significant care must be taken to make the build process work properly, as well as controlling the various versions of code produced. Actualling making use of a specialized library ranges from trivial to ludicrously easy. On the other hand, stacking multiple specialized libraries on top of each other significantly magnifies the complexity and requires a detailed understanding of what code is generated and how it's expected to work.

To start off, we have to have a routine that needs to be specialized:

void average_U8_U8 (uint8_t *dst, uint8_t *src, int width) {
  int i;

  for (i=0; i<width; i++)
    dst[i] = (src[i] + dst[i] + 1) / 2;
}

As you can see, there's significant opportunity for optimization there, usually by converting the routine into assembly and using the various SIMD extensions (MMX, Altivec, etc.) available on various processors. Using the fastest would restrict you to the latest processor, while using the lowest common denominator would be excruciatingly slow on every processor.

Only a few modifications are necessary in order to turn this function into an implementation for a specific architecture:

/*****  average_c.c  *****/

SL_FUNC void _average_U8_U8__c (uint8_t *dst, uint8_t *src, int width) {
  int i;

  for (i=0; i<width; i++)
    dst[i] = (src[i] + dst[i] + 1) / 2;
}

The SL_FUNC tag is used to set various options such as static inline or extern inline depending on where the code is included. The _ prefix marks it as a "private" function, and the __c indicates that this is the "C" version, which is always present and is considered the baseline.

Of course, what good is that without another implementation? Let's do MMX: (yes, I know that's not optimal code, it's old test code)

/*****  average_mmx.c  *****/

SL_FUNC void _average_U8_U8__x86_mmx (uint8_t *dst, uint8_t *src, int width) {
  int i;
        
  for (i=0; i<width; i+=8)
    __asm__ __volatile__ (
        "movq %0, %%mm0\n\t"            // load first
        "movq %1, %%mm1\n\t"            // load second 
        "movq %%mm0, %%mm2\n\t"         // dupe first
        "por %%mm1, %%mm2\n\t"          // or them together
        "pand %2, %%mm2\n\t"            // mask off first bit of each byte
        "pand %3, %%mm0\n\t"            // strip off the first bits of each
        "pand %3, %%mm1\n\t"            // strip off the first bits of each
        "psrlq $1, %%mm0\n\t"           // shift right by one bit
        "psrlq $1, %%mm1\n\t"           // shift right by one bit
        "paddusb %%mm1, %%mm0\n\t"      // add second reg to first
        "paddusb %%mm2, %%mm0\n\t"      // add error values
        "movq %%mm0, %0\n\t"            // save results
    
        : "=m"(dst[i])                  // %0
        : "m"(src[i]),                  // %1
          "m"(sl_arch_x86_mmx_const_byte_0x01), // %2
          "m"(sl_arch_x86_mmx_const_byte_0xfe)  // %3
        : "memory"
    );
}

Now we have to build the files necessary to utilize these routines in application code. To do this we need to use the sl-functions-stub script:

# sl-functions-stub -l average average_c.c average_mmx.c

This will create two files, average_sl.h and average_sl.c, which perform all the wrapping for the implementations. They will assume the library is called libaverage.so, as well as that all the files and headers will be installed into a average/ subdirectory of some common include prefix. The header looks like:

/*****  average_sl.h  *****/
/***** This file is AUTOGENERATED by SpeciaLib, DO NOT EDIT! *****/

#ifndef __SL_AVERAGE_H__
#define __SL_AVERAGE_H__

#include <specialib/sl_preamble.h>

The header specalib/sl_preamble.h contains the definitions of things such as SL_FUNC, but has two sections. If __SL_IN_SPECIALIB is defined, SL_FUNC is blank, allowing the functions to be defined verbatim, such that they are compiled in with normal symbols into the library. If it is not defined, SL_FUNC is set to extern inline. With GCC, this allows the function to be inlined into code that uses it (assuming the appropriate optimization flags were given), but the function will not otherwise show up in the resulting object files. Any non-inlined calls to these functions will be satisfied by later linking to the library that does contain them.

/* function prototypes and section declarations */
#if defined(SL_COMPILE_ARCH_X86) | defined(SL_COMPILE_FEATURE_X86_MMX)
void _average_U8_U8__x86_mmx (uint8_t *dst, uint8_t *src, int width) __attribute__ (".sl_arch_x86_mmx")));
#endif /* defined(SL_COMPILE_ARCH_X86) | defined(SL_COMPILE_FEATURE_X86_MMX) */
void _average_U8_U8__c (uint8_t *dst, uint8_t *src, int width) __attribute__ ((section (".sl_arch_c")));

The function prototypes at the top indicate that the various functions should be located in specific sections, such as .sl_arch_x86_mmx. The compiler and linker will keep all the functions in these sections contiguous. This means that if there are a large number of distinct architectures supported by a large library, the user doesn't generally have to worry about the loader bringing in every single page of the library, just those containing the functions selected for use.

#include <average/average_c.c>
#include <average/average_mmx.c>

Next, the C files containing the implementations are brought in. Presence or absence of __SL_IN_SPECIALIB determines, through the prior inclusion of sl_preamble.h, whether the functions are labelled as extern inline or not.

/* function pointer types (__fpt) */
typedef void (*__fpt_average_U8_U8) (uint8_t *dst, uint8_t *src, int width);

The meat of the file starts with the function pointer types. These are basic typedefs tailored to each function prototype that are used throughout the file.

#ifdef __SL_IN_AVERAGE
  /***** average_U8_U8 *****/
  static const _sl_funcptr_entry __fpd_average_U8_U8_entries[] = {
    { SL_ARCH_C, _average_U8_U8__c },
#if defined(SL_COMPILE_ARCH_X86) | defined(SL_COMPILE_FEATURE_X86_MMX)
    { SL_ARCH_X86|SL_FEATURE_X86_MMX, _average_U8_U8__x86_mmx },
#endif /* defined(SL_COMPILE_ARCH_X86) | defined(SL_COMPILE_FEATURE_X86_MMX) */
    { 0, 0 }
  };

Within __SL_IN_AVERAGE we find several structures. The first one is __fpd_average_U8_U8_entries, which contains a line for each implementation, along with a set of symbols defining the architecture for which it was written. Those functions which may not always be compiled (for instance a PPC/Altivec version) are culled by the preprocessor when their SL_COMPILE_... defines are missing. At the bottom we have a sentinel of zeros.

  _sl_funcptr_def __fpd_average_U8_U8 = {
    _average_U8_U8__c,
    (_sl_funcptr_entry *)__fpd_average_U8_U8_entries,
    0, 0,
    0, 0,
  };

The second structure, __fpd_average_U8_U8, is the complete function pointer definition. It contains the default (always the "C" implementation) pointer, a pointer to the entries list defined above, and a number of zeros reserved for other purposes to be described later.

#else /* ! __SL_IN_AVERAGE */
  extern _sl_funcptr_def __fpd_average_U8_U8;

#endif /* __SL_IN_AVERAGE */

Alternately, if __SL_IN_SPECIALIB is not defined, a prototype for the function pointer definition is provided for the benefit of the compiler and linker.

    
#define average_U8_U8 (*((__fpt_average_U8_U8 *)(&__fpd_average_U8_U8)))

Lastly, we have a #define that provides the final symbol for users of the library. It is simply a cast of the current function pointer residing in the definition to the appropriate type. Any user code directly calling average_U8_U8 will have this code inserted, thus utilizing the function pointer to call the currently selected implementation.

#ifdef __SL_SUBORDINATE_LIBRARY
#undef __SL_SUBORDINATE_LIBRARY
#include <specialib/sl_preamble.h>
#endif

The trailing epilogue checks __SL_SUBORDINATE_LIBRARY, which is set in case this file is included from within another SpeciaLib project. Its purpose is to revert any changes that may have been made to the various defines like SL_FUNC. Note carefully that __SL_SUBORDINATE_LIBRARY is promptly undefined, thus requiring that a SpeciaLib-based library including multiple subordinate libraries must redefine it just prior to each new inclusion.

#endif /* __SL_AVERAGE_H__ */

The C file is much simpler, containing an include for the local config.h as well as average_sl.h. The key to this particular file is the definition of __SL_IN_AVERAGE, which forces the header to define the various __fpd_... structures so they can be physically in the resulting library, and __SL_IN_SPECIALIB to force sl_preamble.h to contain the appropriate function prefixes, forcing the bodies of the functions into the library.

/*****  average_sl.c  *****/
/***** This file is AUTOGENERATED by SpeciaLib, DO NOT EDIT! *****/

#include "config.h"

#define __SL_IN_AVERAGE
#define __SL_IN_SPECIALIB
#include "average_sl.h"

Building the library is pretty trivial once these files are in place. Simply put average_sh.c in the list of files to compile for the library and you're done.

If you want to have a more normal header filename to include, simply added a header average.h:

/*****  average.h  *****/
#include <average/average_sl.h>

Just be sure that both headers and the implementation files (average_c.c and average_mmx.c) get installed into the include path your application is being put into. If you don't, any application bringing in the headers will abruptly find themselves at a compile error as those files can't be located.

Using the specialized library

In its simplest form, using the library consists of simply calling the function in question. The application won't see any difference from simply using the "C" version directly other than the extra indirect of the function pointer call. However, this doesn't allow your application to take advantage of the other implementations available.

#include <stdint.h>
#include <average/average.h>

int main () {
  uint8_t src[64],dst[64];

  /* . . . */

  average_U8_U8 (dst,src,64);

  /* . . . */
}

If you decide you want to select a specific implementation for subsequent use, you can call sl_select_arch() with the architecture ID:

#include <specialib/specialib.hsl_select_arch (average_U8_U8, SL_ARCH_X86 | SL_FEATURE_X86_MMX);

Note that you must bring in specialib/specialib.h in order to use these functions. It is not brought in by the average_sl.h header, and is unlikely to be brought in by any of the implementation files. Among other things this makes libraries built with SpeciaLib capable of being run in environments where libspecialib.so is entirely missing, so long as none of the applications utilizing such libraries ever use any of the SpeciaLib functions.

Selecting the fastest routine

This has not yet been implemented, please check back later <g>

Meta-specialization

Some code, such as the motion compensation core of an MPEG decoder, or any file parser, makes extensive use of specialized routines in such a way as to highlight the costs of making an indirect function call. In general, the cost of making a function call at all are quite high on most architectures, making them something you want to minimize in high-performance applications.

Meta-specialization takes a template source file, and generates versions of the code for each available architecture, tweaked to make use of the inlinable versions of specialized routines. This is why SL_FUNC expands to extern inline in the default case, and why the full source of each of the implemenations is included into average_sl.h even when being included into an application. Every implementation is available for inlining in code written to support it.

Let's take a trivial example, to keep things simple:

/*****  avgsize.c  *****/
#include <average/average.h>

void avgsize (void *dst, void *src, int width, int size) {
  if (size == 1)
    average_U8_U8 ((uint8_t *)dst, (uint8_t *)src, width);
  else if (size == 2)
    average_U16_U16 ((uint16_t *)dst, (uint16_t *)src, width);
}

Obviously, if such a routine were run in a tight loop, a large amount of time (quite possibly more than the actual useful execution time) would be wasted in function-call overhead. So we make it meta-specializable:

/*****  avgsize.c  *****/
. . .

void _avgsize__ARCH (void *dst, void *src, int width, int size) {

. . .

In order to construct the metaspecialized files, we have to call sl-metaspecialize:

# sl-metaspecialize avgsize.c
 avgsize_c.c avgsize_x86_mmx.c

The resulting two files (in this case) look remarkably similar:

/*****  avgsize_c.c  *****/
#include <average/average.h>

void _avgsize__c (void *dst, void *src, int width, int size) {
  if (size == 1)
    _average_U8_U8__c ((uint8_t *)dst, (uint8_t *)src, width);
  else if (size == 2)
    _average_U16_U16__c ((uint16_t *)dst, (uint16_t *)src, width);
}
/*****  avgsize_x86_mmx.c  *****/
#include <average/average.h>

void _avgsize__x86_mmx (void *dst, void *src, int width, int size) {
  if (size == 1)
    _average_U8_U8__x86_mmx ((uint8_t *)dst, (uint8_t *)src, width);
  else if (size == 2)
    _average_U16_U16__x86_mmx ((uint16_t *)dst, (uint16_t *)src, width);
}

The majority of the magic in sl-metaspecialize has to do with locating each and every specializable function used in the file, and determining which architectures are sufficiently represented in order to ensure that there are no missing functions. At some point in the future it will be smart enough to select the fastest available routine that will still function on the given processor, so if there is no MMX version of average_U16_U16 it will fall back to the "C" version in avgsize_x86_mmx.c

This gives us two implementations of the routine, but how do we use them? We can apply normal function-level specialization to the these generated routines. First we have to put a barrier around the included average library so the two don't mangle each other's state:

/*****  avgsize.c  *****/
#define __SL_SUBORDINATE_LIBRARY
#include <average/average.h>

. . .

After re-running sl-metaspecialize, we can call sl-functions-stub to construct the necessary files:

# sl-function-stub -l avgsize avgsize_c.c avgsize_mmx.c

And finally, we can compile avgsize_sl.c into libavgsize.so, add the stub avgsize.h header, and call it a day.