ClusterAlg.gif (577 bytes) The ATLAS Level-1 Calorimeter Trigger

[Home] [Architecture] [Meetings] [Participants] [Publications] [Software] [Search]


Introduction

The OS9 buffer manager used in OPAL has been used as a foundation for this new buffer manager. It has been re-written in C, originally using the virtual operating system [VOS] as an interface. The object of the exercise was to get a new buffer manager that was portable, and yet keep it as similar as possible to the existing OS9 one so that users might be encouraged to make the change.

Version 010 of the portable buffer manager is available for general use. There is a small example skeleton suite that goes with it. It has successfully run in a large number of systems including OS9 [native or VOS], VMS [native, VOS or POSIX], Apollo DN10K, SUN IPX, Ultrix, LynxOS, Alpha/AXP running VMS, Alpha/AXP running OSF1.2 .

It should be a comparatively straightforward exercise to get it to run on any other UNIX systems, given that most of them are drifting towards POSIX compliance. Version 10 has the following changes from version 9:

  • NATIVE_VMS support - so runs on AXP systems as well as VAXes.
  • Extra routines pbm_locase, pbm_upcase, pbm_nodename.
  • Proper standard C function prototypes are used if __STDC__ is defined. This makes writing of correct code a bit easier.

This document is divided up as follows:

  • Buffer manager concepts
  • The function calls [NOTE: only C callable at present]
  • Setting up on a new system
  • Porting considerations
  • Problems and restrictions
  • Future developments

PBM CONCEPTS

What is needed is a simple way of allowing events to be passed through a variable number of stages before being passed up to a higher level. A user program ought to be able to access the event fairly easily without worrying about where it might be, or where it came from.

The implementation is various modules prefixed PBM_ built into a linkable library. You have to write your own suite of programs to make use of these. An example SK.010 exists to show you the necessary calling sequences.

Before describing the calls, some definitions are needed to clarify what is meant:

  1. A simple STREAM is a set of sequential processes through which an event passes. It must start with a PRODUCER which gets empty memory and fills it up. There are then any number of 100% consumers, followed by any number of random consumers.
  2. Starting with version 006 it is possible to have more complex STREAMs containing FARMing stages - which farm events out to a choice of various FIELDS. The FIELDS pass events to the next normal consumer, which may also REORDER events to the original.
  3. Each of the processes mentioned above is referred to as a STAGE. All STAGES have unique names of up to 10 characters.
  4. Each stage is described internally by a QUEUE HEADER which contains information about how many events are waiting for this stage, the current event etc.
  5. It is expected that the last 100% consumer in a stream will pass the event to another stream which may be within the same computer or in another one further up the event building chain.
  6. It is expected that random consumers may want to be enabled and disabled while data taking is active.
  7. Events are passed around by means of DESCRIPTORs. A descriptor contains useful items like the address and size of the data buffer, the length and serial number of the event etc. The buffer manager routines described below often pass the descriptor number as a low overhead way of passing the entire event.
  8. The number of stages and streams should be configurable. The time at which this happens is at INITIALISATION. The initialisation program runs to completion before any of the individual stages are created.
  9. DESCRIPTORs and QUEUE HEADERS are accessed by means of C struct declarations. The definitions are in the PBM_DEFS.H file. Users should include this into their programs.

FUNCTION CALL DESCRIPTIONS

Many of the functions return an integer value of zero if successful. Other integer codes indicate an error as defined in the include file pbm_defs.h . There are two sets of routines described below: those expected to be called by users, and the internal routines. The internal routines may be useful to hide operating system dependent functions.

Many of the routines require a pointer to a struct pbm_ptr as one of their arguments. This provides a tidy way of allowing the user to have several streams and keep the details himself. Normally you should not modify anything in this structure.

The descriptions below are given in alphabetical order. Unless otherwise mentioned these are all int functions.

User Routines

pbm_attach(char *name, struct pbm_ptr *uptr,void (*closedown)() )

Each STAGE has to use this call to announce its existence. The <name> should correspond to one of the names declared during the initialisation phase. If the call succeeds it will set up the contents of the <struct pbm_ptr *uptr> so as to be useful to all other callers and return zero. The <closedown> is a pointer to the user's closedown function - all stages MUST have such a routine.

If the requested STAGE does not exist pbm_attach returns a value of -1. If serious errors occur trying to attach it will give an indication as to the reason and exit.

If a process has successfully issued a pbm_attach call and then subsequently crashes while processing an event, unless you take action the stream will hang fairly soon. If the consumer that crashed is restarted then events will start flowing again, but the consumer will receive the SAME event as the one that caused the crash. The bit PBM_D_CRASH will be set in the pbm_d_evstat word within the descriptor, and it is up to the user to take appropriate action.

pbm_detach(char *name, struct pbm_ptr *uptr)

- Programs that have previously attached themselves should use this when closing down. In fact each buffer manager user has to have a closedown() function that calls pbm_detach(). If a program exits without calling this routine then the stream will hang up, whereas after detaching it can exit and the manager will not try to send events to this STAGE.

pbm_getbuf(int *size, int *index, struct pbm_ptr *uptr)

- This may only be called by a PRODUCER stage. <size> is the size of buffer required. If the call is accepted then on return <index> will contain the number of a descriptor containing details about the data buffer granted. <uptr> was setup by pbm_attach(). Returns zero when a buffer of suitable size has been created, otherwise returns an error PBM_ER_xxxx as defined in pbm_defs.h .

Note that this routine may have to wait for free memory. The <size> granted will be rounded up to the next allocation quantum boundary.

pbm_getevt(int *index, struct pbm_ptr *uptr)

Other stages get their events with this call. On return <index> countains the descriptor number relating to this event. Within this descriptor is an int pbm_d_evstat which may have some interesting bits set in it, like PBM_D_FIRST if first in run, PBM_D_SWEEPER if sweeper event, and PBM_D_CRASH if this event has already caused a stage to crash while processing it.

The descriptor also contains a buffer address in pbm_d_bufadr which is relative to the start of the buffer [held in uptr.pdsa.memory in the example suite]. This MUST be a relative address since other consumers may see the main data buffer at a different process virtual address.

void pbm_getssm(int *result)

- Sets <result> to the time of day in 'seconds since mignight' .

void pbm_init( FILE *fp)

- This is called at initialisation, and sets up all the stages, empty descriptors, memory areas etc. It is given data in the FILE *fp passed to it. The information read from the file is described in the following example:

  
     20000      Size of memory buffer in bytes
     500        Size of allocation quantum
     4          Number of STAGES in this stream
     25         Number of DESCRIPTORS
     1  16      MODE and ITEM - see below.
     1          Number of  this stream
     1 cv_int   Type and name for each stage. Type 1 is a PRODUCER
     2 cv_fetch type 2 is 100% consumer, type 4 is random consumer.
     2 cv_cdxmit
     4 cv_hist

If there are more streams to declare - up to PBM_MAXSTR - you make another call to pbm_init() There is no limit to the number of STAGES in a STREAM. The STAGE type is a bit encode combination of:

     1   PRODUCER - must be first in stream
     2   CONSUMER - 100% consumer
     4   RANDOM consumer - eg for monitoring jobs
     8   FARM.  Producer  or  consumer  that  passes  events  to  one  of  the
         following  stages  of  type  FIELD that are attached.  At present the
         choice of which FIELD gets the event is on basis of lowest last event
         number.
     16  REORDER.  A consumer with REORDER set will receive  events  in  order
         [as  defined in pbm_d_evsn ] irrespective of how events are passed to
         it.
     32  FIELD.  An event is guaranteed to go through one of  [possibly  many]
         field  stages, but you don't know which.  Generally the stage after a
         group of FIELDs will REORDER, but this is not required.

Users are cautioned that there is a potential deadlock if events are so far out of order that the entire stream fills up while waiting for a low numbered event to appear. It is up to the user to declare a sensible stream configuration, only mimimal checks are done by pbm_init

When this call returns it will have created several shared memory areas. In OS9 these will be data modules. In other environments they might be mapped global sections.

MODE and ITEM need explaining. If MODE is 1, then the main memory buffer will be inside the shared data module and the start address of the data part will be rounded up to a multiple of ITEM bytes. If MODE is 2 then the main memory buffer will be in external memory starting at address ITEM. In this case the shared area created will only contain pointers and the occupancy bitmap. It is up to the caller to make sure that the external memory is available. Users should take note of the following:

  1. When reserving memory buffers the requested amount of memory will be ROUNDED UP to the next whole allocation quantum. If this occurs and the memory is external this could cause embarrassment. To avoid this problem make sure that <allocation quantum> is an exact divisor of <requested memory>.
  2. Searching the bitmap for contiguous free bits can be very time consuming if the bitmap is large. This can have a significant effect on deadtime. Try to keep the bitmap less than 100 bits.

pbm_relevt(int index, struct pbm_ptr *uptr)

- When a stage has finished with an event it uses this call. <index> should be the same that was returned in a previous pbm_getbuf or pbm_getevt.

pbm_relmem(int index, struct pbm_ptr *uptr)

- This is used to release unused memory. Once a complete event is built - which may take one or more stages - unused memory can be returned to the buffer manager. This call does that by using the difference between pbm_d_bufsize - the size originally granted - and pbm_d_evlen - the actual event length. These items are both in the descriptor. If the user has been stupid enough as to overfill the buffer a fatal error will occur.

void pbm_sleep(int milliseconds)

- Use this to wait for a bit. This is operating system independent and arranges for your closedown routine to get called if an appropriate signal arrives during the sleep.

pbm_start(int lastevsn, struct pbm_ptr *uptr)

- Can only be called by a PRODUCER stage, and is required so as to set up the event re-ordering logic at start of run.

Internal Routines

void pbmq_add(int *header, int descriptor, struct pbm_ptr *uptr)

- Adds the <descriptor> to the end of the queue with header at *header.

void pbmq_addo(int *header, int descriptor, struct pbm_ptr *uptr)

Adds the <descriptor> to the queue with header at *header. Inserts in correct event number order.

pbmq_rem(int *header, struct pbm_ptr *uptr)

- Returns the descriptor number of the descriptor just removed from the head of the queue. If the queue had no contents it returns -1.

pbmq_remo(int *header, int evsn, struct pbm_ptr *uptr)

- Removes from queue in strict event number order. Requires that previous consumer(s) use the pbmq_addo() to insert properly. If the desired event is not there, or the queue is empty it returns -1. If the event number at the head of the queue is too small this signifies a logic error in the program - so an error exit will occur.

void pbm_edecode(int error_code)

- This only exists in POSIX environments and is required to try to make sense of the silly error numbers that POSIX compliant calls return.

void pbm_locase(char *string)

- Converts the string to lower case.

void pbm_lock(struct pbm_ptr *uptr)

- Sets an interlock on the data structures of the buffer manager.

void *pbm_mapto(char *name, int size, int *error)

- Use this to map to a shared memory area that has previously been created with pbm_mkshm(). If the area does not exist <error> will be filled with an appropriate code. Returns a pointer to the start of the area.

void pbm_mksem(char *name, int *ids)

- Creates a semaphore of the given <name> and returns the id into <ids> . The <ids> must be in a shared memory module or else subsequent lock/unlock calls will not work.

void *pbm_mkshm(char *name, int size)

- Will create and map to a shared memory area called <name> of length <size>. If this fails the program will give error and exit, otherwise it returns the start address.

char *pbm_nodename()

- Returns a pointer to a string containing the nodename of the CPU on which the call is issued. Where this nodename comes from is very system dependent.

void pbm_notify(struct pbm_stage *sp)

- Use this to tell the STAGE pointed to by *sp that it now has an event to be processed. This will not be called unless the buffer manager KNOWS that an event has just been placed on the appropriate queue.

void pbm_semclose(int id)

- Use this to close the semaphore opened by pbm_semopen. The <id> should be the one in per-process space returned by pbm_semopen.

void pbm_semopen(char *name, int *id, int *ids)

- Use this to get access to a semaphore previously created by pbm_mksem(). <ids> should point to the value returned by pbm_mksem() and must be in shared memory. <id> should be in per-process space.

void pbm_setexit(struct pbm_ptr *uptr)

- Called from pbm_attach() to set up an appropriate exit handler. This may involve creating AST routines that get called at the relevant time. All user programs that use the buffer manager MUST have a closedown() routine. This might be called at AST level - so be careful what you do there.

void pbm_unlock(struct pbm_ptr *uptr)

- Releases the data interlock.

void pbm_unmap(void *address, int size))

- Call this with the address returned by pbm_mkshm() or pbm_mapto() to reverse the effect. In VOS environments this call does nothing since it does not seem to be required. The <size> should be equal to the size originally mapped in pbm_mapto().

void pbm_upcase(char *string)

- Converts the string to upper case.

void pbm_waitsig(struct pbm_ptr *uptr)

- The routines pbm_getbuf() and pbm_getevt() will call this when they have no events at present. When another routine calls pbm_notify() then this caller will return.

Dynamic Storage Allocation

These routines are needed for the buffer manager to be able to operate. They can be used independently. As in the pbm_ routines there is common use of a struct pdsa_ptr, and in buffer manager applications this is usually hidden inside the struct pbm_ptr.

pdsai_free(int position, int numbits, struct pdsa_ptr *uptr)

- Release bits from the bitmap. Returns -1 if they were not all marked as used.

pdsai_get(int numbits, struct pdsa_ptr *uptr)

- Searches the bitmap for <numbits> free contiguous. If successful marks them as used and returns the starting bit number. If not enough free returns -1.

void pdsa_attach(char *name, struct pdsa_ptr *uptr)

- Is called usually from pbm_attach() and simply sets up the linkage between the caller and the shared memory previously created by pdsa_init(). The <struct pdsa_ptr *uptr> returned to the caller will have been set up for use in subsequent pdsa_get() and pdsa_free() calls. This makes sure that a single program can manage several unrelated memory areas.

void pdsa_detach(struct pdsa_ptr *uptr)

- Reverses the effect of pdsa_attach. Usually called from pbm_detach() but used elsewhere in some cases.

pdsa_free(char **begadr, char **endadr, struct pdsa_ptr *uptr)

- Is used to free the memory between <begadr> and <endadr>. The calling sequence specifies two addresses rather than a start address and length because of potential problems with rounding.

pdsa_get(int *numbytes, char **address, struct pdsa_ptr *uptr)

- Is used to get memory. <numbytes> is the request, which will get rounded up to the next allocation quantum boundary. < address> is the start of the allocated memory. Returns zero if successful. If not enough free memory available, returns -1. It is up to the caller to wait until a more appropriate time and try again.

pdsa_init(char *name, int size, int quantum, int mode, int - item)

This is called at initialisation time to set up the data structures and create the shared memory area in which the main allocatable memory will reside. <quantum> is the number of bytes in the minimum allocation. <size> is the amount of memory desired, which may well be rounded up on output to be the actual amount allocated.

<name> gives the name of the shared memory area to be created.

<mode> is either PDSA_INTERNAL or PDSA_EXTERNAL,

<item> is the rounding for the base address of the data part of the buffer for _INTERNAL and is the address of the external memory to be used as data if _EXTERNAL.

[Read the notes describing the pbm_init() call to make sure you understand these arguments. ]

SETTING UP ON A NEW SYSTEM

The common source is maintained in a pair of text files by means of a simple suite of text library manipulation programs. [Also fully portable!] These are PBM.010 for the library, and PBM_PROGS.010 for the executable programs. To build on a fresh system you need these sources, plus a build job relevant to the current system. The build job makes sure that the correct C options are selected and that the correct set of operating system specific calls [where relevant] are included in the pbm library.

When the build job completes you will find that it has generated a linkable library pbm.olb or pbm.l etc depending on the system. It may also have made some of the following programs:

  1. pbm_delgbl is only for the native VMS case, and arranges to delete a shared memory area, which is a mapped global section in VMS terminology.
  2. pbm_dump gives a summary of the state of any existing streams, whether programs are attached to the stages, usage, queue lengths etc.
  3. pbm_kill is used to kill an attached program by name. Using this makes sure that the closedown() routine is called and the stage will detach properly from the stream. If you use whatever the operating system provides to kill such a program ( or it crashes for some reason) then the stream will hang since it still appears to be attached.
  4. pbm_setnode is only required in OS9 systems, and is used to create a data module containing the nodename. This solves problems in getting a nodename which might be from a TP4, TCPIP, OS9NET etc data base depending on how the OS9 system is configured.
  5. pbm_xqdispb can be run to get a dynamic display of the usage of various parts of the buffer manager. This does not work yet on all supported systems.

There is a simple skeleton suite which is maintained in the same way. After building it you will have some dummy event generator and consumer programs and some rudimentary control programs. These are ususally used to make sure that PBM is successfully built on the target system.

Building the skeleton suite on a particular system will require additional modules in the link step, particularly with regard to the error message utility. Rather than using direct EMU calls, the skeleton suite relies on a module msg_srs containing the following routines:

msg_init(char *suite_name, char *program_name)

- Use this to initialise the message system. Usually this will cause an emu_init() call to be done on your behalf. If NOEMU is defined when compiling module MSG_SRS then messages come to standard output instead.

msg_out(char *severity, char *text_string)

- Use this rather than direct print statements to send out messages. Normally this will cause the message to be injected into the EMU system, where the severity together with the suitename etc. can be used for routing and filtering messages to different destinations. 'severity' is a single character which causes properties to be attached to the message as follows:

    severity   property        example of use
     I          INF             For local display only
     W          WARN            eg occasional readout failure
     F          FATAL           eg bus-trap causing suite closedown
     D          DEBUG           Occasionally useful
     O          OPER            eg BOOT_OK message
     L          LOG             to go to logfile only.
     S          SUCCESS
     E          ERROR
     R          RC              for run control related messages

Using msg_srs as an interface between the skeleton and the EMU provides a very easy way to rebuild for use on a system without EMU by simply replacing msg_srs by something much simpler.

Now that all supported systems are using the portable version of EMU known as EMUX, the necessary msg_ routines are to be found in the EMUX_SHARE, emulib.l or libemu.a depending on the system.

4 PORTING CONSIDERATIONS

The pieces of code that differ in the different operating environments have been removed into a small number of routines. At present these contain conditional code for _POSIX_SOURCE, VOS, OS9 and NATIVEVMS. If many different systems are to be supported it might be better to have independent versions of these routines rather than one full of conditional code. However it is hoped that the POSIX option will predominate.

User code should NOT contain conditional code. If you find this is necessary then it signifies that extra routines ought to be provided within PBM.

4.1 POSIX Environment

The version of POSIX supported is IEEE 1003.1 - the only agreed standard, and a few of the routines in the 'real time extensions' from IEEE 1003.4 draft 9. It is known that this is being actively revised so that changes are inevitable. I have tried to keep the 1003.4 compliant calls down to the minimum.

Starting with version 007, the routines which make POSIX real time calls have been extracted into a small independent library. One version of this library exists for 1003.4 draft 9, and is used in VMS POSIX and LynxOS environments. A similar library for the Apollo DN10000, ULTRIX and for SUN exists using appropriate system calls. Moving to later drafts of 1003.4 should only be a matter of changing this library file. This is the px4_ library, described below.

There are a few 'special files' related to shared memory and semaphores. You should define an environment variable PBM_DATA which equates to an appropriate directory before using the pbm_ routines.

For interprocess signal I use SIGUSR1 and SIGUSR2. Until the arrival of a better real-time extensions which allows many more user signals this may cause problems if you link to other applications that also require these signals.

4.2 The px4_ Library

Those calls relating to POSIX realtime, 1003.4 have been moved into a separate small library, so as to make it easier to move to newer revisions etc. These are all very low level simple routines. Generally they return -1 on error, but there are exceptions.

px4_mksem( char *name)

- Creates a semaphore with the desired name.

px4_sem_open( char *name)

- Returns a semaphore ID that should be used by later uses of the semaphore.

px4_semwait( semid)

- Waits for a semaphore to become free, and takes it.

px4_sempost( semid)

- Releases the semaphore. If another process is waiting it can now proceed.

px4_mkshm( char *name, int size)

- Create a shared memory area.

px4_shm_open_and_map( char *name, int size)

- Returns pointer to start of shared area. If the map fails it returns NULL, as required by the shmmap() call in 1003.4D9

px4_shmunmap( char *address, int size)

- Unmap a shared area.

px4_msleep( int waittime)

- Wait the specified number of milliseconds.

VOS Environment

The VOS library hides all the details about the underlying operating system. If you have VOS on your system then it should be easy to get PBM to work. Given that VOS is available on VMS, OS9 and several UNIX systems this may provide a quick way to get started. However you are cautioned that the overheads in memory and/or performance may be significant.

There is now no real need for VOS, given the existence of NATIVE_VMS and POSIX versions. A future release of PBM will remove all traces of VOS from the source code.

OS9 Environment

Using OS9 specific system calls where relevant turned out to be fairly easy. Shareable memory is implemented as shared data modules, interlocks as OS9 events, and inter process signalling by using signals. At present I use signal 257 for closedown, 259 for event. This means that a suite of acquisition programs can be 'controlled' by the XX_DAQ program in the normal OPAL data acquisition environment.

NATIVE_VMS Environment

This is now supported in version 010 and later. It means that PBM can run on VMS systems - either VAX or Alpha/AXP ones - without the overhead of VOS, or using the VMSPOSIX. Shared memory is in system global sections. Interlocks are system locks, with the name uniquely qualified by the nodename to avoid problems with interference between PBM running in different nodes on the same cluster. Inter process signalling is using sys$hiber, sys$wake calls. Closedown is managed by declaring proper exit handlers, and using the sys$forcex call.

You should define [/SYS/EXEC] the logical PBM_DATA to a directory in which section files will be created. In a cluster this should be a different directory per node.

Routines That Have Conditional Code

Refer to this list when wondering what to do in a new environment. Conditional code is usually for _POSIX_SOURCE, VOS or OS9. With the separate px4_ library the POSIX options mentioned here should only have to change if there is a major alteration in the 1003.1 standard.

You may wish to add a new conditional assembly option for something different - eg native VMS calls.

  1. PBM_ATTACH semaphores and interprocess notification
  2. PBM_GETSSM getting system time
  3. PBM_LOCK interlock with semaphore or equivalent
  4. PBM_MAPTO get access to shared memory
  5. PBM_MKSEM create semaphore
  6. PBM_MKSHM create shared memory
  7. PBM_NODENAME to get the nodename
  8. PBM_NOTIFY interprocess notification with signals etc
  9. PBM_SEMCLOSE is VMS only at present
  10. PBM_SEMOPEN opens a semaphore
  11. PBM_SETEXIT prepares for orderly closedown
  12. PBM_SLEEP various different ways of waiting a bit
  13. PBM_UNLOCK releases lock in the relevant way
  14. PBM_UNMAP unmap shared memory. Not really required except in POSIX
  15. PBM_WAITSIG waits for notification from some other process

PROBLEMS AND RESTRICTIONS

On systems that support shared libraries it would be very nice to set up PBM in this way. The memory saving is quite small, but the advantage is that the shared library can be rebuilt to fix a problem without requiring all users to relink their programs. This has been tried on OS9, SUN and is 'easy' to setup for VMS. On OS9 there is a tricky relationship between the use of the 'book_sig' routine and the pbm and catstp4 shareable library. Until this is solved the shareable library option should not be used in OS9.

Remember that PBM is NOT callable from Fortran. It is possible to set up a suitable method of call conversion in VMS and OS9 systems, but a general portable way will have to wait until the relevant POSIX committee[s] have come up with a general solution.

When running the native VMS version, the user program will require privileges SYSLCK, SYSGBL, PRMGBL to work properly. It is intended to add some VMS magic to avoid this in a future release.

FUTURE OF PBM

It is highly likely that the VOS option will be removed from the next version. This is no longer necessary given the existence of the native VMS version.

I am gradually adding extra useful routines to the pbm library with a view to making it include ALL conditional code that users might require when writing their real-time programs. Extra routines will be addes as the need arises.

This page was last updated on 16 December, 2013
Comments and requests for clarification may be sent to
E-mail:
Telephone: