## Specification of the PreProcessor ASIC (PPrASIC)

## for the

## **ATLAS Level-1 Calorimeter Trigger**

The IHEP Heidelberg Group

July 1999

#### Abstract

The PreProcessor of the ATLAS Level-1 Calorimeter Trigger prepares digital input for processors, which identify objects of interest for the decision making. The PreProcessor ASIC (PPrASIC) to be designed plays a key role in the Level-1 PreProcessor system. It receives digital data from Flash Analog-to-Digital Converters. The tasks of the ASIC comprise the alignment of pipe-lined data to the same LHC clock cycle, the dead-time free readout of input data to the trigger, the identification of the bunch-crossing where the physics event occured, the fine calibration in a Look-Up-Table and, finally, multiplexing data for efficient use of a fast serial transmission to the Cluster Processor. A second data path is built by adding up trigger cells decreasing the granularity for use in the Jet- and transverse Energy processor.

The ASIC shall process two trigger tower inputs from the ATLAS calorimetry. It will be mounted on a MCM together with other, mostly commercial "dies". One MCM processes four trigger channels. It is the smallest exchangable building block in the ATLAS PreProcessor system. This document shall serve as a base for the ATLAS-internal "Preliminary Design Review" (PDR of PPrASIC).

Members of the panel for the Preliminary Design Review were: P. Bright-Thomas, N. Gee, V. Perera and U. Schaefer.

## Contents:

## 1. The functional position of the PPrASIC in the Trigger system.

- 1.1 Relevant Documents.
- 1.2 Requirements.
- 1.3 Block Diagram.

#### 2. The "real-time" data path in the PPrASIC.

- 2.1 Digital Input from the FADC; resynchronisation to the LHC clock.
- 2.2 The FIFO of programmable depth; data alignment to same LHC clock cycle.
- 2.3 The FIR-Filter BCID for non-saturated Calorimeter pulses.
- 2.4 The "leading edge" digital BCID for saturated Calorimeter pulses.
- 2.4.1 The "leading edge" analog BCID (discriminator) for saturated pulses.
- 2.5 The peak-finder for non-saturated pulses.
- 2.6 The Look-Up-Table in parallel to the peak-finder; reduction of data width.
- 2.6.1 The BCID Decision Logic.
- 2.7 The BC multiplexing scheme for the Cluster Processor; flag bit generation.
- 2.8 The two-stage adder for jet-cells; data width to Jet/Et Processor.
- 2.9 Error code generation for both output paths (Cluster and Jet/Et).

#### 3. Data Input for trigger system checking.

- 3.1 The PlayBack memory and its various purposes.
- 3.2 Loading the PlayBack memory and running PlayBack.
- 3.3 Monitoring; the PlayBack memory as hardware histogram storage during DAQ.

## 4. Data ReadOut for performance monitoring and physics verification.

- 4.1 The points of ReadOut.
- 4.2 The "scrolling" memory for the FADC data readout.
- 4.3 The second "scrolling" memory for simultaneous ReadOut.
- 4.4 "Write and Read Pointer" managment; the derandomizing buffers.

## 5. Operation modes of the PPrASIC for Checkout and DAQ.

- 5.1 The serial I/O interface for Readout and Load/Control.
- 5.1.1 Architecture.
- 5.1.2 Data Format.
- 5.1.3 Bandwidth Considerations.
- 5.2 External Control Signal Input/Output for the PPrASIC.
- 5.3 The Software View of the PPrASIC.

## 6. The built-in test facilities.

#### 7. The manufacturing of the PPrASIC.

- 7.1 The realization in a description language.
- 7.2 The simulation of the design (timing, cross-talk, ...).
- 7.3 The synthesis into a chosen production process.
- 7.4 The production and post-production testing to obtain "known good dies".

## NOTE on technical implementation.

## Summary of the "Preliminary Design Review".

## Appendix A: Registers and Memories

## Appendix B: Input/Output Signals

## 1. The functional position of the PPrASIC in the Trigger system.

The specification describes an "Application Specific Integrated Circuit" (ASIC), which plays the key-role in the Pre-Processor system of the ATLAS Level-1 Calorimeter Trigger. The ASIC, called PPrASIC, is placed on a Multi-Chip-Module (MCM) together with several other, commercially available chip-dies. The MCM itself is the essential building block of the Pre-Processor Modules (PPMs). These modules are Printed-Circuit-Boards (PCBs) with 64 signal channels. The Pre-Processor system comprises a 128 of such PPMs.

Each channel takes in one analog signal from an ATLAS calorimeter, performs the necessary functions, and sends digital output to the subsequent Level-1 Trigger Processors. This is shown in the simplified block diagram of the surrounding MCM (Figure 1). The larger scale items - the MCM and the PPM - are subject of forthcoming specifications (and "Preliminary Design Review" procedures). This particular document is aimed at the smallest unit, the PPrASIC only.



Figure 1: Block Diagram of the PreProcessor MCM

The guide-line to the PPrASIC is the block-diagram (Figure 2) at the end of chapter 1. Chapter 2 describes the treatment of digital input to the "real-time" output needed for the decision making in the downstream processors: the "Cluster Processor (CP)" and the "Jet and transverse Energy Processor (Jet/Et)". This constitutes the part of the pipelined data path, which contributes to the total Level-1 latency. Possibilities for checking out the Level-1 system in "stand-alone" mode by injecting digital data at the "earliest possible" point are outlined in chapter 3. The memories for "PlayBack" can be used in normal ATLAS Data AQuisition (DAQ) to histogram input data in hardware - a feature considered to be useful for fast diagnostic of ATLAS running conditions.

Chapter 4 describes the read-out of trigger data into the ATLAS DAQ stream. Since the trigger constitutes a separate data path from the detectors downwards, this read-out is an essential means for cross-checking system performance as well as physics results.

Chapters 5 and 6 are concerned with electronic access to the chip and operation modes of the PPrASIC. The production technique and test procedures to obtain "known good dies" are outlined in chapter 7.

#### 1.1 Relevant Documents.

The documents of the ATLAS collaboration, on which this design is based, are:

- ATLAS Level-1 Calorimeter Trigger: User Requirements Document, LVL1-Calo-URD-1.1.0, 20Apr1998.
- ATLAS Level-1 Trigger: Technichal Design Report, CERN-LHCC-98-14, ATLAS-TDR-12, 30Jun1998.

The following documents are related to this one:

• Design Guide for the PPrASIC, C.Schumacher, IHEP Heidelberg: http://wwwasic.ihep.uni-heidelberg.de/atlas/L1/DISCUSS/designguide.ps

• Note on BCID for saturated calorimeter pulses, U.Pfeiffer, IHEP Heidelberg: http://wwwasic.ihep.uni-heidelberg.de/atlas/L1/DISCUSS/bcid\_note.ps

• A description of monitoring rates in hardware as well as "raw" transverse energy spectra by K.Mahboubi: http://wwwasic.ihep.uni-heidelberg.de/atlas/L1/DISCUSS/ppa\_his\_rate.pdf

## 1.2 Requirements.

Th following summarizes the requirements and the tasks, which need to be performed in the PPrASIC:

- Implement two calorimeter channels on one ASIC (see also the NOTE after chapter 7).
- Synchronize data from FADC @40 MHz into FIFO line of programmable depth for event coherence.
- Process the data downstream in pipelined mode @40 MHz (free of dead-time).
- Perform BCID on non-saturated as well as saturated calorimeter signals.
- Accept an "external bit" from a BCID source outside of the PPrASIC.
- Provide storage for "PlayBack" data and input to "real-time" path @40 MHz.
- Provide storage in hardware for monitoring data (histograms).
- Provide monitoring of rates for each trigger cell.
- Provide ReadOut capability of real-time data at different locations with buffering (derandomization).
- Implement a Look-Up Table for fine calibration and reduction of data-width to 8 bit.
- Provide 2-to-1 multiplexing (BC-mux) for effective use of serial transmitter to the Cluster Processor (CP).
- Generate error-bits on transmission path to CP.
- Perform two-stage summing to provide 9-bit data for Jet/Et Processor (JEP).
- Generate error-bits on transmission path to Jet/Et.
- Provide "read back" capability for set-up/control data loaded (as far as possible).
- Stay within the latency limits outlined in the TDR for the Level-1 trigger.
- Keep power consumption of the PPrASIC at the lowest possible value for reliability of the MCM assembly.

## 1.3 Block Diagram.



Figure 2: Block Diagram of the PreProcessor ASIC

## 2. The "real-time" data path in the PPrASIC.

In the design of the Level-1 trigger as a system and, therefore, also for its components, it is of prime importance to keep the latency account below a given value specified in the Level-1 TDR. This value is 2 µsec for the ATLAS Level-1 trigger. All components along the so-called "real-time" path, the PPrASIC included, must be designed for minimum consumption of propagation time in terms of LHC clock cycles. Since all LHC clock ticks (apart from accelerator-specific gaps) contain bunch crossings, pipeline processing imposes itself.

#### 2.1 Digital Input from the FADC; resynchronisation to the LHC clock.

Analog signals arriving from the ATLAS calorimetry are converted to digital values every 25 nsec (40 MHz) and directly propagted to the PPrASIC.

• Data from an FADC-die on the same MCM must be received as 10 bit parallel. The FADC digitization and the output encoding is driven by a FADC strobe signal.

• The FADC strobe can have ANY phase displacement with respect to the LHC clock. Hence, latching of data at the input to the PPrASIC must be performed with the negative or positive clock edge depending on the size of phase-displacement, i.e. an "edge"-selection is required. The setting for the individual channel must be software defined corresponding to the delay-setting in the Timer-chip for this channel.

• Data inputs from the FADC as well as all outputs are TTL .

• The supply voltage of the PPrASIC is +5.0 Volt, possibly 3.3 Volt.

#### 2.2 The FIFO of programmable depth; data alignment to same LHC clock cycle.

The ATLAS calorimetry is a very heterogenous set-up of detectors and associated electronics. Hence, a FIFO buffer is required to align digital data in terms of LHC clock ticks for coherence.

• The number of FIFO locations needed is 12. This compensates up to (12 \* 25 ns =) 300 nsec of differences in cable length anywhere between detector and PreProcessor system. Assuming 5 nsec /meter propagation speed in cables, this would correspond to 60 meters. The design will implement the next binary default, which is 16 (sixteen) locations, since there are not only cables but different calorimeter types with very different analog electronics.

• Programmability of the FIFO depth used must be provided for each channel individually.

• In case, the BCID for saturated pulses is implemented as a "leading edge" discriminator, the path width of the FIFO is extended to transport the additional bit for usage downstream. Hence, the FIFO width and all dependent path widths will be 11 bit.

• The FIFO depth for the 11th bit is programmable separately from the 10 bit FADC data stream. This is to adjust timing for an external BCID mechanism.

#### 2.3 The FIR-Filter BCID for non-saturated Calorimeter pulses.

The FIR filter suppresses noise by application of weights to the digital samples. Furthermore, it enhances the accuracy of the energy estimate by integration over the samples, which are a histogram representation of the original analog signal.

• The number of samples in the FIR filter pipeline is defined to be 5 (five), sufficient to provide good BCID efficiency.

• Four-bit wide coefficients are accurate enough for good noise rejection (see TDR). These coefficients must be programmable for each channel individually to comply with different pulse-shapes. The FIRST and the LAST of the five coefficients shall be signed values, thus their accuracy is reduced to +- 3 bits. Evidently, the signed values allow negative weights to be applied. which should help in supressing pulse tails caused by e.g. "pile-up".

• Data of 10-bit multiplied with 4-bit coefficients result in 14-bit wide values for each weighted sample. Adding-up of 5 samples yields a single, 17-bit wide result. This result will be truncated to 10 bits for the peakfinder and the LUT, which is accessed in parallel.

#### 2.4 The "leading edge" digital BCID for saturated pulses (using FADC samples).

The method is described in an extra note titled "Bunch Crossing Identification for saturated pulses". The simulation work done with "PSpice" and "Ptolemy" is outlined in detail. Conclusions and the way of implementation on the PPrASIC is also given. It is available at: http://www.asic.iben.uni-beidelberg.de/atlas/L1/DISCUSS/brid.note.ps

http://wwwasic.ihep.uni-heidelberg.de/atlas/L1/DISCUSS/bcid\_note.ps

This BCID method makes use of slew-rate limitation given by active analog components in the signal transmission. The limitation guarantees at least TWO samples on any analog rising edge. It combines BCID efficiency for saturated pulses with enormous simplicity in design and implementation.

Caveat: Almost all analog components, the Tower-Builder included, are located in front-end crates on the detector perimeter. They are exposed to heavy neutron- and gamma radiation. It could turn out to be risky to

rely on a fixed slew-rate per OpAmp (µvolt/ µsec) during the entire operation of ATLAS. Unknown effects of irradiation on analog components may lead to pulse-distortions of unpredictable nature. There are indications, that e.g. shaper circuits increase their output rise-time when irradiated.

## 2.4.1 The "leading edge" analog BCID for saturated pulses (using a discriminator).

This **option** -outlined in the TDR - is shown to work for all imaginable scenarios of input pulse distortions as long as a unique threshold crossing is present [see studies of W.Hoetzel presented at the Level-1 Heidelbergmeeting on 1Dec98]. A clock-independant discriminator senses the start of the pulse and allocates the signal into a time-slice using the pulse peaking time defined by the shaper. It represents, however, a considerable amount of additional hardware on the MCM. Hence, additional space on the MCM and cost are not negligible. • It is decided to keep the possibility of an "external" BCID bit on the PPrASIC. A decision on the real implemention of a "discriminator BCID" is left to the MCM review later this year.

## 2.5 The peak-finder for non-saturated pulses.

In case of non-saturation, the peak-finder must identify a "maximum" out of three, consecutive energy-sums using the asymmetric condition:  $(n-1 < n \le n+1)$ . It works with the full data precision (17 bit) emanating from the "5 sample" adder. A maximum found in a certain time-slice is marked as an "identified bunch crossing". Automatically, the previous and the following energy-sums are "zero-ed".

• It is foreseen that the non-saturated BCID block (FIR filter and peak-finder) can be set to "dummy" function for certain test applications. This is achieved by loading "trivial coefficients" [0,0,1,0,0] into the FIR filter. The samples pass untouched to the peak-finder and the LUT in parallel (see below).

## 2.6 The Look-Up-Table in parallel to the peak-finder; reduction of data width.

In parallel to peak-finding, all energy-values are truncated and passed as a 10-bit addresses to the LUT, which returns a stored 8-bit data values. This must be done for unsaturated and saturated pulses, to have the 8-bit conversion from the LUT for all possible values of data.

• "Truncation", i.e. masking out the relevant field of consecutive10 bits, is programmable. This is necessary, because the significant data in the 17 bit result depend on the size of filter-coefficients applied. Hence, the lower bound of the 10 bit field is selectable.

• The LUT content must be loadable for each channel individually.

• As "power-up" default, a simple linear conversion should be preloaded into the LUT to have a working start condition. The LUT content will then be modified for "running" with calibrated data.

## 2.6.1 The BCID Decision Logic

The available BCID mechanisms are:

A. FIR-filter BCID with peak-finder for unsaturated pulses.

- B. digital BCID for saturated pulses.
- C. analog "external" BCID for saturated pulses.

The BCID mechanisms A and B are synchronised in the design of the PPrASIC. BCID mechanism C is timed by the 11th bit in the FIFO. The BCID decision logic allocates the BCID mechanisms to energy ranges, which are defined on the scale of the FIR-filter result. The ranges cover three disjoint intervals.

• The two limits dividing the entire Et-range into three intervals are programmable.

- A 3-bit programmable Look-Up Table (LUT) controls the allocation of BCID mechanisms to each energy
- interval. Hence, three LUTs are provided, one per energy interval.

• A default ("power-up") definition shall attribute:

Et  $\leq$  255 GeV to the FIR filter (A); Et >255 GeV to the digital BCID for saturated pulses(B).

The first interval limit is set to "0" and the "external" BCID (C) is disabled.

- If the underliing FIR filter result is  $\leq$  255 GeV, the output from the 10\*8 LUT is given out as result.
- If the underliing FIR filter result is > 255 GeV, the output is set to "overflow" (=0xFF).

• For bunch crossings not accepted by BCID the result is set to zero (=0x00). The BCID decision logic ensures that after each non-zero result a zero result follows to keep BC-multiplexing working.

• By-passing for technical check-out of serial data links.

A special configuration can be used to direct data from the FIFO to the subsequent processors. Several settings must be imposed by controlling software on the PPrASIC:

- •• The BCID decision logic enables the FIR-filter and peak-finder only.
- •• The FIR filter is loaded with a "trivial" set of coefficeints (0,0,1,0,0), passing data ono-to-one.
- •• The LUT is loaded such that a 10bit input is truncated to a 8bit output.
- •• The BC-muxing is disabled, i.e. only one channel will pass. The BC-flag bit will be set to indicate, which

channel is directed to the ouput, e.g. BC-flag =0 means Channel-A, BC-flag =1 means Channel-B.

#### 2.7 The BC multiplexing scheme for the Cluster Processor; flag bit generation.

• U. Pfeiffer's proposal is to be implemented. It uses the empty frame following an identified bunch-crossing by putting data of two bunch-crossings in sequence along with the necessary flagging. Parallel data are output to the serial transmitter chip on the same MCM.

#### 2.8 The two-stage adder for jet-cells; data width to Jet/Et Processor.

It is of great advantage to a pipelined system, like the Level-1 trigger, to avoid exchange of data in the orthogonal direction to the flow of the pipeline. This is achieved by doing summing for larger elements already in the upsteram PreProcessor.

• A first adder sums the two 8-bit energy values as they emanate from the LUTs into a 9-bit value. This adder is used on every PPrASIC. It delivers the partial sum covering (0.1 \* 0.2), which is the size of half a jet-cell.

A second adder builds the sum of a full jet-cell (0.2\*0.2). It is also implemented on every PPrASIC, but only used on every second ASIC-chip. The pad configuration on the ASIC periphery shall be chosen such that shortest connections on the MCM-level are possible. No synchronisation between summands should be needed, because data transfer an the MCM from ASIC to ASIC should be possible within a 25 nsec clock tick.
In "by-pass mode", the partial sum data are sent off the chip as 9 bit parallel data to its neighbor located on the same MCM.

•• In "active mode", the second adder sums two 9-bit values to build a full jet-cell. It limits the added full value (10 bit) to 9 bit range ( $\leq 0x1FF$ ). This range is sufficient for the Jet/Et Processor.

•• "active / by-pass mode" must be controllable by set-up software.

• A detection of "overflow" on the single cell-level is implemented. If energy "overflow" on the cell-level is present, the value of the 0.2\*0.2 jet-cell is set to "overflow" (0x1FF) to avoid "fake" finite energies.

The jet-cell summing is also relevant for the energy-values used in the Et-miss and Et-sum processors. "Setting overflows" in 0.2\*0.2 cells, if small cells overflow ,will affect thresholds. It is assumed, that this has very little influence on the physics captured by these triggers.

#### 2.9 Error code generation for both output paths (Cluster and Jet/Et).

Error code generation is mandatory to minimize the chance of producing "fake" Level-1 "Accepts", which saturate the ATLAS DAQ.

- Cluster path: One parity bit per (8-bit value plus flag-bit), i.e. 10 bit fit into serial Link frame.
- Jet/Et path: One parity bit per 9-bit jet value, i.e. 10 bit fit into the serial Link frame.

• All parity generation uses the ODD parity convention. Hence, a 9-bit data word with 9 "zeros" is accompanied by its parity-bit set to "1"; a 9-bit data word with 9 "ones" is accompnied by "0" parity.

#### 3. Data input for trigger system checking.

The feature of feeding digital data into the system is not only foreseen for the benefit of the PreProcessor. The data replace - at the earliest point - the output from the FADC digitizer running at 40 MHz. Data content can be defined from technical aspects (e.g. ramps) or real data can be used, recorded earlier in the experiment itself or in test-beams. Hence, the "PlayBack" allows checking out of the trigger system all the way to the decision made in the Central Trigger Processor (CTP) without having analog input from detectors.

#### 3.1 The PlayBack memory and its various purposes.

The PlayBack memory must have the same width as raw FADC data output (plus external BCID), i.e. 11 bit.

• The memory length should be as generous as possible. 256 words seems to be a good size covering 6.4 µsec at 40 MHz. Given the ATLAS dead-time of four bunch-crossings after each Level-1 "Accept" allowing read-out, one could load a "raw" calorimeter pulse (5 samples plus 5 empty slices) every 10 locations. This amounts to 25 signals for checkout before reloading the memory with new data. Checkout time will be dominated by loading data rather than running the system, unless the same data are cycled over. Nevertheless, the ratio of running time over loading time should be optimized for efficient checkout work.

• A preset counter is built in to limit the data rate in "PlayBack" mode allowing a Level-1 "Accept" rate of <100 kHz. A counter of 16 bit is sufficient to produce empty time-slices, before data from the memory are rolled into the real-time path again. This is motivated by the following fact:

If the "PlayBack" memory is loaded and clocked as described above, then it will produce a rate of 4 MHz for Level-1 "Accept". This constitutes a "continous peak-rate", where "throttle mechanisms" will slow down Level-1 "Accept" to allow readout of data, which are wanted for system checking. The 16-bit preset counter shall serve as "hardware slow-down", that the read-out can capture everything that is injected. Hence, after each

roll-over of the "PlayBack" memory a preset number of "empty slices" is injected (up to 65000). Thus, the Level-1 "Accept" rate can be limited to 100 (or down to appr.15) kHz.

• Feeding data into the real-time path after the "latch" and before the alignment FIFO is a requirement for the checking out of the trigger signal chain. Other points are not really needed. The same purpose as feeding data into other locations can be achieved by "activating / by-passing" circuitry e.g. BCID and peak-finder (see block diagram).

#### 3.2 Loading the PlayBack memory and running PlayBack (see also chapter 5).

• Loading data into the PlayBack memory is done via the serial I/O interface being mastered by the on-board RemASIC, which in turn interfaces to the PipeLineBus and VME.

# **3.3 Monitoring; the PlayBack memory as hardware histogram storage during DAQ.** Two monitoring features are built into the PPrASIC:

1. Monitoring of rates in each calorimeter cell.

A programmable threshold is applied to the FADC outputs of all cells from a certain calorimter type (em. or had.). FADC values above threshold increment a counter. The time of accumulation is measured in parallel. The LHC clock is devided by a 10-bit counter to 40 KHz yielding a time-accuracy of 25 µsec, which is precise enough for an on-line rate measurement. The divided clock is counted in parallel from a preset downwards to define the time-interval. After the elapsed time two values are given to read-out:

- The FADC intergral above threshold.
- The time-interval with 25 µsec precision.

The threshold shlould be unique per calorimter type (em. or had.) to obtain meaningful relative rates per cell. This information can be converted in an on-line computer to display two "rate maps" covering the entire calorimter acceptance - electromagnetic and hadronic separately.

2. Monitoring of the "transverse energy spectrum" in each calorimeter cell.

For detailed investigation of analog signal input from the calorimetry, FADC spectra can be accumulated in the "Playback" memories, which are idle during Data acquisition. The memory provides 256 locations for FADC values. The frequency of a value occuring can be counted up to 10 bits.

• Histograms can be enabled by setting a lower and upper bound on the "Bunch Crossing Number". This counter is running locally on the PPModule and resynchronized for every LHC turn.

- Filling of the histogram is controlled by an individually programmable threshold cut for each cell.
- Binning of FADC values is programmable in "powers of two", i.e. the 256 locations cover 1/4, 1/2 or the full FADC range with corresponding resolution.
- Histogram filling stops, when one location reaches "overflow".

The histogramming method allows investigation of spectra for particular LHC bunches. Setting the "bunch range" to full LHC (3600, out of which 2835 carry protons) gives FADC spectra for every trigger cell averaged over all bunches.

The implementation is described in detail in a separate note by K.Mahboubi, available at the following Web-URL:

http://wwwasic.ihep.uni-heidelberg.de/atlas/L1/DISCUSS/ppa\_his\_rate.pdf

#### 4. Data ReadOut for performance monitoring and physics verification.

Readout of data, on which the trigger bases its decision making, is mandatory in a system of several thousand input channels. However, a division in categories must be made, to define which kind of data readout from the trigger system is needed for which purpose.

The purposes are:

- Technical function checks of the trigger system from PreProcessor to CTP.
- Monitoring of cell rates and general trigger performance.
- Decision verification inside the trigger system per event.
- Physics result cross check per event with calorimeters, e.g. cross calibration.

These requirements must be confronted with the following possibilities of action in:

- "beam-off" time by means of "PlayBack" or analog signal input from calibration pulser systems.
- "special" runs possibly very short -, but with big DAQ dead-time.
- "normal" Data Aquisition runs, where dead-time must be minimal.

A definition of strategies lies outside the scope of this document. It will have to proceed though together with the design of controlling software.

## 4.1 The points of ReadOut.

Two separate scroliing memories will be implemented to provide diagnostic amd/or monitoring readout (see also Fig.2 - Block diagram of the PPrASIC).

#### 4.2 The "scrolling" memory for the FADC data readout.

It is mandatory to capture "raw" data emanating from the FADC, since the trigger path is completely disjoint from detector read-out after the "pre-amplifier" level.

• The memory has a width of 11 bit and a size of 128 locations. The speed of the memory has to comply with the 40 MHz LHC clock evidently.

#### 4.3 The second "scrolling" memory for simultaneous ReadOut.

A second memory of same size will be implemented to capture data after the LUT (see block diagram). It will have the same characteristics as the first scrolling memory.

• The memory will capture the 8-bit result from the LUT. In addition, the 3 bits from the three BCID mechanisms will be added for checking/debugging purposes.

#### 4.4 "Write and Read Pointer" management; the derandomizing buffers.

Both scrolling memories are written with data as they come along the pipeline, which is driven by the LHC clock. Due to fixed latency of the trigger decision, reading accesses memory locations, which are displaced "behind in time" by a fixed number of clock ticks.

Upon receipt of a Level-1 "Accept", data corresponding to a given bunch crossing (or to its vicinity in time) have to be copied out into a derandomizing buffer memory. They are stored until read-out moves them towards the DAQ system.

• The derandomizer buffer size is 11 bit wide and 64 locations deep.

• A preset counter is built in to downscale "raw" data read-out on the derandomizer level. This allows reading the FULL 128 locations in "pipeline memories" on a sampling basis. A preset counter of 8 bit is sufficient to keep the data volume within the allowed bandwidth.

## 5. Operation modes of the PPrASIC for Checkout and DAQ.

All control- and set-up data reach the PPrASIC through its serial I/O interface. They will be stored in registers to allow read-back for high-level software. The same interface transports read-out data to the RemASIC in DAQ mode.

It should be noted again, that data access to the RemASIC is guaranteed via the PipeLineBus (faster access for high data volume) as well as through VME (slower access for technical check - also possible in a stand-alone crate with VMEBus only).

## 5.1 The serial I/O interface for Readout and Load/Control.

The tasks of the serial I/O Interface are:

- \* Loading of "PlayBack" data.
- Select clock edge for FIFO latching depending on Timer setting.
- Set FIFO depths.
- · Control pointers for Readout of TWO scrolling memories into derandomizer buffers.
- Loading of coefficients for FIR filter BCID.
- Set threholds for digital saturated BCID.
- Configure BCID Decision Logic.
- LUT loading.
- Set second Jet/Et adder: "active / by-pass".
- Set DAQ / Playback mode.
- Set PreScaler for PlayBack frequency.
- Set "FIFO / FIR filter / LUT" by-pass.
- Set PreScaler for sampling ReadOut into derandomizer buffers.
- Setting number of FADC- and /or LUT-samples for ReadOut.

A summary is given in the table "PPrASIC: Registers and Memories" in Appendix A.

The Pre-Processor ASIC contains a serial interface, which is used for configuration and readout purposes. It has to provide enough bandwidth to read out all selected data from the processing pipeline, while the trigger

operates with the full rate of 100 kHz for Level-1 "Accept". Additional requirements concern the configuration of the PPrASIC, readback of configuration and memory data and a daisy chain capability.

## 5.1.1 Architecture.

The serial interfaces uses four data lines and two additional daisy-chain signals. A synchronous protocol is used to transmit data words having a width of 13 bits. **Figure 3** shows a block diagram of the interface. The clock signal "SerialClk" operates the input and output shift registers. "DataIn" and "DataOut" are the input and output of the respective shift register. The signal "Frame" is used to identify word boundaries by controlling the loading of the input register and output shift register. **Figure 4** shows the timing of the relevant signals.

Several serial interfaces can be connected together forming a daisy chain by feeding the "DaisyOut" signal to "DataIn" and "DataOut" to "DaisyIn" of the next chip. The frequency of the "Frame" signal then has to be divided by the number of chips in the daisy chain.

The serial clock can be operated with a different frequency than the core of the PPrASIC (,which is the 40MHz LHC clock). A synchronizer has to generate a signal from the "Frame" bit that indicates a stable value in the input register. Processing has to be fast enough to provide a stable output value before the next "Frame" signal occurs.

## 5.1.2 Data Format.

Data from three different origins have to be read out from the PPrASIC via the serial interface. These are the two processing channels of the PPrASIC providing raw trigger data and a channel providing readback capability for parameter and status registers as well as memories. Data from these three channels are multiplexed to the serial output using a fixed time scheme. Two control bits indicate the type of data. **Figure 5** shows the scheme.

The number of raw data words is controllable in a range from zero to five. Identification of the event header by one control bit ensures synchronization. The width of the user data word is 11 bits corresponding to the maximum width of internal PPrASIC memories. This yields a serial word length of 13 bits, which has to be indicated by the "Frame" signal. The readback data stream can contain two different types of information, control and data words. The flag bit, not required for channel identification, is used to distinguish the two types. This allows to read memory contents with a word width of 11 bits from the PPrASIC in a compact way. For configuration a similar format can be used, which contains control and data words, identified by control bits. With this format it is possible to send 11 bit data words to the PPrASIC in order to load memories efficiently.

## 5.1.3 Bandwidth considerations.

If the serial clock is run with 40 MHz and the maximum number of five raw data samples is read out, it takes 4.875 µsec to read out an event from one PPrASIC. Even, if two PPrASICs are daisy-chained, the readout time lies below the maximum of 10 µsec, which is the maximum time at a trigger rate of 100 kHz. By reading out fewer raw data the readout time is reduced. Another way to increase the bandwidth is to use a serial clock with higher frequency. This requires synchronisation of the serial clock and the system clock used for the rest of the PPrASIC. Circuitry for this synchronisation will be provided on the PPrASIC. The bandwidth of the readback channel with 40 MHz serial clock and maximum number of raw data samples is 2.2 MBit/sec without daisy-chaining. For example the readback of the one histogramming memory (256 \* 11 Bit) would require 1.25 ms.







Figure 4: Timing diagram of signals relevant for serial interface.



Figure 5: Format of readout data.

## 5.2 External Control Signal Input/Output for the PPrASIC.

Signal I/O to the PPrASIC comprises [all TTL]:

- 40 MHz LHC clock.
- Level-1 "Accept" signal.
- Reset. A "soft" Reset of the PPrASIC, e.g. reset all memory pointers, clear derandomizing buffers.
- Bunch Crossing number from local counter, which is resynchronized to the LHC turn signal via TTC.
- Reset Level-1 number from TTC .
- Synchronous "start of PlayBack" via TTC.

A summary is given in the table "PPrASIC: Input/Output Signals" in Appendix B.

## 5.3 The Software View of the PPrASIC.

A "programming model" does really only make sense on the PPModule level.

Registers for parameters and memories in the PPrASIC have to be written by software. It is possible to read back all register values and memory contents. In addition status information can be read out.

The state of the PPrASIC is controlled by commands, which are sent to the PPrASIC and are not necessarily reflected by a readable register setting. The exact method to access the PPrASIC depends on the PPM implementation. Software should implement this access method in a transparent way to allow viewing of the PPrASICs internals as common objects independent of the concrete access method.

## 6. The built-in test facilities.

This concerns JTAG for the PPrASIC only. Its purposes are:

- "boundary scan" on the chip periphery.
- "internal scan" on the PPrASIC chip to load and read internal registers.

It can be used via VME (in particular for a stand-alone PPM in a laboratory test-crate with VMEBus only). A more detailed description is given in the related document by C.Schumacher, which can be found at: http://wwwasic.ihep.uni-heidelberg.de/atlas/L1/DISCUSS/designguide.ps

## 7. The manufacturing of the PPrASIC.

Several processes from various manufacturers will have to be checked out. Candidates are: AMS 0.6µm; MIETEC 0.35µm; and others. (see C.Schumacher@Heidelberg-meeting on 1Dec98). Some selection criteria are : Process size versus chip area; availability of libraries eg. memory macros; reasonable availability of the process into the future.

## 7.1 The realization in a description language.

It is intended to segment the functional blocks of the PPrASIC such, that different people can work in parallel on the development in terms of writing Verilog-code. Clear and well defined interfaces will be established on the final block diagram. Good coordination and supervision towards the step of synthesis into hardware is mandatory.

## 7.2 The simulation of the design (timing, cross-talk, ...).

Tools for timing simulation, cross-talk simulation etc. - all available in the Cadence program pack - will be used intensivly to ensure as much as possible, that the design submitted has a good chance to function. Experience with former prototypes supports this assumption.

## 7.3 The synthesis into a chosen production process.

A decision on the process will be taken when real-estate requirements are clear. A first synthesis of the completed Verilog code into a hardware design shall provide this information in May 1999. The currently preferred production process is the 0.6µ process of Austria Microsystems (AMS). Availability of libraries for the implementation of memory blocks has already been checked out. The simplest possibility is, to outsource the design of the required memory blocks to AMS.

## 7.4 The production and post-production testing to obtain "known good dies".

The current prototype will be fabricicated in moderate numbers within the EUROPRACTICE framework on Multi-Project-Wafers. Tests of the prototype are performed with packaged devices. Final production of bare "dies" will evidently require a so-called engineering run at AMS, which has a different costing base. The cost for several wafers is not known yet.

It is also not yet clear, in which way testing on wafers will proceed. IHEP Heidelberg has a wafer-probe station as well as a HP82000 chip tester for test data input/output. Wafer cutting and bonding onto the MCM will have to

be discussed in due time.

#### NOTE on technical implementation.

Synthesis and "place/route" shows that the current two-channel design results in a size of 20 mm<sup>2</sup> for the AMS 0.6 µm process. This is a moderate die size. Hence, the idea came up to integrate 4 channels on a single die. The design might grow to about 50 mm<sup>2</sup>, which is still managable and affordable.

Advantages:

• The PPrASIC is more compact. A single tested die covers 4 channels already. A factor 0f 2 less dies need to be handled for testing.

There is no functional change on the channel basis.

• The pre-summing of jet-cells will be done on the ASIC-level. No routing on the MCM is necessary. The implemention of an "un-used" second stage adder on every second two-channel PPrASIC is superfluous.

Disadvantages:

• Serial read-out has to be reconsidered. Daisy-chaining on the die-level may limit bandwidth irreversably.

- Two serial ports per 4-channel PPrASIC will be implemented.
- · There are no other disadvantages visible up to now.

#### Summary of the "Prelim. Design Review"

• The entire digital data path per channel is 11 bits wide accommodating an "external BCID" bit. This includes "PlayBack" and all readout memories.

- There are TWO scrolling memeories, one for FADC data and one for the LUT output.
- Each "scrolling memory" has its own derandomizer buffer of size (11 bit \*64).
- Saturation shall be indicated by one unique value only: 0xFF for CP and 0x1FF for JEP.

• By-passing is organized such that FADC data (or "playBack" data) can be passed through directly to serial output. The FIR-BCID will be put to "pass-through" by a set of trivial coefficients (0,0,1,0,0). Other BCID mechanisms will be disabled in the BCID decision logic. BCmux ing will block the alternate channel and flag the open channel. This is important for functional testing of serial data links.

• The BCID decision logic shall attribute unique energy ranges to the different BCID mechanisms by means of programmable 3-bit LUTs. The range limits are programmble and non-overlapping

Monitoring of cell-rates is introduced.

• Histogram binning for transverse energy spectra is programmable in powers of two, i.e. 0-64, 0-128, 0-256 GeV with corresponding resolution. Histogram filling can be selective for certain bunches by definition of a bunch number range.

• Bandwidth estimates for data read-out via the serial interface suggest, that no more than TWO channels should be daisy-chained.

• Open question: The impact of "artificial" saturation for 0.2\*0.2 jet cells on Et-miss and /or Et-sum? Remark: Currently, the whole 0.2\*0.2 cell is set to "saturate", if at least one of summands (0.1\*0.1) is saturated.

\* "ODD parity" convention is used as error-code generation for the CP as well as the JEP serial data stream.

## APPENDIX A

| PPrASIC : Registers and Memories |        |                      |                                                       |  |  |  |  |  |
|----------------------------------|--------|----------------------|-------------------------------------------------------|--|--|--|--|--|
| Register Name                    | Туре   | [#Regs] * Width      | Description / Remark                                  |  |  |  |  |  |
|                                  |        | Address * Data       |                                                       |  |  |  |  |  |
| PPrASIC channel-A                |        |                      |                                                       |  |  |  |  |  |
| InputControl                     | r/w    | [2] * 11             |                                                       |  |  |  |  |  |
| FIRFilterControl                 | r/w    | [3] * 11             |                                                       |  |  |  |  |  |
| BCID_Dec.Control                 | r/w    | [5] * 11             |                                                       |  |  |  |  |  |
| OutputControl                    | r/w    | [1] * 11             |                                                       |  |  |  |  |  |
| LUTable                          | w      | 1024 * 8             |                                                       |  |  |  |  |  |
| ReadOutControl                   | r/w    | [4] * 11             |                                                       |  |  |  |  |  |
| FADC memory                      | r      | 128 * 11             | /only accessible via DerandBuf                        |  |  |  |  |  |
| LUT memory                       | r      | 128 * 11             | /only accessible via DerandBuf                        |  |  |  |  |  |
| Derand.Buffer                    | r      | 64 * 11              | / content per L1-Accept is defined by ReadOut Control |  |  |  |  |  |
| PBack/HistoControl               | r/w    | [1] * 11             |                                                       |  |  |  |  |  |
| PlayBack memory<br>Histo memory  | w<br>r | 256 * 11<br>256 * 11 |                                                       |  |  |  |  |  |
| RateMeterControl                 | w      | [2] *11              |                                                       |  |  |  |  |  |

#### PPrASIC channel-B

Register and Memory allocation is identical to PPrASIC channel A

## PPrASIC common

ByPassBCIDtoBCMux

Set "trivial" Coeffs. for Filter, ByPass LUT by dropping 2 LSBs, Disable BCMux, Add Channel Flag Bit in place of BCmux Flag

| Register Name      | #Bits  | Interpretation                                                                                       |
|--------------------|--------|------------------------------------------------------------------------------------------------------|
| InputControl.1     | 0      | Latch input on NEGATIVE clock edge /<br>Latch input on POSITIVE clock edge                           |
|                    | 1      | ReadOut mode (Data from FADC; fill into HistoMem) / PlayBack mode (Data from PlayBack memory)        |
| InputControl.2     | 5<br>5 | Depth of 10bit (FADC) FIFO buffer [0x00 : 0x3F]<br>Depth of 1bit (extBCID) FIFO buffer [0x00 : 0x3F] |
| FIRFilterControl.1 | 4<br>4 | Coeff.1 [0x-7 : 0x+7]<br>Coeff.2 [0x0 : 0xF]                                                         |
| FIRFilterControl.2 | 4<br>4 | Coeff.3 [0x0 : 0xF]<br>Coeff.4 [0x0 : 0xF]                                                           |
| FIRFilterControl.3 | 4<br>4 | Coeff.5 [0x-7 : 0x+7]<br>Select LSB from 17bit PeakFinder result for LUT [0:0x7]                     |
| BCID_Dec.Control.1 | 3      | LUT for BCIDselect in lowest Et-interval                                                             |
| BCID_Dec.Control.2 | 10     | UpBound for low Et-interval                                                                          |
| BCID_Dec.Control.3 | 3      | LUT for BCIDselect in medium Et-interval                                                             |
| BCID_Dec.Control.4 | 10     | UpBound for medium Et-interval                                                                       |
| BCID_Dec.Control.5 | 3      | LUT for BCIDselect in highest Et-interval                                                            |

| Register Name      | #Bits | Interpretation                                                                                  |
|--------------------|-------|-------------------------------------------------------------------------------------------------|
| OutputControl      | 1     | by-pass ADD4 for JP /<br>use ADD4 for JP                                                        |
| ReadOutControl.1   | 8     | FADCmem:: ReadPointer offset from WritePointer [0x00 : 0x7F]                                    |
| ReadOutControl.2   | 8     | #FADC samples to read from FADC memory;<br>sym. +-ReadPointer; [0x00 : 0x3F]                    |
| ReadOutControl.3   | 8     | LUTmem:: ReadPointer offset from WritePointer [0x00 : 0x7F]                                     |
| ReadOutControl.4   | 8     | #Samples to read from LUT memory;<br>sym. +-ReadPointer; [0x00 : 0x3F]                          |
|                    |       |                                                                                                 |
| PBack/HistoControl | 1     | If "PB": Run data off the PlayBack memory (cyclic) /<br>If "PB": Load data into Playback memory |
|                    | 6     | Histogram threshold on FADC value [0x00 : 0x7F]                                                 |
| RateMeterControl.1 | 10    | RateMeter Threshold                                                                             |
| RateMeterControl.2 | 10    | RateMeter Counting Time interval * 25 µsec                                                      |

| PPrASIC : Input/Output Signals |      |             |                                                                                 |  |  |  |
|--------------------------------|------|-------------|---------------------------------------------------------------------------------|--|--|--|
| Signal Name[width]             | Dir. | Electr.Type | Description / Remark                                                            |  |  |  |
| FADC-A [0:9]                   | In   | TTL         | FADC data @ 40 MHz into channel-A                                               |  |  |  |
| extBCID-A                      | In   |             | "transition" from extBCID                                                       |  |  |  |
| FADC-B [0:9]                   | In   | TTL         | FADC data @ 40 MHz into channel-B                                               |  |  |  |
| extBCID-B                      | In   |             | "transition" from extBCID                                                       |  |  |  |
| Clk                            | In   | TTL         | LHC Clock from TTC                                                              |  |  |  |
| L1Accept                       | In   | TTL         | Level-1 Accept from TTC                                                         |  |  |  |
| BcCntRes                       | In   | TTL         | Resynchronizing Reset per LHC-turn from TTC                                     |  |  |  |
| EvCntRes                       | In   | TTL         | Level-1 Event Number Reset from TTC                                             |  |  |  |
| Reset                          | In   | TTL         | Soft Reset of PPrASIC, eg. R/W pointers, clear DerandBufs                       |  |  |  |
| Sync                           | In   | TTL         | Synch. start for PlayBack from TTC                                              |  |  |  |
| Ser. Interface [0:5]           | I/O  | TTL         | Read/Write DAQ:<br>SerClkIn, SerFrameIn, SerIn, SerDaisyIn, SerOut, SerDaisyOut |  |  |  |
| ToCP [0:9]                     | Out  | TTL         | Data + BCMuxFlag + OddParity @ 40 MHz to ClusterProc                            |  |  |  |
| ToJP [0:9]                     | Out  | TTL         | added 4cell data + OddParity @ 40 MHz to Jet/EtProc                             |  |  |  |
| CellSumOut [0:8]               | Out  | TTL         | 2 cell data OUT to MCM for JEP                                                  |  |  |  |
| CellSumIn [0:8]                | In   | TTL         | 2 cell data IN from MCM for JEP                                                 |  |  |  |
| JTAG [0:3]                     | I/O  | TTL         | Test I/O                                                                        |  |  |  |

## PPrASIC : Input/Output Signals