ATLAS-UK Level-1 Calorimeter Trigger Meeting
Wednesday 10th October 2001 - RALPresent: Bruce Barnett, Ian Brawn, Adam Davis, James Edwards, Eric Eisenhandler (chair), John Garvey, Norman Gee, Tony Gillman, Stephen Hillier (minutes), Murrough Landon, Gilles Mahout, Ed Moyse, Viraj Perera, Richard Staley, Peter Watkins, Alan Watson.
Agenda
Click for Click for
summary transparencies
(all pdf)
Minutes of previous meeting
Hardware status
Common Merger Module....................................Ian 10'
Cluster Processor Module............................Richard 10'
CP chip tests using GTM...............................James 10'
CP/JEP ROD prototype..................................Bruce 10'
Timing Control Module..................................Adam 10'
Short-term schedule....................................Tony 10'
Hardware-related items
CANbus status...............................Eric (for Dave) 10'
More on TTC timing jitter..............................Alan 10'
Software status
Online software status.............................Murrough 15'
Hardware Simulation and test vectors..................Steve 10'
How to read out the slice............................Norman 10'
Physics simulation
Off-line trigger simulation status.......................Ed 10'
Level-1 items from the Lund physics workshop..........Steve 15'
Out of the rut
Highlights of the LEB workshop.......................Gilles 15'
Malvern forum on microelectronics......................Tony 15'
Coming reviews
ATLAS System Status Overview introduction..............Eric 10'
ATLAS System Status Overview talk......................Tony 15'
PPARC review of ATLAS and CMS..........................Eric 10'
Coming meetings
ATLAS week.............................................Eric 5'
RAL joint meeting arrangements.......................Norman 15'
NIKHEF T/DAQ workshop..................................Eric 5'
Any other business
Dates of next meetings
Hardware status summaries
The board was being worked on in the Drawing Office by Dan Beckett. Progress
was generally good, with all components placed, and most tracks routed, although
the density of pins into the FPGAs was causing some problems. One major hiccup
had occurred with the placement of the backplane connectors, which had been
displaced due to a misunderstanding. The fault was only found by accident,
although should have been picked up by the manufacturer at a later stage.
Ian was also in contact with Sam about another connector placement
problem which showed up in a disagreement between the Gerber
files and the backplane specifications. Time was critical in finishing the work
as Dan was only available for a little longer. It was estimated that the module
should be ready in early December, with Ian and Viraj working on it until Christmas.
The schematics were now final, and the net-list had been checked for hanging wires.
In the drawing office, the components were nearly all placed, with no chips placed
underneath the PCB. The board should be routed soon. Richard was working on
finishing the firmware and updating the documentation. A provisional schedule was
presented with a projected date for layout to finish in mid-December. This would
lead to probable start of tests in Birmingham in February.
It was pointed out that we should not assemble all four modules immediately. Norman
commented that we should make sure all lessons from the ROD module have been learnt
as regards test points and clear labelling of chips. Also the front panel layout
and meaning should be checked for consistency with the CMM, and the backplane
connector issue should be addressed. The CMM design should also be put through the
net-list check before production. Investigations of BGA re-working should also
continue with the old boards.
James had tried the CP-FPGA code in the GTM with two different configurations.
Firstly a simple configuration with a small set of test-vectors was tried, and
secondly Ian's more complex setup. James had succeeded in getting all 108
channels to synch after some debugging, and some of the test-vectors worked, but
problems were observed in the right hand cells. With Ian's setup, lots of errors
were seen, but there was no obvious pattern to the problem. This needs more work
to understand the problems, but James will have to suspend CP chip work for a while
to work on debugging and extending the ROD code. James showed a comparison of the
resource usage in the FPGA devices used on the GTM and planned for the CPM. The
numbers are similar, but timing margins should be better for the CPM.
Peter asked how can further progress be made on debugging the GTM. Use of ChipScope
should clarify where the problems are located, and even if it takes a large number
of iterations to fix complex FPGA code, at least the situation is better than a
non-functional ASIC!
Bruce reported on recent work with the DSS/ROD setup. There are three known
problems with the DSS, one of which Bruce had recently re-confirmed in the lab.
There were nine known problems with the ROD, one of which has now been attributed
to software, but most of which still need solving. Bruce described the new test
setups made possible by the arrival of the SLIDAD, SLIMOD and ODIN cards. These
allow more detailed checking of the S-Link output and protocol as was only
previously possible in the RoIB setup at CERN. Using these setups, it was confirmed
that James' firmware changes had fixed one of the known problems with the Link Full
Flag (LFF). However, although the flow control now works, there was still some
suspicion of bit errors occurring. This needs more work as Bruce has only recently
had the time to put the system together. Input from James is also required now he
has time to work on RODs again.
Finally Bruce summarised the tracking and documentation status. It has been
suggested that error tracking for individual modules and module types could be
done through route cards and problem reports as used by TD at RAL. An example
of these for the ROD tests was shown. Bruce had also made the ROD status reports
available on the web with password protection. Murrough thought that it would be
good to make passwords the same across the level-1 system (as far as possible).
Adam described the TCM's functions and showed the layout of the front panel
indicators. He then detailed the problems that had been found on the board.
Most of these problems originated from mis-labelled connections in the schematics
which could have been found by the net-list checking program. Some of the bad
connections and timing problems could be fixed in firmware, but others required
wire modifications, nine in all. The status after all these fixes is that most
things have been checked and work. The major component still to be worked on is
the CANbus - Adam hasn't yet checked the TTC distribution, but Bob had checked
this before he left.
One option to consider in making up more boards is that seven of the wire
modifications could be avoided by programming the address decoding differently,
leaving just two wire mods. Alternatively the boards could be re-manufactured
with the fixes. Adam summarised the options and it was felt that it would be
better just to go ahead with the modifications rather than spend more money on
new boards. Adam showed the new CPLD code that had already been designed to
fix the problems.
Tony showed the current version of the schedule for the UK slice elements, but
warned that he felt it was still too optimistic. It is now unlikely that the
CPM would be debugged by the end of February, and the backplane was still being
delayed as Sam was having discussions with the manufacturers about track widths.
It would probably not arrive in the UK until Christmas at the earliest. To allow
three months of slice testing in the UK, the modules would not go for tests in
Heidelberg until summer 2002, but this will probably tie in well with the current
PPM schedule, where the current holding item is the ASIC, which had still not been
submitted for production.
Hardware-related items
Dave had been trying to get the Fujitsu chips to talk together over CANbus by
adapting the existing ELMB software. He has succeeded in getting data across,
but has been having problems with nested interrupts, which he identified as a
problem with the ADC return from an interrupt. He has written new code which
which he thinks has solved the problem. He should then be in a position to
try talking to the TCM over CANbus - Adam is very to try this soon.
Dave had recently heard about a new device from Analog Devices which should
have better software support than the Fujitsu chips, and also are half the
size. The new device does not have much flash memory, but might fit into our
requirements by having one of these on most modules, and an ELMB on the TCM.
The status of ELMB support in ATLAS was queried, and Eric said that although
the official line was still there was no support for other devices,
he had heard that we are not the only group who feel the ELMB is not
well suited to our needs.
Alan had been trying to put some numbers on tolerance to a possible drift in
the LHC clock signal arriving at the ATLAS pit due to its propagation on a
long fibre from the LHC RF area. The problem cannot be corrected
by the TTC system as it is the input to the TTC from LHC that might
drift relative to the beam crossing time. The problems this might cause
are a reduction in the efficiency of BCID in both saturated and non-saturated
cases, and a reduction in Et resolution. Previous studies suggest a +/- 5ns
tolerance, with the saturated pulses being a very hard limit. Note these
studies were done for EM-like pulses, and the situation could be worse for
the faster hadronic signals. Some of this tolerance will already be needed
to cope with already foreseen sources of jitter and timing mis-alignment in
the detectors and tower summing.
Nick had specifically asked what our tolerances would be for jitter,
short-term drift and long-term drift. The worst of these is
short-term drift, for which we have no correction mechanism - long-term
drift can be corrected by a re-calibration. Assuming all other sources
are less than 3ns (TDR figure) then we can accept about 2ns jitter or
drift for a period less than a re-calibration. The actual numbers depend
on the assumed tolerable error rate, with saturated pulses being both
the most error-prone and probably the most important to get right. Alan
noted that the situation would be far worse in a high-luminosity LHC
scenario with a halved beam-crossing time.
Currently the status is that Nick is collecting requirements from many groups,
and there will be discussions about experimental needs and what LHC can do
for us. Nick also asked if we could monitor any drift, to which the answer is
probably yes, given enough statistics, although the detectors themselves are
in a better position to do this. However the general conclusion of the meeting
was that many detector groups must be very sensitive to this drift and there
ought to be some means of minimizing it, either from a better LHC signal, or
use of beam-pickup. For the moment the best way forward is to set a sufficiently
tough requirement on the drift.
It was noted that a member of the Heidelberg group is currently monitoring
calorimeter signals at the CERN test-beams in order to compile a library of
test pulses as a function of detector type and amplitude.
These will be useful both for off-line studies and in the slice test.
Software Status
Murrough started by describing recently reviewed documents on software
requirements and organisation. There was some question as to whether these
should be reviewed externally. The software tasks were broken down into a
list with those responsible named, and there was another list of tasks with
no-one yet assigned.
Murrough then briefly described recent work in the software area, along with
planned developments. Bruce had done a lot of work on setting up the
Concurrent CPUs, and this was well documented and the CPUs now running nicely.
Mainz have ordered another CPU identical to the three Bruce had worked on,
and these four should be enough for the non-preprocessor part of the slice.
Murrough has been working on the run control and database system using the current
version of the standard online software. The skeleton is in place and running
both at QMW and RAL, but many issues have to be resolved before it can be used
to control the hardware. Note that we have a milestone to integrate, at some
level, with the online software before the end of 2001.
The remaining tasks were separated into short, medium and long term. In the
near future, much HDMC development is needed for stand-alone module tests. In the
slightly longer term, for serious module tests, integration of the online,
calibration, test-vectors and monitoring is needed. This must be extended in
the longer term to cover the whole system in a seamless way.
There have been lots of software meetings and a successful video conference with
Oliver which should be repeated. Informal meetings will be held at CERN during
the ATLAS week, and a series of pre-joint meeting discussions are planned for
November. The manpower situation is looking better - there is a new software
person at Mainz (Thomas Trefzger), a new diploma student at Heidelberg,
and a new PhD student at Mainz, who could however be concentrating on firmware
or other areas.
so there are now about 5 FTEs with a good spread throughout the level-1 groups.
However things will still be tough, especially with time divided between
writing software and testing hardware. Tony asked how the ratio of effort
needed to assigned looked (probably similar to before) and whether there was
anything we could do to manage the effort required when hardware arrived or
just take things as they come.
Steve summarized the simulation and test-vector work done since the last
meeting. This included the development of a model for the GTM, addition
of several sets of CP-FPGA test vectors and some code administration matters.
Some time had been taken in actually helping to debug James' FPGA code using
the test-vectors. There was still much work to be done, the most urgent
being the need for documentation and ideas on how to integrate the framework
into the online and test software.
More details about the test vectors themselves was presented. Many of the
aspects of the CP algorithm are now fully exercised, but notably the BC-demux
treatment is not fully tested. Steve then described his experience with
debugging the FPGAs, commenting on how difficult it is to sort out anything
but the most gross errors without resorting to looking inside the 'black box'
with either the VHDL simulation, or tools such as ChipScope for the actual
hardware. Debugging will be much harder than in the past because of the
difficulty in accessing intermediate data and the turnaround time of new
FPGA code. Simulation at the board level rather than the chip level may help
to uncover some problems before they are seen in hardware, and Norman suggested
that a block-by-block analysis of the FPGA functionality may help to pinpoint
faults.
Norman presented the current plans for how to collect data from the RODs in
the slice system. He briefly showed the known inventory of parts then went
on to describe recent thoughts on readout of the CTPD and CMM outputs, the
problem being that a) the CTPD does not have enough inputs for all the CMM
outputs and b) the CTPD does not have a ROD equivalent. In order to read the
CTPD inputs, a CPU with an S-Link mezzanine is required to collect the data
from the FIFOs, and an even more messy system with DSS modules and special
firmware would be required for the CMM outputs. It is proposed to rely on
the slice readout for CMMs in general, with separate tests to ensure the
real-time outputs are correct.
With this information in mind, an estimate of the S-Link numbers and data
rates can be made using certain assumptions. Norman's numbers were eight
S-Links with a total data rate of about 400 Mb/s. In order to house
eight S-Links, PCI expansion crates would be needed but the data rates
claimed for the PCI bus are too small to cope with the full slice. It will
be possible to do debugging and some tests at low rates and with just a
few modules being read, but to test at high rate, we can only check a subset
of events, and we need a mechanism to select those events - note software
could only check a fraction of the events in real time anyway. If a random
selection of events is made, we only get a relatively low sensitivity to
errors. Use of the DSS to select events with errors would increase sensitivity,
but it is not easy to feed the DSS flag back into the selection.
There
are two possible ways to read events at 100kHz. One is via ROD crate DAQ,
ignoring the ROS, the alternative is to use the full ROS/ROBIN architecture to
selectively read out events of interest. RHUL is currently developing ROBINs
that could be used in this scheme, but they would have to be thoroughly tested
before we could be sure of using them. However, this setup should be far more
like the final system than any other solution. Norman then detailed how to
run a system in this configuration in terms of triggering bursts of events and a
mechanism for selecting data to be studied. It is potentially very worrying that
we will have to use a large empire of new hardware and software in order to work
this solution.
Physics simulation
Ed said that the code development was progressing with informal code
reviews with Murrough. Some re-writing of the old code was required, and the
jet trigger should be ready by the next meeting. Other areas that need work
are the Et trigger, outputs to CTP and the trigger menu.
Athena still appears to be
fairly poorly documented and difficult to install - working outside CERN
appears to be very difficult. Ed had been complimented
that his web pages were some of the most useful available! The release cycle
seems to be poorly defined, with the current version not supporting StoreGate,
a mechanism Ed had recently changed the code to use.
Steve gave a quick summary of the Lund workshop, highlighting topics of
direct interest to the group. In the general introduction, the current plans
for startup of LHC and also the detector staging for 2006 were shown. There
was also a talk on prospects for Super-LHC, with the likely scenario being a
reduction in beam crossing time. This prompted some discussion of the ability
of both trigger and detectors to cope with this change. In the detector
performance section, an analysis of the potential for muon tagging at level-1
with the tile calorimeter was presented.
On the physics side,
one topic of note was the Higgs production via Weak Boson Fusion which has
prompted the recent flurry of activity in the forward jet tagging area.
Benjamino Di Girolamo presented his analysis of the additional level-1 rate
from forward jet tagging, which was very reasonable at 0.22Hz. The PESA
meeting concentrated on new analysis of HLT rate in previously ignored
areas, specifically thinking about luminosity and prescaled triggers.
Out of the rut
Gilles presented some of the material from the LEB most relevant to
our project. The news from LHC was that they can't say yet that they
can't make it on time! There were some questions raised on our
talks, one questioning whether we could put so many S-Links on a ROD,
and another the use of G-Links. There were a many talks of interest
about various aspects of FPGAs, including experience from other
experiments.
There was a interesting talk about the
challenges and future of microprocessors. The scale is still increasing
rapidly, although problems of yield, stability of power supply
and sensitivity to particles are becoming more of a problem as devices
become smaller. Another problem is that the compilers are not
keeping up with the designs. On FPGAs, one point to note is that
the lifetime of devices is so short that ours are already considered
mature. There were useful talks about BGA reworking, JTAG strategies,
and some interesting ideas on building FPGA redundancy into a board, and
automated checking of the loaded code. On the final day a discussion
was held where many of the topics were relevant to us, including how to
keep up with rapidly developing technology and the
problems of manpower stability.
Tony showed some of the highlights from the Instrumentation Department
Forum, at which a few outside visitors like himself were able to see what
developments were being worked on at RAL. There was an impressively
diverse range of detector/device development going on, covering a wide
range of physics research. There were several interesting developments
in the area of fundamental research in space with several projects being
supported. There was also a nice presentation of particle physics at
CMS and many talks on precision detectors of various types.
Coming Reviews
Eric previewed the ASSO to be held on the Tuesday of the ATLAS week.
Other than the main set of presentations, the process also consists
of some prior consultation, with the emphasis being on management and
integration issues. On the day of the meeting all parts of the system
will be presented, but the time assigned to level-1 is short, giving
both Paul Hanke and Tony only 10-12 minutes each to summarise the
situation for the
preprocessor and cluster processors. The review panel have split the
system into major components for each reviewer to concentrate on -
Bill Cleland is the reviewer for level-1. Much supplementary information
has been requested on the PBS and schedules as background. The aim
is to be constructive and produce a list of actions where problems are
identified.
Bill Cleland suggested an outline for the level-1 talks to concentrate
on deliverables, organisation, interfaces, milestones, QA and results.
Preliminary talks were presented at a video conference including Bill
and Nick, at which Bill already picked up on a few points. The areas
of concern include software manpower, TileCal receivers, PRR timings
for ASICs and MCMs, documents specifying signal organisation, QA,
calorimeter calibration liaison and lack of documents in EDMS.
Tony presented his version of the talk updated according to Bill's
wishes from the video conference. This version closely followed
Bill's outline, and can be found by clicking on the link above.
Comments were invited, and the final version, as delivered at the
ASSO on Tuesday 16th October can be found by clicking
here.
Imminent administrative work is also required for the PPARC
mid-project review of ATLAS and CMS.
All projects in ATLAS must submit a preliminary report to the
UK ATLAS group meeting on 26th October to serve as a basis for the
full UK ATLAS status. One of the most important inputs is the
manpower estimates, which are tied to grant reviews.
Another important area is to report on the progress on
deliverables. This is a little awkward for us since the deliverables
as defined to PPARC at the start of the project have been changed considerably
in our current architecture and sharing of responsibilities between
the UK and other institutions. This will have to be explained in
a relatively short status update. The slippage in schedule for
LHC, ATLAS and the individual projects means that in general, progress
and expenditure is behind the original estimates.
A careful
re-assessment and presentation of estimated costs is required since this
early in production it is difficult to be sure about module costs,
and it is clear we need more engineering effort on FPGA
firmware design. Costs will be examined very closely in view of the
recent news of a short-fall in funding for LHC. Eric had an almost
complete draft which would be circulated for comments before submission
to the ATLAS collaboration board.
Coming meetings
Many people were going to attend the next T/DAQ plus ATLAS week on
15th to 19th October and several
official and unofficial meetings were planned. Other than the ASSO
review, there would be the usual meetings plus lots of smaller gatherings
to discuss specific issues. These include a talk with Bill Cleland about
cabling issues, Yuri Ermoline about a patch panel to the CTPD, software
discussions and level-1 dataflow.
Norman commented that the registration was going slowly. Accommodation
would not be in Coseners House, but instead in the Travel Inn in Abingdon,
which is actually cheaper. The agenda will take shape soon, and the
Thursday/Friday issue for the meal/management meeting would be decided.
Software discussions are to be held before the main meeting.
The T/DAQ group would be holding their annual `outside CERN' meeting in
NIKHEF, in the week after the RAL joint meeting. As usual the
emphasis is on HLT, but a few UK level-1 people would go (at least
Murrough and Norman). There is one level-1 specific meeting plus
DIG and the institutes board.
Eric was congratulated on reaching a major age landmark and suitable
celebrations were held.
8th - 10th November .............. Joint meeting at RAL
Wednesday 28th November .............. UK meeting at RAL
|