CERN

ORGANISATION EUROPEAN POUR LA RECHERCHE NUCLEAIRE

EUROPEAN ORGANIZATION FOR NUCLEAR RESEARCH

.

Testing HIPPI Switch Configurations for Event Building Applications

Arie Van Praag, Ralf Spiwoks, Robert van der Vlugt,

CERN / ECP 96-15
(18 September 1995)

Abstract

Current plans for Atlas Event Building are centered around using high performance switching technology composed of either one switch or a network of switches. The bandwidth on this level is calculated to be from 1 to 10 GBytes/s with a channel speed to the processor farm ranging from 10 to 100 MBytes/s. The High Performance Parallel Interface ( HIPPI ), a well established technology with cheap interfaces and fast switches and a data rate of 100 MBytes/s, is a serious candidate to be used in the event building. Extensive tests have been performed with Data Sources running at 40 MBytes/s. However few tests have been done with HIPPI switches, near the 100 MBytes/s HIPPI specification. The following series of tests will give a more detailed view of switch behavior in an event building function. Up to the limits of available material, measurements will be done with a single and a double switch configuration.

Presented at the SOZOPOL-96 workshop on Relativistic Nuclear Physics, Sozopol, Bulgaria, 30 September - 6 October 1996

Introduction
Test Methods and Materials

Slate as Data Source
Switches
Test Configurations
Measurements and Definitions

Simulation
Test Results

3 X 3 and 6 X 6 Configurations
Comparison to the Simulations
Tests with 2 Switches

Conclusion

Event building with Single Switches
And with double switches

Future Evaluations
Acknowledgment
References


Introduction

The coming LHC (Large Hadron Collider) is planned to produce a bunchcrossing in the detectors every 25 nsec. corresponding to a frequency of 40 MHz. The generated data rates reach values in the order of TBytes/s. As only a fraction of the obtained data will be of interest, filters with several levels of triggers are foreseen, so that only the selected data will be stored for analysis. The Atlas architecture [1] uses three levels (LVL1, LVL2 and LVL3) as shown in Fig: 1.

At LVL1, special purpose processors act on reduced granularity data, obtained from a selected subset of the detectors, to define the so called "Regions Of Interest" (ROI). The LVL2 trigger uses full granularity and full precision data from most of the detectors, but only examines the data identified as a ROI by LVL1. From LVL2 the data of a single event goes via many parallel channels to LVL3, where the full data is examined, selected and recorded for further analysis. The Atlas data acquisition architecture uses large data switches to build the event in a single processor of the LVL3 processor farm. To fulfill the requirements for this part of the data processing a number of promising technologies are now appearing on the market. One of those which is available and fullfills all the conditions, including for the switches, is HIPPI. [2.3]. These switches exist in sizes up to 32 X 32 channels, with 128 X 128 channels announced for the near future. Switch Latency is generally under 1 msec. The bandwidth is 100 MBytes/s for each channel. Tests with HIPPI for event building have been done with data generators of up to 40 MBytes/s using 3 channels and a switch [4]. Switch behavior at higher speeds will be verified in these series of tests, including the combination of two coupled switches. The obtained results will be compared with simulated data.

Test Methods and Materials

The test set-up is a simplified application of the event builder (Fig: 2.) The limits are determined by the size of the available switches and by the number of HIPPI data sources that are capable to generate data rates between 80 MBytes/s and 100 MBytes/s. The standard HIPPI flow control is at the same time respected. The block size of the

Fig: 2. Test Configuration

data that represents the sub-event must be selectable from 0.5 KBytes to 1 MByte. The sub-event size will be incremented in a 1-2-5 sequence. As sources Slate 2 will be used, which is a preloaded memory with output sequencer. [5][6], and for the Destinations Neddi's( Never EnDing Destination Interface ) and HIPPI test equipment.

Slate as Data Source

From a size of 1 MByte sub-events down to 5 KBytes, the Slate data rate is higher than 90 MBytes/s. Sub-events with sizes between 5 KBytes and 2 KBytes use Short Bursts. This introduces a start-up overhead for each Request cycle that slows down the data rate. The sub-events sized from 1 KByte to 0.5 KBytes are fixed data blocks. The only limiting factor is the start-up overhead. A synchronous master-slave connection, where one master starts all other slaves in parallel by a hardware connection, is used to start the data stream.

Fig: 3 Slate Data Stream

The data set for each Slate is a string emulating sub-events equal to the number of Destinations (Fig:3). Each of the Destinations is addressed using the HIPPI "Logical addressing mode", the "camp on" option is set to avoid time-outs.

The destinations

The Destinations that replace the inputs of a processor farm should be able to sustain a data rate close to 100 MBytes/s. At least one Destination must be able to test data integrity. Test points for measurements need to be present on a number of Destinations. For those without data checking NEDDI's can be used. These devices are able to handle the HIPPI handshake; the HIPPI data lines however are not connected. The complete device is build in a standard HIPPI connector.

For the control of data integrity HIPPI test equipment can be used (Table 1). The latter have test points and data verification. Both are capable to support a 100 MBytes/s data rate. In a later stage a VMEbus processor or PCI bus workstation [8] will be connected to at least one of the channels.

Switches

The HIPPI switches are standard commercially available products made by two different manufacturers. The maximum configurations needed for these tests are 6 X 6 channels and two times 5 X 5 channels. Plug-in modules can be exchanged to adapt to these configurations. The data rate per channel is 100 MBytes/s and switch latency is shorter than 1 msec. An overview of the equipment available is given in Table 1. One type of switch has the possibility to program up to 8 routes for the same Logical Address, wich, is very usefull to couple two or more switches.

Source

Destination

Switches

Type

No

Type

No

Type

Size

Multichannel

Slate 2

6

NEDDI

4

Avaika

8 X 8

Yes

   

CERN HIPPI Testbox

2

Avaika

3 X 3

Yes

   

Avaika Tester

2

Essential

4 X 4

No

 

Processors

Motorola VME Processor

max. throughput 40 MByte/s

Digital Alpha 500 Maverick

90 MByte/sec ( PCI - HIPPI )

Table 1

Test Configurations

From the many possibilities to test switches some are of particular interest. A single switch in a 3 X 3 configuration (Fig: 4. A) is a good start because it corresponds

Fig: 4 Test Configurations

to tests done previously. It should be followed by the largest configuration possible, being 6 X 6 (FIG:4. B). As Atlas plans to use very large switch fabrics, the combination of two cascaded switches interconnected by one or by two full duplex interconnections is also very interesting (Fig: 4. C, D). In the latter variant it is possible to let the internal priority selection of the switch choose the interconnection.

Measurements and Definitions

Measurements are done by means of an oscilloscope connected to the Destinations. In later extensions a processor can measure the period of time between the moment it starts the Slates and the reception of a predefined number of complet events. For measurements and results the following definitions are used: (Fig: 5.)

Sequence:

The data of sub-events with different labels (I-Field) generated by one Source module

 

 

Event:

The sorted data with the same label (I-field) of a number of parallel sequences as received at one Destination module.

 

 

Round:

All sorted events with the same label (I-Field) of one Sequence from every Source involved, received at the Destinations.

Fig: 5 Test Definitions

Simulation

The simulation program is based on MODSIM II [9], a commercial object oriented language for discrete event simulation. The program [4] implements an event generator, the event building sources and destinations and a simple model of the switch whose parameters represent the transfer speed as a linear function of the sub-event size. This model can simulate different sub-event size distributions, different control schemes and can be fully configured for the number of event building sources and destinations including their corresponding parameters, in particular

Fig: 6. Simulated data rate versus Measured data rate

the times for processing an event (sub-event). As a first step the source data rate had to be defined in such a way that it comes close to the values measured on the Slate. Good comparison is obtained with a link speed parameter set to 88.7 MBytes/s and an overhead of 2 msec per subevent

Test Results

For each test result a table and a chart (annexe B) are produced. For every experiment using a different file size, a reference is included and measured with a single Slate sending data directly into a Destination. The recorded times are recalculated to data rates taking into account the total amount of data present in the channel during an event or a round. To clarify some of the observed effects a series of oscilloscope pictures are given in appendix A (OP 1-8).

3 X 3 and 6 X 6 Configurations

For experiments using a single switch in 3X3 and 6X6 configurations no particular problems occurred. A single round is shown in OP 1. The resulting data-rate is relatively close to what the data generators produce. A first point is that with partial events smaller than 5 KBytes, both switches slow down the data rate as compared to the reference. The region between 0.5 KBytes up to 5 KBytes is that where Short Bursts appear in the data structure. It can thus be concluded that the switches have a slightly longer latency time as soon as Short Bursts have to be handled. This effect is less pronounced in the Avaika switch than in the Essential one. The second point is that the data rate to the Destinations seems to

Fig: 7. Single Switch Configurations 3X3 and 6X6

be higher for sub-events ranging between 5 KBytes and 100 KBytes compared to that produced by the data generators (Fig: 7.). Is this true, is it a measurement error or is it both ? The latency of the Slate is equal to 20 X 40 nsec for Packets, between Request cycles it is equal to 30 X 40 nsec. The switch latency for this two constants agrees to the minimum HIPPI specification of 4 X 40 nsec. OP 2 shows that during start-up of the first channel, the different channels have to wait before delivering their first sub-event, thus filling the pipeline. It is the latency differences between the slower data generator and the faster switch, together with the lost time during start-up that induces this effect. Extending the measurement over a long time span would compensate for this effect. With six data channels active and sub-events of 50 KBytes and bigger gaps in the data stream become visible on the points where the switch has to wait for the generators (OP 3).

Comparison to the Simulations

Theoretical and experimental data have a correlation coefficient greater than 0.90 for Slate and 0.95 for the switches. (Table 1 and 2, Fig: 6 and 9). Indeed the small differences are due to the fact that the Slate data rate is not constant but varies with the sub-event size. In addition it seems that the overhead of the 2 switches is different. This value must thus be adapted to the switch model. A more exact timing of the data

Sub-event Size

Source Simulated

Slate Measured

Avaika Simulated

Avaika Measured

Essential Simulated

Essential Measured

 

 

 

 

 

 

 

0.5

65

74.56

61.7

64.65

55

57.66

1

75

85.33

74.8

78.47

70

73.35

10

87.1

90.35

88

92.25

88.1

92.42

100

88.5

90.82

87.5

91.76

87.1

91.35

1000

88.7

90.89

86.9

91.1

86.8

91.02

Table 2: Simulation versus Measurements in a 3 X 3 configuration

generation can only be done when the parameters of the Slate are measured more accurately. In a next step the Simulation program will be used to analyze set-ups with cascaded switches. The parameters from the first step will be used and the agreement between measurements and simulations tested. From the moment that the behavior of model is equivalent to the switch itself, the program can be used to simulate a realistic setup needed for the Atlas trigger and data acquisition system.

Tests with 2 Switches

Using the same Slate set-up as for the 6X6 configuration the inputs are now distributed symetrically over two switches. With a single full duplex connection between the switches a bottle neck is created that provokes data congestion. Network specialists generally believe that the expectations of performance for data switches in a shared situation should not exceed, on a sustained basis, more than

Fig: 8. 6X6 channels divided over two switches with one and two interconnections

50% media speed per port [9][10]. As the data structure we use for this test is very regular the results measured are only - 40 % (Fig: 8.) of those observed for a single switch. It can also be expected that the average data rates go up if the cross connection between the switches consists of more than one duplex connection. The measurements show that the data rate does not differ significantly from the single switch. The ratio between the demanded channels and those available is 3 to 2, or in other words amply exceeds 50 %. Such free channels are available without long waiting times, explaining the good results in this configuration. With different ratios between the demanded channels and the available crossconnetions between the switches the obtainable bandwidth will be different.

It should also be mentioned that the feature of the Avaika switch to program a logical address to up to 8 outputs Sources greatly helps. In fact it gives the switch the possibilitiy to choose a free channel for rapid communication by itself.

Conclusion

During the tests with a single switch the HIPPI equipment has worked in agreement with the specifications. Some particularities emerged as soon as Short Bursts are used, due to the fact that the switch latency gets more visible. These results were confirmed by different measurements outside CERN [11][12]. The noted effect is less pronounced with the Avaika switch than with Essential switch. It can be concluded that the latter one has a slightly longer latency.

Event building with Single Switches

It is shown that with a single switch, event building can be done at speeds as high as 92 MBytes/s/channel. The throughput with 6 Slates as sub-event generators has datarates over 540 MBytes/s. In both cases the limit is not HIPPI but the Slate data generators. As the correspondence with the computer models is satisfactory (Fig: 9.) further complex simulations are foreseen to define more precisely the detector architecture at the LVL3 level.

Fig: 9. Simulation switch data rate versus measured switch data rate

And with double switches

In the case that 2 switches are connected in parallel, the HIPPI equipment behaves as it would be expected: event building data rates have been observed of up to 89 MBytes/s/channel which represents a total bandwidth of almost 540 MBytes/s. It has to be said that these values are flattered by the ratio of data generators available to the number of used cross connections.

Future Evaluations

These tests show that HIPPI is successful in event building at very high data-rates; however more experiments need to be done to show that it can easily be adapted in large High Energy Physics detectors. Fast HIPPI interfaces for the PCI bus exist, the same modules with a PMC form factor become available. It is important to continue these tests in a VMEbus environment where one or more channels drive sub-events into a switch using PMC HIPPI sources. After the switch PMC HIPPI Destinations deliver the built event into a host memory. If the speed of the PCI bus in the VMEbus units is insufficient, an interim solution is possible with the same interfaces as PCI modules and a workstation as host.

Acknowledgment

The authors wish to thank Gigalabs (formerly Avaika Networks Corporation) and Essential Communications for the switches they made available to do this tests. Thanks go to Robert McLaren of CERN for the many useful discussions during the preparation of these tests. Thanks also to all those who have made their Slate modules available to us for these tests.

References

[1]

Atlas Technical Proposal, CERN/LHCC/94-43, LHCC/P2, 15 December 1994, ISBN 929083067-0, WWW: http://atlasinfo.cern.ch/ Atlas/GROUPS/TP/TP_ps.html

[2]

High Performance Parallel Interface Mechanical, Electrical, and Signalling Specification, HIPPI PH, ANSI X3.183-1991 Rev8.3.

[3]

High Performance Parallel Interface Mechanical, Electrical, and Signalling Specification, HIPPI SC, ANSI X3.210-1992, Rev 4.4.

[4]

Ralf Spiwoks, Evaluation and Simulation of Event Building Techniques at the LHC, Ph.D Thesis. University of Dortmund, Germany, 1995 (CERN-Thesis- 96-002)..

[5]

Introduction to the Slate Hardware, http://www1.cern.ch/HSI/hippi/datagen/slate/slatehrd.htm

[6]

Slate2 Instruction Manual ERT 2.28.02.0 Version 1.00, 30 September 1994, http://www1.cern.ch/HSI/hippi/datagen/slate/slatehrd.htm

[7]

HIPPI Developments for CERN Experiments, A.Van Praag, et al, CERN/ECP 91-28, 7 November 1991. Presented at IEEE NSS 1991, http://www.cern.ch/HSI/hippi/applic/otherapp/hppidef.htm

[8]

Overview of the use of the Pci Bus in Present And Future High Energy Physics Data, Acquisition Systems, Arie Van Praag et al, CERN Geneva, CERN/ECP 95-4, 3 January 1995, http://www.cern.ch/HSI/hippi/applic/pcihippi/pcihippi.htm.

[9]

Gigabit Lessons for ATM from HIPPI, James P. Hughes, Network Systems Corporation, Minneapolis MN, September 1993.

[10]

Switching techniques in data acquisition systems for future experiments. M.F. Letheren, CERN Geneva Switzerland. Presented: CERN summerscholl of computing 1994.

[11]

Low Cost, High Performance LAN's Constructed with Serial HIPPI, G. Mcalpine, McAlpine Research and Development Inc.and Dr. J.R. Wilson, Avaika Network Corporation. January 1995. Presented at Interop March 1995.

[12]

High-Speed networking at RUS, Peter Haas, Regional Computer Center, University Stuttgart, 1995.

This is one of the CERN High Speed Interconnect pages - 27 June 1996 - Arie van Praag