A Computational and Data Challenge for Future INFN Experiments; a Grid approach

V2.0

9 July 2001

Abstract
This document describes the INFN Project for the future Experiment Computing and data challenge. The project is strongly related to Grid concepts to be developed in the near future either in a common European project proposal (in preparation) or in a standalone INFN program.

Coordination within the HEP World in Europe and in the other World Countries will be pursued.

The document is dedicated to the specific Italian (INFN) activities and plans, in order to test, implement and commit the resources needed for coming Experiments' data Analysis and Processing.

INDEX

1.2 Objectives of the INFN-Grid project *

2. Project Overview *

2.1 The LHC Collaborations and Grid project *

2.2 Building the INFN-Grid in the context of an European and World-wide Grid *

2.3 Grid architecture *

2.4 Grid services exploitation *

2.5 INFN-Grid additions to the tools and services developed in DataGrid *

2.6 The role of INFN-Grid in the INFN Computing *

2.7 The role of INFN-Grid for the Computing of LHC Experiments *

2.8 LHC Regional Center Prototypes, activities foreseen for 2001-2003 and resources needed *

2.9 Other Experiments and Projects *

2.9.1 APE *

2.9.2 VIRGO *

3. Workplan *

3.1 Work Package 1 – Installation and Evaluation of the Globus toolkit *

3.1.1 General Description of the Package *

3.1.2 Work plan *

3.1.3 Resources *

3.2 Work Package 2 – Integration with DATAGrid and Adaptation to the National Grid Project *

3.2.1 Task 2.1 - Workload Management *

3.2.2 Task 2.2 - Data Management *

3.2.3 Task 2.3 – Grid Application Monitoring *

3.2.4 Resources *

3.2.5 Task 2.4 - Computing Fabric & Mass Storage *

3.3 Work Package 3 – Applications and Integration *

3.3.1 General Description of the Package *

3.3.2 Work plan *

3.3.3 Resources *

3.4 Work Package 4 – National Test Beds *

3.4.1 Introduction *

3.4.2 Testbed baselines *

3.4.3 Experiments' Grid requirements *

3.4.4 Distributed, Hierarchical Regional Centers Architecture *

3.4.5 MONARC activities in INFN-Grid *

3.4.6 Capacity targets for the Test-bed *

3.4.7 Layout Computing Model *

3.4.8 Test-bed deployment *

3.4.9 INFN-Testbed layout *

3.5 Work Package 5 – Network *

3.5.1 LAN technologies *

3.5.2 Data Transport Protocols *

3.5.3 Quality of Service *

3.5.4 New communication technologies *

3.5.5 Work plan *

3.5.6 Resources *

4. Costs evaluation *

4.1 Baseline for costs *

4.1.1 Personnel *

4.1.2 Hardware *

5. Management and Organization *

6. Summary *

6.1 Materials and consumables costs for LHC experiments in ML *

6.2 Materials and Consumables costs for VIRGO (in ML) *

6.3 Milestones and Deliverables *

7. References *

8. APPENDIX A: Experiments’ Computing Documents *

8.1 ALICE *

8.1.1 Introduction *

8.1.2 Prototype outline *

8.1.3 The Resources Required *

8.2 APE *

8.2.1 A test case for the Grid project: analysis of Lattice QCD data *

8.3 ATLAS *

8.3.1 Status and plans for global ATLAS Computing *

8.3.2 Plans for ATLAS Computing in Italy *

8.3.3 Plans for ATLAS Trigger Studies *

8.3.4 Plans for ATLAS Jet Simulation Studies *

8.4 CMS *

8.4.1 Introduction *

8.4.2 Objectives *

8.4.3 First Phase: developing the Computing System (years 2000-2003) *

8.4.4 Infrastructure *

8.4.9 Space and local support *

8.4.10 Resource sharing (First Phase) *

8.4.11 Computing Costs *

8.4.12 First Phase: developing the Computing System (years 2000-2003) *

8.4.13 Second Phase: implementation of the Final CMS Computing System (years 2003-2005) *

8.4.14 Third Phase: running of the CMS Computing System and Upgrades (years 2005à ) *

8.5 LHCb *

8.5.1 Differences between LHCb model and MONARC model *

8.5.2 The role of Grid software *

8.5.3 HEP applications (Grid use-cases) *

8.5.4 Test-bed *

8.5.5 Plan for the acquisition of equipment *

8.6 VIRGO *

8.6.1 Introduction *

8.6.2 The test for inspiraling binary search for VDAS *

8.6.3 The test for pulsar search for VDAS *

8.6.4 Distribution of the computational resources. *

8.7 ARGO *

8.7.1 Introduction *

8.7.2 Computing model *

8.7.3 Use-case and test-bed *

8.8 COMPASS *

8.8.1 Introduction *

8.8.2 Motivation for a WAN *

8.8.3 COMPASS as a TEST-BED *

8.8.4 Human Resources *

8.8.5 Work plan *

9. APPENDIX B: Financial requests for 2000 and 2001 *

9.1 Summary of the financial requests by WP: *

9.2 Summary of the financial requests by sites and experiments *

9.2.1 2000: Consumables and Materials for services (ML) *

9.2.2 2000 : Materials for experiments *

9.2.3 2001: Consumables by sites and experiments (ML) *

9.2.4 2001: Materials by sites and experiments (ML) *

9.3 Internal and external travels for sites and Servizi Calcolo +technical personnel (Information Technology Group) in ML *

9.3.1 Internal and external travels for ALICE *

9.3.2 Internal and external travels for ATLAS *

9.3.3 Internal and external travels for CMS *

9.3.4 Internal and external travels for LHC-b *

9.3.5 Internal travels for VIRGO *

9.3.6 Internal and external travels for APE *

9.3.7 Internal travels for ARGO *

9.3.8 Internal travels for COMPASS *

10. APPENDIX C: Personnel *

Sites and Authors

BARI: Giorgio Maggi, Marcello Castellano, Paolo Cea, Leonardo Cosmai, Maria D’Amato, Domenico Di Bari, Rosanna Fini, Emanuele Magno, Maria Mennea, Vito Manzari, Sergio Natali, Giacomo Piscitelli, Lucia Silvestris, Giuseppe Zito.

BOLOGNA: Paolo Capiluppi, F. Anselmo, L. Bellagamba, Daniela Bortolotti, G. Cara Romeo, Alessandra Fanfani, Domenico Galli, Roberto Giacomelli, Claudio Grandi, Marisa Luvisetto, Umberto Marconi, Paolo Mazzanti, Tiziano Rovelli, Franco Semeria, Nicola Semprini Cesari, GianPiero Siroli, Vincenzo Vagnoni, Stefania Vecchi.

CAGLIARI: Alberto Masoni, Walter Bonivento, Alessandro DeFalco, Antonio Silvestri, Luisanna Tocco, Gianluca Usai.

CATANIA: Roberto Barbera, Franco Barbanera, Patrizia Belluomo, Ernesto Cangiano, Salvatore Cavalieri, Enrico Commis, Salvatore Costa, Aurelio La Corte, Lucia Lo Bello, Orazio Mirabella, Salvatore Monforte, Armando Palmeri, Carlo Rocca, Vladimiro Sassone, Giuseppe Sava, Orazio Tomarchio, Alessia Tricomi, Lorenzo Vita.

CNAF: Federico Ruggieri, Andrea Chierici, Paolo Ciancarini, Luca Dell’Agnello, Tiziana Ferrari, Luigi Fonti, Antonia Ghiselli, Francesco Giacomini, Alessandro Italiano, Pietro Matteuzzi, Cristina Vistoli, Giulia Vita Finzi, Stefano Zani.

COSENZA: Marcella Capua, Laura La Rotonda, Marco Schioppa.

FERRARA: Michele Gambetti, Alberto Gianoli, Eleonora Luppi.

FIRENZE: Raffaello D’Alessandro, Leonardo Bellucci, Elena Cuoco, Piero Dominici, Leonardo Fabroni, Giacomo Graziani, Michela Lenzi, Giovanni Passaleva, Flavio Vetrano.

GENOVA: Carlo Maria Becchi, Alessandro Brunengo, Giovanni Chiola, Giuseppe Ciaccio, Mauro Dameri, Bianca Osculati

LECCE: Giovanni Aloisio, Massimo Cafaro, Salvatore Campeggio, Gabriella Cataldi, Lucio Depaolis, Enrico M.V. Fasanelli, Edoardo Gorini, Martello Daniele, Roberto Perrino, Margherita Primavera, Surdo Antonio, Franco Tommasi.

L.N.LEGNARO:Gaetano Maron, Luciano Berti, Massimo Biasotto, Michele Gulmini, Nicola Toniolo, Luigi Vannucci.

MILANO: Laura Perini, Mario Alemi, Claudio Destri, Roberto Frezzotti, Giuseppe Lo Biondo, Marco Paganoni, Francesco Prelz, Francesco Ragusa, Silvia Resconi.

NAPOLI: Paolo Mastroserio, Fabrizio Barone, Enrico Calloni, Gian Paolo Carlino, Sergio Catalanotti, Rosario De Rosa, Domenico Della Volpe, Giuseppe Di Sciascio, Alessandra Doria, Antonio Eleuteri, Fabio Garufi, Paolo Lo Re, Leonardo Merola, Leopoldo Milano.

PADOVA: Mirco Mazzucato, Simonetta Balsamo, Marco Bellato, Fulvia Costa, Roberto Ferrari, Ugo Gasparini, Stefano Lacaprara, Ivano Lippi, Michele Michelotto, Maurizio Morando, Salvatore Orlando, Paolo Ronchese, Ivo Saccarola, Massimo Sgaravatto, Rosario Turrisi, Sara Vanini, Sandro Ventura.

PARMA: Enrico Onofri, Roberto Alfieri.

PAVIA: Claudio Conta, Carlo De Vecchi, Giacomo Polesello, Adele Rimoldi, Valerio Vercesi.

PERUGIA: Leonello Servoli, Maurizio Biasini, Ciro Cattuto, Luca Gammaitoni, Fabrizio Gentile, Paolo Lariccia, Michele Punturo, Roberto Santinelli.

PISA: Flavia Donno, Silvia Arezzini, Giuseppe Bagliesi, Giancarlo Cella, Vitaliano Ciulli, Andrea Controzzi, Davide Costanzo, Tarcisio Del Prete, Isidoro Ferrante, Alessandro Giassi, Fabrizio Palla, Fabio Schifano, Andrea Sciabà, Andrea Vicere’, Zhen Xie.

ROMA: Luciano Maria Barone, Daniela Anzellotti, Claudia Battista, Marco De Rossi, Alessandro De Salvo, Marcella Diemoz, Speranza Falciano, Sergio Frasca, Alessandro Lonardo, Egidio Longo, Lamberto Luminari, Ettore Majorana, Francesco Marzano, Andrea Michelotti, Giovanni Mirabelli, Aleandro Nisati, Giovanni Organtini, Cristiano Palomba, Enrico Pasqualucci, Fulvio Ricci, Davide Rossetti, Roberta Santacesaria, Alessandro Spanu, Enzo Valente.

ROMA II: Paolo Camarri, Anna Di Ciaccio, Marco Guagnelli.

ROMA III: Severino Bussino, Ada Farilla, Cristian Stanescu.

SALERNO: Luisa Cifarelli, Carmen D’Apolito, Mario Fusco Girard, Giuseppe Grella, Michele Guida, Joseph Quartieri, Alessio Seganti, Domenico Vicinanza, Tiziano Virgili.

TORINO: Luciano Gaido, Nicola Amapane, Cosimo Anglano, Piergiorgio Cerello, Susanna Donatelli, Antonio Forte, Mauro Gallio, Massimo Masera, Roberto Pittau, Luciano Ramello, Enrico Scomparin, Mario Sitta, Ada Solano, Annalina Vitelli, Albert Werbrouck.

TRIESTE: Franco Bradamante, Enrico Fragiacomo, Benigno Gobbo, Roberto Gomezel, Massimo Lamanna, Anna Martin, Stefano Piano, Rinaldo Rui, Claudio Strizzolo, Lucio Strizzolo, Alessandro Tirel.

UDINE: Giuseppe Cabras, Alessandro De Angelis

Foreword

High Energy Physics Experiments have always requested state of the art computing facilities to efficiently perform the analysis of large data samples and some of the ideas born inside the community to enhance the user friendliness of all the steps in the computing chain have been successful also in other contexts: one striking example is the World Wide Web.

The previous generation of experiments for the electron positron collider, LEP, has proven the effectiveness of the computing farms based on commodity components in providing low cost solutions to the LEP experiments needs and farms of this type are now very popular and distributed in most INFN sites and world-wide.

Recently in the US the concept of Grid computing has been proposed as a new advance to build on top of internet and the WEB. The Grid make profit of the increasingly high bandwidth which connect the various computing resources of the scientific communities to make them transparently available to the users. This is achieved by a new software infrastructure, the Grid middleware, that once developed will provide standard tools to run applications and to manage and access very large data sets distributed in remote nodes.

The Grid concept

The Grid idea is well described in an already famous book: "The Grid: Blueprint for a New Computing Infrastructure" Edited by I. Foster and C. Kesselman, Morgan Kaufmann, 1999

(ISBN 1-55860-475-8).

Several US Grid related Projects where started in these last two years, the most relevant ones for this proposal are:

Globus^[1]

The Globus project is developing basic software services for computations that integrate geographically distributed computational and information resources. Globus concepts are being tested on a global scale on the GUSTO testbed.

Condor^[2]

The goal of the Condor project is to develop, implement, deploy, and evaluate mechanisms and policies that support High Throughput Computing (HTC) on large collections of separately owned and distributed computing resources.

The Particle Physics Data Grid^[3]

It "has two long-term objectives. Firstly the delivery of an infrastructure for very widely distributed analysis of particle physics data at multi-petabyte scales by hundreds to thousands of physicists and secondly the acceleration of the development of network and middleware infrastructure aimed broadly at data-intensive collaborative science".

The GriPhyn project^[4]

The GriPhyN (Grid Physics Network) collaboration proposes an ambitious program of computer science and application science research designed to enable the creation of Petascale Data Grids, systems designed to support the collaborative analysis, by large communities, of Petabyte-scale scientific data collections.

NASA's Information Power Grid^[5]

NASA's Information Power Grid (IPG) is a testbed that provides access to a Grid; a widely distributed network of high performance computers, stored data, instruments, and collaboration environments. The IPG is one component of an effort by the university, commercial, and government computer science communities to define, develop and build Grids that will enable routine use of widely distributed resources for the diverse purposes of research and education, engineering, design, and development.

Objectives of the INFN-Grid project

The objectives of the INFN-Grid project are to develop and deploy for INFN a prototype computational and data Grid capable to efficiently manage and provide effective usage of the large commodity components-based clusters and supercomputers distributed in the INFN nodes of the Italian research network GARR-B.

These geographically distributed resources, so far normally used by a single site, will be integrated using the Grid technology to form a coherent high throughput computing facility transparently accessible to all INFN users.

The INFN national Grid will be integrated with European and worldwide similar infrastructures being established by ongoing parallel activities in all major European countries, in US and Japan.

The scale of the computational, storage and networking capacity of the prototype INFN-Grid will be determined by the needs of the LHC experiments. These include the experimental activities for physics, trigger and detector studies and the run of applications at a sufficient scale to test the scalability of the possible computing solutions to very large amount of distributed data (PetaBytes), very large amount of CPU’s (thousands) and very large number of users.

The development of the new components of Grid technology will be done by the INFN-Grid project, whenever possible, in collaboration with international partners through specific European or international projects. A proposal for the first of such kind of projects, DATAGRID,^[6] has been submitted the 8^th of May for the European Union IST program EU-RN2, asking for 10 M€ funding and has received a very positive evaluation by referees.

The 20^th of July the European Union IST direction has formally invited the project to negotiate the final contract along the lines indicated in the proposal : 3 years development plan and ~ 10M€ funding.

The MONARC^[7] Phase-3 project current activities are another important component of the INFN-Grid project, as they will provide guidance for the developments of the computing models of the LHC experiments. The INFN-Grid will investigate the current ideas for the computing models, based on the MONARC proposed hierarchical architecture of Tier1-TierN Regional Centers. The investigation will be performed using real applications on real Center prototypes, fulfilling at the same time the real computing needs of the experiments.

The project will encourage the diffusion of the Grid technologies in other Italian scientific research sectors (ESA/ESRIN, CNR, Universities), addressing the following points:

develop collaborations with those Italian scientific partners to address problems which are common to the research sectors;

promote the integration of the research computing resources into one national research Grid.

Project Overview

The INFN-Grid Project is the framework for efficiently connecting:

The developments of the middleware performed by participants to the DATAGrid European project.

The extra developments needed for INFN specific purposes

The setting up of the prototypes of the Italian part of the distributed computing system for the 4 LHC experiments Alice, Atlas , Cms, Lhc-b and Virgo. The most relevant components of this system are the prototypes of the Regional Centers for the 4 LHC experiments and Virgo.

The step by step testing of the middleware with the "real-life" application of the experiments, which is needed in order to make sure that the middleware will truly respond to the practical requirements of the experiments. The only possible testbeds for such testing activities are the prototypes mentioned above. An extension of the middleware tests will be provided by APE.

further extensions of the middleware tests provided by currently running or ready to run experiments such as Compass and Argo.

The current DATAGrid proposal submitted by May the 8th is only requiring EU funds for the middleware development. The "testbed" and "HEP application" aspects are taken care of in two separate Workpackages. These workpackages require no EU funds for hardware and are just intended to provide for the human resources needed for the integration within the collaboration-wide activities. Further DATAGrid proposals are currently being considered for supporting the full Grid integration of the HEP applications and the full deployment of the Grid distributed LHC computing system. The INFN-Grid will be the natural framework for the Italian participation to any such project, and will provide for the h/w needed and for the extra manpower required for the setting up and running of the local prototypes and physics applications.

In the following chapters a detailed description of the work plan and needed resources will be given. Whenever necessary a clear distinction will be done between INFN and European Grid project, in order to make clear budget and resources separation.

The LHC Collaborations and Grid project

The LHC collaborations have stated they strongly support the Grid project. They recognize that the functionality promised by the Grid middleware are needed for deploying the efficient system of distributed computing that they are planning for and which is assumed in their Computing Technical Proposals.

There is a general consensus, of LHC experiments, on the advantages of a common project where the prototyping of each experiment computing architecture is discussed within a common framework. This concept has been expressed at the Computing Model Panel of the LHC Computing Review. The Italian LHC groups share the same opinion.

Common projects are also addressed by the MONARC collaboration (Phase III), discussing possible computing scenarios and aiming at providing detailed simulation of regional centres architectures.

There is also general consensus that all the members of the collaboration will be allowed to use the Grid tools for running the applications of the experiments, irrespective if their country is participating to DATAGrid or not. Of course all the Italian members of LHC collaborations will be granted full access to the Grid tools, Italian RC prototypes and local Grid infrastructure, regardless if they are subscribing or not to this INFN-Grid project.

Building the INFN-Grid in the context of an European and World-wide Grid

The term "Grid" refers to distributed, high performance/throughput computing and data handling infrastructure that incorporates geographically and organizationally dispersed, heterogeneous resources that are persistent and supported. A computing and data Grid is based on a set of services for obtaining information about Grid components, locating and scheduling resources, communicating, accessing code and data, measuring performance, authenticating users and resources, ensuring the privacy of communications, and so forth^[1][6]. The Grid services define the Grid architecture and the way the applications will use the Grid. How to proceed in building the Grid? What is involved?

The Grid users can be classified in the following way:

Class	purpose	Make use of	concerns
End user	Solve problems	Applications	Transparency, performance
Application developers	Develop applications	Programming models and tools	Ease of use, performance
Tools developers	Develop tools Programming models	Grid services	Adaptivity, exposure of performance, security
Grid developers	Provide basic Grid services	Local system services	Local simplicity, connectivity, security
System administration	Manage Grid resources	Management tools	Balancing local and global concerns

To address the above questions we have to consider the computing requirements of the new experiments , the existing applications willing to run on the Grid, the scalability in terms of resources, data and users and network connectivity and services. The focus of HEP Grid is on the huge quantity of data and the large number of users geographically distributed. Computational infrastructure, like other infrastructure, is fractal or self-similar at different scales but what is good at local scale is probably not appropriate at wide area level. Therefore scale is one of the major drivers for the technique used to implement HEP Grid services.

We also have to consider that the application developers will choose programming models based on a wide spectrum of programming paradigm: Java, CORBA, Grid-enabled model and others possibly available in the future. All these applications will have to interface the Grid services in different ways.

There is much discussion on what programming model is appropriate for a Grid environment, although it is clear that many models will be used. One approach to define services and programming models is to start from HEP use-cases, analyze them in terms of computing needs, data and network access, and identify Grid services and tools and design the appropriate testbed to test prototypes.

Grid architecture

The models proposed so far for Grid structure, integrating resources (computing, data, network, instruments..), services and applications are based on network schema, aiming to be as simple as possible, supporting broad deployment and focused on application performance.

Due to the strong requirement of the INFN-Grid to be finalized to the LHC and other new experiment (Virgo, ….) computing prototyping, we first make some general observations regarding goals and principles:

Support HEP applications (high throughput)
Enable sharing at multiple levels in the infrastructure avoiding redundancy, promote interoperation, and maximize opportunities of reuse.
Address the specific requirements of high throughput applications but also those of high performance applications wherever necessary, either via the provision of alternative services and interfaces or via the definition of translucent interfaces that allow specialized control of a single service.
Leverage and contribute to standard wherever possible, i.e. use standards when they exist.

The components of the Grid architecture can be represented in Table 2.1.

Table .1 - Grid Architecture Components

Applications	Batch traditional applications	Client/Server batch applications	Client/Server Interactive applications	Grid enabled applications
Application oriented Grid Services/ Grid application toolkits	HTB (High Throughput Broker) Data mover	Monitoring tools Real-time workload management	HP Scheduler HP data access	Remote data access, remote computing , remote instrumentation, Application monitoring
Grid Common Services (Resource and application independent services)	Common Information Base (MDS),	Authentication, authorization, policy (GSI)	Global Resource Monitoring	GRAM (Globus Resource Allocation Manager)
Grid Fabric/local Resources	Computing resources local schedulers, site accounting …..	Network services	Data storage	Instrumentation

Grid Fabric includes resources-specific and site-specific mechanism which enable or facilitate the creation of higher-level distributed Grid services. They might include:

CPU scheduler, site accounting …. These services will be covered by EU-Project WP 4 Fabric management.

Network quality of service support in router and end system, transport protocols… this services will be covered by WP5 Network Services.

Grid fabric includes resource management interfaces supporting advance reservation, allocation, monitoring …. As requested by WP1 Workload Management, these Grid fabric mechanisms serve to Grid-enable basic resources to allow the coordinated allocation of computers, networks and storage systems.

Grid Common Services or middleware provide resource-independent and application-independent functionality. They might include Information Service, Authentication, Authorization, resource location and allocation …….WP1 of this project titled "Evaluation of Globus Toolkit" will cover this Grid component.

Application-oriented Grid Services or toolkit provide more specialized services and components for various application classes. Almost all these services are included in the EU-Project:

WP2.1 includes most of them: High Throughput Broker, Real time and scheduled workload management, co-reservation and co-allocation of multiple resources….

WP2.2 Data Management data access and data mover.

WP2.3 Grid Monitoring Services covers all the monitoring issue.

Applications: as already said HEP application will be implemented following different programming models; some of them already exist , other have defined specific data models and will be based on specific databases (ROOT or Objectivity…). Programming models could be based on Java, CORBA or other commercial technologies. Finally specific Grid aware applications will be implemented in term of various application toolkit components, Grid services and Grid fabric mechanism.

That said, from the analysis of the use cases different classes of applications could emerge: traditional applications, client/server batch applications, client/server interactive applications and finally Grid-enabled applications. For all these applications suitable Grid services are to be identified in order to run efficiently on the Grid. The present proposal should put special emphasis on this issue through the analysis of use cases in order to model Grid services on the requirements of different types of applications, in order to complement what is done within the DataGrid project.

Grid services exploitation

Since one of the most important goal of the project is to develop prototypes for applications and computing models (regional centers) of LHC and other new experiments, Grid services and Grid programming models will be investigated starting from experiment’s use cases. This approach has also the objective to exploit the development of the project to offer efficient and transparent access to distributed data and computing facilities to the large and geographically distributed physics community.

The work should flow through the following steps:

Experiments identify the use-cases.

Use cases are analyzed to define specifications of the Grid services.

Prototype development will be scheduled on WP’s deliverables

Grid services will be validated through use-case application-prototypes experimented on the most realistic Grid testbed.

Technical work must be organized between application-users, application-developers and Grid services developers.

Each or few use cases must have an application’s user that is responsible of the application-prototypes and the validation process has to be carry out by a specified group consisting of one application-developer and one developer of each Grid-service involved.

WP Testbed will take care of the coordination of the prototype and Grid services test phase.

INFN-Grid additions to the tools and services developed in DataGrid

The INFN-Grid project will be planned in order to fully integrate services and tools developed in DATAGrid in order to meet the requirements of INFN Experiments and to harmonize its activities with them. In addition to that INFN-Grid will develop specific workplans concerning:

Validating and adapting Grid basic services (like the ones provided by GLOBUS) on top of which specific application oriented services (middleware) will be developed.

Addressing INFN computing needs, not included in DATAGrid, some of them synchronized with experiment computing schedule. This will bring to develop intermediate releases of Grid services optimizing the specific application computing.

National testbed deployment characterized by the INFN collaborations requirements mostly in terms of computing, data and network capacity, and computing model. The national testbed will interconnect the other national testbeds through the European Grid providing altogether the necessary size (data, computing, network and users) to test and validate middleware, application prototyping and the development of LHC experiment computing models.

Testbed activities will include the workload management for individual analysis tasks, the data management in the distributed Italian scenario, the monitoring of the INFN resources, the standardization of INFN Farms distributed in different INFN sites and mass storage tests suited for the INFN site distributed activities. The network layout is foreseen to be provided by GARR in collaboration with the present project.

The role of INFN-Grid in the INFN Computing

The INFN-Grid project will provide the middleware and the testbeds for running the physics applications of the large experiments which foresee the distributed analysis of huge amounts of data: not only LHC experiments, but also VIRGO and APE fall in this category and are fully participating to the project. Of course all the INFN experiments that are in the conditions of taking benefit of this INFN-Grid project are welcome and will be encouraged in joining for the specific activities of their interest: ARGO and COMPASS have already expressed their interest.

The project will also be beneficial for the other INFN computing activities. In fact some of the most innovative techniques under evaluation for the Grid middleware were already considered by the INFN projects developed within the framework of Computing Committee (CC). Large synergies will be possible between INFN IT Services and Grid related activities: the internal developments will profit of state of the art technology evaluated and selected by large international collaborations that include top level computing scientists also in US and Japan.

Example of activities where large synergies are possible include:

Study of solutions for Information Service systems. A Directory Service based on LDAP protocol is currently investigated by a CC project to provide fast access to information concerning INFN people, mail addresses, etc. and by GLOBUS to make publicly available the static and dynamic status of Grid node resources.

CONDOR as a tool for distributed computing. CONDOR is already used in INFN for supporting distributed simulation activities and it is also a component of GLOBUS

Networking: this is a domain where the expertise of CC and CNAF can be effectively confronted and enhanced in the common activities with other European partners to plan, integrate, run and monitor the HEP applications in the Grid European testbed.

Security: the traditional expertise of the CC group can be used to enable an INFN certification authority to issue certificates that can be recognized worldwide by the Grid community.

Farms standardization: the ongoing debate and experimentation in INFN will profit of the international forum and expertise since this is one of the major developments foresee in the European project.

The role of INFN-Grid for the Computing of LHC Experiments

The needs of the LHC experiments for computing infrastructures (h/w and s/w) in the years 2001-2003 will be almost fully requested in the framework of the INFN-Grid project. The successful development of the Grid tools will be verified step by step using the widest possible variety of physics applications of the experiments, which will be run in a production style, aiming both at producing the physics results the experiment want and at using as much as possible the available middleware. Thus in principle all the computing activities of the experiments will be connected to the Grid which will finally provide the infrastructure for the distributed running of all of them.

Only the developments of detector related s/w and of the algorithmic parts of the experiment s/w will stay separate from the Grid project. The development of CORE s/w will instead have many points of contact with the Grid developments.

Whence we propose all the computing h/w of the experiments except the one related to standard s/w development and to on-line on-site applications to be funded in the framework of the INFN-Grid project for the years 2001-2003. Some online activities will also be covered by INFN-Grid (typically remote monitoring).

The activities performed in the framework of the MONARC project for its Phase-3 are also naturally falling within INFN-Grid, given their character of common project of the LHC experiments aimed at the definition of the common aspects of their computing models. The MONARC simulation program could provide a valuable tool for complementing the prototypes in planning for the LHC Grid implementation.

The computing people of the LHC experiments involved in the INFN-Grid project are working at the Grid since the Grid is the instrument they are using for implementing the computing application of their experiments. Therefore their contribution to the INFN-Grid project is to be counted as fully contained within the fraction of their time devoted to their LHC experiment.

Only a fraction of the end-users and of the application developers (see 2.2) of the LHC experiments are members of this INFN-Grid project; this fraction works in INFN-Grid with the purpose of granting the Grid infrastructure is developed according to the needs of their experiment, of assuring the integration in Grid of the applications of the experiment and of contributing to the design of the LHC computing system.

LHC Regional Center Prototypes, activities foreseen for 2001-2003 and resources needed

The Computing Model that LHC Experiments are developing is based on MONARC studies, as well as on their own evaluations. The key elements of the architecture are based on an hierarchy of computing resources and functionalities (Tier0, Tier1, Tier2, etc.), on an hierarchy of data samples (RAW, ESD, AOD, TAG, DPD) and on an hierarchy of applications (reconstruction, re-processing, selection, analysis) both for real and simulated data.

The hierarchy of applications requires a policy of authorization, authentication and restriction based again on a hierarchy of entities like the "Collaboration", the "Analysis Groups" and the "final users".

The LHC Experiments Computing Design requires a distribution of the applications and of the data among the distributed resources. In this respect the correlation with Grid initiatives is clearly very strong. Grid-like applications are such applications that are run over the wide area network, i.e. include Regional Centres (Tier-n) using several computing and data resources with network links over the wide area network where each of the single computing resources can itself be a network of computers.

Within this framework the Computing Project is seen as a detector component, whose goal is to provide the efficient means for the analysis and potential physics discovery to all the LHC Physicists.

In the discussions, within the Computing Model Panel of the LHC Computing Review, all LHC Collaborations agreed that is mandatory to reach, by 2003, a prototyping scale of the order of 10% of the final size foreseen in operation by 2006. To reduce the prototype scale could be a serious risk for the system. The experience has clearly shown that scaling by an order of magnitude can raise up unexpected problems with the system incapable to fulfil its requirements. This would represent a disaster for the LHC Physics program.

All Italian LHC groups have therefore as target a prototype size, for 2003, about 10% of the 2006 one. At the same time each group has an activity program, for the next three years, which will involve considerable resources for simulation, reconstruction and analysis of the simulated data. The prototype will provide these resources.

The motivation for the testbed-planned capacities is therefore twofold:

to perform a system test with real tasks in realistic size;

to give an appropriate and cost effective answer to LHC computing needs in the next three years.

Moreover the simulation process, involving event generation, reconstruction and analysis will represent a very suitable environment to reproduce the system in its full operation activity.

In the following only a short description of Experiments’ activities and needs are given. Appendix A (Chap. 8) reports the detailed plans of the Experiments.

Throughout the document costs are evaluated in Euro and ML assuming an exchange rate 1K€ = 2ML.

ALICE

Based on the latest figures from ALICE simulation and reconstruction codes, it is estimated that the overall computing needs of ALICE are around 2100 KSPECint95 of CPU and 1600 TB of disk space. The global computing resources foreseen to be installed in Italy are 450 KSI95 of CPU and 400 TB of disk space. In spite of the fact that these numbers are affected by a very large error because neither the code nor the algorithms are in their final form we are confident that the figures presented are not far from the final ones. They are certainly adequate to estimate our computing needs in the next three years.

These numbers take into account that the contribution to the ALICE computing infrastructure will be shared mainly among France, Germany and Italy, where the Italian collaboration to ALICE in terms of people is close to the sum of France and Germany together.

The plan is to reach by 2003 a capacity of 45 KSI95 and 40 TB of disk (10% of the final size). These resources will allow performing the prototype tests at a realistic scale and, at the same time, will provide the adequate resources for the simulation and analysis tasks planned by the Italian groups of ALICE in the next three years.

Two different ALICE projects will be involved in the Grid activity. The first is connected with the Physics Data Challenges whose first milestone is the Physics Performance Report. This is a complete report on the physics capabilities and objectives of the ALICE detector that will be assessed thanks to a virtual experiment involving the simulation, reconstruction and analysis of a large sample of events. The first milestone for the Physics Performances Report when the data will have to be simulated and reconstructed is due by the end of 2001. The specific purpose in this case is to obtain the necessary information, from detector physics performances, to establish priorities in the physics program and in the experiment set-up. The Physics Data Challenges will be repeated with a larger number of events regularly to test the progress of the simulation and reconstruction software. The simulations will be devoted to study the detector Physics performances. Careful studies of the dielectron and the dimuon trigger are necessary for instance to fully understand and optimise their efficiency and rejection power. A large emphasis will be given to interactive distribute data analysis for which special codes will be developed and that it is expected to use a very large spectrum of the services offered by the Grid.

A second activity is linked with the ALICE Mass Storage project. In this context already two data challenges have been run, and third one is foreseen in the first quarter of 2001 involving quasi-online transmission of raw data to remote centres and remote reconstruction. Other data challenges aiming at higher data rates and more complicated data duplication schemas are planned at the rhythm of one per year. The foreseen milestones are as follows.

Physics Data Challenges:

October 2000 Production Version of AliRoot

July 2001 Production, reconstruction and analysis of 10⁶ Events.

End 2001 First Physics Performance Report.

End 2002 Production, reconstruction and analysis of 3 x 10⁶ Events.

End 2003 Production, reconstruction and analysis of 7 x 10⁶ Events.

Computing Data Challenges:

Beginning of year 2001: Alice Data Challenge III, with the involvement of the Regional Centres.

Further ADCs with increasing complexity are foreseen in the years 2002 (5%), 2003 (10%), 2004 (25%), 2005 (50%).

Current estimation of the CPU power to simulate a central event are around 2250 KSI95/s while to reconstruct it 90 KSI95/s are needed. The storage required for a simulated event before digitisation is 2.5 GB and the storage required for a reconstructed event is 4 MB.

In order to optimise the utilisation of the resources, signal and background events will be simulated separately and then combined, so that to produce a sample of 10⁶ events will require the full simulation of only 10³ central events. The reconstruction will be performed on the full data sample. The table below reports the CPU and storage needs to simulate and reconstruct the required number of events. In the table below it is assumed an amount of CPU that, if used for full year, at 100% efficiency, will provide the needed capacity. The corresponding amount of disk space is also indicated.

Table .2 - Resources needed for simulation and reconstruction in 2001-2003

ALICE Simulation

2001

2002

2003

Events

1.0E+06

2.0E+06

4.0E+06

Needed CPU (SI95)

(assuming 12 months time and 100% efficiency)

3000

6000

12000

Needed disk (TB)

6.5

13

26

Adding a 50% factor for the analysis and taking into account a 30% inefficiency related to the availability of the computers, we believe that the prototype scale proposed in the table below is just adequate for the needs of the foreseen simulation activity.

Table .3 - ALICE's Testbed capacity target

Capacity targets for the INFN Testbed ALICE
	units	2001	2002	2003	Total
CPU capacity	SI95	7,000	15,000	23,000	45,000
Disk capacity	TBytes	6	12	22	40
Tape storage – capacity	TBytes	12	24	44	80
Total cost	ML	1000	1020	1180	3200
Total cost	K€	500	510	590	1600

The consumable and manpower costs have not been evaluated yet.

The last column represents the integral achieved in year 2003.

The evaluation of the costs of the needed resources has been done according to the following table:

Year	1999	2000	2001	2002	2003	2004	2005
COST CPU (KL /SI95)	200	120	72	43,2	25,92	15,55	9,331
COST DISKS (KL/Gb)	118,4	76,96	50,02	32,51	21,13	13,73	8,929
COST TAPES (unit for 18 GB/h)			30	30	30			TOT
TOT. CPU (KSI95)			6,8	15	23			45
TOT. DISKS (TB)			6,4	12	22			40
TOT. TAPE CAPACITY (TB)			12	24	44
TOT. COST CPU (ML)			490	648	596	0	0	1734
TOT. COST DISKS (ML)			320	390	465	0	0	1175
TOT. COST TAPES (ML)			180	30	90			300
TOTAL COST (ML)			990	1068	1151	0	0	3209
TOTAL INTEGR. COST (ML)			990	2058	3209

ATLAS

The estimations currently being made in the ATLAS collaboration evaluate at ~1000 KSI95 the computing power needed outside the Tier0 in 2006. These estimations are of course affected by a large error, however they are the best ATLAS can provide at this time and are the ones which have been presented at the LHC Computing Review taking place in these days. The secondary storage needs of a Tier1 Regional Center are estimated ~200 TB in 2006.

The Milestones for ATLAS computing include:

Mock Data Challenge 1 (MDC1) in the first half of year 2002, testing the full new software chain on the scale of at least ~10⁶events (10⁶ coming out of Level3 trigger, thus for simulating the working of Level2, which is the minimal goal, a factor 10 more events have to be fully simulated), distributed as widely as possible and making use of as much Grid as possible. Different trigger condition will be simulated and a significant subset of the Physics TDR plots will be reproduced.

MDC2 in the first half of year 2003 on the scale of ~10⁸events, testing also the distributed analysis Grid procedures and involving all the Regional Center prototypes.

The activities of specific interest for the Italian ATLAS community from 2001 include:

Trigger studies for the muon system: a capacity of ~5000 SI95 and ~ 3TB disk should be in place already at start 2001 for allowing results to be available at mid 2001 for the trigger TDR.

GEANT4 simulation of QCD jets for studying the background to b/t physics and to the t identification: 10⁶ events to be simulated in 6 months time starting in mid-2001 will require ~2000 SI95 and ~2TB disk.

The detailed ATLAS milestones 2001-2003 for physics and trigger studies have still to be agreed by the collaboration (see ATLAS document in the Appendix): the activities of Italian groups in 2002-2003 will also depend from these agreements, the participation to MDC2 is however already decided.

The table below provides a first preliminary evaluation of the target capacity to be reached each year by the ATLAS Regional Center prototypes. It is assumed 80-90% of the total capacity will be located in the two Tier1 sites, while the rest will be allocated to 3-7 Tier3 sites. The breakdown of the CPU acquisitions between the different years could be somewhat revised subject to the final setting of the ATLAS computing Milestones

Table .4 - ATLAS Testbed capacity target

Capacity targets for the INFN Testbed ATLAS

units

2001

2002

2003

Total

CPU capacity

SI95

5,000

3,000

12,000

20,000

Disk capacity

TBytes

5

5

10

20

Tape storage (backups)

TBytes

10

10

20

40

Material Cost

ML

650

270

510

1430

Outsourcing

ML

200

300

600

1100

Total costs

ML

850

570

1110

2530

Total costs

K€

425

285

555

1265

The last column represents the integral achieved in year 2003.

ATLAS ask a funding to be granted in september 2000 of 123 ML for materials + ~30 ML outsourcing =~150 Ml. With this anticipation in 2000 the h/w costs in 2001 will go down to 540 Ml.

The ATLAS baseline model for implementing the Tier1 Regional Center relays on manpower outsourcing, for system management and operation, from available Consorzi di Calcolo: a first evaluation of the cost of this outsourcing is ~500Ml for the 3 years. The additional cost of housing is estimated ~200Ml for the 3 years. These ~700 Ml are estimated assuming the Tier1 prototypes are implemented dividing the capacity between two different sites: a single site option would allow a saving roughly estimated in ~300Ml over the three years. The consumable costs (electrical power etc.) for keeping this computing capacity up and running in the Tier1 for the full three years is estimated ~400Ml.

The total estimated cost for ATLAS in the 4 years 2000-2003 is ~2.6 Gl.

CMS

CMS has also adopted the twofold motivation of the testbed capacities:

A Regional Centres prototype of about 10% of the final (2006) System by year 2003.

A "real needs applications" approach via the High Level Trigger studies and Physical studies to be due by year 2003.

Milestones settled by CMS are related to both the above approaches:

Data challenge: ~1% in 2000 (ORCA), ~5% in 2002, ~20% in 2004.

HLT studies: 2000, 2001, and 2002.

2001: DAQ TDR

2002: SW/Computing TDR

2003: Physics TDR (~ 5% data challenge results and 10% scaling demonstration)

2004: Stress test (20% data challenge)

Some of the global CMS Collaboration resources expected by year 2006 are as follow:

Total CPU power of the order of 1.7x10⁶ SpecInt95.

Total disk storage larger than 700 TBytes.

The 10% realization by year 2003 of Tier-1 and Tier-2 Centres are in the plans of current INFN-Grid project for CMS Italy. To reach this goal CMS has adopted a slowly growing approach that consists to procure and put in operation some 20% in year 2001, 30% in year 2002 and 50% in year 2003 of the resources necessary to build the 10% final system prototype.

On the other hand, Italian CMS collaboration is strongly involved in the simulation and analysis of the necessary data to define the High Level Trigger algorithms and the Physical studies.

CMS is already exploiting now (2000) the simulation and reconstruction software for the off-line analysis of the Trigger and Physical channels. For the nearest time goals the deep study and optimization of the Trigger algorithms is the driving effort.

The rejection factor that the CMS Trigger has to provide against the background reactions is as large as 10⁶-10⁷. The High level Trigger (level2 and 3) have to provide a rejection factor of the order of 10³ in the shortest time possible and with the larger possible efficiency without depressing the searched signal.

The process of simulation and study has a planned schedule that will improve the statistical accuracy during the next three years. The number of events to be simulated increase of an order of magnitude during the three year project program, but the requested resources only increase of a factor two per year, taking into account the better efficiency and the learn management of the resources themselves.

CMS Italy (INFN) is strongly (and in many fields with a leading role) involved in this process of simulation and study, contributing with al least the 20% of the effort (both with computing resources and physics analysis competences).

Both the approaches to the prototyping phase lead to the same amount of resources needed. This scenario has the benefit of going to the prototype of Regional Centres (Tier-1 and Tier-2's) using the resources for "REAL" activities and physical results.

A preliminary deliverable plan for the two approaches can be summarized as follow:

Table .5 - CMS Prototype deliverables

2001

–Tier-n first implementation

–Network connection

–Main centers in the Grid

–HLT studies as of 106-107 events/channel

–Coordination of OO databases

–Network tests on "ad hoc" links and "data mover" tests

–All disk option storage

2002

–Tier1 dimension is doubled

–Tier0 Centre is connected (Network and first functionality's)

–Grid extension including the EU project

–Physics studies in preparation of the TDR and 5% data challenge for Computing TDR

–OO distributed databases (remote access)

–Performance Tests

–Mass storage studies

2003

–Infrastructure completion

–Functionality tests and CMS Integration

–Final studies for Physics TDR and preparation for 2004 data challenge

– Data challenge

–Full integration with CMS Analysis and Production

A preliminary table with the most demanding resources required to CMS Italy prototype and production activities follows:

Table .6 - CMS Testbed capacity target

Capacity target for the INFN Testbed CMS
Resources \ year	2001	2002	2003	Total
Total CPU (SI95)	8,000	8,000	16,000	32,000
Total Mass storage (TB)	12	28	40	80
LAN & WAN equipment (units)	7	6	7	20
Tape Library (units)	1	2	4	7

The last column represents the integral achieved in year 2003.

Preliminary cost estimates for the above resources are as following (it should be stressed that Personnel, consumables and WAN cost are NOT included in the table, as well as the local infrastructure support and network integration)

Table .7 - CMS Italy PRELIMINARY cost evaluations

Costs in Mlit	2001	2002	2003	Total
Processors	573,000	344,000	414,000	1,331,000
Disks	896,000	1,392,000	1,318,000	3,606,000
Tape Libraries	100,000	160,000	256,000	516,000
LAN	207,000	48,000	108,000	363,000
Tapes media	144,000	129,000	153,000	426,000
Total costs	1.920,000	2.073,000	2.249,000	6.242,000
Total costs in K€	960	1036	1125	3121

The last column represents the integral cost for years 2001-2003.

LHCb

The current baseline LHCb computing model is based on a distributed multi-tier regional centre model following that described by MONARC. We assume in this scenario that regional centres (Tier 1) have significant CPU resources and a capability to archive and serve large data sets for all production activities, both for analysis of real data and for production and analysis of simulated data. At present we also assume that the production centre will be responsible for all production processing phases i.e. for generation of data, and for the reconstruction and production analysis of these data.

The Italian regional centre requirements for supporting user analysis and simulation have been estimated to be of the order of 100kSI95 CPUs and 80 TB of disks. The plan is to reach by 2003, the order of 20kSI95 and 0.7 TB of disks. These resources will allow to perform the prototype tests on a significative size, continue on-going detector and low level trigger optimisation, and begin signal and background Monte Carlo event simulations for physical studies.

Table .8 - Capacity targets for the Testbed LHCb

LHCb Capacity Targets for the Testbed
	units	2001	2002	2003	Total
CPU capacity	Si95	2,300	4,500	13,500	20,300
Estd. Number of CPUs	#	50	61	114	225
Disk capacity	TBytes	0.25	0.09	0.38	0.72
Tape storage capacity	TBytes	0.10	0.06	0.12	0.28
WAN link to CERN	Mbits/s	155	622	622	1000
WAN link between Tier 1	Mbits/s	34	155	155	1000
WAN links between tier 1 and tier 2	Mbits/s	34	155	155	155
Total costs	ML	366	446	676	1488
Total costs	K€	183	223	338	744

The last column represents the integral achieved in the year 2003.

Other Experiments and Projects

Expressions of interest have been presented by several experiments and projects. All the experiments are obviously welcome to take part in the Grid activities, but in this proposal only a couple of "non-LHC" projects: APE and VIRGO are considered full main partners in the development of the Grid project.

Also ARGO and COMPASS have shown significant interest in the test-bed activity, offering the opportunity to add new requirements to the project. Persons from those experiments will take part to the project and inject their specific needs in the WP activities.

APE

A test case for the Grid project is the analysis of Lattice QCD data.

At the end of the year 2000, several APEmille installation will be up and running in several sites in Italy, Germany, France and the UK, for an integrated peak performance of about 1.5 Tflops.

These machines will be used for numerical simulations of QCD on the lattice, with the physics goals of an accurate analysis of the low-lying hadrons in full QCD, and an investigation of the phenomenology of the weak interaction of heavy quarks systems.

In the following years, a new generation of machines will gradually become available. A major player in this field will presumably be the new generation APE project, apeNEXT that will give an increase of a factor ten in performance, reaching the level of 5 to 10 Tflops. The coordinated use of several machines running on the same physical problem, will help accumulate large statistics made available to a wide community of users to perform independent investigations.

We shall then see the trend to have a large computer installation running Lattice QCD code as the analogous of a particle accelerator. In this analogy, the lattice field configurations are equivalent to the collision events, collected and analyzed by experiments. The relevant point for Grid is that the amount of data to be stored and analyzed is of comparable size as a large experiment. Typical figures for the database size are of 25 TByte in the year 2000-2003 for APEmille up to 1 PByte in the years 2003-2006 for apeNEXT.

The activities will than be that of operating a storage systems supporting such a database, presumably distributed on a small number of sites. A farm of processors will be set up to retrieve configurations from the databasebase and process them. This will also mean that a network infrastructure and the middle-ware to allow physicists to perform remote analysis will have to be developed. All these points may represent, within Grid, a Lattice QCD specific test-bed to be set up quite rapidly, based on a couple of APEmille sites with about 5 TByte of data customizing the relevant middle-ware (presumably Globus based) to allow Lattice QCD analysis.

This preliminary program should heavily leverage on the investment made on Grid-like techniques by our experimental colleagues and concentrate on the specific features of Lattice QCD analysis. The experience gained in this effort would be precious to sustain a smooth analysis of apeNEXT configurations as they become available.

VIRGO

The scientific goal of the Virgo project is the detection of gravitational wave in a frequency bandwidth from few Hertz up to 10 kHz. Virgo is an observatory that has to operate and take data continuously in time. The data production rate of the instrument is ~ 4 Mbyte/s including the control signals of the interferometers, the data from the environmental monitor system and the main interferometer signals. Raw data will be available for off-line processing from the Cascina archive and from a mirror archive in the computer center of the IN2P3 of Lyon (France). The reconstructed data h(t), plus a reduced data set of the monitoring system, define the so called reduced data set. The production rate of the reduced data is expected of the order of 200 kbyte/s.

The off line analysis cases that are most demanding from the point of view of computational cost and storage are the following:

the search of the binary coalescent signals

the search of the continuous gravitational wave signals

We propose to approach these problems in the framework of the Grid project.

The inspiraling binary search

The search for the inspiraling binary g.w. signal is based on the well established technique of the matched filter.

This method correlates the detector output with a bank of templates chosen to represent possible signals with a sufficient accuracy over the desired range of parameter space.

The computing load is distributed over different processors dealing with different families of template. Then, the output is collected in order to perform the statistical tests and extract the candidates.

The computational cost and the storage requirements are non linear functions of the coordinates of the parameter space so that the optimization of the sharing of the computation resources and the efficiency of the job fragmentation will be the crucial point of the test.

The goal of the test bed is the application of the analysis procedure on the data generated by the Virgo central interferometer. The output of this test is essential in order to implement on line triggers for the binary search signal.

The preliminary target is the search of signals generated by binary star system with a minimum mass of 0,25 Solar Masses at the 91% of Signal to Noise Ratio (SNR).

The pulsar search

The goal of the Grid test bed is to optimize the full analysis procedure on the data generated by the Virgo central interferometer. We plan to limit the search to the frequency interval 10 Hz - 1.25 kHz and the computation will cover 2 month of data taking.

Computational resources and links.

Let us now list the needs for the total computational power and storage to be hierarchically distributed in the Virgo INFN sections and laboratories of the collaboration and structured in the classes of computer centers

Tier 0 Virgo- Cascina;

(Tier 1 Lyon – IN2P3)

Tiers 2: Roma 1 and Napoli

Tiers 3 Perugia and Florence

Concerning the network links we notice that

the laboratory in Cascina has to become a node of the network backbone

we need dedicated bandwidths for the transmission of the reduced data set to the Tiers and to perform intensive distributed analysis

the international link should permit the reception and transmission of the reduced set of data to Europe, USA and Japan, and, at least in the future, the transfer of a relevant fraction of the raw data in France.

In conclusion, for test beds dedicated links connecting Virgo-Cascina, Roma1, Napoli, Perugia and Firenze must be foreseen.

Finally, we report in the Table the total capacity needed for the Virgo test beds at the end of 2001 and the final target for the off line analysis of Virgo.

Table .9 - VIRGO Testbed capacity target

Capacity targets for the INFN Testbed VIRGO
	units	end 2001	end 2003
CPU capacity	SI95	8,000	80,000
disk capacity	TBytes	10	100
disk I/O rate	GBytes/sec	5	5
sustained data rate	Mbytes/sec	250	250
WAN links to Cascina	Mbits/sec	155	2 ,500
WAN links to labs	Mbits/sec	34	622
WAN links to France	Mbits/sec		622
Total costs	ML	1,477
Total costs	K€	738

Workplan

The following chapters will describe the work packages details and the resources needed to accomplish the various tasks.

The general philosophy of the work plan is to take into account the following main activities:

Initial tests of the Globus Toolkit.

Interoperability with the European DataGrid Project.

Real use of the applications defined by the experiments.

National Testbed definition and deployment.

The first activity or work package is already started, will probably end-up after 6 months of tests and will inject in the second work package (at European level as well) the necessary information on how much of Globus Toolkit can be used and how many changes and improvements have to be done.

The second activity is fundamentally centered on the adaptation of the European work packages deliverables to the national infrastructure. This includes also all the eventually necessary modifications and additions which will be done for the INFN-Grid scope.

Requirements and Applications to be tested "on the Grid" will be delivered by the third activity which will start from the experiments wishes and needs and after having filtered them will inject the requirements in the second and fourth activity. Moreover this work package will provide the applications that will be tested on the Grid infrastructure.

The fourth activity is devoted to the definition of the national testbed and is also intended to take care of the physical implementation of such an infrastructure.

Work Package 1 – Installation and Evaluation of the Globus toolkit

General Description of the Package

In order to create and deploy usable Grids, a framework providing basic services and functionalities is necessary.

The toolkit provided by the Globus project has been identified as a possible candidate for this "software infrastructure" for several reasons:

The Globus "layered bag of services" model is suitable to support a wide variety of applications and programming models: developers of tools or applications can select only the specific tools they need

Services are distinct and have well-defined interfaces, therefore they can be integrated into applications or tools in an incremental fashion

The Globus toolkit has already been used with success in several projects of different disciplines. Other Grid projects (i.e. PPDG, NASA IPG) have chosen Globus as basic infrastructure

The current research activities of the Globus team (i.e. data Grids, advance reservation, end-to-end QoS, …) are considered of much interest for this project

Contacts with the Globus core team are already in place, and good and productive relations may be established.

The goal of this work package is to evaluate Globus as a distributed computing framework: Globus mechanisms will be evaluated for robustness, ease of use, effectiveness and completeness, in order to find which services can be useful for the middleware work packages of the INFN-Grid project and of the DataGrid project (the European Grid project).

Modifications and extensions needed to correct any Globus shortcomings will be identified.

Work in this work package will be coordinated with the similar activities that are going to be performed also by other members of the European Grid project (i.e. Cern), in order to share the acquired experience, to avoid the duplication of the same activities, etc…

Work plan

Activities in this work package will investigate the functionalities of the Globus toolkit, to evaluate which services can be used, what is necessary to integrate/modify, what is missing and therefore must be found somewhere else, or must be developed.

In particular Globus services for security, information, resource management, data migration, fault monitoring, execution environment management will be investigated.

Some activities have already been started, and there is already an experience with the Globus toolkit by some members contributing to this work package. Therefore it is foreseen that the described activities can be performed in a short period (6 months).

The results of this evaluation will be considered to plan and schedule the future work. For example, if the activities of this work package will prove that Globus can be used as a framework providing basic Grid functionalities, it will be necessary to widely deploy Globus in all INFN sites in order to build the infrastructure for the INFN-Grid, it could be necessary to modify and extend the software to correct the identified shortcomings, etc...

Task 1.1 Globus deployment

In order to widely deploy Globus in a successful way, it is very important to reduce the complexity for installation and maintenance, limiting the manpower required for these operations. Local administrators, responsible to install Globus on local resources, will be provided with specific tools, documentation, operational procedures, etc…

A group of "central" Globus administrators will implement and maintain a central software repository (for example using AFS): they will install Globus for all the considered platforms, test and install new software releases, install the required patches, develop tools and documentation to enable local administrators to quasi-automatically install the software on the local resources, etc…

Local administrators will only be responsible to deploy Globus on the local machines. These activities will require limited manpower, thanks to the tools and the documentation provided by the "central" team.

Task 1.2 Security

The Grid Security Infrastructure (GSI) service, focusing on authentication, will be evaluated.

The single sign-on mechanisms for all Grid resources provided by GSI will be tested between different administrative domains, each one using different security solutions, mechanisms and policies (such as one-time passwords and kerberos).

Activities for this task will also include the integration of some existing applications with the GSI authentication mechanisms, using the gss-api.

Since the Grid Security Infrastructure is based on public key technology, that relies upon certificates, a "local" Certification Authority responsible to issue these certificates must be implemented. The relations (i.e. CA hierarchy) between this Certification Authority and other Certification Authorities (for example those ones implemented by other partners of the DataGrid project) must be investigated, defined and implemented.

Task 1.3 Information Service

The Grid Information Service (GIS), formerly known as Metacomputing Directory Service (MDS) will be analyzed.

Using LDAP as standard protocol, GIS provides information on system components of the dynamic Grid environment (i.e. architecture type, amount of memory, load of a computer, network bandwidth and latency, …) in a common interface. This information can then be accessed and used by various components (for example by a resource broker, for resource discovery).

Activities for this task have already been started, in particular with the implementation of a national GIS server.

Tests on performance and scalability (increasing the number of agents that produce useful information and the number of components that access and use that information) will be performed. The related results will be used to decide and establish the most suitable architecture for reliable and redundant access to the GIS in the multi-site INFN environment (How many GIS servers have to be considered ? Where do these servers have to be installed ? How the information can be replicated ? …).

Tools and interfaces (in particular web user interfaces) addressed to Grid users and administrators must be developed, to support the various operations related to the Globus GIS service: browsing the information servers, adding, deleting, modifying information, etc…

Tests will also be performed on the possibility and suitability to generalize the GIS schema, integrating the default information with other required information (i.e. agents that automatically provide the GIS with specific information on resource characteristics and status, information on replicas of data, …).

Task 1.4 Resource Management

The GRAM (Globus Resource Allocation Management), that is the Globus service for resource allocation and process management, will be analyzed and evaluated.

In the Globus resource manager architecture, each GRAM is responsible for a set of resources operating under the same site-specific allocation policy, often implemented by an existing resource management system.

Activities in this task will be done in an incremental fashion, starting with simple scenarios.

As first step, the Globus functionalities for job submission on remote resources that don’t rely upon resource management systems will be evaluated.

The functionalities of GRAM in conjunction with different (local) resource management systems (in particular LSF and Condor) will be investigated, to evaluate if GRAM could be a possible uniform interface to different resource managers. Demonstrations showing submitting jobs via Globus which run on remote systems using different underlying resource managers, complete with file staging, forwarding of output, etc… without the need for direct human intervention will be performed.

For these tests the INFN Condor WAN pool will be interfaced to Globus as a "single" resource.

It will be necessary to evaluate how the resource requests expressed in terms of Globus RSL (Resource Specification Language) are translated in the resource description languages of the specific resource management systems considered.

It will also be necessary to evaluate if and how each local GRAM is able to provide the Grid Information Service with information on characteristics and status of the local resources.

Task 1.5 Data Access and Migration

The GASS (Global Access to Secondary Storage) APIs, that allow programs that use the C standard I/O library to open and subsequently read and write files located on remote computers, will be analyzed, considering the possible file access patterns. The capabilities, performance, ease of use of the GASS services will be evaluated and compared with other solutions, such as distributed file systems. Some existing applications will be adapted to use these APIs, to verify if GASS mechanisms can be a viable and suitable solution to access remote data.

Moreover, tests will be performed to evaluate the functionalities, performance and robustness of the Globus tools for data migration. In particular the GASS service and GSIFTP (an FTP which works with the Globus authentication system and allows high speed file transfers with fully adjustable TCP window sizes) will be evaluated in LAN and WAN environments, and compared with other tools and services, such as FTP.

Task 1.6 Fault Monitoring

The Globus HBM (HeartBeat Monitor) service, providing simple mechanisms for monitoring the health and status of a distributed set of processes, will be analyzed, evaluating its scalability, accuracy, completeness, overhead and flexibility.

This service for fault detection will be evaluated not only for controlling Globus system processes (as it has been originally designed), but it will also be analyzed if it is suitable for monitoring other processes implementing particular services (i.e. http server daemons) and also for application processes associated with "user" computations.

Using the provided API, specific data collectors will be implemented: in the beginning these data collectors should simply detect and notify about failures of processes that have identified themselves to the HBM. Fault recovery mechanisms, such as automatic restart of crashed daemon processes, will be implemented later.

Task 1.7 Execution Environment Management

Tools are needed to set up the environment in which code must actually execute: binaries have to be staged, runtime libraries have to be located and/or instantiated, environment variables and other state needed must be established, etc…

The Globus Executable Management (GEM) service will be tested, in order to evaluate whether it is able to provide some of these functionalities, required when a code migration strategy (the application is moved where the computation will be performed) must be considered.

Table .1 – List of the Deliverables for WP1

Deliverable	Description	Delivery Date
D1.1	Tools, documentation and operational procedures for Globus deployment	6 Months
D1.2	Final report on suitability of the Globus toolkit as basic Grid infrastructure	6 Months

Table .2– List of the Milestones for WP1

Milestone	Description	Date
M1.1	Basic development Grid infrastructure for the INFN-Grid	6 Months

Resources

Only the resources required for the described activities, that will last for the year 2000, are listed in the following tables. As described above, the resources necessary for the next years will be better identified after this first evaluation of the Globus toolkit.

Table .3 – Personnel Contribution to the WP1 Tasks (FTE)

Sede	Cognome	Nome	DataGrid	INFN-Grid	Esperimento
BARI	D'Amato	Maria		0.2	CMS
	ZZZ	Art. 15		0.5	IT
BOLOGNA	Semeria	Franco		0.2	IT
CATANIA	Barbera	Roberto		0.1	ALICE
	Cangiano	Ernesto		0.1	IT
	Rocca	Carlo		0.1	IT
CNAF	Fonti	Luigi		0.2	IT
	Giacomini	Francesco		0.1	IT
	Italiano	Alessandro		0.1	IT
	Vita Finzi	Giulia		0.4	IT
LECCE	Cafaro	Massimo		0.1	IT Università
	Depaolis	Lucio		0.1	IT Università
LNL	Biasotto	Massimo		0.1	IT
MILANO	Lo Biondo	Giuseppe		0.1	IT
	Prelz	Francesco		0.1	IT
PADOVA	Sgaravatto	Massimo		0.2	IT
PISA	Bagliesi	Giuseppe		0.3	CMS
	Controzzi	Andrea		0,5	CMS
	Donno	Flavia		0.3	IT
	Sciaba'	Andrea		0.3	CMS
	Xie	Zhen		0.3	CMS
ROMA	Anzellotti	Daniela		0.1	IT
	De Rossi	Marco		0.0	IT
	Majorana	Ettore		0.1	VIRGO
	Palomba	Cristiano		0.1	VIRGO
	Rossetti	Davide		0.1	APE
	Spanu	Alessandro		0.1	IT
	Valente	Enzo		0.1	CMS
TORINO	Anglano	Cosimo		0.1	IT Università
	Forte	Antonio		0.1	IT
	Gaido	Luciano		0.0	IT

Table .4 – Table of Materials for WP1 in K€ and ML

Site	Description	Year 2000 K€	Year 2000 ML
Bari	QUANTUM	7.5	15
Bologna	QUANTUM	7.5	15
Catania	QUANTUM	7.5	15
CNAF	QUANTUM	7.5	15
Ferrara	QUANTUM	7.5	15
Lecce	QUANTUM	7.5	15
LNL	QUANTUM	7.5	15
Milano	QUANTUM	7.5	15
Padova	QUANTUM	7.5	15
Pisa	QUANTUM	7.5	15
Roma1	QUANTUM	7.5	15
Torino	QUANTUM	7.5	15
Total		90 K€	180 ML

Table .5 – Table of Consumables for WP1

Site	Description	Year 2000 K€	Year 2000 ML
CNAF	License for Netscape LDAP Server – 1000 keys	5 K€	10 ML

Work Package 2 – Integration with DATAGrid and Adaptation to the National Grid Project

This package has to deal with the integration of the DATAGrid project developments and the necessary modifications to meet the Italian INFN needs and configuration. For this purpose the WP is split into sub packages.

Task 2.1 - Workload Management

General Description

The goal of this task is to define and implement a suitable architecture able to assign computing tasks to the available Grid resources, taking into account many factors, pertaining to the tasks themselves and to the available resources:

Type, size and destination of output data

Availability and suitability of dynamic and heterogeneous resources (CPUs, network, storage)

Knowledge of existing (previously submitted) tasks in the system

Global and local constraints (i.e. policies governing how resources dedicated to a particular experiment should be prioritized, policies governing when "external" users can access local resources, etc…)

For this purpose the middleware produced within the DataGrid project will be adapted and deployed, in order to build a reliable workload management system, able to meet the requirements of the future experiments.

Moreover, existing technology base developed under other Grid related projects (in particular Globus and Condor) will be integrated and extended, also profiting from the considerable experience with the use, deployment and support of these software systems (in particular it is worth to remark the many years experience with the Condor system, and the collaboration between INFN and the Condor Team of University of Wisconsin, Madison). These activities are strictly related with the DataGrid project, and will provide useful inputs and solutions for the research and development performed in this European project.

The man-months and personnel participation include INFN contribution to DATAGrid project in workload management activities.

Work plan

Activities in this task will start with the simplest scenarios that require limited and simple services, considering in the beginning simple implementation strategies. More complex environments will be considered in an incremental fashion.

Task 2.1.1 Workload management for Monte Carlo production

The production of simulated data is typically a scheduled task, and the characteristics and requirements of these jobs are known in advance: these are usually CPU bound applications, with a limited I/O rate. Monte Carlo production is typically performed by "traditional" applications that perform their I/O activities on the executing machine storage system. They usually just need as input small card files describing the characteristics of the events to be generated and geometry files describing the detector to be simulated, that can be staged in the executing machine without a significant cost.

Therefore for Monte Carlo production the goal is to build a system able to optimize just the usage of all available CPUs, maximizing the overall throughput. Code migration (moving the application where the processing will be performed) will be considered as implementation strategy.

For Monte Carlo production dedicated PC farms, distributed in different sites, will be considered, but it will also be investigated if it is suitable to exploit idle CPU cycles of general purpose workstations as well.

Bookkeeping, accounting and logging services must be implemented in this framework. Moreover, it must be possible to define policies and priorities on resource usage.

A simple language for submitting jobs, removing/suspending jobs, and monitor their state will be made available in this phase.

The workload management system for Monte Carlo production will be deployed considering elements of Condor and Globus Grid architectures (it is foreseen that this work will be coordinated with the research and development activities performed in the DataGrid project).

For what concerning Condor, some activities at hand are:

How Condor can be used for dedicated Grid resources (and not only to exploit idle CPU cycles of a collection of a separately owned and distributed computing resources)

How Condor’s ClassAds and matchmaker system could be used and augmented to implement a more general Grid matchmaking system.

About Globus, important issues to be investigated as WP1 follow-up are:

How to use and augment the Resource Specification Language (RSL) in order to be able to define all the information needed by the workload management system

Evaluate and in case modify some existing high throughput brokers, such as the High Throughput Broker developed by the Globus team

Design and implement a high throughput broker, if the existing ones are not able to provide all the necessary functionalities.

The interconnection of Globus and Condor must be investigated as well.

Tests on how a Globus client can submit jobs to a Condor pool and vice versa how a Condor machine can submit jobs to a Globus Grid are already on going.

It will also be investigated if Personal Condor can be a suitable broker for Globus resources.

A comparison of different local resource management systems (LSF, PBS, etc…) will also be performed.

Task 2.1.2 Workload management for data reconstruction and production analysis

Reconstruction of real/simulated data is the process where raw data are processed, and ESD (Event Summary Data) are produced.

Production analysis includes Event Selection (samples of ESD of most interest to a specific physics group are selected) and Physics Object Creation (the selected ESD data are processed, and the AOD, the Analysis Object Data, are created).

These are again scheduled activities, driven by the experiment and/or by physics groups that typically will be performed in the Tier 1 Regional Center possibly distributed in different sites.

For these kinds of applications the goal is again the maximization of throughput. Besides code migration, data migration (moving the data where the processing will be performed) must be considered as a possible implementation strategy: it could be profitable to move the data that must be process to remote farms, able to provide largest CPU resources, but it must be considered that the required data sets usually have a non negligible size, and therefore the cost for moving them must be taken into account.

Again, the services that must be considered in the workload management system for these applications are bookkeeping, accounting, logging, the possibility to define policies and priorities on resource usage.

Task 2.1.3 Workload management for individual physics analysis

These jobs process selected AOD data to prepare more refined private data samples together with analysis plots and histograms.

User analysis is typically a "chaotic", non-predictable activity, driven by the single physicist.

Moreover, the goal is typically the minimization of latency of the submitted jobs, instead of the maximization of the overall throughput.

The new HEP applications are based on Object Oriented programming paradigms, and Object Oriented client/server architectures, such as Corba/ORB, Java/RMI, Objectivity/DB, Espresso, Root, etc. These applications split data from CPU, and therefore it is not strictly necessary that the input/output data reside on the local storage system of the executing machine. Therefore for these client/server applications remote data access (accessing data remotely) must be considered as another possible implementation strategy, besides data migration and code migration. Network is an important resource and must be taken into account to deliver optimal workload, in order to minimize application elapsed time.

Advance reservation and co-allocation of resource mechanism and strategies will be studied and tested in collaboration with the Data Grid project.

Security mechanisms for authorization and authentication must also be considered as services to manage user analyses, besides bookkeeping, accounting, logging, and the possibility to define policies and priorities on resource usage.

Since the complexity and the details of the workload management system must be hidden to users, a "high level" graphical user interface (in particular a Web Interface) for task management, allowing physicists to easily perform the various activities, is required.

Table .6– List of the Deliverables for WP2.1

Deliverable	Description	Delivery Date
D2.1.1	Technical assessment about Globus and Condor, interactions and usage	5/2001
D2.1.2	First resource Broker implementation for high throughput applications	7/2001
D2.1.3	Comparison of different local resource managers (flexibility, scalability, performance)	10/2001
D2.1.4	Study of the three workload systems and implementation of the workload system for Monte Carlo production	12/2001

Table .7– List of the Milestones for WP2.1

Milestone	Description	Date
M2.1.1	Workload management system for Monte Carlo production	12/2001
M2.1.2	Workload management system for data reconstruction and production analysis	12/2002
M2.1.3	Workload management system for individual physics analysis	12/2003

Table..8 – Personnel Contribution to the WP2.1 Tasks

Sede	Cognome	Nome	DataGrid	INFN-Grid	Esperimento
BOLOGNA	Semeria	Franco	0.4	0.5	IT
CATANIA	Barbanera	Franco		0.2	IT
	Cavalieri	Salvatore	0.4	0.4	CMS
	Commis	Enrico	0.4	0.4	IT
	Lo Bello	Lucia	0.4	0.4	CMS
	Mirabella	Orazio	0.4	0.4	CMS
	Monforte	Salvatore	0.4	0.4	CMS
	Sassone	Vladimiro		0.2	IT
	Vita	Lorenzo	0.4	0.4	IT
CNAF	Ciancarini	Paolo	0.5	0.5	IT
	Ferrari	Tiziana		0,2	IT
	Fonti	Luigi	0.3	0.2	IT
	Giacomini	Francesco	1	0,7	IT
	Ghiselli	Antonia		0.3	IT
	Vistoli	Cristina	0.5	0.5	IT
FERRARA	Gambetti	Michele		0.2	IT
	Gianoli	Alberto		0.2	LHCb
	Luppi	Eleonora		0.1	IT
LECCE	Aloisio	Giovanni		0.2	IT Università
	Cafaro	Massimo	0.4	0.3	IT Università
	Campeggio	Salvatore		0.2	IT Università
	Depaolis	Lucio		0.1	IT Università
	Fasanelli	Enrico M.V.		0.2	IT
	ZZZ	Dottorato		1	INFN/ISUFI
MILANO	Prelz	Francesco	0.3	0.2	IT
PADOVA	Orlando	Salvatore	0.3	0.3	IT Università
	Sgaravatto	Massimo	0.4	0.4	IT
ROMA	Mirabelli	Giovanni	0.3	0.3	ATLAS
TORINO	Anglano	Cosimo	0.3	0.2	IT Università
	Donatelli	Susanna	0.3	0.3	IT Università
	Gaido	Luciano	0.3	0.3	IT
	Werbrouck	Albert	0.3	0.3	ALICE
	ZZZ	Tecnologo	0.6	0.9	IT
	ZZZ	Ass. Ricer.	0.5	1	IT

Materials for WP 2.1

There will be at least 3 PC connected via a Fast Ethernet switch for each site belonging to this WP. Since these equipments will be used for all the middleware WP (WP1, WP2.1 WP2.2, WP2.3) then it will be included in the testbed WP.

Consumables for WP 2.1

It will be necessary to buy enough licenses for LSF, to evaluate the performance of the different local resource manager. It is assumed that for each machine included in the testbed of the WP the same number of LSF software licenses will be available. It is assumed that, to do an extensive evaluation using farms with a high number of nodes, the necessary software licenses will be provided by the experiments or by the computing fabric WP.

Task 2.2 - Data Management

The Data Management task in the INFN-Grid project is strongly correlated to the DATAGrid WP2 on one side, and to the High-Energy Physics Applications (DATAGrid WP8) on the other.

Let's remind the tasks of the european WP:

2.1 Data Access and Migration

2.2 Replication

2.3 MetaData Management

2.4 Security and Transparent Access

2.5 Query Optimization and Access Pattern Management

2.6 Coordination

and the deliverables:

D1: A system capable of managing replicas and metadata at the file level

D2: A system capable of securely and transparently managing dataset queries and replicas

D3: Demonstration of an efficient production system

In the context of INFN-Grid the work should concentrate on data access in a situation where several sites hold data samples which may or may not be replicated. The experiments use cases will concern mostly simulated data, produced in Italy as well as elsewhere. Data will be stored on disk, either in Objectivity or with other systems.

Assuming that Data Replication and Migration is mainly handled by the European project and considering that, for Regional Centers prototyping, Data Access, MetaData definition and management are crucial issues.

We propose to define the following sub-tasks in INFN-Grid:

Task 2.2.1: Wide Area Data Location: assuming that data are resident in different sites, build a service which allows any user to know automatically in which site/storage and possibly file data are; such a service will relieve the user from the burden of consulting data catalogs before executing his jobs

Task 2.2.2: MetaData Definition and Mapping to Real Data: define the attributes which fully characterize a dataset in an abstract way and the methods of mapping the abstract description to the actual one; it should be noted that such a task, although implicitly present in the european project needs a localization onto the italian case, because the underlying storage layer might be completely different.

Task 2.3.3: Data Query Evaluation: build a service capable to evaluate in the national distributed context how and when a query can be satisfied, returning codes which should be captured by user programs and guide their completion.

Table .9– List of the Deliverables for WP2.2

Deliverable	Description	Delivery Date
D2.2.1	Definition of requirements for a Data Location Broker (DLB)	5/2001
D2.2.2	Definition of a metadata syntax	7/2001
D2.2.3	Prototype of a DLB using metadata	12/2001
D2.2.4	Replica Management at file level	12/2001

Table .10– List of the Milestones for WP2.2 year 2001

Milestone	Description	Date
M2.2.1	Capability to access replicated data at file level using a simple abstract description	12/2001

Table .11 – Personnel Contribution to the WP2.2 Task

Sede	Cognome	Nome	DataGrid	INFN-Grid	Esperimento
BARI	Silvestris	Lucia		0.3	CMS
	Zito	Giuseppe	0.3	0.5	CMS
PISA	Arezzini	Silvia		0.3	IT
	Controzzi	Andrea		0.5	CMS
	Donno	Flavia	0.5	0.2	IT
	Schifano	Fabio		0.2	APE
ROMA	Barone	Luciano M.	0.3	0.3	CMS
	Lonardo	Alessandro		0.3	APE
	Michelotti	Andrea		0.3	APE
	Organtini	Giovanni		0.2	CMS
	Rossetti	Davide	0.3	0.2	APE

Materials for WP 2.2

There will be at least 3 or 4 PC connected via a Fast Ethernet switch for each site belonging to this WP. Since this WP will be used for all the middleware WP (WP1, WP2.1 WP2.2, WP2.3) then it will be included in the testbed WP.

Consumables for WP 2.2

No specific SW license is envisaged for WP 2.2 in year 2001.

Task 2.3 – Grid Application Monitoring

General Description

The Application Monitoring task in the INFN-Grid project is strongly correlated to the DATAGrid WP3.

The purpose of this task is to deliver methodologies and mechanisms for monitoring applications in Grid environments. The possibility to monitor distributed computing components is a crucial issue for enabling high-performance distributed computing. Monitoring data are needed in such environments for several activities such as performance analysis, performance tuning, scheduling, fault detection, QoS application adaptation, etc….

Some key issue of a monitoring service in a Grid environment include:

scalability

discovery of data

validity of information

Scalability is very important because of the high number of resources and users working at the same time in the system: the intrusiveness of monitoring mechanisms should be limited and efficient coordination mechanism developed, otherwise resources could be consumed by the monitoring facilities themselves. With many resources involved by a single application, appropriate mechanisms to discover and understand the characteristics of the information that are available are needed. Finally, in many cases performance information are effective for a short lifetime, during which they can be used.

Monitoring services developed within the European Grid project will be adapted to specific needs of the future experiments. Moreover, existing technologies developed in other Grid related projects will be examined as well, in order to verify its integrability and usefulness in the project. Exploitation of recent agent technologies will also be explored, since their capability to cope with systems’ heterogeneity and to deploy user customized procedures on remote sites seems to be very adequate to Grid environments.

Work plan

Task 2.3.1 Analysis requirements and evaluation of current technology

A requirements analysis should be performed as first task to evaluate peculiar needs of Grid environments and in particular INFN ones. This work should also monitor other Grid related project, in order to understand the different views of the problem. Existing tools and technologies for monitoring will be analyzed with respect of the requirements aforementioned.

Task 2.3.2 Monitoring architecture design

An appropriate monitoring architecture will be defined for INFN, taking into account localized monitoring mechanisms able to collect information on different resources. Directory services will be exploited to enable location and/or access to information. The task will strongly rely on the development of similar services by the DataGrid project and in particular of:

measurement services

data collection

data repository services

data discovery

Task 2.3.3 Development of suitable tools for analysis of monitoring data and for presentation of results.

Starting from raw data collected by the measurement service, appropriate algorithms, methodologies and mechanisms should be developed to build higher level informations. Effective tools for visualizing this data should be also developed, in order to quickly and effectively perform activities such as performance tuning, fault detection, etc.

Task 2.3.4 Integration with Grid environments

Integration with existing Grid environments will be performed, as well as extensive testing of usability and of effectiveness. Informations and data collected by this task could affect some implementation issues of previous tasks.

Table .12 – List of the Deliverables for WP2.3

Deliverable	Description	Delivery Date
D2.3.1	Requirements analysis report	6 Months
D2.3.2	Evaluation report on existing technology	12 Months
D2.3.3	Architectural design	18 months
D2.3.4	Detailed design report of core modules	24 Months
D2.3.5	Final implementation of the system and integration	36 Months

Table .13 – List of the Milestones for WP2.3 (for first year)

Milestone	Description	Date
M2.3.1	Evaluation of suitability for INFN of baseline architecture and technology proposed by DATAGrid	12 Months
M2.3.2	Demonstrate prototype tools and infrastructure in a INFN testbed Grid environment	24 Months
M2.3.3	Demonstration of monitoring services on a full scale INFN testbed	36 Months

Resources

Materials for WP 2.3

There will be at least 3 PC connected via a Fast Ethernet switch for each site belonging to this WP. Since this WP will be used for all the middleware WP (WP1, WP2.1 WP2.2, WP2.3) then it will be included in the testbed WP.

Consumables for WP 2.3

No specific SW license is envisaged for WP 2.3 in year 2001.

Table .14 – Personnel Contribution to the WP2.3 Tasks (FTE)

Sede	Cognome	Nome	DataGrid	INFN-Grid	Esperimento
BOLOGNA	Galli	Domenico	0.3	0.3	LHCb
	Marconi	Umberto	0.3	0.3	LHCb
CATANIA	La Corte	Aurelio	0.4	0.4	IT
	Tomarchio	Orazio	0.4	0.4	IT

Not enough effort is available today inside INFN for all the work related to this task but actions have been taken to find the missing manpower.

Task 2.4 - Computing Fabric & Mass Storage

Commodity components like pcs and switched base LANs are mature to form inexpensive and powerful computing fabrics and they are almost ready to be used in real production environments. Nevertheless basic questions concerning how to design a fabric of thousands nodes balancing computing power and efficient storage access or how to control and monitor the basic components of the system, are still open. Moreover there is now the need to integrate the fabric controls and the monitor systems into a Grid environment. This WP 2, task 4 of the INFN-Grid project wants to address problems like these adding a technological tracking task to follow and to test, with real use case, the evolution of the basic constituents of a fabric: microprocessor architectures, motherboard and IO busses, LANs and SANs (System Area Network), storage systems, etc.

The Computing Fabric Task has been then broken down in two subtasks:

Fabric Design

Fabric Management (DataGrid)

Crucial fabric design aspects include:

Overall architecture and fabric setup;

Interconnection Networks;

Communication protocols for high speed networks

Storage Systems

Microprocessor Technology

Effective local site management and problem tracing should include items like:

Configuration management and automatic software installation;

System monitoring;

Dynamic System Partition

Problem management.

Grid integration provide a means to publish all the information that the local management, monitor and problem tracing system, can collect.

INFN-Grid Task4 will develop a software toolkit and a design guide to produce, in a very effective way, efficient computing fabrics. The fabric management functions and the integration into a Grid environment will be developed in the DataGrid WP4 framework that is strongly related to the INFN-Grid Task.

Work Plan

Our work plan can be broken down in few subtasks.

Task 2.4.1: Requirements gathering from fabrics users

Task 2.4.2: Survey of existing fabric/cluster architectures and setup of small scale demonstrators to test different interconnection networks, communication technologies for high speed network fabrics, networked file systems

Task 2.4.3: Set up of a cluster of more than 100 nodes using an architecture as selected by the previous point with configuration and installation management support as provided by DataGrid.

Task 2.4.4: INFN toolkit to design and realize local Fabrics.

Task 2.4.5: Installation of the INFN fabrics.

Task 2.4.6: Integration of the full DataGrid fabric management software in the all INFN fabrics and in the INFN-Grid. Demonstration of the fabric command and monitor capability at the Grid level

Task 2.4.7: Integration with Tier 0.

Task 2.4.1 and 2.4.2 should start within the year 2000 in order to define the proper farm architecture before 2001. We plan to have the following small scale demonstrators:

NFS connection topology: a) disk servers based farm (Legnaro); b) Distributed disks based farm (CNAF)

Interconnection Networks: a) Gigabit Ethernet (Genova), b) Myrinet (Lecce), c) Infiniband (2001, Legnaro)

Storage Systems: Fibre Channel, SCSI, Ultra ATA and Serial ATA (Padova)

Table .15 – List of the Deliverables for WP2.4

Deliverable	Description	Delivery Date
D2.4.1	Requirements gathering from fabric users	3 Months
D2.4.2	Survey of existing fabric/cluster architectures, architectural issues to design a fabric of commodity pcs	6 + 6 Months
D2.4.3	INFN toolkit to design and realize local fabrics	12 Months
D2.4.4	Integration of the DataGrid fabric management software tools	24 Months
D2.4.5	Integration of the Fabric Management into the Grid environment	36 Months

Table .16 – List of the Milestones for WP2.4

Milestone	Description	Leader Site	Start Date	End Date
M2.4.1	NFS connection topology: disk server based	LEGNARO	9/2000	3/2001
M2.4.2	NFS connection topology: distributed disks	CNAF	9/2000	3/2001
M2.4.3	Microprocessor Technology: dual slim processors	CNAF	9/2000	3/2001
M2.4.4	Storage Systems: Fibre Channel - SCSI Ultra 160/320	PADOVA	9/2000	3/2001
M2.4.5	Storage Systems: Serial ATA	PADOVA	5/2001	7/2001
M2.4.6	Microprocessor Technology: IA64	PADOVA	11/2000	3/2001
M2.4.7	Interconnection Networks: Myrinet based	LECCE	9/2000	3/2001
M2.4.8	Interconnection Networks: Gigabit Ethernet and efficient communication protocols	GENOVA	9/2000	5/2001
M2.4.8	Setup of a cluster of more than 100 nodes using an architecture as selected by the previous points with configuration and installation management support as provided by DataGrid. See also CMS Regional Center Prototype	LEGNARO	1/2001	12/2001
M2.4.9	Interconnection Networks: Infiniband	LEGNARO	5/2001	12/2001
M2.4.10	Installation of the INFN fabrics (according to the selected architecture)	All	24 Months (2003)
M2.4.11	Test of integration of the INFN fabrics with Tier 0	All	30 Months (2003)

Resources

Table .17 – Personnel Contribution to the WP2 2.4 Task (%FTE)

Sede	Cognome	Nome	DataGrid	INFN-Grid	Esperimento
BOLOGNA	Mazzanti	Paolo		0.2	CMS
	Siroli	GianPiero		0.3	CMS
CATANIA	Cangiano	Ernesto		0.25	IT
	Rocca	Carlo		0.25	IT
	ZZZ	Tecnologo		0.25	IT
CNAF	Chierici	Andrea		0.3	IT
	Dell’Agnello	Luca	0,3	0,5	IT
	Giacomini	Francesco		0,2
	Matteuzzi	Pietro		0.3	IT
	Vistoli	Cristina		0,1	IT
	Zani	Stefano	0.3	0.3	IT
GENOVA	Chiola	Giovanni	0.5	0.5	IT Università
	Ciaccio	Giuseppe	0.5	0.5	IT Università
LECCE	Aloisio	Giovanni		0.2	IT Università
	Cafaro	Massimo		0.2	IT Università
	Campeggio	Salvatore		0.1	IT Università
	Depaolis	Lucio		0.2	IT Università
	Fasanelli	Enrico M.V.		0.2	IT
	ZZZ	Dottorato		0.3	ISUFI
LNL	Biasotto	Massimo	0.5	0.5	IT
	Gulmini	Michele	0.5	0.5	IT
	Maron	Gaetano	0.5	0.5	IT
PADOVA	Balsamo	Simonetta	0.4	0.4	IT Università
	Bellato	Marco		0.25	CMS
	Costa	Fulvia		0.3	IT
	Ferrari	Roberto		0.3	IT
	Michelotto	Michele	0.3	0.3	IT
	Saccarola	Ivo		0.3	IT
	Ventura	Sandro		0.25	CMS
ROMA	Anzellotti	Daniela		0.2	IT
	Battista	Claudia	0.4	0.4	IT
	De Rossi	Marco		0.1	IT
	Falciano	Speranza		0.2	ATLAS
	Marzano	Francesco		0.15	ATLAS
	Spanu	Alessandro		0.2	IT
	Valente	Enzo	0.3	0.2	CMS
TORINO	Forte	Antonio		0.2	IT

Table .18 – Table of Materials for WP2.4 in K€

Site	Description	2000	2001	2002	2003	Total
	Controllers RAID	5
LNL	4 PC IBA		25
	1 switch IBA		10
	adapter IBA to FC/SCSI		5
PD	4 server	15
	8 porte FC Switch	10
	4 FC Adapters	7.5
	1 Raid Array SCSI	7.5
	1 Raid Array FC	7.5
	4 dischi SCSI	4
	4 Controller Serial ATA		2.5
	4 pc IA64 Architecture		20
CNAF	10 dual slim processor	45
	1 Fast Ethernet Switch	5
	10 SCSI Disk 35 GB	10
LE	10 Dual Pentium III-800 MHz, 512 MB RAM	25
	1 Switch Myrinet SAN 16 Porte 2.5 Gbps	4
	10 Myrinet SAN/PCI NIC 2.5 Gbps	17.5
	10 Disks SCSI 72 GB	17.5
	10 SAN cable 10 Feet	2
	1 Rack	0.5
	1 Disk Server + Disks	5
	10 Dual IA-64-800 MHz, 512 MB RAM		50
	1 Rack		0.5
	1 Disk Server + Disks		7.5
GE	1 Switch Gigabit Ethernet a 12 porte 1000 Base T	15
	8 PC with EIDE disks	12
	8 Gigabit Ethernet 1000 Base T NICs	8
Totals		223	120.5

Table .19 – Table of Consumables for WP2.4 in K€

Site	Description	2000	2001	2002	2003	Total
LNL		1	2.5
PD		0.5	2
CNAF		0.5	1
LE		1	2.5
GE		2	4
Total		5	12

Work Package 3 – Applications and Integration

General Description of the Package

This package has to deal with the coordination among the Applications needs and the middleware, together with the implications of the DATAGrid project use cases.

Work plan

As already seen in the previous chapters (2.8 and 2.9) LHC Experiments with APE and VIRGO can be considered independent guinea pig users of the Grid.

Experiment/Project specific applications will be used on the Grid to validate the tools and to test the infrastructure and each experiment’s test can be considered as a separate task.

Task 3.1: ALICE

The Italian collaboration activities are based on the milestones already established for the entire collaboration and on the specific responsibilities of the Italian part already described previously in the par. 2.8.1.

Task 3.2: ATLAS

Atlas activities and milestones have been already described previously in the par. 2.8.2

Task 3.3: CMS

CMS activities and milestones have been already described previously in the par. 2.8.3

Task 3.4: LHCb

LHCb activities and milestones have been already described previously in the par. 2.8.4.

Task 3.5: APE

APE activities and milestones have been already described previously in the par. 2.9.1

Task 3.6: VIRGO

VIRGO activities and milestones have been already described previously in the par. 2.9.2

For more details see also Appendixes in chapter 8.

Table .20 – List of the Deliverables for WP3

Deliverable	Description	Delivery Date
D3.1	Use cases programs. Report on the interfacing activity of use case software to minimal Grid services in INFN.	12 Months
D3.2	Report on the results of Run #1 and requirements for the other WP’s.	24 Months
D3.3	Report on the results of Run #2. Final project report	36 Months

Table .21 – List of the Milestones for WP3

Milestone	Description	Date
M3.1	Development of use cases programs. Interface with existing Grid services in INFN	12 Months
M3.2	Run #1 executed (distributed analysis) and corresponding feed-back to the other WP’s	24 Months
M3.3	Run #2 executed including additional Grid functionality and extended to a large INFN user comunity	36 Months

Resources

Table .22 – Personnel Contribution to the WP3 Tasks (%FTE)

Sede	Cognome	Nome	DataGrid	INFN-Grid	Esperimento
BARI	Cea	Paolo		0.2	APE
	Cosmai	Leonardo		0.2	APE
	D'Amato	Maria		0.3	CMS
	Fini	Rosanna	0.3	0.3	ALICE
	Silvestris	Lucia	0.3	0.4	CMS
	ZZZ	Art. 15	0.5	0.5	IT
BOLOGNA	Anselmo	F.		0.25	ALICE
	Bellagamba	L.		0.25	ALICE
	Capiluppi	Paolo		0.4	CMS
	Cara Romeo	G.		0.2	ALICE
	Fanfani	Alessandra		0.5	CMS
	Grandi	Claudio	0.3	0.4	CMS
	Luvisetto	Marisa		0.3	ALICE
	Mazzanti	Paolo		0.3	CMS
	Rovelli	Tiziano		0.3	CMS
	Semprini Cesari	Nicola		0.3	LHCb
	Vagnoni	Vincenzo		0.2	LHCb
	Vecchi	Stefania		0.4	LHCb
CAGLIARI	DeFalco	Alessandro		0.3	ALICE
	Masoni	Alberto		0.2	ALICE
	Tocco	Luisanna		0.3	ALICE
	Usai	Gianluca		0.2	ALICE
CATANIA	Barbera	Roberto		0.4	ALICE
	Costa	Salvatore		0.3	CMS
	Palmeri	Armando		0.5	ALICE
	Tricomi	Alessia	0.3	0.4	CMS
	ZZZ			0.25	ALICE
COSENZA	Capua	Marcella		0.2	ATLAS
	LaRotonda	Laura		0.2	ATLAS
	Schioppa	Marco		0.2	ATLAS
FIRENZE	Bellucci	Leonardo		1	CMS
	Cuoco	Elena		0.3	VIRGO
	Fabroni	Leonardo		0.2	VIRGO
	Graziani	Giacomo		0.2	LHCb
	Lenzi	Michela		0.5	CMS
	Vetrano	Flavio		0.1	VIRGO
GENOVA	Becchi	Carlo Maria		0.1	APE
	Dameri	Mauro		0.4	ATLAS
	Osculati	Bianca		0.3	ATLAS
LECCE	Cataldi	Gabriella		0.2	ATLAS
	Gorini	Edoardo		0.2	ATLAS
	Martello	Daniele		0.2	ARGO
	Primavera	Margherita		0.2	ATLAS
	Surdo	Antonio		0.2	ARGO
LNL	Vannucci	Luigi		0.3	ALICE
MILANO	Destri	Claudio		0.1	APE
	Frezzotti	Roberto		0.2	APE
	Perini	Laura	0.3	0.3	ATLAS
	Ragusa	Francesco		0.2	ATLAS
	Resconi	Silvia	0.3	0.3	ATLAS
NAPOLI	Carlino	GianPaolo		0.4	ATLAS
	Catalanotti	Sergio		0.3	ARGO
	Della Volpe	Domenico		0.4	ATLAS
	Doria	Alessandra		0.4	ATLAS
	Merola	Leonardo		0.3	ATLAS
PADOVA	Coppola	Nicola		1	CMS
	Gasparini	Ugo	0.6	0.6	CMS
	Lacaprara	Stefano		0.5	CMS
	Lippi	Ivano	0.6	0.6	CMS
	Morando	Maurizio		0.3	ALICE
	Ronchese	Paolo		0.5	CMS
	Turrisi	Rosario		0.4	ALICE
	Vanini	Sara		0.7	CMS
PARMA	Alfieri	Roberto		0.2	APE
	Onofri	Enrico		0.2	APE
PAVIA	Conta	Claudio		0.2	ATLAS
	De Vecchi	Carlo		0.25	ATLAS
	Polesello	Giacomo		0.2	ATLAS
	Rimoldi	Adele		0.2	ATLAS
	Vercesi	Valerio		0.2	ATLAS
PERUGIA	Biasini	Maurizio		0.2	CMS
	Cattuto	Ciro		0.5	VIRGO
	Gammaitoni	Luca		0.2	VIRGO
	Lariccia	Paolo		0.4	CMS
	Punturo	Michele		0.3	VIRGO
	Santinelli	Roberto		0.5	CMS
	Servoli	Leonello		0.3	CMS
PISA	Bagliesi	Giuseppe	0.3	0.3	CMS
	Cella	Giancarlo		0.1	VIRGO
	Ciulli	Vitaliano	0.3	0.6	CMS
	Costanzo	Davide		0.5	ATLAS
	Del Prete	Tarcisio		0.2	ATLAS
	Ferrante	Isidoro		0.1	VIRGO
	Giassi	Alessandro	0.3	0.3	CMS
	Palla	Fabrizio	0.3	0.5	CMS
	Sciaba'	Andrea	0.3	0.2	CMS
	Vicerè	Andrea		0.1	VIRGO
	Xie	Zhen	0.4	0.5	CMS
ROMA	Barone	Luciano M.		0.3	CMS
	Diemoz	Marcella		0.3	CMS
	Falciano	Speranza		0.2	ATLAS
	Frasca	Sergio		0.3	VIRGO
	Longo	Egidio		0.3	CMS
	Luminari	Lamberto	0.3	0.3	ATLAS
	Nisati	Aleandro		0.25	ATLAS
	Palomba	Cristiano		0.2	VIRGO
	Ricci	Fulvio		0.3	VIRGO
	Santacesaria	Roberta		0.2	LHCb
ROMA2	Camarri	Paolo		0.3	ATLAS
	Di Ciaccio	Anna		0.3	ATLAS
	Guagnelli	Marco		0.3	APE
ROMA3	Bussino	Severino		0.2	ARGO
	Farilla	Ada		0.2	ATLAS
	Stanescu	Cristian		0.4	ARGO-ATLAS
SALERNO	Cifarelli	Luisa		0.1	ALICE
	D'Apolito	Carmen		0.2	ALICE
	Fusco Girard	Mario		0.2	ALICE
	Grella	Giuseppe		0.2	ALICE
	Guida	Michele		0.2	ALICE
	Quartieri	Joseph		0.2	ALICE
	Seganti	Alessio		0.25	ALICE
	Vicinanza	Domenco		0.25	ALICE
	Virgili	Tiziano		0.2	ALICE
TORINO	Amapane	Nicola		0.3	CMS
	Gallio	Mauro		0.3	ALICE
	Ramello	Luciano		0.3	ALICE
	Sitta	Mario		0.3	ALICE
	Solano	Ada		0.3	CMS
	Vitelli	Annalina		0.3	CMS
TRIESTE	Bradamante	Franco		0.1	COMPASS
	Fragiacomo	Enrico		0.5	ALICE
	Gobbo	Benigno		0.3	COMPASS
	Lamanna	Massimo		0.4	COMPASS
	Martin	Anna		0.2	COMPASS
	Piano	Stefano		0.3	ALICE
	Rui	Rinaldo		0.3	ALICE
UDINE	Cabras	Giuseppe		0.15	ATLAS
	DeAngelis	Alessandro		0.1	ATLAS

Work Package 4 – National Test Beds

Introduction

The INFN-Grid testbed will be based on the computing infrastructures set-up by the INFN community in the years 2001-2003 for the needs of the experiments which will exploit the Grid. This chapter describes in details the testbed infrastructures corresponding to the Tier1-Tier3 LHC experiments Regional Centers prototypes. It is agreed however that other experiments like APE and VIRGO, but also ARGO and COMPASS, will contribute to the INFN testbed and will take part to the tests of the Grid middleware. The INFN-Grid testbed will be connected to the European testbed and through it to the other national Grids participating to the European project. The project activities will provide a layout of the necessary size to test in real life the middleware, the application development, the development of the LHC experiment computing models and the necessary support to LHC experiment computing activities during this prototyping phase. The national testbed deployment will be synchronised with DataGrid project activities.

Testbed baselines

The baselines can be summarized in the following issues:

Data intensive computing: we plan to develop testbeds to address the problem of storage of a large amount of data because it is known that several aspects of the current distributed architecture do not scale at the level of petabytes.

Hierarchical Tier-n Regional Centers model as recommended by MONARC will be prototyped and tested.

Production oriented testbed: from the very beginning of this project the activities will be developed into distinct but parallel planes: straightforward demonstrator testbeds which, besides validating the models, will be used for prototyping and continuously prove the production capabilities of the overall system; advanced testbeds which stretch the system to its limits.

Real goals and time-scales: it should be considered that Test-bed Applications will be "real Experiment applications" needed for the foreseen Experiments' activities. By the end of year 2003, all the LHC Experiments have to have a "reasonable-scaled" (order of 10%) to proceed to final "full scale" implementation.

Integrated system: the project aims to develop methods to ensure the efficient utilization of the distributed resources , integrated in a common Grid, while maintaining the policies of the experiments.

Experiments' Grid requirements

Testbed deployment will be strongly driven by experiment requirements and their resource capacity. A brief description of some experiments follows.

ALICE Italy is planning to have one Tier-1 distributed between Torino, Bari and Bologna, with leading role for Torino, and four Tier-2, see appendix 8.1. ALICE Italy participates to the Grid project in a co-ordinated effort with the ALICE Collaboration and the contribution within the INFN project has the goal of testing Regional Centers prototypes with the Grid technology in a realistic scale and with real tasks. Test of PROOF (Parallel Root Facility) for remote, distributed analysis is one of the most important aims.

ATLAS Italy is planning for one Tier-1, possibly a Tier-2, plus several Tier-3. The Tier1-Tier2 prototyping phase will be based on two sites: Milano and Roma, with leading role for Roma, see appendix 8.2.

At the end of the prototyping phase the decision will be taken if Rome+Milan will constitute together a distributed Tier1 or if a more traditional model of one Tier1 and one Tier2 will be adopted. The study and implementation of an efficient model for Tier3 use and connectivity is an important goal of the prototyping phase. ATLAS-Italy is participating into both in EU (as part of the global ATLAS effort) and INFN-Grid project. Its activity will be focused on real-life prototypes providing feedback and granting the development gives useful results.

CMS Italy is planning to have one distributed Tier-1 RC with leading role for Legnaro and few Tier-2 and Tier3 RCs. Each national site will have a Tier-n dimensioned on its commitment and manpower, see appendix 8.3. The Tier-n architecture and topology will be decided after a prototypes study and test with a size of at least a 10% of the finale implementation. CMS Italy will focus computing effort on real simulations and analysis. Initial studies will involve the High Level Trigger algorithms evaluation, further studies will involve the simulation and analysis of particular Physics channels. Some of these activities will be used also for Grid testing.

LHCb-Italy is planning to have one Tier-1 RC and several Tier-3 RCs. The Tier-1 will be based on the site where will be more convenient to install the computer facility (likely chosen among CILEA/Milan, CINECA/Bologna and CASPUR/Rome), see appendix 8.5. LHCb-Italy is interested in participating into both in DATAGrid and INFN-Grid projects. LHCb-Italy will focus computing effort on real simulations and analysis, starting with detector and L1-trigger optimization (2000-2001) then with high level trigger algorithm studies (2002-2003) and with production of large samples of simulated events for physics studies (2004-2005). This on-going physics production work will be used, as far as is practicable, for testing development of the Grid computing infrastructure.

Distributed, Hierarchical Regional Centers Architecture

The envisaged hierarchy of resources may be summarised in terms of tiers of RC with five decreasing levels of complexity and capability. A possible scheme for LHC is:

Tier-0: CERN, acting also as a tier-1

Tier-1: large RC on national scale, expensive, multi-service

Tier-2: smaller RC, less expensive, mostly dedicated to analysis

Tier-3: institute workgroup servers, satellites of tier-2 and/or tier-1

Tier-4: individual desktops

A joint activity among all the WPs and in particular between ‘Computing Fabric’, ‘Applications’ and ‘Testbed’ will define Tier-x prototypes to be experimented as building blocks of the Grid. Relationships with international RC activities are encouraged, in particular with those within DATAGrid.

Since INFN is spread over many Italian Universities and therefore has a "naturally" distributed resources (both human and computational) architecture, It well matches the "Grid" idea.

To better exploit these well established synergy of efforts, INFN-Grid project will try prototyping of distributed (in a limited number of sites) Tier-1 functions and duties. This approach to the study seems a quite natural Grid way of looking, at least theoretically.

INFN knows that the success of such a trial is as much demanding as the "full" Grid project is; final choices (particularly for CMS and ALICE) will be strongly related to the results (and successes) of the present INFN-Grid Project (and of the World Grid effort). If WAN and LAN connections, via the Grid Middleware, will transparently show the computing resources to the final user, then a distributed Tier-1 Centre will be affordable.

MONARC activities in INFN-Grid

The MONARC Phase-3 activities are a coherent part of the INFN-Grid project. At the international level the MONARC project in its current Phase3, extending till next spring is working at the definition of a plan complementary to and linked with the US, EU and Japan Grid projects and contributing to keep active contacts between them. The confluence of MONARC in Grid also at the international level is also a possibility currently being studied.

The Simulation work is a field where the contribution of MONARC is going to be relevant: MONARC simulation tool and the expertise in developing and using it, will play a positive role in the design and optimization of the computing infrastructure, both local and distributed.

The configuration of prototypes build with » 100 CPU’s (a typical Tier2 prototype in this phase) is a field currently addressed in MONARC which is of interest also for building INFN-Grid.

Capacity targets for the Test-bed

In the previous chapters 2.8 and 2.9 and the capacity targets for LHC experiments (CMS, ATLAS, ALICE, LHCB) and for other experiments and projects (APE, Virgo, ARGO) are described. The ALICE capacity evaluations take into account that the computing will be done in France, Italy and Germany, whereas those of CMS and ATLAS take into account that also USA and UK (plus Japan in the ATLAS case) will contribute to the computing.

In this chapter we describe how the LHC experiment target resource are distributed in a hierarchical way between the different sites during this prototyping phase.

ALICE

ALICE Italy resources are allocated according to the plan to have one Tier-1 distributed between Torino, Bari and Bologna, with leading role for Torino, and four Tier-2, see appendix 8.1.

Table.23 - ALICE Preliminary distribution of resources

Anno 2001 ALICE-Grid	FTE CALC	FTE Grid	CPU (KSI95)	DISK (TB)	TAPE UNIT	Inv. Cpu +disk +tape	Inv. Switch	Inv. Cpu +disk +tape +switch
BA	2	1	1	1.2	1	162	6	168
BO	2.5	1	1.4	1.2	1	191	12	203
CA	4	1.4	0.7	0.6	1	110	6	116
CT	4.8	1.0	1	0.6	1	132	6	138
LNL		0.3						0
PD	1.5	0.3	0.4	0.6	0	59	6	65
RM								0
SA	2	1.8	0.7	0.6	1	110	6	116
TO	6	2.1	1.4	1.2	1	191	12	203
TS	2.2	1.1	0.2	0.4	0	43	0	43
TOTALE	25	10	6.8	6.4		999	54	1053

These costs are not additional with the respect to the ones evaluated in par. 2.8.1.

ATLAS

In the following table a preliminary sharing of the allocation between the two Tier1 and some candidates Tier3 is sketched. The final allocation of the yearly total to the sum of all Tier3 will be between 10 and 20%: the value and the site will be proposed each year by ATLAS-Italy on the basis of the needs and possibilities both of production and of experimentation.

The Tier3 proposal reported here for 2001 and following years is still preliminary, as well all the aspect of the proposed sharing for 2002. For 2003 the sharing will be decided after the first 2 years of experimentation and the possibility is contemplated of a very asymmetric increment (almost all on the leading site).

Table .24 - ATLAS Preliminary distribution of resources

*Year* Site	2001		2002		2003		Total
*Year* Site	CPU (SI95)	Disk (TB)	CPU (SI95)	Disk (TB)	CPU (SI95)	Disk (TB)	*CPU* (SI95)	Disk (TB)
Rome 1	1,800	1.9*	1,200	1.6
Milano	1,800	1.9	1,200	1.6	-	-
Napoli	400	0.4	0	0.4	-	-
Pavia	400	0.4	0	0.3	-	-
Roma 2	400	0.4	0	0.3	-	-
? (Tier3)	200 ?		200	0.4	-	-
? (Tier3)			400	0.4	-	-
Total	5,000	5	3,000	5	12,000	10	20,000	20

* out of these quantities 1000 Si95 and 1 TB are requested to be anticipated to the year 2000.

The cost of the h/w required for one Tier3 in 2001 is estimated in ~50Ml: these costs are not additional with respect to the ones evaluated in 2.8.2.

CMS

CMS Italy is planning to distribute prototyping and production activities over the INFN CMS Sections to exploit all the possible human resources and knowledge. In order to better try the capabilities and the efficiencies, beside of flexibility, the resources are spread in a hierarchical way, taking into account the CMS (and MONARC) architectural Computing Model and the local commitments to specific functional tests.

Table .25 - Preliminary proposed distribution of resources for CMS Italy.

*Year* Site	2001		2002		2003		Total
*Year* Site	CPU (SI95)	Disk (TB)	CPU (SI95)	Disk (TB)	CPU (SI95)	Disk (TB)	CPU (SI95)	Disk (TB)
LNL	3,100	5	3,100	16	6,800	19	13,000	40
Bari	700	1	700	2
Bologna	700	1	700	2	-
Padova	700	1	700	2	-
Pisa	700	1	700	2	-
Roma1	700	1	700	2	-
Catania	350	0,5	350	0,5	-
Firenze	350	0,5	350	0,5	-
Perugia	350	0,5	350	0,5	-
Torino	350	0,5	350	0,5	-
Total	8,000	12	8,000	28	16,000	40	32,000	80
Total integral	8,000	12	16,000	40	32,000	80	32,000	80

Year 2003 is today "unassigned"; it will be settled according to obtained results.

The evaluation of required resources for year 2001 is, of course, better defined.

These costs are not additional with respect to the ones evaluated in 2.8.3.

Table .26 - CMS Italy breakdown of resources and funding for year 2001 (Klit)

Site	CPU			Disk		Tape Library		LAN		Tapes Media		TOTAL
Site	(SI95)	cost	#	(TB)	cost	#	cost	Unit	cost	(TB)	cost	TOTAL
Legnaro	3,100	223,000	89	5.0	390,000	1	100,000	1	102,000	20	54,000	869,000
Bari	700	50,000	20	1.0	78,000	0	0	1	21,000	5	14,000	163,000
Bologna	700	50,000	20	1.0	78,000	0	0	1	21,000	5	14,000	163,000
Padova	700	50,000	20	1.0	78,000	0	0	1	21,000	5	14,000	163,000
Pisa	700	50,000	20	1.0	78,000	0	0	1	21,000	5	14,000	163,000
Roma1	700	50,000	20	1.0	78,000	0	0	1	21,000	5	14,000	163,000
Catania	350	25,000	10	0.5	29,000	0	0	0	0	2	5,000	59,000
Firenze	350	25,000	10	0.5	29,000	0	0	0	0	2	5,000	59,000
Perugia	350	25,000	10	0.5	29,000	0	0	0	0	2	5,000	59,000
Torino	350	25,000	10	0.5	29,000	0	0	0	0	2	5,000	59,000
Total	8,000	573,000	229	12.0	896,000	1	100,000	6	207,000	53	144,000	1,920,000

LHCb

LHCb is planning to concentrate the final Tier-1 regional centre and the Tier-1 prototype in only one site, hosted in a "consorzio di calcolo", to be chosen on a housing and manpower cheapness base.

LHCb Italy believe this choice will optimize their human resources exploitation, due to the absence of multiplication of installations, to the lack of organization and synchronization requests between sites, and to the convenience – in term of bandwidth – of remote control of concentrated computing resources with respect of geographical distribution of data and CPUs.

LHCb Italian groups believe that transparent remote control of concentrated resources by institutes will be allowed by Grid middleware.

Table .27- LHCb Preliminary distribution of resources

*Year* Site	2001		2002		2003		Total
*Year* Site	CPU (SI95)	Disk (TB)	CPU (SI95)	Disk (TB)	CPU (SI95)	Disk (TB)	CPU (SI95)	Disk (TB)
Unique Tier-1	2,300	0.25	4,500	0.09	13,500	0.38	20,300	0.72
Total	2,300	0.25	4,500	0.09	13,500	0.38	20,300	0.72

For the testbed contribution of APE and Virgo and other experiments see appendixes 8.

Layout Computing Model

Data servers and computing farms can be represented by a hierarchical client/server model. The logical layout of the multi client-server architecture for one LHC experiment is represented in the following figure.

In the configuration example below there are Data Server and Client computing farms (client/server model). Several client machines connect data servers through LAN and WAN links. (This will provide a direct comparison between LAN and WAN behaviour and evaluating network impact on application behaviour and efficiency). Users access Grid through desktop or Client machines. The INFN WAN Condor pool will be connected to the system Grid. Data are distributed through all the data servers; users from the desktop run jobs and resource managers must locate CPU-clients and data servers in order to process data in the more efficient way.

Figure 1 - Test-Bed Layout

Test-bed deployment

Test-bed deployment will go through different tasks coordinated by the test activity of the Work Packages and by the time scale of the experiment applications. Since the beginning the national testbed will be connected to the european tesbed as planned in DATAGrid project.

The INFN testbed will also take into account the computing needs of the different INFN experiments. The activities are divided in the following tasks:

Task 3.1: project activity requirements & planning. (month 0-3)

Task 3.2: Grid basic service deployment and network topology in place, including the connections foreseen in DATAGrid; small capacity required. (month 6-9)

Task 3.3: Grid middleware development and testing and network services configuration and testing (month 6-36)

Task 3.4: Testbed technologies and applications; progressively higher capacity required (month 6-36)

Task 3.5: global Grid testbed deployment & closing process (month 24-36)

Task 3.6: Testbed operation & process controlling (month 0-36)

INFN-Testbed layout

At the moment the global Grid production model is thought as a loosely coupled parallel Grids. An intersection of the all experiment layouts will be devoted for testing. Each released software will move from the ‘test Grid’ to the different prototyping Grids of the experiments, in order to test real experiment applications which have been progressively enabled to run with the Grid tools. During its progressively growing in order to match demonstration’s needs, the layout will change and move towards the production configurations of each experiment. The final testbed layout will provide the production computing Grid for all the involved experiments.

The network characteristics of the testbed will be defined in the Network WP, which will identify the network services required by the experiment applications running in a Grid environment.

Figure 2 - Test-Bed Evolution

Initially a network production environment will be used, and appropriate bandwidth for each site will be defined according to Grid activity.

Table .28 – List of the Deliverables for WP4

Deliverable	Description	Delivery Date
D4.1	Widely deployed testbed of Grid basic services	6 months
D4.2	Application 1 (traditional applications) testbed	12 months
D4.3	Application 2 (scheduled client/server applications) testbed	18 months
	Application n (chaotic client/server applications) testbed	30 months
D4.4	Grid fabric for distributed Tier 1	24 months
D4.5	Grid fabric for all Tier hierarchy	36 months

Table .29 – List of the Milestones for WP4

Milestone	Description	Date
M4.1	Testbeds topology	6 Months
M4.2	Tier1, Tier2&3 computing model experimentation	12 Months
M4.3	Grids prototypes	18Months
M4.4	Scalability tests	24 Months
M4.5	Validation tests	36 Months

Table .30 - Personnel Resorces of WP4

Sede	Cognome	Nome	DataGrid	INFN-Grid	Esperimento
BARI	Castellano	Marcello		0.6	ALICE
	Di Bari	Domenico		0.4	ALICE
	Maggi	Giorgio		0.2	CMS
	Magno	Emanuele		0.2	IT
	Manzari	Vito		0.3	ALICE
	Natali	Sergio		0.2	CMS
	Piscitelli	Giacomo		0.4	ALICE
BOLOGNA	Bortolotti	Daniela		0.3	IT
	Capiluppi	Paolo	0.3	0.5	CMS
	Galli	Domenico	0.3	0.4	LHCb
	Giacomelli	Roberto		0.4	CMS
	Grandi	Claudio		0.4	CMS
	Marconi	Umberto	0.3	0.4	LHCb
	Siroli	GianPiero		0.4	CMS
	Vagnoni	Vincenzo		0.3	LHCb
CAGLIARI	Masoni	Alberto	0.4	0.4	ALICE
	Bonivento	Walter		0.2	LHCb
CATANIA	Belluomo	Patrizia	0.3	0.5	IT
	Cangiano	Ernesto		0.15	IT
	Costa	Salvatore		0.5	CMS
	Rocca	Carlo		0.15	IT
	ZZZ	Tecnologo		0.25	IT
CNAF	Ghiselli	Antonia	0.3	0.3	IT
FERRARA	Gambetti	Michele		0.1	IT
	Gianoli	Alberto		0.3	LHCb
	Luppi	Eleonora		0.1	IT
FIRENZE	D'Alessandro	Raffaello		0.2	CMS
	Dominici	Piero		0.2	VIRGO
	Fabroni	Leonardo		0.1	VIRGO
	Graziani	Giacomo		0.2	LHCb
	Passaleva	Giovanni		0.2	LHCb
GENOVA	Brunengo	Alessandro		0.3	IT
LECCE	Fasanelli	Enrico M.V.		0.2	IT
	Perrino	Roberto		0.2	ATLAS
LNL	Berti	Luciano	0.3	0.3	IT
	Biasotto	Massimo	0.5	0.4	IT
	Toniolo	Nicola	0.3	0.3	IT
MILANO	Perini	Laura	0.3	0.3	ATLAS
	Resconi	Silvia	0.3	0.3	ATLAS
	Alemi	Mario		0.5	LHC-b
	Paganoni	Marco		0.2	LHCb
NAPOLI	Barone	Fabrizio		0.2	VIRGO
	Calloni	Enrico		0.3	VIRGO
	De Rosa	Rosario		0.5	VIRGO
	Eleuteri	Antonio		0.5	VIRGO
	Garufi	Fabio		0.3	VIRGO
	Milano	Leopoldo		0.2	VIRGO
PADOVA	Michelotto	Michele		0.3	IT
	Sgaravatto	Massimo	0.3	0.3	IT
PAVIA	Vercesi	Valerio		0.2	ATLAS
PERUGIA	Gentile	Fabrizio		0.1	IT
	Servoli	Leonello		0.2	CMS
ROMA	Barone	Luciano M.		0.3	CMS
	De Salvo	Alessandro	0.4	0.4	ATLAS
	Luminari	Lamberto	0.3	0.3	ATLAS
	Majorana	Ettore		0.1	VIRGO
	Marzano	Francesco		0.1	ATLAS
	Organtini	Giovanni		0.3	CMS
	Palomba	Cristiano		0.2	VIRGO
	Pasqualucci	Enrico		0.1	ATLAS
	Valente	Enzo		0.1	CMS
TORINO	Cerello	Piergiorgio	0.3	0.3	ALICE
	Gaido	Luciano		0.2	IT
	Masera	Massimo	0.3	0.3	ALICE
	Pittau	Roberto		0.2	IT Università
	Scomparin	Enrico		0.3	ALICE

The initial testbed will consist of 3PCs in each site and a LAN switch connected via production network. The Globus tests will also use existing computing farms in CNAF , Padova and Pisa.

Table .31 – Table of Materials for WP4 in K€ and ML

Description	2001 (K€)	2001 (ML)	2002	2003	Total
3 PC 26 sites (5ML/PC)	185 *	370
L2 switch *10	250	500
Router interfaces	60 *	120

* It is requested that 90 K€ (180 ML), corresponding to the Quantum Grid of the 12 sites which take part to the Globus evaluation, be anticipated to year 2000 as mentioned at pg. 26, together with the 60 K€ (120 ML) of router interfaces to be assigned to CNAF.

Work Package 5 – Network

This Work Package focuses on the network, a fundamental component in distributed systems. In conjunction with the efforts carried out in the framework of the EU project, work will be developed in two directions:

Analysis of existing data link protocols and of LAN topologies for an efficient and high-throughput access to data servers;

Analysis of the interaction between the network protocols and the application software architecture in terms of an efficient use of the transmission facilities;

Enhancement of the network capabilities in terms of transmission technology (capacity, robustness, resiliency etc.) and of traffic differentiation for the support of Quality of Service (QoS).

The goal of this WP is the definition of guidelines for the design of a suitable network to be adopted in a Grid environment.

The preliminary task consists in the characterization of the application, i.e. in the definition of the application and middleware network requirements, input will be provided in collaboration with other work packages (the National Testbeds WP and the Computing Fabric task of WP 2).

Activities will be carried out on an experimental network layout applied to the Grid environment.

Activities are grouped into four areas: LAN technologies, protocols, quality of service and WAN transmission technologies.

LAN technologies

For the implementation of local area networks several aspects will be evaluated.

The most suitable LAN topology and data link technology for the implementation of the computing fabric. The Network Attached Storage (NAS) and Storage Area Network (SAN) solutions will be analyzed.

Several transport protocols for computing clusters will be considered. GAMMA^[8] is one of the possible candidates.

These work items has to be carried out in collaboration with the Computing Fabric WP.

Data Transport Protocols

The middleware architecture indicated by WP 2 will be studied to verify the efficiency in the deployment of network resources. Feedback will be provided to WP 2 if middleware performance optimization will be required. Data transport protocols will be tested in collaboration with WP 2 and 3.

The focus of this task is two-fold: the analysis of the suitability of the existing packet network transport protocols when applied to distributed computing and the identification of new protocols which can be deployed to optimize the network utilization by any system component requiring it. The identification of the networked architecture components to which a performance analysis has to be applied is part of the task.

The first step is the evaluation of the TCP protocol in order to improve the efficiency of the transmission mechanism and of congestion avoidance. Over long-distance high-speed connection the retransmission mechanism could be critical since a large volume of traffic (proportional to the bandwidth-RTT product) is transmitted before packet loss is notified to the source.

As second step new data transport protocols will be analyzed; a non-exhaustive list of them is the following:

Nexus: it’s a communication protocol implemented in the Globus toolkit that is used to support a range of high-level programming models, including message passing, remote procedure call, remote I/O and maintenance of shared state in collaborative environments.

Group Communication Protocols: they assist application programmers in maintaining the consistency of replicated data by maintaining the membership of process groups and by multicasting messages to those process groups.

Distributed Object Protocols: CORBA/IIOP (Internet Inter-Orb Protocol), Java/RMI.

Internet Backplane Protocol: it could be used to set-up a network topology where data is temporarily stored in selected places where storage resources are available with the aim of minimizing the data transfer time and of avoiding the replication of data if unnecessary.

Reliable multicast transmission protocols like the Scalable Reliable Multicast Protocol (SRM) should be analyzed since they could be adopted to duplicate databases.

The identification of the data protocols of interest is a work item of this task.

Quality of Service

Quality of Service (QoS) support allows the user of a given application to specify the network requirements and to treat packets differently in the network so that the specified service requested is met. The deployment of QoS offers the possibility of running mission-critical applications on production networks, so that traffic is divided into class and traffic isolation is achieved, i.e. interference between different kinds of traffic is avoided.

Application characterization

The application and middleware requirements are identified quantitatively in collaboration with the Applications and Integration WP (WP 5), for example in terms of latency, jitter, network reliability, sensitivity to packet loss, minimum throughput etc. If specific network requirements are identified, work will proceed according to the following tasks.

QoS implementation

Different approaches to Quality of Service in a data network can be adopted, depending of the application requirements.

Two different architectures have been developed so far by the research community: the Integrated Services Architecture (intserv) and the Differentiated Services Architecture (diffserv).

Intserv is particularly suitable for per-flow QoS support, since for each stream an end-to-end signaling protocol notifies the flow and reservation profile to the routers. Intserv provides QoS with fine-granularity, but may imply overhead on network devices when the number of signaling sessions increases. On the other hand, diffserv provides Quality of Service in an aggregated fashion to a set of flows grouped in one class and cannot offer guarantees per flow. Other technical solutions could be evaluated in case of new developments in the QoS research and standardization field.

Technical solutions will be compared in terms of effectiveness, scalability and suitability to the Grid requirements. This evaluation will require the experimentation of network mechanisms like policing, traffic marking, classification, packet scheduling and traffic conditioning in a networked Grid testbed. The feasibility of each solution (e.g. its applicability in a production environment like a National Research Network) will be one of the evaluation parameters.

Dynamic network configuration

Dynamic architectures like the automatic application adaptation to changing performance metrics and the automatic configuration of network devices for the provisioning of QoS to traffic aggregates characterized by dynamic performance requirements, will be considered.

The first work item will focus on the suitability of the adaptation mechanism when applied in the Grid environment, while the second is a network-oriented analysis of the existing architectures offering a support to bandwidth brokerage.

New communication technologies

The Internet transport infrastructure is moving towards a model of high-speed routers interconnected by optical core networks.

WDM Baseline testing

The Wavelength Division Multiplexing (WDM) allows for a greater network capacity and low latency. WDM technology is still under fast evolution, increasing either the number of channels and the single channel throughput. The functionality, efficiency and interoperability of opto-electronic network systems are still an issue.

The possibility of testing WDM on WAN in collaboration with the Italian National Research Network will be investigated.

Advanced testing

This task focuses on the evolution of optical networks, in particular in the field of optical bandwidth management, real-time provisioning of optical channels in automatically switched optical networks and Dense Wavelength Division Multiplexing (DWDM) capabilities on IP routers.

The first issue regards with MPLS and DWDM: Optical networks must be survivable, flexible, and controllable. Therefore, intelligence in the control plane of optical transport systems must be introduced to make devices more versatile. The Multi-Protocol Lambda Switching is an approach to the design of control planes for optical cross-connects (OXCs), which is based on the MultiProtocol Label Switching (MPLS) traffic engineering control plane model. The main idea is to integrate recent results in control plane technology developed for MPLS traffic engineering in optical transport systems. This approach will assist in optical channel layer bandwidth management, dynamic provisioning of optical channels and network survivability through enhanced protection and restoration capabilities in the optical domain. The Multi-Protocol Lambda Switching approach is particularly advantageous for OXCs in data-centric optical internetworking systems because it will simplify network administration and it contributes to the incorporation of DWDM multiplexing capabilities in IP routers. The work item will focus on new optical network services based on the integration of DWDM and MPLS and offering a support to bandwidth brokerage.

Another issue is related to QoS: the applicability and the advantages of having QoS support on WDM or DWDM networks will be investigated.

The feasibility of these issues depends on the future evolution in the above mentioned research fields and it is also dependent on the availability of the needed network infrastructures, both in terms of equipment and connections.

Work plan

LAN technologies:

Task 1: evaluation of topologies and data link protocols (in collaboration with task 2.4 of WP 3.2);

Task 2: evaluation of computing cluster transport protocols.

Data Transport Protocols:

Task 3: Data transport

Identification of the middleware architecture to be adopted in the Grid prototype and of the related protocols, selection of the protocols relevant from the network point of view;

Test plan definition, identification of the measurement tools, testbed configuration, performance measurement.

Task 4: Transport protocols

Identification of the networked software components of the Grid prototype and of their functions; estimation of the network environment that will be used in production (typical delays, network capacity in use, reliability, packet loss probability, etc.)

Identification of the existing protocols or technical solutions to support the functions identified at point a., analysis of the potential related issues, solution outline.

Quality of Service:

Task 5: Application characterization

Quantitative definition of the application QoS requirements.

Quantitative definition of the middleware QoS requirements.

Task 6: QoS implementation

Identification of the most appropriate QoS architecture according to the supported features, the scalability and its feasibility

Configuration of network set-up and performance measurements

Analysis of application adaptation and bandwidth brokerage

Definition of the network model to be adopted.

Transmission technologies

Task 7: Investigation of the feasibility of a WDM-based wide area testbed for baseline analysis of applicability, configuration and deployment issues, implications on network design

Task 8: Study of the latest developments of the optical transmission technology (among the others: MPLS and QoS support over DWDM/WDM, lambda-based switching and routing) and offering a support to bandwidth brokerage.

Table .32 – List of the Deliverables for WP5

Deliverable	Description	Delivery Date
D5.1	QoS: Definition of QoS requirements, performance analysis on a QoS-capable networked Grid layout; Protocols: identification of the network-related issues	Months 12
D5.2	Protocols: performance analysis; experiments on a WDM layout	Months 18

Table .33 – List of the Milestones for WP5

Milestone	Description	Date
M5.1	Definition of the network architecture (production-based or VPN), definition of the QoS architecture (if relevant to Grid); identification of the network protocols needed.	Month 12
M5.2	Identification of the issues of the application and middleware protocols; future development of the future activities on optical networks	Month 12

Resources

Table .34 – Personnel Contribution to the WP5 Tasks (% FTE)

Sede	Cognome	Nome	DataGrid	INFN-Grid	Esperimento
CNAF	Chierici	Andrea	0.3	0.2	IT
	Ferrari	Tiziana	0.3	0.5	IT
	Ghiselli	Antonia		0.2	IT
	Vistoli	Cristina		0.2	IT
ROMA	Mirabelli	Giovanni		0.2	ATLAS
TORINO	Gaido	Luciano		0.1	IT

Costs evaluation

Costs are evaluated using actual forecasts on prices. Some of the following figures are based on the work of the CERN Technology Tracking Group.

Baseline for costs

Personnel

The actual personnel costs for INFN are shown in the following table. All the figures are full costs, excluding some kind of overheads: travels, basic equipment (desk, chair, telephone, Personal Computer, etc.).

Table .1 - Personnel Cost Figures

Position	Level	Annual Cost in €	Overhead (20%) in €	Total in 3 Years (€)	Total without Overhead (€)
Technology Researcher	I	87,469	17,494	314,889	262,408
	II	57,925	11,585	208,531	173,776
	III	50,722	10,144	182,598	152,165
Technician	VI	26,472	5,294	95,297	79,415
Fellow	N.A.	15,000	3,000	54,000	45,000
Scholar	N.A.	14,000	2,800	50,400	42,000

Hardware

The Hardware costs are mainly related to three categories: Processors (CPU), Disk Storage, Mass or Tape Storage. Nevertheless the network is also a strategic item, both at a local and wide area, although LAN switches and Network cards will not be a major part of the investment.

Network

Network equipment will be a relevant part of the infrastructure, both in terms of costs and number of ports. The natural evolution of the technology seems, since 4 years ago, to be in line with the rest of the hardware development.

Two technical evolution lines should be considered regarding the Local and the Wide Area Network infrastructures.

In the figure below it’s shown a step diagram evolution of the WAN links in Italy foreseen by GARR. These figures match quite exactly with the forecasts for the Research and Academic European connectivity as proposed by the Géant project^[11].

No special cost forecast is presented here: we assume that Géant and GARR will provide the necessary bandwidth to all the INFN sites interested and the related costs will be covered by the general INFN networking infrastructure fund.

The only official document is presently the "European Commission grant of 80 M€ to upgrade the European Internet for Research"^[12].

Figure 3 - WAN Bandwidth Evolution

The LAN technology has rapidly increased the interconnection speed and changed the topology from a shared bus to a switched network with greater bandwidth availability.

Fast Ethernet and Gigabit Ethernet are already solid technologies vastly deployed in the INFN sites. None of the present applications seems to be limited by the bandwidth available with these technologies and the prices are falling down quite rapidly in such a way that LAN switches cost is by far not a limiting factor.

Figure 4 - LAN Bandwidth & Cost Evolution

The figures used for networking equipment is based on the present evaluation of a 48 Fast Ethernet ports switch with 2 Gigabit Ethernet uplinks equivalent to 6000 €. The cost evolution can be considered essentially quite flat but, due to the increase of the access speed of the ports, the price/performance will be significantly better in a couple of years.

CPU Costs

We assume in the following that holds the so-called Moore’s law^[10] that states that the processors will double their performances every 18 months and we also assume that the price will be the same. These assumptions lead to the following table and to the graphs in the figures below.

Table .2 - CPU Evolution and Prices

Time	MHz	SI2000	€/SI2000
1-Jan-2000	600	266	7.52
1-Jun-2000	759	336	5.94
1-Jan-2001	960	426	4.70
1-Jun-2001	1215	538	3.71
1-Jan-2002	1536	681	2.94
1-Jun-2002	1944	862	2.32
1-Jan-2003	2459	1090	1.83
1-Jun-2003	3110	1379	1.45
1-Jan-2004	3934	1744	1.15
1-Jun-2004	4977	2206	0.91
1-Jan-2005	6296	2791	0.72
1-Jun-2005	7964	3531	0.57
1-Jan-2006	10075	4467	0.45

Figure 5 - CPU and Price/Performance Evolution

All the figures are given in SPECint2000, that is the next-generation industry-standardized CPU-intensive benchmark suite, the successor of SPECint95^[9]. Although the results of these two benchmarks can’t easily be translated one into the other, just for simplification we assume that 10 SPECint2000 are almost equivalent to a SPECint95 (9.2 is a better approximation but it’s much less comfortable).

Assuming commodity personal computer hardware, with no monitor and a single processor, a basic price of 2,000 €/Box is assumed for today purchases (800 MHz, 355 SPECint2000).

This cost is assumed to be constant with time, in such a way that the price/performance drop is due to technological evolution only. Better figures can obviously be obtained if one assumes a price drop with time. In the following table the cost of a CPU farm, built around these assumptions, is shown. No significant Disk capacity is taken into account.

Table .3 - CPU FARM Cost Evolution (Single Processor)

FARM		Date	CPU		Network Switches		Racks		*Total* Cost
SI95	SI2000	Date	#	Cost	#	Cost	#	Cost	*Total* Cost
2000	20000	Jun-00	60	€ 118,763	2	€ 12,000	2	€ 5,000	€ 135,763
4000	40000	Jun-01	75	€ 148,453	2	€ 12,000	2	€ 5,000	€ 165,453
16000	160000	Jun-02	186	€ 371,133	4	€ 24,000	5	€ 12,500	€ 407,633
20000	200000	Jun-03	145	€ 289,948	4	€ 24,000	4	€ 10,000	€ 323,948
30000	300000	Jun-04	136	€ 271,826	3	€ 18,000	4	€ 10,000	€ 299,826
80000	800000	Jun-05	227	€ 453,044	5	€ 30,000	6	€ 15,000	€ 498,044
340000	3400000	Jun-06	602	€ 1,203,397	13	€ 78,000	16	€ 40,000	€ 1,321,397

In case of Dual Processor Boxes, a significant capacity/space improvement is gained; on the other hand the price premium of the double CPU box has been evaluated to a 30% increase (2600 €/Box) that takes into account only the CPU and the memory doubling.

Table .4 - CPU FARM Cost Evolution (Dual Processor)

FARM			Date		CPU				Network Switches				Racks		Total
SI95	SI2000				#		Cost		#		Cost		#	Cost	Cost
2000	20000	Jun-00		60		€ 77,196		1		€ 6,000		1		€ 2,500	€ 85,696
4000	40000	Jun-01		75		€ 96,495		1		€ 6,000		1		€ 2,500	€ 104,995
16000	160000	Jun-02		186		€ 241,237		2		€ 12,000		3		€ 7,500	€ 260,737
20000	200000	Jun-03		145		€ 188,466		2		€ 12,000		2		€ 5,000	€ 205,466
30000	300000	Jun-04		136		€ 176,687		2		€ 12,000		2		€ 5,000	€ 193,687
80000	800000	Jun-05		227		€ 294,478		3		€ 18,000		3		€ 7,500	€ 319,978
340000	3400000	Jun-06		602		€ 782,208		7		€ 42,000		8		€ 20,000	€ 844,208

As it can be seen from the Table 4.4 the cost improvement is essentially due to the CPU boxes and in a minor part also to the reduced number of network switches and racks needed.

In both of the above cases we didn’t take into account some specific items which can increase the overall price: a central GigaEthernet switch, cables, the hardware to control the system, etc..

We can then conclude that in general the Double Processor boxes should be preferred, not only for their better space optimization, but also for a rough 30% improvement in the total Farm cost.

Configurations and architectural study of the farm is part of the Work Package 2 already described.

Disk Storage

The disk storage is an essential part of the computing system and this project is partly dedicated to data access, data replication and data retrieval. Almost all of these data operations will be done with disks and since the cost per GB of this magnetic media is falling, there will also be good economical reasons for using as much disk as possible to optimize the Database access.

Figure 6 - Evolution of Disks Technology and Price

The Figure above shows the technology and price/GB evolution extrapolated from the last 4 years. The costs are referred to raw disk capacity (i.e. EIDE) without options like: controllers, cables, etc..

Using data shown in the Figure 6, we can estimate the costs of different technological and capacity configurations. The assumptions here are the following:

The cost, in €/GB, of the various types of storage architectures is assumed to be in the following relationship: SAN : RAID 5 : SCSI : EIDE = 5 : 2.5 : 2 : 1

A 19" Rack contains 8 Drawers with 10 Disks each and this packaging is stable across the years.

Table .5 - Disk Storage Cost Evolution

Capacity	Year	SCSI (JBOD)			RAID 5			RAID + Fibre Channel SAN
(TB)	Year	# Disks	Racks	Cost	# Disks	Racks	Cost	# Disks	Racks	Cost
1	2000	25	1	€ 34,250	35	1	€ 42,063	35	1	€ 81,125
10	2001	157	2	€ 201,313	200	3	€ 253,141	200	3	€ 497,281
50	2002	489	7	€ 631,352	615	8	€ 786,939	615	8	€ 1,549,879
100	2003	611	8	€ 786,939	765	10	€ 983,674	765	10	€ 1,937,349
500	2004	1908	24	€ 2,456,186	2385	30	€ 3,070,232	2385	30	€ 6,050,464
1000	2005	2385	30	€ 3,070,232	2985	38	€ 3,839,290	2985	38	€ 7,564,581
1000	2006	1491	19	€ 1,919,645	1865	24	€ 2,400,306	1865	24	€ 4,728,613

Mass and Tape Storage

Mass storage is intended in twofold directions: Data Backup and Rarely Accessed Data Archive. Both kind of mass storage systems require: low cost media (€/GB), high density (GB/Cartridge or Media), scalable performances to PB archives, robotics and automated mounting/dismounting/shelving, low space occupancy and weight.

The present technology leads to two possible solutions, which are by no means equivalent: Magnetic Tapes and DVD ROM. In both cases a medium size (1000 slots) robotic library has been added to the pure media costs with an overhead of 25K€/Library stable across the years.

Figure 7 - Tapes & DVD Capacity Evolution

The costs evolution is quite different, as it can be seen from the table below. The cost per GB is assumed to decrease with a raw 40% every year in the case of Tapes, due mainly to the increasing capacity of the tapes.

In the DVD case, instead, the smooth decrease of the cost/GB is only due to the media cost dropping and the assumption is that the cost of a DVD-RAM will be, in the end, similar to the actual cost of a CD-RW. This assumption is driven by the market need to freeze the technology evolution of the DVD for a number of years (around 10) necessary to make good profit of the investments in the production plants.

Figure 8 - Tapes & DVD Costs Evolution

Table .6 - Tapes and DVD Technology Costs Evolution

Capacity		DLT			9840			DVD
TB	Year	# Tapes	Robot	Cost	# Tapes	Robot	Cost	# Disks	Robot	Cost
1	2000	25	1	€ 26,250	40	1	€ 27,000	193	1	€ 34,615
10	2001	157	1	€ 32,813	250	1	€ 37,500	962	1	€ 53,281
50	2002	489	1	€ 49,414	782	1	€ 64,063	2404	3	€ 116,589
100	2003	611	1	€ 55,518	977	1	€ 73,828	4808	5	€ 173,928
500	2004	1908	2	€ 145,367	3052	4	€ 252,588	24039	25	€ 768,907
1000	2005	2385	3	€ 194,209	3815	4	€ 290,735	48077	49	€ 1,394,302
1000	2006	1491	2	€ 124,506	2385	3	€ 194,209	48077	49	€ 1,324,589

Management and Organization

For the organizational and managerial aspects of the project the general idea is to have something similar to the well consolidated managerial organization of the HEP experiments.

The following organization can be proposed.

A Collaboration Board, chaired by the Collaboration Board Chairman, formed by:

one representative per Experiment for each site and laboratory involved in the project.

the Grid Local Coordinator that will be nominated by the collaborating personnel of a site (and only one per site), in agreement with the site Director, to have administrative responsibilities for that site, if not an experiment site representative

ex-Officio members: all the members of the Executive Board (see next), all the members of the Technical Board (see next), the Director of CNAF.

The role of this board is to endorse Political, Administrative, Organizational and Strategic decisions proposed by the Executive Board. The Collaboration Board meets about 2 times per year, depending on the activities, and usually after each Collaboration Meeting (see next).

An Executive Board, chaired by the Project Manager, and composed by:

The representatives or Computing Coordinators of the experiments (i.e. ALICE. ATLAS, CMS, LHCb, APE, VIRGO).

The Technical Coordinator of the project (see later).

The INFN representative in the DATAGrid project

The Executive Board has Political and Administrative responsibilities regarding the normal management of the project. In case of need the Board can also take urgent strategic decisions, subject to later approval by the Collaboration Board.

This Board meets whenever necessary with a minimum of a meeting every 2 months.

A Technical Board, chaired by the Technical Coordinator, and composed by the Work Packages Coordinators and by the responsible of the experiment applications that are making daily usage of the Grid. We are here supposing that most of the those people will also be involved in the European Grid Project; if this is not the case, room should be left for another couple of persons technically involved in the DATAGrid initiative. This Board meets whenever necessary with a minimum of a meeting every 2 months to steer and monitor the technical progress of the Project, and solve the technical issues eventually raised in running the experiment application on the available Grid services.

Collaboration Meetings are the general assemblies of the Project where, to all the members of the collaboration, is presented for discussion:

the status of all the activities;

the strategic decisions which have to be taken, proposed by the Executive to be subsequently approved by the Collaboration Board;

new proposals and modifications of the existing work plans and/or organization.

Figure 9 – Managerial Structure of the Project

We now describe the role and duties of the Key Persons.

Collaboration Board Chairman: chairs the Collaboration Board and the Collaboration Meetings which are the general assemblies of the Project.

Project Manager: is the person who has the general responsibility of the project, with particular emphasis on the Political and Administrative issues. He/she chairs the Executive Board and speaks on behalf of the entire collaboration. He/she presents, at least once per year, the Project Status Reports to the Collaboration Meetings and to the Collaboration Board.

Technical Coordinator: coordinates all the technical activities in the Project. He/she chairs the Technical Board and presents, at least once per year, a Technical Status Report to the Executive and Collaboration Board. He/she is a member of the Executive Board.

Work Package Coordinator: coordinates the activity of a single work package, he/she is an Ex Officio Member of the Technical Board where he/she is presenting regular status reports.

Experiment Site Representative: represents the group of persons of each experiment working in a site or a Laboratory, normally chosen by them. He/she is a member of the Collaboration Board and is the local responsible of the experiment activities. Each site will also choose one and only one person which acts as coordinator of all experiments and services and has administrative responsibilities for that site. He/she is a member of the Collaboration Board.

Summary

Materials and consumables costs for LHC experiments in ML

	2000	2001	2002	2003	Total
ALICE		990	1068	1151	3209
ATLAS	150	740	570	1110	2570
CMS		1920	2073	2249	6242
LHC-b		366	446	676	1488
Total	150	3416	4157	5186	13509

Materials and Consumables costs for VIRGO (in ML)

	2000	2001	2002	2003	Total
VIRGO		1477
Total		1477

Milestones and Deliverables

Table .1 - Summary of all the Deliverables

Deliverable	Description	Delivery Date
D1.1	Tools, documentation and operational procedures for Globus deployment	6 Months
D1.2	Final report on suitability of the Globus toolkit as basic Grid infrastructure	6 Months
D2.1.1	Technical assessment about Globus and Condor, interactions and usage		5/2001
D2.1.2	First resource Broker implementation for high throughput applications		7/2001
D2.1.3	Comparison of different local resource managers (flexibility, scalability, performance)		10/2001
D2.1.4	Study of the three workload systems and implementation of the workload system for Monte Carlo production		12/2001
D2.2.1	Definition of requirements for a Data Location Broker (DLB)	5/2001
D2.2.2	Definition of a metadata syntax	7/2001
D2.2.3	Prototype of a DLB using metadata	12/2001
D2.2.4	Replica Management at file level	12/2001
D2.3.1	Requirements analysis report	6 Months
D2.3.2	Evaluation report on existing technology	12 Months
D2.3.3	Architectural design	18 months
D2.3.4	Detailed design report of core modules	24 Months
D2.3.5	Final implementation of the system and integration	36 Months
D2.4.1	Requirements gathering from fabric users		3 Months
D2.4.2	Survey of existing fabric/cluster architectures, architectural issues to design a fabric of commodity pcs		6 + 6 Months
D2.4.3	INFN toolkit to design and realize local fabrics		12 Months
D2.4.4	Integration of the DataGrid fabric management software tools		24 Months
D2.4.5	Integration of the Fabric Management into the Grid environment		36 Months
D3.1	Use cases programs. Report on the interfacing activity of use case software to minimal Grid services in INFN	12 Months
D3.2	Report on the results of Run #1 and requirements for the other WP’s.	24 Months
D3.3	Report on the results of Run #2. Final project report	36 Months
D4.1	Widely deployed testbed of Grid basic services	6 months
D4.2	Application 1 (traditional applications) testbed	12 months
D4.3	Application 2 (scheduled client/server applications) testbed	18 months
	Application n (chaotic client/server applications) testbed	30 months
D4.4	Grid fabric for distributed Tier 1	24 months
D4.5	Grid fabric for all Tier hierarchy	36 months
D5.1	QoS: Definition of QoS requirements, performance analysis on a QoS-capable networked Grid layout; Protocols: identification of the network-related issues		Months 12
D5.2	Protocols: performance analysis; experiments on a WDM layout		Months 18

Table .2 - Summary of all the Milestones

Milestone	Description	Date
M1.1	Basic development Grid infrastructure for the INFN-Grid	6 Months
M2.1.1	Workload management system for Monte Carlo production	12/2001
M2.1.2	Workload management system for data reconstruction and production analysis	12/2002
M2.1.3	Workload management system for individual physics analysis	12/2003
M2.2.1	Capability to access replicated data at file level using a simple abstract description	12/2001
M2.3.1	Evaluation of suitability for INFN of baseline architecture and technology proposed by DATAGrid	12/2001
M2.3.2	Demonstrate prototype tools and infrastructure in a testbed Grid environment	12/2002
M2.3.3	Demonstration of monitoring services on a full scale INFN testbed	12/2003
M2.4.1	NFS connection topology: disk server based	3/2001
M2.4.2	NFS connection topology: distributed disks	3/2001
M2.4.3	Microprocessor Technology: dual slim processors	3/2001
M2.4.4	Storage Systems: Fibre Channel - SCSI Ultra 160/320	3/2001
M2.4.5	Storage Systems: Serial ATA	7/2001
M2.4.6	Microprocessor Technology: IA64	3/2001
M2.4.7	Interconnection Networks: Myrinet based	3/2001
M2.4.8	Interconnection Networks: Gigabit Ethernet and efficient communication protocols	5/2001
M2.4.8	Setup of a cluster of more than 100 nodes using an architecture as selected by the previous points with configuration and installation management support as provided by DataGrid. See also CMS Regional Center Prototype	12/2001
M2.4.9	Interconnection Networks: Infiniband	12/2001
M2.4.10	Installation of the INFN fabrics (according to the selected architecture)
M2.4.11	Test of integration of the INFN fabrics with Tier 0
M3.1	Development of use cases programs. Interface with existing Grid services in INFN	12/2001
M3.2	Run #1 executed (distributed analysis) and corresponding feed-back to the other WP’s	12/2002
M3.3	Run #2 executed including additional Grid functionality and extended to a large INFN user comunity	12/2003
M4.1	Testbeds topology	6 Months
M4.2	Tier1, Tier2&3 computing model experimentation	12 Months
M4.3	Grids prototypes	18Months
M4.4	Scalability tests	24 Months
M4.5	Validation tests	36 Months
M5.1	Definition of the network architecture (production-based or VPN), definition of the QoS architecture (if relevant to Grid); identification of the network protocols needed.		Month 12
M5.2	Identification of the issues of the application and middleware protocols; future development of the future activities on optical networks		Month 12

References

GLOBUS: http://www.globus.org/

CONDOR: http://www.cs.wisc.edu/condor/

PPDG: http://www.cacr.caltech.edu/ppdg/

GRIPHYN: http://www.phys.ufl.edu/~avery/mre/

NASA: http://www.nas.nasa.gov/ipg/

DATAGrid Project: http://www.cern.ch/Grid/

MONARC Project: http://MONARC.web.cern.ch/MONARC/

GAMMA: http://www.disi.unige.it/project/gamma/index.html

http://www.specbench.org/

Gordon E. Moore Intel co-founder: more information can be found in the following URL: http://www.sciam.com/interview/moore/092297moore2.html

GEANT: http://www.dante.net/geant.html

http://europa.eu.int/comm/information_society/eeurope/pdf/ip00_507_en.pdf

APPENDIX A: Experiments’ Computing Documents

ALICE

Introduction

A fundamental requirement defining the working mode and computing model for ALICE will be how the end-user has access to the reconstructed data. The amount of data to be processed by LHC experiments will exceed by one order of magnitude that of previous experiments. This large volume of data will require a correspondingly large amount of CPU resources. These requirements, along with our already worldwide collaboration, lead naturally to a distributed architecture, based on CERN plus some Regional Centers, where efficient access to distributed data and CPU resources will be key issue.

The ALICE collaboration has already defined the architecture of its Data Acquisition and Computing systems and different scenarios for the testing and the prototyping of the different components of this architecture. On a regular basis, "ALICE Data Challenges" are realized to test this architecture with existing components. The present proposal aims at looking at the problem of data storage and access from the analysis point of view, rather than from the one of raw data reduction and storage, and it provides a realistic and demanding environment for the execution of distributed interactive data analysis tasks. In a second phase ALICE may exploit the Grid technology also in the overall context of our Data Challenge.

A specific requirement for the ALICE experiment comes from the intrinsic nature of Heavy Ion Collisions (dN/dy up to 8000 particles at mid rapidity). Events are expected to be rather large even when working with very compressed data, such as micro-DST's. The local storage capacity is expected to increase substantially, and certainly much more data will be stored locally. However the foreseeable evolution of the processing power of desktop machines will be outrun by the amount of the data to be traversed. For the previous generation LEP (Large Electron Positron collider) experiments, a typical figure for the amount of data was 100GB, and the capacity of a local disk was around 1GB. At LHC the data will of the order of the PB, while the local disk capacity will not exceed the TB. Improvements in the disk access speed are not expected to exceed a factor 5 over present figures. This means a reduction of the capabilities of a desktop workstation to meet the demands of the physicists of more than one order of magnitude.

Besides the data access problem, ALICE computing will have to face the challenge of tasks that will require large amounts of computer time. These include, but may not be limited to: event reconstruction, particle identification of the large number of tracks expected, sophisticated statistical analysis of the reconstructed events, software based detector alignment between different detector elements, up to a precision of one part per million (10 microns over 10 meters), and other calibration and detailed detector performance studies.

To recover the necessary resources for these tasks and insure an acceptable ergonomy of use of the current computing paradigms based on the interactivity, particularly for the analysis phase, the only solution is to exploit parallel processing, locally but also remotely on geographically distributed resources. It seems therefore mandatory to develop an analysis framework that could provide easy access to distributed storage and computing resources. The Grid technology may offer to the member of the ALICE collaboration better and more efficient access to distributed storage (the Data Regional Centres) and CPU (Data and Offline Regional Centres) resources independent from their geographical location.

To put in place such a complex system, experimentation should start now and the experience has shown that extrapolations by an order of magnitude are not reliable. It is therefore advisable to start a prototyping program with the goal of deploying and testing, within three years, a computing infrastructure about 10% of the magnitude required at LHC startup. The simulations devoted to the detector physics performances studies together with the reconstruction and analysis of the simulated data will represent the case study for these prototypes. This work, with all its phases, from event generation to the final analysis, well reproduces the real environment in which the physicists will operate. The prototype size is therefore motivated not only by a top-down procedure (the motivation for a reasonable scaling) but also from a bottom-up approach where the resources are justified on the basis of the cpu and storage needs of the simulation and the analysis tasks.

Prototype outline

A detailed description of the system is provided in the ALICE Grid proposal. Here only a short description is given.

The data to be analysed will be produced by the ALICE simulation and reconstruction ROOT-based framework and they will consist of reconstructed tracks ready to be analysed. This represents also a major simulation and reconstruction effort by which we will obtain an accurate and complete description of the detector response to relevant physics signal. The data will then be geographically distributed in a few candidate regional centres and they will be accessed remotely to perform a pre-defined set of analysis tasks. These tasks will cover a broad range of use cases from very simple to data and CPU intensive ones. The tests must cover a large number of parameters principally having to do with the locality of data and algorithms and the access patterns to the data.

The core of the system will be a load-balanced parallel query mechanism that will fan-out requests where data is stored and retrieve a range of possible entities, from data to be processed locally to the final result of the remote analysis to be assembled locally. We expect to exploit the Grid middle-ware to reach a high level of access uniformity across our geographically diverse community via wide area networks on heterogeneous systems. In a first stage data at sites will be stored on disk, while in a second phase data may be stored on tertiary storage.

All communications will go via the root remote daemon (rootd). This is an advanced multithreaded application managing remote file access. Load balancing will be performed by the Parallel ROOT Facility (PROOF) in collaboration with the Grid services.

The interest of this scheme is its complete flexibility. If heavy computational tasks are sent to the remote nodes, the system will behave as a distributed meta-computer. If the data is retrieved for local processing, the functionality of a OODBMS is achieved. Also, each slave server has access, via the rootd daemon, to both local and remote data. This means that a remote server with an excess of CPU power could consume all its data and then be assigned to process remote data, residing on less performing or more loaded servers. Self adaptation of the application topology to the load conditions of the system is mandatory to even out the load and avoid pathological cases.

Emphasis will be put on load balancing and we expect this work to be complementary, and possibly profit from, MONARC Phase III developments on resource scheduling and management.

The Resources Required

Two different ALICE projects will be involved in the Grid activity. The first is connected with the Physics Performance Report. This is a complete report on the physics capabilities and objectives of the ALICE detector that will be assessed thanks to a virtual experiment involving the simulation, reconstruction and analysis of a large sample of events. The first milestone for the Physics Performances Report when the data will have to be simulated and reconstructed is due by the end of 2001. This exercise will be repeated with a larger number of events regularly to test the progress of the simulation and reconstruction software. The simulations will be devoted to study the detector Physics performances. Careful studies of the dielectron and the dimuon trigger are necessary for instance to fully understand and optimise their efficiency and rejection power. A large emphasis will be given to interactive distribute data analysis for which special codes will be developed and that it is expected to use a very large spectrum of the services offered by the Grid.

A second activity is linked with the ALICE Mass Storage project. In this context already two data challenges have been run, and a third one is foreseen in the first quarter of 2001 involving quasi-online transmission of raw data to remote centres and remote reconstruction. Other data challenges aiming at higher data rates and more complicated data duplication schemas are planned at the rhythm of one per year.

Current estimation of the CPU power to simulate a central event are around 2250 KSI95/s and while to reconstruct it 90 KSI95/s are needed. The storage required for a simulated event before digitisation is 2.5 GB and the storage required for a reconstructed event is 4 MB.

Table .1 - Resources needed for simulation and reconstruction in 2001-2003

ALICE Simulation

2001

2002

2003

Events

1.0E+06

3.0E+06

7.0E+06

Needed CPU (SI95)

(assuming 12 months time and 100% efficiency)

3000

9000

21000

Needed disk (TB)

6.5

19.5

45.5

Table .2 - ALICE's Testbed capacity target

Capacity targets for the INFN Testbed ALICE
	units	2001	2002	2003
CPU capacity	SI95	7,000	22,000	45,000
Disk capacity	TBytes	6	18	40
Tape storage – capacity	TBytes	12	36	80
Total cost	ML	1000	2020	3200

The consumable and manpower costs have not been evaluated yet.

Each column represents the integral achieved in that year.

A more detailed breakdown of the activities is being prepared for the Physics Performances Report document. A short description of the activities and specific charges of the Italian groups is reported below. The statistics evaluated are based on the studies performed for the detector TDRs (1999-2000) .

HMPID

Study of the inclusive branching ratios (p,K, p, anti-p): study of efficiency and rejection power as a function of momentum and track angle 4 x 10^5 events.

Study of f ->K+K- : 2x10^5 events.

Background events: 100 at 8000 dN/dY, 100 at 4000 dN/dY

ITS

Study of detector behavior (500 events full simulation)

Study of neutron and beam loss background

Study of selected physics channels : K0,L, S, W,X , (10000 events per signal) y, y’ (10^6 events per signal with TPC and TRD)

HBT correlation studies (1000 signal events per set of correlation parameters)

Study of di-barion,di- L, and strangelets (10^5 events per signal)

Background events: 3000

The total estimated cpu needs amount to 4000 SI95

TOF

Open charm and f, production, studies of inclusive branching ratios and multiplicities. For these activities are foreseen:

simulation of 200 central background events: 4000 h @ 25 SI95

simulation of 2x10^5 signal events (charm and phi): 200 h @ 25 SI95

reconstruction of 2x10^5 events: 10^5 h @ SI95

Taking into account a 70% system efficiency and further analysis stages, under the hypothesis of performing these tasks within a 5 months period the total required cpu power is 2100 SI95

MUON ARM

Study of the detector response for J/ y, y’ and Y:

Simulation of 10^7 pp` BB and ` DD events with Pythia: 365 days @ 25 SI95

Simulation of 600 PbPb central events: 165 days @ 25 SI95

Simulation of 60 J/ y, y’ and Y samples in different Pt bins: 32 days @25 SI95

Reconstruction of 6 x 10^5 events: 2500 days @ 25 SI95

Taking into account: 70% system inefficiency ,further analysis stages, a 5 months time to perform these tasks and considering that the Italian groups will contribute for a 50% of the total, the final estimated cpu power is 700 SI95.

ZDC

Study of centrality trigger: 10^4 minimum bias events : 208 days @ 25 SI95.

Taking into account reconstruction and analysis the cpu estimated is 200 SI95

APE

A test case for the Grid project: analysis of Lattice QCD data

At the end of the year 2000, several APEmille installation will be up and running in several sites in Italy, Germany, France and the UK, for an integrated peak performance of about 1.5 Tflops.

These machines will be used for numerical simulations of QCD on the lattice, with two main physics goals in mind:

An accurate study of the physical properties of the low-lying hadrons,as described by the complete QCD theory, including all fermion effects.

An investigation of the phenomenology of the weak interaction on hadronic states, with enphasis on heavy quarks. In this area very large lattices (in terms of lattice points) are needed. Due to computer power limitations, these studies are performed in the so called quenched approximation (which neglects the effect of virtual fermion loops).

In the following years, a new generation of machines will gradually become available. A major player in this field will presumably be the new generation APE project, apeNEXT.

Trends for the next five years can be summarized by the following points:

An increase of yet a factor ten in performance, expected to reach the level of 5 to 10 Tflops.

The coordinated use of several machines running on the same physical problem, to help accumulate more statistics.

A gradual trend to make field configurations available to a wider community of users, so they can be used more effectively to perform several independent investigations.

The last two points clearly illustrate the trend to see a large computer installation running Lattice QCD code as the analogous of a particle accelerator. In this analogy, the field configurations are equivalent to the collision events, that are collected and analyzed by the experiments.

The relevant point for Grid is that the amount of data to be stored and analyzed is of comparable size as a large experiment.

One of the most consuming steps in a Lattice QCD simulations is the evaluation of the propagators of a given field configuration. Once a set of propagators is available, the evaluation of the expectation values of the operators needed to measure a physical quantity

is relatively straightforward. It does make sense therefore to store permanently a large collection of propagators for subsequent reuse by a larger community of physicists.

This is not a trivial operation. For example, Table 8.3 lists the storage size of the set of (heavy and light) quark propagators corresponding to typical lattices simulated by APEmille and apeNEXT.

In the past this huge storage problem was bypassed altogether, since propagators were computed, used on the fly and discarded.

Table .3 - Typical data structures to be analyzed in Lattice QCD

	APEmille	apeNEXT
when	2000 - 2003	2003 - 2006
lattices	50³ x 100	100³x 200
propagators size	500 GByte	10 Tbyte
Configurations	50	100
Global D-base size	25 Tbyte	1 PByte

In the near future, the whole database must be made available to a rather large community of users (say, 50 physicists) working at several sites.

The challenges are:

the Operation of a storage systems supporting the database. Data will be presumably distributed in a small number of sites.

The assembly and operation of a farm of processors able to retrieve configurations from the database and process them.

The development of the network infrastructure and the middle-ware to allow physicists to perform their analysis remotely.

In the near future, in the framework of the Grid initiative a Lattice QCD specific test-bed could be envisaged, with the following objectives:

Install and operate a large storage system in at least two APEmille sites.

Generate and store several large lattice configurations on each installation (for example 5 Tbytes of data at each site).

Develop and customize the relevant middle-ware (presumably Globus based) to allow Lattice QCD analysis.

Experiment on the system by encouraging lattice physicists to run physics analysis on the data base.

This preliminary program should heavily leverage on the investment made on Grid like techniques by our experimental colleagues and concentrate on the specific features of Lattice QCD analysis.

The experience gained in this effort would be precious to sustain a smooth analysis of apeNEXT configurations as they become available.

The APE group is able to commit the equivalent of one person per year to the project. Their main task will be that of working out detailed specifications and requirements. Smooth operation of the project requires two additional full-time persons for a period of about two years.

The project requires a non negligible investment in hardware (the storage system and the analysis farm). Work is in progress to make an accurate estimate.

Besides Lattice QCD research, it is likely that other groups will need large computing facilities in the near future. For example, studies in gravitation focuses on massive generation of GW of interest for the VIRGO facility will require very large lattices to study Einstein's equations for coalescing objects. A large farm could be conveniently used to run codes such as NCSA-Potsdam Cactus, which is ready to run efficiently on linux-based clusters.

ATLAS

Status and plans for global ATLAS Computing

ATLAS has completed in June 1999 the Physics and Performance TDR; the simulation, reconstruction and analysis work for the Physics TDR was done with largely traditional s/w albeit with sizeable insertions of C++ code.

A new OO framework is now being developped (ATHENA, which has mutuated many aspects of GAUDI, the LHCb framework). The development of GEANT4 code for all the detectors is progressing well. By beginning 2001 the basic recostruction functionalities should be available in the new code and a working GEANT4 simulation will allow detailed physics and performance checks to be performed for GEANT4 validation in ATLAS. Objectivity is the current ATLAS baseline for implementing the event store: in the new s/w a large enough use of Objectivity will be made, so that already by the end of 2000 significant performance tests with Objectivity will be feasible.

The World Wide Computing Model and the related Data Model are being developped in the framework of the new ATLAS Computing organization, where a newly formed National Computing Board (with one representative for each "country") has been added to a reorganized Computing Steering Group. The ATLAS Computing Model will be based on the results of MONARC and will make use as much as possible of the tools and infrastructure developped by the Grid projects (DATAGrid in Europe, GRIPHYN in US, etc). This work is steered by the NCB by three Groups: MONARC-Simulation, Grid, Computing Model.

Tier1 RC are planned in France, Italy,Japan, US/Canada, UK; in other countries the discussion is still at a more preliminary phase.

The Computing Milestones are defined by two kind of requirements: a) the need of getting physics and performance results (e.g on High Level Triggers), and b)the need of developping and testing the prototypes of the computing infrastructure (both h/w and s/w) by 2003, before starting the real build-up of the final production system. These two sets of requirements have to match in order to allow an efficient use of the resources.

In the following list the most relevant milestones of the two kinds are listed; the milestones of the second kind are usually given the misleading name of Mock Data Challenges (actually they have nothing of "mock" and will be done with the highest possible degree of realism), the misleading name is however maintained here and they will be called MDC0,1,2...:

Mid 2001 : GEANT4 validation Workshop

Mid 2001 (few months shift forward possible): HLT TDR

Second half 2001: MDC0

Test the full chain GEANT4 and the new s/w with at least 10⁵ events

First half 2002: MDC1

Full simulation and processing chain (using also calibration and allignent as far as possible) for at least 10⁶ events coming out of Level3 trigger; this MDC milestone is matched with the milestone of the study of final algorithms for Level3 (and Level2 as far as possible). The events to be fully simulated will be at least 10⁷, in order to allow for Level3 study. Some, but not all, Regional Centers will be involved in the MDC1, and the Grid will be esploited as far as possible

End 2002: Computing TDR

The detailed ATLAS computing Model will be described

First half 2003: MDC2

Validate the Computing Model . Test the Grid middleware and infrastructure for distributed production and analysis. Involve all existing RC, which are typically supposed to have reached a size 10% of their 2006 capacity.

Aim at 10⁸events (after Level3).

Plans for ATLAS Computing in Italy

The Italian ATLAS community has agreed on the following basic points:

We aim at implementing a Tier1 RC in Italy for the LHC startup

The baseline hypothesis for the Tier1 implementation relays on manpower outsourcing for system management and operation and h/w housing by "Consorzi di Calcolo"

The protopyting phase 2001-2003 will be based on two sites for Tier1: Milan and Roma1, with leading role for Rome. Some Tier3 (3-7) will be part of the Grid built for ATLAS in Italy , and prototypes for them will be implemented.

For 2003 we aim at reaching a capacity 10% of 2006 size. The global ATLAS capacity in 2006 is now estimated in 1.5 MSpecint95 (1/3 CERN, 2/3 outside); thus we aim at 20K SI95 in 2003. The disk storage needed in 2006 in Italy for is driven by the size and amount of the data and is evaluated in about 200TB: in 2003 we aim at 20TB, which seems coherent with the needs for MDC.

ATLAS-Italy does not a priori refuse the idea of a multi-experiment RC in Italy and would be available for collabrating with other LHC experiment in studying such an option.

The detailed sharing of the work within the ATLAS collaboration will be setup in the next 6 months, in connection with the preparation for the Interim Computing MoU and with the setting up of a final framework of Milestones, however we have agreed between us and within the collaboration for a set of activities covering the period estending till the first half 2002 and for which also exist a reference community of interested people in Italy.

In the following the two main activities of this set, will be described in some detail and the computing resources needed will be quantified.

Plans for ATLAS Trigger Studies

The INFN groups participating in the ATLAS experiment play an essential role in the implementation of the full chain of trigger system, in particular:

the Level-1 muon trigger in the barrel region is under the responsibility of the Roma1, Roma2 and Napoli groups;

the Level-2 muon trigger in the barrel region is being developed by the Roma1 group;

the Event Filter system depends on original and substantial contributions by the Pavia and Roma3 groups;

studies of the overall trigger system performances are coordinated or carried out by INFN physicists.

With these activities, the Italian groups significantly contributed to the "DAQ, HLT and DCS Technical Proposal", published on April 2000, and are engaged to give an analogous contribution to the "DAQ, HLT and DCS Technical Design Report" (TDR), due for second half of 2001.

To complete the studies needed for the TDR, the most relevant tasks and working areas are:

production of single muon simulated events (with and without secondaries) for Level-1 and Level-2 trigger studies. In particular, for the Level-1 the sizes of the coincidence windows in the RPC stations have to be defined for all the nominal trigger thresholds (6, 8, 10, 20, 40 GeV/c in p_T); for the Level-2, Look-Up Tables with the accepted roads for p_T = 6 and 40 Gev/c muons have to be built. Then the resulting trigger efficiencies have to be calculated.

Taking into accounts all the different detector regions that have to be studied, the number of the parameters (like muon p_T , sign, etc…) and their possible values, in order to optimize the filter logic and the reconstruction algorithms, ~10⁸ events are needed, requiring ~3*10⁹ SpecInt95* sec and ~1 TB of storage.

Production of simulated physics events with muons in the final state for acceptance and efficiency studies, also combined with other (mainly calorimetric) triggers:

bb events to study CP violation, in particular analyzing the channels B⁰_d® J/y K⁰_s, B⁰_d®p+p- and B⁰_d® J/y f, and to study B⁰_s mixing analyzing the channel B⁰_s-->D_s- p+. These channels are very interesting, because the presence of a muon allows to select them very efficiently and to define the flavour of the associated neutral meson (muon tag). Moreover, for this kind of physics the study of the bias introduced by the trigger in the measurement of the charge asimmetry is crucial.

H ® 4l: this channel (especially in the case of 4 muons in the final state) constitutes one of the most complete tests of the muon spectrometer performances as regards the high p_T trigger, the geometrical acceptance and the track reconstruction efficiency;

pp ® Z ® mm: the study of this channel, which in the real life will be one of the best direct experimental verification of the goodness of the alignment systems, is very important to understand how accurate the measurement of the muon momentum can be.

These studies will require ~10⁶ events (~10⁴ SI95*sec/event, 1.5 MB/event) for a total of ~10¹⁰ SpecInt95*sec and 1.5 TB of storage.

Simulation of different kinds of background that may affect the track reconstruction or give fake triggers. The complexity of this task is due to the many contributing sources and the great uncertainties on the resulting fluxes of particles through the muon chambers, which impose the use of several security factors (up to 10 times the nominal background estimates). FLUKA+GEANT3 are used to transport the particles in the experimental area and in the apparatus. The fluxes of particles through the trigger and the precision chambers are integrated over the time gates characteristic of these detectors and the corresponding hits and digits are stored as "background events".

The total number of simulated events will be 10⁵; using a conservative security factor, the computing power required to simulate an event reaches 10⁵ SpecInt95*sec and the event size 10 MB. Therefore, the total amount of resources needed will be ~10¹⁰ SpecInt95*sec for the CPU and 1 TB for the storage.

Production of "complete" LHC events, obtained by merging physics and background events. Because of the size of the complete events, the merging is done "on demand", adding the digits of one event of each type; the event is analyzed, then discarded. This operation, in case of conservative security factors, takes >5*10³SpecInt95*sec/event, that for a total of ~10⁶ events gives >5*10⁹ SpecInt95* sec.

The global size of the background dataset to access (see point 3) and the typical size of a complete event may easily raise storage problems (or network problems if the background dataset is remotely located).

Study and prototyping of geographically distributed farms for the Event Filter and evaluation of different hardware and software architectures. This is the end-point of many of the analyses described above, and will either make use of pre-selected samples of events in order to fully assess the capabilities of the on-line selection system, or perform high-level calibration and alignment tasks of the muon spectrometer, using full data sets.

Distributed database, containing all the needed data related to the MDT and RPC chambers. In this case the essential resources will be excellent connectivity and powerful data servers.

Monitoring and control (possibly remote) of the trigger and Event Filter systems. The crucial activity of event reconstruction and selection, which rely on many critical hardware and software components, needs powerful tools and allocated guaranteed resources, that must be considered in every respect parts of the system.

The number of the aforesaid activities and the geographical distribution of the participating groups naturally raise the question of an effective distribution of tasks and resources. A way of facing the many aspects of distributed computing and shared resources may be a MONARC-like architecture, with a Tier-1 center (in Roma1) and several Tier-3 centers with typical dimension of a few percent of the Tier-1. This distribution of resources may properly map the sharing of activities among the various groups and at the same time allow the centralization of common data (like the repositories of background events to merge with physics events).

For fulfilling the trigger TDR milestone 2.5 10¹⁰ SI95*sec are needed: this requires

5000 SI95 available for 4 full months at 50% efficiency.

Some of these activities may start immediately, being the software tools ready. To meet the TDR milestone in the second half of 2001, we estimate at least 1000 SI95 and 1 TB disk should be available as soon as possible, and in any case in year 2000.

These studies will continue in the following years with a better integration with the other detectors in the new ATLAS software framework and in the GEANT4 simulation environment.

Plans for ATLAS Jet Simulation Studies

The Atlas experiment will perform studies on many different signal channels with small cross-sections (of the order of pb or smaller), as e.g the S.M. Higgs search via the WH and ttH (H® bb) channels, the supersymmetric Higgs bosons search A/H® tt , the precise measurement of the t lifetime via the W® tn decay, etc. All these channels have a common possible huge background source in the QCD jets, whose cross-section is many orders of magnitude larger. In order to understand the ATLAS potential in these measurements, the background rejection must be well known. Considering the enormous background to signal cross-section ratio, significant results ask for the simulation of 10⁸ QCD jets. The Level1 2-jet trigger , which could be emulated before Geant tracking at the particle level, accepts ~2% of the QCD jet events: as a consequence the Geant tracking of 10⁶ QCD jet events accepted by the Levl1 2-jets trigger must be foreseen.

We estimate this production as a very useful Use Case for the first (2001) prototype of the Atlas Regional Center in the Grid environment, well time-fitting with the ATLAS Geant4 release: the collaboration has scheduled the first Geant4 release of its complete apparatus for the end of the 2000. After the validation studies the real production will start in the second half of 2001 after the short MDC0 and will have to be completed in no more than 6 months ( possibly less) as participation to the simulation part of the MDC1.

We do not know at the moment the Geant4 time performance with respect to the Geant3. In the following the CPU time needed for the simulation is estimated using the well-known Geant3 value: a factor of 2 (in both the sense) must be considered as possible.

GEANT4 Simulation of QCD 2-Jet Events

² Geant3 CPU time per event: 10000 s/specint95

² 10⁶ events: 10 * 10⁹ s/specint95

² 100 (20 SPECint95) PC’s : 5*10⁶ sec/PC

² duty cycle: 50%

Þ needed time for the Geant production: ~120 days

Þ needed storage: 1 TB ( ~1Mb/ev)

Signal events:

A few tens of thousands of events in the above listed physics channels must be produced:

Þ needed time for the Geant production: small with respect to the QCD 2-jets production (~10%)

Þ needed storage: 50 GB

Mininum–bias events:

These events, needed for the high-luminosity pile-up to the signal events, will not be Geant-tracked in the Cilea Regional Center prototype, but mass-storage must be foreseen for a fast access

Þ needed storage: 50 GB

Event Reconstruction:

Both the CPU time and the storage needed for the reconstruction (generally "seed" driven) is small with respect to the simulation requirements, of the order a few percents.

Some significant extra CPU time charge can come from the complete (not "seed" driven) reconstruction of the simulated Higgs events at high luminosity, which anyway will represent a negligible fraction of the production.

CMS

The present document is subject to approval by INFN Scientific Committees as well as by INFN Management. Final version of this document may have substantial differences from the present version, however it describes the current CMS Italy proposed plans.

Introduction

CMS is currently planning the Computing System that will allow the Collaboration to manage and analyze the real and simulated data. The plan has to describe three different (but strongly related) phases:

Preparation, via R&D activities, of the Computing System (years 2000-2003)

Implementation of the Final System (years 2003-2005)

Running of the System (years 2005->)

Milestones for these phases have already been set as follows:

Data challenge: ~1% in 2000 (ORCA), ~5% in 2002, ~20% in 2004.

HLT studies: 2000, 2001, and 2002.

2001: DAQ TDR

2002: SW/Computing TDR

2003: Physics TDR (5% data challenge results and 10% scaling demonstration)

2004: Stress test (20% data challenge)

The above milestones require the availability of computational and data storage resources where the CMS software is run and used by the Physicists. Developing the software has to face both the complexity of the algorithms imposed by the CMS apparatus (a challenge in itself) and the distribution of the resources.

Software and Computing is a "Subdetector" for CMS Collaboration since many years, and as such it is managed. However it should be stressed that the nature of software and computing will embrace all the CMS participating Physicists, leading to a specific "over-detector" nature of the project.

The Computing Model that CMS is developing is based on MONARC studies, as well as on CMS own evaluations. The key elements of the architecture are based on a hierarchy of computing resources and functionalities (Tier0, Tier1, Tier2, etc.), on a hierarchy of data samples (RAW, ESD, AOD, TAG, DPD) and on a hierarchy of applications (reconstruction, re-processing, selection, analysis) both for real and simulated data.

The CMS Computing Design requires a distribution of the applications and of the data among the distributed resources. In this respect the correlation with Grid initiatives is clearly very strong. Grid-like applications are such applications that are run over the wide area network, i.e. they include Regional Centres (Tier-n) using several computing and data resources with network links over the wide area network, where each of the single computing resource can itself be a network of computers.

An example of the use of the distributed hierarchies follows.

First reconstruction of RAW data is performed at Tier0 (CERN) under the control of the Collaboration and produces the ESD, AOD and TAGS data stored at CERN.

Reconstructed data (ESD, AOD and TAG) are transferred to the Tier1 Centres.

Production of simulated data is executed at the Tier-n Centres under the control of the Analysis Groups or the Collaboration.

Re-processing can be done two times a year (in the beginning phases) at CERN and once at Tier-1 Centres in a coordinated way.

Selection can be done initially at CERN for all the Analysis Groups and then can easily migrate to Tier-1 Centres end eventually later to Tier-2 Centers for specific channels.

Analysis is performed initially at Tier-1 Centres (including CERN) and can progressively migrate to Tier-2 and lower hierarchy Centers as the Analysis Process becomes more stable.

The current estimates for the CMS Collaboration of the totally required resources circa 2006 is shown in the following table.

(100% efficiency, no AMS overhead)	Activity	Frequency	Response time/pass	Total CPU Power (SI95)	Total Disk Storage (TB)	Disk I/O (MB/sec)
Reconstruction	Experiment	Once/Year	100 Days	116k	200	500
Simulation + Reconstruction	Experiment /Group	~106 events /Day	~300 Days	100k	150	10
Re-processing	Experiment	3 times/Year	2 Month	190k	200	300
Re-definition (AOD &TAG)	Experiment	Once/Month	10 Days	1k	11	12000
Selection	Groups (20)	Once/Month	1 Day	3k	20	1200
Analysis (AOD, TAG & DPD)	Individuals (500)	4 Times/Day	4 Hours	1050k	?	7500
Analysis (ESD 1%)	Individuals (500)	4 Times/Day	4 Hours	52k
Total utilized Total installed				~1400k ~1700k	~580 +x ~700 +y

In order to meet the milestones shown at the beginning of this Section, CMS has a two-fold approach.

The HLT (High Level Trigger) studies and the simulation of the required events to prove the DAQ and Physics TDR have to be run during the first (and eventually the second) phase of the CMS Computing plan. Indeed the real applications of simulation, reconstruction and analysis for the Trigger studies have already been started by CMS during this year (2000). CMS has the software (even if in a developing phase) and the measure of the resource load (even if preliminary) necessary for the execution of these tasks in the coming years.

The implementation of the Tier-n architecture has to go through a controlled growth during the coming years (first two phases) to test and prove the functionality (and flexibility) of the Design. This process will lead CMS to a running system from the very beginning of LHC data taking.

These two approaches can well be performed in parallel and, indeed have to proceed in a coordinated way. Building the Tier-n Centres allow the possibility of having the necessary resources for the simulation and analysis.

This scenario can certainly benefit from a Grid development; moreover it requires Grid-like tools to be implemented.

Objectives

The Italian CMS community is actively involved in all the software and computing initiatives. The role of INFN in this "subdetector" development and commitment is as high as it is for the "signal-sensitive" detectors of CMS.

The final goal of the "Computing" detector is to provide the efficient means for the analysis and potential physics discovery to all the CMS Physicists.

As has been said in the Introduction, there are three main phases of the project. Details of those phases are given below, with particular attention to the INFN contribution and involvement. Requirements for the nearest term phases are of course more defined, whilst the other phases have an increasing range of uncertainties (nevertheless within a reasonable constraint), being projected in a "technology far" future.

Note: The CMS software project and development is only partially accounted in this document.

First Phase: developing the Computing System (years 2000-2003)

The (at least) two approaches to this phase can be identified as a "bottom-up" and "top-down" view.

"Bottom-up" means that the immediate (or few years term) needs of the Collaboration's Physicists are taken into account as the driving issue.

"Top-down" means that the longer (or starting year of LHC) term needs of the Collaboration as a whole are taken into account as the driving issue.

The Bottom-up view.

CMS is already exploiting now the simulation and reconstruction software for the off-line analysis of the Trigger and Physical channels. For the nearest time goals the deep study and optimization of the Trigger algorithms is the driving effort. Simulation of the behavior of the Level1 trigger and the simulation and study of the algorithms to be implemented on the on-line farm of commercial processors has to be done according to the settled milestones. The High Level Trigger algorithms (Level2/3 Trigger) in CMS are implemented via "pure software", being the evaluation and the definition of such algorithms driven by the Physics channels to be analyzed (and therefore recorded for subsequent Trigger-tagged streams analyses).

The strong relation between the Physical signature by the detector and the background suppression that can be afforded by the HLTrigger algorithm is considered as one of the major capabilities of the CMS Detector to be committed. The Trigger will define what can be observed and what the LHC Experiments will forever lose.

The rejection factor that the CMS Trigger has to provide against the background reactions is as large as 10⁶. The High level Trigger (level2/3) have to provide a rejection factor of the order of 10³ in the shortest time possible and with the larger possible efficiency without depressing the searched signal.

These processes have to be carefully simulated and studied both physically (generation processes) and detector performance (simulation and tracing through the being built apparatus).

The Physics channels events expected from many different Physics processes by the proton-proton collision at 14 TeV are already on the way for CMS and a data sample of the order of 10⁷ events per studied channel has to be simulated, reconstructed and trigger-algorithm tested against the many possible background processes.

This process of simulation and study has a planned schedule that will improve the statistical accuracy during the next three years.

The percentage above matches well the INFN total effort for the CMS Program.

Level1 Trigger is hardware implemented (custom ASICs) in CMS and is flooded at the rate of 40 MHz per bunch crossing taking the responsibility to reduce the output rate to the level of 75kHz. High (2/3) level trigger has the responsibility to further reduce the input 75kHz down to the level of 100Hz, which is the maximum allowed for data recording. Raw data output from the CMS detector will be of the order of 1 MByte per event thus giving about 100Mbytes per second recording rate.

A Physics example of such L1+HLT study is here given to better explain the project.

Let us consider the case of a muonic trigger. The most demanding physics case in terms of CPU cost to study at the required level of statistical accuracy (10%) the needed background rejection power is the single muon topology, eventually accompanied by other triggering signatures (missing E_T, high p_T jets, energetic electrons or photons). Physical signal channels relevant to this topology are, e.g., ‘associated’ standard Higgs productions like WH->mnbb, ttbarH->mnXbb, mnXgg and Susy Higgs channels like A,H,h ->tt -> mnpX, all of them requiring single muons with p_T^m as low as 15-20 GeV. Although the final trigger ‘cocktail’ (composition of the different CMS triggers) to cover the overall 100 Hz rate foreseen for the writing of the events to the mass storage is of course not yet defined at this stage, an allowed rate of about 10-20 Hz for this topology appears to be a reasonable guess. The Higgs boson search through this single muon signature in the final state require the generation, the simulation through the detector, the reconstruction of the digitized signals and the analysis of the resulting data of many of the possible (with a very larger cross-section) obscuring processes aside from the signal itself.

The background rates expected for muons (mainly originating in QCD jets either from b,c decays, "prompt muons", or from p/K decays in flight, "non prompt" muons) surviving the "physical cuts" due to the magnetic field and the material in the inner part of the apparatus and thus delivering signals in the muon trigger system of CMS, is expected to be about 2 MHz. The bulk of this rate comes from low pT muons (3-5 GeV/c; the rate for p_T^m>15 GeV/c being much lower, of the order of few KHz). However, the effect of the detector resolution at the 1^st and higher trigger levels must be carefully studied (in particular, the effect of non gaussian tails due to momentum mis-measurement for non-prompt muons (about 60% of the total muon yield) together with ghost creation mechanisms. The whole L1+HLT trigger chain must then provide a rejection factor of more than 10⁵, requiring the full simulation of 10⁷ events in this topology. Also, the important role of physical correlations between muonic and calorimetric signatures in the background QCD jets prevents the possibility to further reduce at generator level the amount of data to be fully simulated, in order to give reliable estimates of the rates in single muon topologies when energy isolation criteria are applied to the muon and/or backgrounds to "mixed" trigger topologies (e.g. muon-electrons, muon-jets) are considered. (It must be stressed that in this study the events are anyway preselected and properly weighted at generator level by the requirement of the presence of (at least) one ‘potentially’ triggering muon; this save a factor 400 in CPU with respect to the straightforward generation of minimum bias events, occurring at a rate of 800 MHz (at full LHC luminosity)).

In a second phase, "dedicated", preselected (by additional requirements at generation) samples of muon-electron/gammas, muon-high pT jets, muon-high missing E_T topologies, each one of the order of 10⁶ events, can be envisaged.

Di-muons topologies (e.g. relevant for H->WW 2m2n, H ->ZZ->m+m-jj signals) have background rates typically smaller by 2 order of magnitudes ( jets -> 2m X, pile-up of 2 single-muon events, associated production of W/Z boson pairs, ttbar production), thus background samples of 10⁵ events will suffice.

Similar considerations can be applied to other trigger topologies driven by electron/photons and jets/missing E_T signatures; although partial overlapping of data samples can be exploited at some stage of the analysis, it is clear that, given the similar reduction factors required also in these channels, initial ‘unbiased’ samples of 10⁷ events in the electron and jets topologies must also be considered. This leads the total data sample needed for CMS trigger studies in the range 3-5 10⁷ events.

The current CPU power required per event simulation through the CMS detector is of the order of 2000 SpecInt95*sec (100 sec/evt on a today PIII CPU), with a mean dimension of the output event of the order of 1 MByte. The digitization in muonic and calorimetric sub-detectors, the simulation of the trigger algorithms and the off-line reconstruction require (today) a similar amount of CPU power.

Further tracker digitization and reconstruction (in presence of pile-up) is much longer, but it is supposed to be done either locally ("regional trigger") or anyway on pre-filtered (by L1+L2 triggers) sub-samples of data (about 10%). A conservative factor 2 in the CPU requirement for reconstruction is to be attached to this phase of the event processing, leading, on average, to a total of 6000 Specint*sec/event.

These values have to be compared with the current MONARC evaluation for a "Full Monte Carlo" event production that is about 5000 SpecInt95*sec, and a reconstruction power of the order of 1000 SpecInt95*sec/event.

Being the Physics channels to be "triggered" really a large number, as previously said CMS Collaboration has to simulate and study early 2001 of the order of 5*10⁷ events. This estimate leads CMS to a necessary CPU power of the order of 3*10⁴ SpecInt95 and about 50 TBytes of mass (disk) storage.

Thus, assuming INFN will contribute by 20% to these resources, about 6000 SI95 and 20 TB disk space should be installed in Italian sites by year 2001.

During year 2002 a factor two increase in the complexity of the study is foreseen; the main emphasis here will be on prototyping off-line analysis, in preparation of the Physics TDR foreseen for the following year. A further background rejection factor of 4-5 orders of magnitudes will be required to pass from the 100 MHz output frequency of the on-line system to the rare signal signatures like ,e.g. H -> gg or H -> 4 leptons. Here the main background to be copiously generated is the production of heavy objects (top quark,W,Z bosons) delivering signatures supposed to pass the on-line HLT selections.

Another factor two can be expected for the following year 2003, at the end of which is expected the Physics TDR that will give the quasi-final power for Physics discovery of CMS.

Here the full spectra of possible new physics signatures (Susy Higgs, superpartners of ordinary matter) must be studied against all possible sources of background. The analyses will considerably differentiate, increasing their degree of sophistication (e.g. implementing t/b tags in the final environment) and involving an increasing number of physicists. Studies dedicated to heavy flavor physics (e.g. CP violation in the B meson sector) in the low luminosity scenario (which will deal with considerably lower thresholds, e.g. for J/psi -> m+m- detection) will also require non-negligible resources in terms of CPU power and mass storage.

The factor "only" two of increase is the result of the CMS confidence that we will learn how to increase the usage and efficiency of the resources and that the understanding of different sub-detectors integration will be improved. Indeed the statistical requirements and the complexity of the studies increase at least of an order of magnitude during the three years, nevertheless we are confident that an order of 4 increase of resources will allow CMS to reach its goals (accounting for better knowledge and efficiency).

CMS Italy is of the order of 20% of the whole Collaboration, if we take into account the major contributors. Moreover the Italian Researchers are involved in the main CMS detectors (Tracker, muons, e/gamma calorimeter) with a "leading" role. CMS INFN is also well involved in the HLT studies and in the Physics studies; these elements have to be taken into account while planning the resources allocation and availability.

The "Top-down" view.

The CMS "hardware" milestones are shown in the following table.

CENTRAL SYSTEMS	1997	1998	1999	2000	2001	2002	2003	2004	2005
Functional prototype						A
Turn on initial systems							A
Fully operational system									A
REGIONAL CENTRES	1997	1998	1999	2000	2001	2002	2003	2004	2005
Identify initial candidates				A
Turn on functional centres						A
Fully operational centres									A

The planned schedule take into account the final goal of having the "full" CMS Computing System running at the beginning of LHC data taking. In order to be "ready" the different Tier-n Centres have to be prototyped well in advance. The prototyping has to go through a progressive implementation of the hardware and also of the "functionalities" required to manage the resources in a coordinated way.

To guarantee the managing and the functionalities, the distributed hierarchical Model has to be tested and developed synchronously in all the involved sites.

The dimensions of the total hardware requirements for CMS Computing have already been presented in the Introduction Section.

The final Design of the Computing System foreseen for the year 2005 is based on a distributed hierarchical Model of Tier-n Centres "a la MONARC". Namely CMS Model requires one Tier-0 Centre (at CERN), some 5-6 Tier1 Centres (candidates already identified are FNAL/US, Rutherford/UK, INFN/IT, IN2P3/FR, Moscow/RU, etc.). Many of the Collaboration Institutes have already plans for the funding and commitment of the Tier1 Centres and in addition there is a quite large number of Tier-2 candidates connected to each Tier-1 (a number of Tier-2 Centres ranging from 2 to 6 per Tier-1 Centre).

Apart from the first reconstruction and selection to be performed at CERN, it's clear from the table in the Introduction that the most demanding task is the Analysis. In the CMS Model this task is mainly performed at Tier-1 and Tier-2 Centres.

It's worthwhile to remember the Tier-1 and Tier-2 roles as from MONARC, even if some of the details can be different for CMS.

Tier-1 Centres are defined as to provide all the technical services, all the data services needed for the analysis and preferably another class of data services (Monte Carlo production or data reprocessing, not necessarily from raw data).

The aggregated resources of all Tier1s should be comparable to CERN resources, thus providing the first 1/3 of outside resources of the "1/3-2/3 rule" (being the second 1/3 provided by the Tier2-3 Centres); we expect that there will be between 5 and 10 Tier1s supporting each experiment. As a consequence a Tier1 should provide resources to the experiment in the range 10 to 20% of CERN (although the functionality provided by automatic mass storage systems might be provided differently).

Furthermore, a Tier1 must be capable of coping with the experiments' needs as they increase with time, evolving in resource availability and use of technology.

A Tier-2 Centre should provide most of the technical services, but not at the same level as a Tier-1. Data services for analysis will be certainly the major activity in such a centre, while other data services might not be available.

A Tier-2 is similar to a Tier-1, but on a smaller scale; its services will be more focused on data analysis and it can be seen as "satellite" of a Tier-1 with which it exchanges data. A Tier-2 should have resources for LHC computing in the range 5 to 25 % of a Tier-1, or about 2% of the CERN Tier-0 Centre.

Services are defined as of MONARC as follows:

Data Services

(re)processing of data through the official reconstruction program [requires CPU, storage, bookkeeping, and software support]

generation of events [requires CPU and storage, bookkeeping, and software support]

simulation of events [requires a large amount of CPU, storage, bookkeeping, and software support]

reconstruction of MC events [see point 1]

insertion of data into the database

creation of the official ESD/AOD/Tags

updating of the official ESD/AOD/Tags under new conditions

ESD/AOD/Tag access (possibly with added layers of functionality)

data archival/retrieval for all formats [including media replication, tape copying]

data import/export between different Tiers (see below)

bookkeeping (includes format/content definition, relation with the data base)

Technical Services

database maintenance (including backup, recovery, installation of new versions, monitoring and policing)

basic and experiment-specific software maintenance (backup, updating, and installation)

support for experiment-specific software development

production of tools for data services

production and maintenance of documentation (including Web pages)

storage management (disks, tapes, distributed file systems if applicable)

CPU usage monitoring and policing

database access monitoring and policing

I/O usage monitoring and policing

network maintenance (as appropriate)

support of large bandwidth for data transport

consulting and training support

The dimension of Tier hierarchy scales with a factor of the order of 20% (e.g. Tier-1 Centre is about 20% of Tier-0 Centre, and so on). These rules together with the planned Tier-n Centres numbers, make it possible to fulfill the "tacitly accepted" rule of 1/3 Computing resources at CERN and the 2/3 outside. It should be noticed that in addition to the majority of the Analysis tasks, the bulk of Monte Carlo simulation would be performed outside CERN (at Tier-n, n>0).

CMS estimates (Computing Review currently in progress at CERN, World Wide Computing Panel) show that a CMS Tier-0 needs more that 600kSI95 CPU power, more than 550 TBytes disk storage by year 2006 and more than 2 PBytes sequential storage per year of run.

The data challenges foreseen in the Milestones and the need to synchronously grow the CMS distributed resources will require an implementation of the order of the 10% for the Tier-n by year 2003 (turn on of functional Centres and preparation for the phase of full functionality).

A Tier-1 Centre by the year 2005-6 (full run of LHC) for CMS has a dimension shown in the following table (only some of the key elements are presented):

CPU Power	> 150kSI95
Disk space	> 200 Tbytes
"Tape" capacity	~ 600 Tbytes
Link speed to each Tier-2	> 100 MBytes/sec

This equipment has to be doubled, except for the "tape" component, when considering a National commitment, including at least the Tier-2 Centres.

The CMS INFN Collaboration is committed to have a Tier-1 Centre (with an architecture that will be described in the following of the document) and some 4-6 Tier-2 Centres (and also a smaller number of Tier-3 Centres). A reasonable plan to implement, during year 2003, the 10% of resources and functionality of the final Tiers is proposed as follow: 20% of the 10% prototypes acquired during year 2001, a 30% added during year 2002 and the remaining 50% added during year 2003.

It is intention of the Italian Researchers in this phase to exploit and study the possibility of the "disk-only" option for the mass storage of the Tier1-2 systems; in this scheme, the disk and tape requirements of the above table must be considered together.

The proposed growth rate is a bit more demanding when compared with other Countries, but it has to be noticed that ALL other Countries committed for Tier-1 Centres have already a host Site of large dimension and experience that is traditionally behaving as a reference Computing Centre (e.g. Lyon Computing Centre for In2P3, Fermilab Computing Centre and Rutherford Appleton Laboratory).

CMS Italy has to compete with those implementations and programs with its own program and specific tradition of distributed resources (the INFN Sections are spread over Italy and have a long experience in coordination and distribution of duties/resources).

During the first phase (2001-2003) CMS Italy will therefore implement and commit prototypes of Tier-1 and Tier-2 Centres, in strong collaboration with the other CMS Institutions.

The dimension of the needed resources for CMS Italy to prototype in the coming years the Computing detector are extremely well tuned with the Physics goals needs of the previous sub-section. No one of the resources made available for prototyping the Tier-n architecture will be wasted, as they will be overloaded by the simulation and analysis for Physics studies.

A specific issue of the Tier-1 Centres is the sequential storage, currently under a wide discussion in the HEP community. CMS Italy is well involved in the discussion and the final choices will have to be tested, also according to final decision on the Tier-1 architecture (see below again).

Comparison of the two approaches.

The two approaches described above are part of the same overall design and can lead CMS Italy to fully participate to the CMS Computing challenge.

The prototyping of Tier-n Centres and the simulation/analysis of the Triggered physical/background events are in the Designed Scenario for CMS Computing, and therefore both benefit from each other.

The two approaches are indeed the same synchronous approach to the development of resources and coordinated functionality's. These activities have to face the large and difficult distribution of resources over WAN/LAN connections and the flexibility/robustness of the gluing applications.

Most of the last concepts are part of the Grid (and GLOBUS) project, and the actual implementation has to go through study, development and test of the available tools.

CMS Italy is willing to test the Tools of the Grid with REAL Applications (from HLT studies, to ORCA running, to analysis of Trigger algorithms, to coordinated development of Farms and Tier-n) and to use the Grid products in a REAL WORLD.

Timing and dimension of the test will be agreed between CMS Researchers and Grid developers, setting the priorities and having the CMS Milestone's deliverables as the leading project.

Whatever the status of the Grid development will be, the CMS production and deliverables will have to be honored.

The scheduled Data challenge and the HLT/Physics studies are fundamental objectives for CMS and therefore these results will have all the possible support from the Italian CMS Collaboration.

The all-possible available resources will be used for this purpose, whatever the status of tools development will be.

Grid projects development (INFN-Grid and EU-Grid "DATAGrid") will be pursued in strong relation with CMS activities.

CMS people are actively involved in such initiatives and the relation between CMS timely needs and Grid-like tools development can be defined in two possible classes of use cases.

Distributed production of simulated events with the help of Grid tools;

Data access via Grid-like tools to either traditionally produced data or "a la" Grid produced data.

A possible scheme of the common deliverables is as follows:

2001

–Tier-n first implementation

–Network connection

–Main centers in the Grid

–HLT studies as of 106-107 events/channel

–Coordination of OO databases

–Network tests on "ad hoc" links and "data mover" tests

–All disk option storage

2002

–Tier1 dimension is doubled

–Tier0 Centre is connected (Network and first functionality's)

–Grid extension including the EU project

–Physics studies in preparation of the TDR and 5% data challenge for Computing TDR

–OO distributed databases (remote access)

–Performance Tests

–Mass storage studies

2003

–Infrastructure completion

–Functionality tests and CMS Integration

–Final studies for Physics TDR and preparation for 2004 data challenge

–Data challenge

–Full integration with CMS Analysis and Production

Second Phase: implementation of the Final CMS Computing System (years 2003-2005)

Third Phase: running of the CMS Computing System and Upgrades (years 2005à )

Infrastructure

CMS Italy is already committed to the EU-Grid project with a qualified contribution to many of the Work Packages. The commitment to the project is part of a more general plan of development and prototyping for Regional Centres in Italy.

Indeed Italian CMS Researchers have already started to build a distributed architecture of resources via the HLT production and studies during the current year (2000). INFN CNS1 has approved and funded for 2000 a few small Computing and storage systems in some Sites. Those systems are going to be delivered and a plan to use them in a national coordinated way is already on progress. The February/March simulation production for Muon HLT have been performed in a distributed, but standalone, way using the locally available resources and the more general national INFN Condor Pool; more specific tools for management and job scheduling are needed for the second round of HLT production foreseen for Fall 2000.

All the major Italian CMS sites are under the process of creating, managing and using a local Objectivity Database. These local databases have to be widely coordinated to guarantee interchange of data between the Italian Sites and the remaining of the CMS Collaboration.

The HLT studies and the Data challenges are clear examples of HEP applications where a distributed Computing and distributed data access are of fundamental importance. The Grid concept well matches the above requirements and, even if it might not be the only solution, canseriously help the CMS Computing Design and Realization.

CMS Italy has already decided (end of 1999) to go for prototypes of Regional Centres, prototypes that will allow the INFN Researches to test the feasibility of the Infrastructure and the coordination with the whole CMS Collaboration. As said in the previous Section of this document, the progressive growth of the prototype infrastructure will also give the means to perform the HLT studies and the moch data challenges.

The planned activity foresees to start to build the prototypes since the beginning of 2001 and to end by the 2003, when a final decision to go for "Fully operational Centres" has to be made.

Flexibility of the prototyping design and of its implementation is the major issue of the plan,which is conceived to drive the growth in strict respect of the CMS requirements andto take into account promptly the outcome of the trials.

Initial CMS Italy plan foresees high hierarchy prototyping, distributed among the Padova area (Laboratori Nazionali di Legnaro, leading site, and Padova INFN Section), the Bari area (Bari INFN Section), the Bologna area (with a contribution also of CNAF as a common INFN experiments' support, beside the Bologna INFN Section), the Pisa area (Pisa INFN Section) and the Rome area (Roma1 INFN Section).

The distributed infrastructure has also to test and prototype lower level architecture and functions, in the same global design of CMS distributed hierarchical Computing. This approach is necessary to prove the feasibility of the CMS Computing Model, but is also necessary to give enough flexibility to the process of evaluation of results: it will allowthe prompt rethinking of the strategy in case of resources unavailability or of new possibilities.

Initial Sites are already identified also for the smaller Centres in the INFN Sections of: Catania, Firenze, Perugia and Torino.

It has to be stressed that all the plans will undergo a periodical revision (milestones) taking into account the result obtained (globally and per site) as well as the possible availability of other resources also in common with other INFN Experiments.

Again, the present document describes an R&D project that will allow CMS Italy (and INFN) to scientifically take decisions about the CMS Computing and to significantly contribute to the preparation of Physics studies.

In the following the details of the infrastructure components are given, with a preliminary distribution of planned resources.

Network

WAN and LAN infrastructure is the glue of all the proposed Computing Models. Network "is the System" according to real distributed computing and data access: the base concept of the Grid computing approach.

All the proposed LHC Computing architecture requires a fast, robust, efficient and guaranteed Network (both locally and remotely).

The hierarchical MONARC Model is strongly based on the WAN connection between Centres and the LAN connection between nodes (CPU and storage) of the Farms.

LAN architecture is an issue by itself, as can be argued from the CMS requirements shown in the table of Introduction. However the technology is there or is about to be there provided that FARM R&D is performed seriously. What is meant here is that a hierarchical architecture of Farm nodes has to be studied and tested, together with a strong development of the managing tools. Flows of data and of applications have to be carefully planned and controlled, taking into account internal (to the Farm) and external (Fabric) constraints.

A very simple example is that a single disk I/O capacity of 10MBytes/sec (today performance, indeed) is enough for 2005 Farm, but only if no more than a single job (or a maximum of few sub-job) is accessing it. This granularity of the Farm design is hard to be guarantee against failures, with high efficiency and availability. The very high Switch of the Farm hierarchy may require a switching capacity (hundreds of Gbps switching capacity per port) well above the optimistic technology projections.

CMS Computing Plans foresee inter-Country connections of the order of the Gbps (e.g. transatlantic connections) and better performance intra-Country connections.

The reason of such a bandwidth speed (not the only technical requirement for this connections since also RTT, latency, BER, mean and peak packet loss, QoS guaranteed are some of the other "variables" to be considered) rely on two major requirements: the first is the Analysis Design Model for CMS that requires transfer of data access and application movement between Regional Centres; the second is that the Network will be essential for all the activities of the Experiment, from the Information System to the mailing system, to the videoconference system, to the distributed software development and maintenance, etc.

First Phase: developing the Computing System (years 2000-2003)

The planned capacity of WAN between Regional Centres prototypes is expected as follows

(Mbps)	2001	2002	2003
Wan Link to Tier0 (CERN)	155	622	1000
WAN link between Tier-1 sites	34	155	1000
WAN link between Tier-1 and Tier-2 sites	34	155	155?

The Tier-1 prototype should be a single "logical" unit, independently of its own internal structure.

The proposed increase during the years of the project will be verified against the real bandwidth requirements of the Tier-n prototyping and of the real activity performed at the different sites. Responsibilities and functionality's development will be here again the driving criterion.

Tier-1 to Tier-2 connections will allow for transfer of analysis datasets in a reasonable time and also the transfer of specific samples of quite large dimension for special purposes (e.g. calibrations, background tests, test beam data).

Tier-1 to Tier-0 connection will allow CMS Italy to participate to the whole CMS Computing program and to the EU-Grid project tests. Specific example of the use of this connection is the replication of parts of the Objectivity Database in both directions.

All the proposed links will contribute to the development of the INFN-Grid project.

An example of network load is that a 10 GBytes sample of AOD objects can be transferred for specific Analysis to a Tier-2 Centre in about one hour in 2001.

Another example is the transfer during one night (2001) of a subset of the order of 1% of the ESD objects for specific reconstruction tests.

Local commodity farm nodes are today typically interconnected via 100 Mbps ethernet switches with Gbps uplinks to the commodity data servers. General purpose networked file systems (e.g. NFS) or specialized client/server applications like in the case of DBMS (e.g. AMS for objectivity) are used to share data archives between the farm nodes and the data servers. As typical I/O bandwidth in these cases is in the order of 5 Mbyte/s per farm node and of 40 Mbyte/s per data server, we can argue the main bandwidth limitations are not in the speed link itself, but on the not yet optimized network protocols wasting the PC memory bandwidth (too many memory copies). Ethernet speed link will grow up easily to 10 Gbps within next year, but both an increased PC memory bandwidth and a serious step forward in the communication software are also needed to fully exploit, at the application level, the available network bandwidths. Storage Area Networks (SAN) are also used to share disks between farm nodes (e.g. fiber channel disk arrays). Also in this case the hardware has proved to run according to the requirements, but the provided software functionality is still quite far from the one required.

Second Phase: implementation of the Final CMS Computing System (years 2003-2005)

Third Phase: running of the CMS Computing System and Upgrades (years 2005à )

Hardware

Hardware is traditionally considered as the most demanding issue of a Computing plan. However the new Architecture Design of Modern computing (distributed or eventually "a la Grid") has many other demanding components: hardware components are already today just a matter of best performance/price choice. This consideration does not mean that Hardware choices will be negligible, it just fixes that hardware is ONE of the components of the "puzzle".

The main items related to hardware are the CPUs, the "on-line" random access storage (disks), the "archive" sequential storage (tapes or tape-like support/devices), the connecting elements of CPU and storage boxes (switches etc.), the outside World connecting boxes (routers etc.), the users' access means (PCs or "terminals") and finally the Operating System to be used. The last issue may look "not hardware" as itself, but really the hardware choice is already today strongly related to the OS to be used, and the Market will overstress this aspect in the future.

First Phase: developing the Computing System (years 2000-2003)

During this prototyping phase the hardware installation has to progressively increase in order to allow both the increasing demand for simulation and analysis and the test of Tier-1 and Tier-2 prototypes.

As previously said, Italian CMS is about 20% of the whole CMS Collaboration. Having in mind an installation by year 2003 of the order of 10% of final one, it's necessary to acquire about 32000 SI95 of CPU power and about 80 TBytes of disk space.

These two figures come from the evaluations discussed in the Objective Section. It should be made clear that both evaluations lead to the same hardware needs. The bottom-up approach foresees a duplication of resources per year even if the simulation and analysis load is increasing of an order of magnitude during the three years. A learning of the system and a better efficiency are included in the evaluation thus matching the requirements.

The top-down approach has a growing plan of 20%, 30% and 50% installation in the three years, with similar results.

The disk space required is about a factor two larger than what can be argued from the numbers shown before. At least three considerations support the disk increase:

No "active" tapes (only "write once, read never") are used in the Italian prototyping.

Larger CMS estimates of data samples in comparison to what MONARC is proposing (average of the 4 LHC experiments).

Analysis and software development storage will also be necessary (e.g. storage of the output of different digitization and/of clusterization algorithms on the same simulated data, special data for calibration and alignment tasks...), not only datasets storage and archiving.

The requested hardware is therefore of 8000 SI95 CPU power and 20 TBytes disk space by year 2001, 8000 SI95 CPU power and 20 TBytes of disk to be added during year 2002, 16000 SI95 CPU power and 40 TBytes of disk during year 2003. Setting up the initial System naturally implies a low efficiency and late delivery, thus leading to a few percent more demanding resources that what were optimistically evaluated. The table that follows in this section takes into account this sort of contingency (or better a sort of spin-off startup).

LAN switches to connect the CPUs boxes and the disk boxes, as well as the WAN routers and switches have to be planned for the three years. The WAN equipment is foreseen only during the last year, being possible to use the existing equipment in most of the INFN Sections during the prototyping phase of the first two years. One LAN unit per participating site is foreseen per year, with a moderator for the smaller implementations.

Tape libraries are needed for "backups". The CMS INFN Model does not foresee for prototypes the "active" use of sequential storage, however some permanent storage media should be provided. The tape libraries will be dimensioned to the amount of disk storage locally installed, starting the first year with only one installation in a single site, and doubling the sites per year. Because of the evolution of sequential storage market, the INFN prototyping will have to look at new technologies (DVD disks for ex.) and compare them with the traditional tape libraries.

The following table summarizes the hardware needs for CMS Italy for the first Phase. (The integral of requested resources are given during the three years)

Resources \ year	2001	2002	2003
Total CPU (SI95)	8000	16000	32000
Total Mass storage (TB)	12	40	80
LAN & WAN equipment (units)	7	13	20
Tape Library (units)	1	3	7

Second Phase: implementation of the Final CMS Computing System (years 2003-2005)

Third Phase: running of the CMS Computing System and Upgrades (years 2005à )

Software

Software is of course another element of the CMS Computing project. The middleware "software" is described in the following sub-section; here is described the "infrastructure" part of what can be called "traditional" software.

There are two main components of this software: the HEP (and CMS) specific software and the "commercial" needed (support) software.

The CMS and HEP specific software have to be developed and maintained in all the INFN Sections, being that the base for Analysis tasks.

Some of the parts of the specific sub-detector software will be developed and maintained by the INFN Sections being involved in the corresponding sub-detectors. However the role of Regional Centres account also for software development and maintenance and therefore Tier-1 and Tier-2 prototypes will have to take care of the overall (and also specific) CMS software problems.

It should be remembered that final implementation of the Regional Centres (Tier1 in particular) has to provide, according to MONARC, support for the technical services and the data services for analysis. In particular two items can be recalled: basic and experiment-specific software maintenance, and support for experiment-specific software development.

Commercial (support) software is a wide container for many needed items. Some of them are: Operating Systems (with the timely needed patches and version), compilers at the required release (and patches), AFS, CVS, Objectivity, software tools for development and maintenance, browsers, Web servers, videoconference tools and allocation system, backup products, scientific libraries, office tools, etc.

Apart from the Personnel and the Resources needed for providing those services, some of the software packets have to be acquired under license. A Collaboration agreement may cover part of these fees (like Objectivity), but many of the other has to be accounted for the CMS Computing.

First Phase: developing the Computing System (years 2000-2003)

Second Phase: implementation of the Final CMS Computing System (years 2003-2005)

Third Phase: running of the CMS Computing System and Upgrades (years 2005à )

Middleware

Middleware is a word today used is strong relation with the concept of Grid. However middleware software is there since a long time, and in the HEP Word is well known.

All the tools since now used to manage and run the applications (jobs) that the Physicists need when doing analysis are middleware. Moreover all the procedures and tools used to make available and to coordinate the resources used for "running" an Experiment are something related to middleware.

Being CMS an Experiment with a large need of distributed computing resources, the role of middleware software is much more important.

Allocation of resources, finding the better available resources, migration of data and/or executables, bookkeeping, monitoring, automatic control of Farms, procedures for configuration of resources, batch systems, priority and access controls, etc, are all part of a middleware development.

The need of such tools will increase with the increasing complexity and dimension of the CMS Computing System. It's clear that a strong coordination between all the participating Institutions is needed.

Most part of the middleware development and test has to be started with prototyping phase (First Phase) in order to understand what are the different possibilities and to define a strategy for future deployment (Second Phase) and running (Third Phase) of the full CMS Computing System.

First Phase: developing the Computing System (years 2000-2003)

The following items should be addressed during this phase:

Job submission

Bookkeeping

Monitoring

Resource coordination

Information publishing

Data management

Authentication

Resource integration

Job retrieval and control

Resource optimization

Farm managing

User information and access

Scheduling

Etc.

Second Phase: implementation of the Final CMS Computing System (years 2003-2005)

Third Phase: running of the CMS Computing System and Upgrades (years 2005à )

Space and local support

Installation and Personnel spaces are required at each Site involved. In particular Personnel space is only required when additional Personnel is identified.

Installation space and infrastructure (electric power, conditioning, etc.) are clearly related to the dimension of the local system, but they should not be forgot for the small sites too.

Resource sharing (First Phase)

The criterion for resource distribution is basically built on the responsibility (and expertise) of testing identified functions of the Tier-n implementation, and also on the involvement of the local Researchers in the Physics studies.

The following preliminary table shows a possible distribution during the three years of the CPU and disk storage resources. It should be stressed that all the resources will be used for simulation and production of data. As already said all the sites will be involved in the prototyping of Tier-n Centres.

The proposed scheme intends to assess the work done and the completion of engagements at the end of the second year; for this reason the third column does not share hardware resources among the sites.

As shown in the table a main site will concentrate a large amount of resources, in terms of CPU and storage. This site may be considered as a tentative prototype for a CMS tier-1, and its prototyping activity will be mainly addressed towards PC-farm developing.

This site is currently mapped to Legnaro.

According to the hierarchical approach designed by MONARC, a layer of sites with important although smaller resources follows; these sites are tentatively identified on the basis of availability of manpower, physical space, network connectivity, software competence (not necessarily in that order).

They all have equal hardware resources; they all share the charge to carry on the tasks needed by the Collaboration on the physics side (HLT production and analysis, calibration simulation and other) as well as to develop tools for automatizing software tasks, managing the data, making data analysis easier to the average user.

All developments will be integrated in the Grid project; the sites will concentrate on developing functionalities as

job submission scheme

scheduling coordination

production monitoring

analysis load

data access patterns

A tentative identification of these sites includes: Bari, Bologna, Padova, Pisa and Roma.

Finally a layer of sites with initially less hardware resources, again evenly assigned, is foreseen; the goal is twofold: on one side extend the competence in the computing activities to as many people as possible, on another to join the data analysis bringing in the detector knowledge.

year site	2001	2002	2003
LNL	3100 SI95, 5 TB	6200 SI95, 21 TB	13000 SI95, 40 TB
Bari	700 SI95, 1 TB	1400 SI95, 3 TB	-
Bologna	700 SI95, 1 TB	1400 SI95, 3 TB	-
Padova	700 SI95, 1 TB	1400 SI95, 3 TB	-
Pisa	700 SI95, 1 TB	1400 SI95, 3 TB	-
Roma1	700 SI95, 1 TB	1400 SI95, 3 TB	-
Catania	350 SI95, 0.5 TB	700 SI95, 1 TB	-
Firenze	350 SI95, 0.5 TB	700 SI95, 1 TB	-
Perugia	350 SI95, 0.5 TB	700 SI95, 1 TB	-
Torino	350 SI95, 0.5 TB	700 SI95, 1 TB	-
Total	8000 SI95, 12 TB	16000 SI95, 40 TB	32000 SI95, 80 TB

Computing Costs

The baseline for cost is as follows:

1 "Farm PC" is rated at ~2.5 Mlit (beginning 2000) with a decrease of ~40% per year. A mean power of a "beginning 2000" CPU is rated at the "sweet point" to be about 20 SI95.

1 TByte of disk space is rated at ~120 Mlit (beginning 2000) with a decrease of ~35% per year. The accounted price is for a "SAN" server with all the necessary devices to be accessed (including access stations and software).

1 TByte of disk space, for less than a TByte of disk space, is rated at ~90 Mlit (beginning 2000) using more traditional SCSI packages. The decrease rate is expected to be similar.

1 Tape library is rated at ~125 Mlit (beginning 2000) with a decrease of ~20% per year.

Network (LAN and WAN) unit costs are just estimates, ranging from 120 Mlit for a Top class switched system to 5 Mlit for a few port FE switch (beginning 2000).

Wan network links are not included (rapid evolution of the market).

First Phase: developing the Computing System (years 2000-2003)

PRELIMINARY ESTIMATES!

Costs in Mlit	2001	2002	2003	Total
Processors	573,000	344,000	414,000	1,331,000
Disks	896,000	1,392,000	1,318,000	3,606,000
Tape Libraries	100,000	160,000	256,000	516,000
LAN	207,000	48,000	108,000	363,000
Tapes media	144,000	129,000	153,000	426,000
Total	1,920,000	2,073,000	2,249,000	6,242,000

It should be noted that ALL the software costs (Tools, licenses, OS, compilers, etc.) have still to be evaluated and added. Moreover the Personnel cost is not included, as usually done in Italy (end EU).

Second Phase: implementation of the Final CMS Computing System (years 2003-2005)

Third Phase: running of the CMS Computing System and Upgrades (years 2005à )

LHCb

Background information on current thinking about the LHCb Computing Model

The current baseline LHCb computing model is based on a distributed multi-tier regional centre model following that described by MONARC. The facility at which data are produced we call the production centre. In the case of real data CERN would be the production centre. In the case of simulation, an external facility would typically be the production centre and CERN would fulfil the role of a regional centre giving those physicists resident at CERN access to the simulated data.

We assume that regional centres (Tier 1) have significant CPU resources and a capability to archive and serve large data sets for all production activities, both for analysis of real data and for production and analysis of simulated data. Local institutes are assumed to have sufficient CPU and storage resources for serving the physics analysis needs of their physicists. A local institute may also have significant CPU resources for running compute intensive applications and this resource must be fully exploited by the computing model.

At present we also assume that the production centre will be responsible for all production processing phases i.e. for generation of data, and for the reconstruction and production analysis of these data. The production centre will store all data generated by the entire production, namely RAW, ESD, AOD and TAG data. Furthermore we assume that physicists will do the bulk of their analysis using the AOD and TAG data only and therefore only the AOD and TAG data will be shipped to other regional centres on an automatic basis. Some data generated during reconstruction may be needed on a regular basis in physics analysis jobs. If this is the case then we expect the amount of data to be small and would consider including these data in the AOD data that is shipped to the regional centres. This would remove the need for shipping the ESD data as well.

The consequences of this approach are as follows:

The role of CERN will be to be the production centre for real data. All production processing of real data up to the generation of AOD data sets will be done at CERN. The AOD and TAG datasets will be shipped to the LHCb regional centres, which will serve these data to physicists running jobs in production at their local institutes and on their desktop machines.

All simulation production will be done on facilities outside CERN and all data sets generated (RAW, ESD, AOD and TAG) will be archived at the nearest regional centre. In this case this regional centre is filling the role of the production facility with respect to the distribution of these data to the rest of the collaboration. As in the case for real data, the AOD and TAG data for simulation productions will be shipped from the production centre to other LHCb regional centres, including CERN, to serve the analysis jobs of remote physicists.

Differences between LHCb model and MONARC model

Although the basic architecture of the LHCb computing model corresponds well with the MONARC model, there are a number of details that distinguish LHCb from the larger LHC experiments:

The number of different analyses is very large, due to the large number of decay channels that are studied. In this sense physicists are working largely independently on different decay channels. We do not explicitly identify group analyses.

The first stage in the analysis is performed in common for all analyses that subsequently follow, in the production centres. Only AOD and TAG data will be automatically shipped from the production centres to the outside facilities (small RAW and ESD samples will be shipped only on demand).

We do not see a clear need to distinguish Tier-1 and Tier-2 centres.

The data loads to transfer data from production centres to all regional centres are such that shipments should be realised in nearly all cases by using the WAN infrastructure.

CERN resources will only be devoted to processing real data produced at the experiment. Monte Carlo events will only be produced in the facilities outside CERN (RAL, IN2P3/Lyon, INFN, Liverpool, …).

The role of Grid software

The main part of the Grid middleware in LHCb computing model is the data transfer management of geographically distributed computing systems. The way in which the baseline model would be applied in practice can be understood through specific examples. In here we described three scenarios for the analysis of different data samples by a physicist situated remotely from the data production centre.

Scenario 1: user analysis on real data. A physicist in Ferrara wishes to perform a B⁰®J/Y K⁰s analysis on the sample taken during the first year of data taking.

The selection algorithm must first scan TAG data corresponding to all 109 events. The selection criteria will identify 10⁷ candidates with physics features of interest during the production analysis phase. The physicist’s analysis job will be run on the AOD data corresponding to these events to select the particular B⁰®J/Y K⁰s candidates (~10⁶). The number of real fully reconstructed B⁰®J/Y K⁰s events is expected to be ~10⁵. The analysis job outputs ntuple and statistical information and these data are interrogated interactively many times.

In some cases the decay selection algorithm may need to be changed in which case the whole procedure will need to be repeated on the full 109 events. Otherwise the user physics analysis algorithm may change but this will be run only on the 107 selected events.

In addition systematic studies will be performed (to look at the influence of particle identification, tracking etc. on the algorithms) that require access to ESD information (a 105 ESD events sample can be enough). Physicists may look in detail, for example using the event display, at complete event information including the raw data for very small samples of ~100 events.

This analysis would be realised as follows:

The AOD and TAG data for all 109 events would be available at the INFN Tier-1, having been distributed automatically from CERN (Tier-0) to all Tier-1 Regional Centres (and so also to the INFN Tier-1) as they are produced.

The scanning procedure would be run on the INFN Tier-1 computer facility to identify the 10⁷ candidates of interest.

The AOD and TAG information corresponding to the 10⁷ selected events would be copied from INFN Tier-1 to the Ferrara department server (Tier-3). The total amount of data moved corresponds to 200 GB (AOD) and 10 GB (TAG). This amount of data could be conveniently shipped over the WAN in a few hours.

The analysis job would typically be run at Ferrara many times on these selected events.

If a change in the selection algorithm is needed the above procedure would be repeated from step 2.

Systematic studies would involve copying ESD data from the production centre (in this case CERN) for 105 events i.e. 10 GB of data.

A small sample of events would be checked interactively, for example with the event display. For these events, RAW and ESD data for ~100 events would be copied from CERN to Ferrara i.e. 100 MB of data.

The copying of data sets between INFN Tier-1 and Ferrara Tier-3 only has to be done once and should not be repeated for each Ferrara physicist doing analysis. A tag database is needed to keep track of which data is available locally. The data caching and replication software (a part of Grid software) may help to ensure this happens transparently for the user.

Scenario 2: user analysis on real data. A physicist in Ferrara wishes to perform an analysis of the (very crowded) decay channel B⁰®D^*- p⁺ on the sample taken during the first year of data taking.

This scenario is very similar to scenario 1. In this case, however, the AOD data size (2 TB) is grater by one order of magnitude than the preceding case and can’t be transferred on the WAN from INFN Tier-1 to Ferrara Tier-3 in a reasonable amount of time. This analysis, therefore, would presumably be performed at the INFN Tier-1 instead of Ferrara Tier-3 (moving jobs to data instead of moving data to computing facility).

Scenario 3: user analysis of simulated background events. A physicist in Cagliari wishes to run his B⁰®J/Y K⁰s analysis on background bbar events generated at a simulation production facility located at the Lyon/IN2P3 regional centre.

The sample of events to be analysed corresponds to the total real data sample i.e. 10⁹ events. The total amount of data generated is 12 TB GEN, 200 TB RAW, 100 TB ESD, 20 TB AOD and 1 TB TAG.

The RAW and ESD data are archived at the production centre i.e. Lyon.

The AOD, TAG and GEN data are automatically distributed to other Tier-1 centres (like INFN Tier-1) and are archived there. This involves transfer of 33 TB of data from Lyon to INFN Tier-1.

The physicist will run a job at the INFN Tier-1 centre that scans the TAG dataset to select interesting events (~10⁵ events).

The AOD and TAG data corresponding to these 10⁵ events are then copied to Cagliari. The total volume of data transferred is very small i.e. 3.3 GB of AOD, TAG and GEN.

The analysis job would be run at Cagliari many times on these selected events.

As you can see, to perform physical analyses we need to transfer big amount of data on the WAN, in a rational way (so to avoid overloads and bottlenecks) and possibly transparently for the users (so that physicists don’t need to worry about data location and transfer). The tasks of Grid middleware can be summarized as follows:

Automatic distribution of AOD, TAG and GEN (for Monte Carlo data) from production centres (CERN for real data, RAL + Lyon + INFN + Liverpool + … for Monte Carlo data) to all Tier-1 Regional Centres (CERN + RAL + Lyon + INFN + Liverpool + …).

On-demand distribution of AOD, TAG and GEN selected samples from Regional Centres (Tier-1) to Department Servers (Tier-3).

Updating of tag database to keep track of locally available data and to avoid multiple transfer of the same data.

On-demand distribution of small ESD samples from Production Centres to the Tier-3 that request them.

On-demand distribution of small RAW samples from Production Centres to the Tier-3 that request them.

For analysis of very crowded decay channels (like B⁰®D^*- p⁺), user analysis job moving to the Tier-1 regional centres with load balancing.

Distribution of calibrations database.

HEP applications (Grid use-cases)

In the following we describe a possible scenario for carrying out the study on a distributed computing infrastructure, which could serve as a useful test-bed for the Grid activity.

Study of the reaction channels Bà J/Y K*à J/Y K_sp⁰ and Bà c_c1K

The total sample of B --> J/Y K*simulated events needed for this analysis should be ~10 times the number produced in the real data. In one year of data taking we expect to collect and reconstruct 10⁵ events and therefore the number of simulated B --> J/Y K_sp⁰ events to be generated, stored and reconstructed is 10⁶ (same considerations apply to the second reaction channel mentioned above)

The simulation involves a number of steps:

Physics generation

Tracking through the detector using GEANT to produce detector hit information

Digitisation to simulate the response of the detector

Triggering to select those events that would cause the LHCb trigger to fire

Full reconstruction of the triggered event sample

Apply analysis cuts to reconstructed event sample and do physical analysis.

The CPU power required to simulate the 10⁶ physical processes is shown in the table below.

Table Table .4 - CPU power to simulate 1 Million physical processes

Step	Number of events	CPU time/event [SI95 s]	Total CPU power [SI95 s]
Physics generator	10⁹	200	2x10¹¹
GEANT tracking	10⁸	1000	10¹¹
Digitisation	10⁸	100	10¹⁰
Trigger	10⁸	100	10¹⁰
Reconstruction	10⁷	250	2.5x10⁹
Final state reconstruction	10⁶	20	2x10⁷

The total CPU power required is 3x10¹¹ SI95 s, corresponding to an installed CPU capacity of 10000 SI95 to get in one year the required data sample.

Moreover, a factor 100 more is required to generate a reasonably sized background events sample. Making cuts as early as possible in the simulation sequence, the expected CPU requirements for background generation could be estimated to be a factor of 4 greater than for the simulated signal sample.

Estimates of data volumes and CPU requirements for processing and storage of simulated data are given in the table below.

Table .5 - CPU requirements and Data Volumes

CPU power for signal events	10000 SI95
CPU power for background events	40000 SI95
Raw data size per event	200 kB
Total raw data per production	10⁸x 200 kB = 20 TB
Generator data size per event	12 kB
Total generator data	1.2 TB
ESD data size per event	100 kB
Total ESD data per production	10⁸x 100 kB = 10 TB
AOD data size per event	20 kB
Total AOD data per production	10⁸x 20 kB = 2 TB
TAG data size per event	1 kB
Total TAG data per production	10⁸x 1 kB = 0.1 TB

Test-bed

LHCb Italy is planning for one Tier-1 Regional Centre (concentrated in one site, in a "Consorzio di calcolo", to be chosen on a housing and manpower cheapness base) and eight Tier-3 (Bologna, Cagliari, Ferrara, Firenze, Genova, Milano, Roma-1 and Roma-2).

Plans for simulation in 2001-2005 (bottom-up view)

LHCb collaboration is planning to start immediately the Monte Carlo production. The computing power demand in the short period is considerable and therefore it drives the plan of computing acquisition. The developing plan of LHCb computing system follows therefore a bottom-up view.

In 2001 we will produce ~2x108 simulated events for detector and trigger optimisation (detector TDRs expected in 2001 and early 2002);

In 2002-2005 we will start to produce very large samples of simulated events, in particular background, for physics studies (2.4x106 evt in 2002, 7x106 in 2003, 1.5x107 in 2004, …);

This on-going physics production work will be used, as far as is practicable, for testing development of the computing infrastructure.

Mock Data Challenge test of infrastructure (top-down view)

From 2002 to 2004 we will make tests to validate our computing model. This will include deploying software for operating compute facilities and for supporting distributed processing.

2002: MDC1 – application tests of Grid middleware and farm management software using a real simulation and analysis of 10⁷ B channel decay events. Several regional facilities will participate: CERN, RAL, CCIN2P3/Lyon, Liverpool, INFN, …

2003: MDC2 – Participate in the exploitation of the large scale Tier-0 prototype to be set-up at CERN.

2004: MDC3 – Start to install event filter farm at the experiment.

The validation of Grid software should intersects with scalability tests for the off-line farms of regional centres:

Reboot of all farm CPUs in a reasonable amount of time.

Test writing of RAW and ESD data in offline farm environment from multiple input processes.

Access to conditions data from multiple reconstruction jobs.

Stress testing of event database. Multiple concurrent accesses by independent analysis jobs.

To make these tests, because of non-linear scaling of complexity with dimension, a considerably sized test-bed is required. A top-down plan of computing system is therefore also required, which starts from the final size and schedules the short-term milestones so to allow the validation on a reasonable sized facility.

LHCb collaboration evaluates the prototype size in the 10% of the final dimension.

Plan for the acquisition of equipment

To test the Grid software with LHCb use cases, we estimate the following requirements for test-bed computing resources.

Table .6 - Capacity targets for the Test bed LHCb

	units	2001	2002	2003
CPU capacity	Si95	2300	6800	20300
Estd. Number of cpus	#	50	111	225
Disk capacity	TBytes	0.25	0.34	0.72
Tape storage capacity	TBytes	0.10	0.16	0.28
WAN link to CERN	Mbits/s	155	622	1000
WAN link between Tier 1	Mbits/s	34	155	1000
WAN links between tier 1 and tier 3	Mbits/s	34	155	155

The costing for hardware acquisition in 2001-2003 is estimated in about 525 k€, electric power supply in 83 k€, housing in 53 k€ and outsourced system manpower in 143 k€.

The total LHCb costing for 2001-2003 amounts to about 0.8 M€.

VIRGO

Introduction

The scientific goal of the Virgo project is the detection of gravitational wave in a frequency bandwidth from few Hertz up to 10 kHz. Virgo is an observatory that have to operate and take data continuously in time. The data production rate of the instrument is of ~ 4 Mbyte/s including the control signals of the interferometers, the data from the environmental monitor system and the main interferometer signals. The ensemble of these data is called the raw data set. These is stored in the on line data distribution disks and then recorded on tapes.

All these operations are performed in the VIRGO laboratory located in Cascina, where the reconstruction of the physics calibrated signal of the interferometer h(t), i.e. the strain deformation of the interferometer, is computed on line.

Raw data will be available for off-line processing from the Cascina archive and from a mirror archive in the computer center of the IN2P3 of Lyon (France).

The h(t) reconstructed data, plus a reduced set data from the monitoring system, define the so called reduced data set. The production rate of the reduced data is expected of the order of 200 kbyte/s.

The main activity of the off line data analysis processing will concern mainly this second set of data, that we have to transmit and replicate continuously in the various laboratories (or computer centers) of the collaboration.

In conclusion, in Cascina we have the data production and the storage, the on line processing and a part of the computing power for the off line data processing: Cascina play a similar role of CERN respect to the LHC experiments.

The off line analysis cases that are most demanding from the point of view of computational cost and storage are the following:

the search of the binary coalescent signals

the search of the continuous gravitational wave signals

In the following we propose to approach these problems in the framework of the Grid project.

The test for inspiraling binary search for VDAS

The search for the inspiraling binary g.w. signal is based on the well established technique of the matched filter.

This method correlates the detector output with a bank of templates chosen to represent possible signals with a sufficient accuracy over the desired range of parameter space. The balance between the computational cost and storage requirements depends on the applied computational strategy. Assuming that the templates are stored and not generated on the fly, the main computational cost of the general procedure is due to the Fast Fourier Transform (FFT), since all the main steps of the procedure are accomplished in the frequency domain.

Given the raw data production rate and the present status of the network bandwidth, we propose to analyze the data starting from the reconstructed interferometer strain h(t).

The request for the search of signals generated by binary star system with a minimum mass of 0,25 Solar Masses at the 91% of Signal to Noise Ratio (SNR) is

8000 SI95 of computing power

1 Tbyte of memory.

Dedicated links connecting Napoli, Perugia, Firenze and Cascina must be foreseen.

The test for pulsar search for VDAS

The case of pulsar signal search is more demanding for computational resources. The procedure of the signal search is performed by analyzing the data stored in a relational data base of short FFTs derived from the h(t) quantity. The search method is basically hierarchical. In the first step of the search, the analysis starts by dividing the frequency band in intervals and the fragmentation of the computational job is directly connected to the selected frequency intervals. In the following step the subsequent fragmentation depends on the result of the previous step. Thus, the computational procedure is characterized by a higher level of complexity to define and control of the job granularity.

We notice that the definition of the optimum procedure depends from the available computational architecture and it represents one of the main tasks of the present R&D activity of the Virgo off line group.

The goal of the Grid test bed is to optimize the full analysis procedure on the data generated by the Virgo central interferometer.

We plan to limit the search to the frequency interval 10 Hz - 1.25 kHz and the computation will cover 2 month of data taking.

It implies roughly the same computational power of the binary coalescent case

8000 SI95 of computing power

and a minimum storage memory storage of

10 Tbyte of storage disk.

Dedicated links connecting Roma 1, Napoli and Cascina must be foreseen.

Distribution of the computational resources.

Tier 0 Virgo- Cascina;

(Tier 1 Lyon – IN2P3)

Tiers 2: Roma 1 and Napoli;

Tiers 3 Perugia and Florence

Concerning the network links we notice that

a) the laboratory in Cascina has to become a node of the network backbone

b) we need dedicated bandwidths for the transmission of the reduced data set to the Tiers and to perform intensive distributed analysis

c) the international link should permit the reception and transmission of the reduced set of data to Europe, USA and Japan, and, at least in the future, the transfer of a relevant fraction of the raw data in France.

Finally, in the Table we report the total capacity needed for the Virgo test beds at the end of 2001 and the final target for the off line analysis of Virgo.

Table .7 - VIRGO Test Bed capacity

	units	end 2001	end 2003
CPU capacity	SI95	8,000	80,000
estd. number of cpus		400	1,000 (?)
disk capacity	TBytes	10	100
disk I/O rate	GBytes/sec	5	5
sustained data rate	Mbytes/sec	250	250
WAN links to Cascina	Mbits/sec	155	2,500
WAN links to labs	Mbits/sec	34	622
WAN links to France	Mbits/sec		622

ARGO

Introduction

ARGO is a cosmic ray telescope optimized for the detection of small size air showers. It consists of a single layer of RPC’s, placed at 4300 m. elevation, in Tibet. The detector will provide a detailed space-time picture of the showers front, initiated by primaries of energies in the range 10 GeV- 500 TeV . Data gathered with ARGO will allow to face the gamma astronomy at 100GeV threshold monitoring more than 300 sources, gamma ray burst physics, diffuse gamma ray from galactic plane, the Anti-p/p ratio at energies 300GeV- TeV, the primary proton spectrum in the 10-200 TeV region and so on. The experimental apparatus will start taking data in the second half of 2001 with a fraction of the detector installed and will be fully operational in 2003.

Computing model

We should consider, when talking about the computing task of ARGO experiment, the peculiar conditions of the experimental site : the remote location, the incertitude about the availability of a network link , the lack of local computing infrastructures.

When fully operational the raw data archiving system will collect data at a rate of 6.8 MB/s, corresponding to an event rate of 25 KHz, according to the current valuation. The raw data will be registered on DLT tapes at a rate of 12 tapes a day. In one year we gather 200 TB of raw data. After reconstruction the total volume of data is reduced by a factor of 10, equivalent to 20TB a year of DST data, from 2003 on. We should take into account a certain number of reprocessing, at least during the initial phase of the experiment life, due to the normal process of optimization and refinement of the reconstruction software.

The DLT tapes will be processed in a Tier-0 equivalent site, located in Naples, where the raw data will be reconstructed and organized in streams for the different physics analysis. At the moment we don’t’ foresee Tier-1 centers , but Tier-2 (Lecce, Roma3) and Tier-3 (Roma2, Torino, Palermo) ones.

For the simulation of the atmospheric showers and the hit pattern into the detector (a well-known heavy task, common to the experiments of this kind) we foresee a number of Monte Carlo data equivalent to 12% of the raw data (using sampling techniques on the different intervals of energies of the primaries). The programs we are using, Corsika for the shower simulation and ARGO-G (based on Geant3) for the hit pattern, require from 5 to 10 sec./event using a 12 SI95 CPU. At present peculiar techniques are under study to reduce the computer power of 20 to 40 times (in respect to the 320000 SI95 theoretically requested for this computation).

The task of raw data reconstruction needs around 10 mSec/event using a 12 SI95 CPU . For one fully operational year of data taking, the reconstruction of the 7.8*10**11 gathered events will request around 6000 SI95, if reprocessing have to be taken into account.

Use-case and test-bed

For a fully operational year of data taking we can resume the computing resources requested for ARGO as follows :

CPU power for event reconstruction	6000 SI95
CPU power for simulation events	8000 SI95
Raw data rate	6.8 MB/s
Event data rate	25 KHz
Total raw data	200 TB
Total event number	7.8 x 10¹¹
Total reconstructed data	20 TB
Total MC event	24 TB
Total reconstructed MC event	2.4 TB

During the period 2000-2001 we shall continue the detector and trigger optimization using a sample of Monte Carlo events. We can estimate this sample to be a 1/5 of the total number of simulated events needed in one fully operational year. The computing power requested for this task is of the order of 1600 SI95. A useful use-case for the Grid development and ARGO computing is the use of a certain number of distributed resources (CPU and storage) located in Napoli, Lecce and Roma3 dedicated to the Monte Carlo production. In fact in the first phase the ARGO test-bed will be mainly devoted to MonteCarlo studies. Since the Extensive Air Shower development is dominated by large fluctuations, more important near the energy threshold, a huge amount of simulated data is needed. In order to face this problem, many different techniques are currently exploited to reduce the computing time and compare the small amount of simulated data to the larger quantity of data collected in the experiment. As an example, a library of shower events, simulated at fixed angles and energies, can be produced. Then each simulated shower is used many times in different geometrical relations with the detector. Results obtained at different angles and energies are integrated over the spectrum of photons or primary cosmic rays, after interpolation. This kind of mixed approach (analytical calculation folding the MonteCarlo results) is unable to provide an exact reproduction of shower fluctuations and detector sampling effects. Predictions obtained in this way could be affected by large systematic errors if the amount of simulated showers is reduced too much. These problems will be faced by the ARGO test-bed, distributing the resources (CPU and storage), specified in the table below, between the INFN Sections of Lecce, Napoli and Roma3. The sharing of these computing resources in the frame of the INFN-|Grid will allow to use and test the tools developed by the different WP groups (Workload Management, Data Management, Validation, Monitoring).

From 2002 onward the use-case will be extended to the sharing of the reconstructed experimental data and of the simulated data (full scale). We shall follow the scheme : Tier-0 in Napoli, Tier-2 in Lecce and Roma3, Tier-3 in Palermo, Roma2 and Torino. For this kind of physics the reconstructed events should be available on-line, at least for a certain number of months, to allow an efficient access to the data in correspondence with interesting events registered by other experimental apparatus around the world (typically gamma ray bursts, events registered by the gravitational interferometers) or for the re-examination of the data gathered during definite periods of time.

The resources requested for the ARGO test-bed are :

ARGO test-bed capacity
	units	2001	2002	2003
CPU capacity	Si95	1600	4400	14000
Number of CPUs (60/80/100 SI95 each)	#	25	25+35	25+35+96
Disk capacity	TBytes	1.5	3	12
Tape storage capacity	TBytes	5.5	45	224
WAN links between tier 0 and tier 2	Mbits/s	34	155	622

COMPASS

Introduction

COMPASS is a new High-Energy-Physics fixed target experiment at the CERN SPS. The first commissioning run, devoted to debug the different parts of the apparatus, included the off-line computing system will start in may 2000, and the data taking will go on for at least 4 years.

The COMPASS physics program is very diversified, going from the search for exotics states to the measurement of the gluon contribution to the nucleon spin, and will be performed using slightly different apparatus geometry. All the measurements require high statistics, corresponding to about 0.5 PB of raw event data accumulated over the 100-150 data taking days per year.

The archiving of all data and the raw event reconstruction will be done at CERN, where a dedicated facility (the Compass Computing Farm, CCF), consisting of about 200 CPU in the initial layout, with an extended deployment of commodity hardware, is being set-up.
The final analysis for specific measurements, as well as the Monte Carlo event production, will be done mainly in the home institutes.

At CERN, all the data (starting from the raw events) are in an Object Data Base (based on Objectivity/DB), whose major feature is a terse hierarchical structure to organise the events allowing easy parallel access with practically no locking contention. The connections between the reconstructed information and the original events is kept inside the federated data base.

The data are organised in the data bases in such a way that simple access patterns can be fulfilled in a "predictable" time. Since the layout of the data is simple, the implementation of "pre-staging" or "stage-forward" policies is possible.

Motivation for a WAN

Since the very beginning some COMPASS Institutions, and in particular Trieste, started to investigate the possibility to allow in their local farm some access to the raw information connected to some special sub sample of events. The access to the raw and experimental information is of fundamental importance to have the possibility to make specific tests (like detector studies, code development and performances tests on standard samples) and partial reprocessing of subsets of events for specific analysis. Without it, the off-line activity in the home institutes is limited to the "usual" physics analysis, based on self-contained data, and Monte Carlo production. To facilitate this task, in Trieste we are building a satellite farm with essentially the same architecture (scaled to 20%) and the same environment of the CCF, possibly connected to the other local computing systems.

Similar satellite farms are under development in other COMPASS Institutions (Munich, Mainz, ...).

Using Objectivity/DB, the access to the centrally (CERN) stored data can be done with different kind of data replication and/or with the direct access to the remote data bases.

COMPASS as a TEST-BED

A real investigation of this use case, which asymptotically requires a mature wide area data management, should start with the use of the existing structure (data base and control software) to study the real behavior of the system (for example, the efficiency of pre staging policies, the error recovery scheme). To do that a generic monitoring infrastructure over wide area is needed, coupled with the monitoring system already developed for the CCF.

We suggest a very concrete approach to the data management starting from a generalised monitoring system. We can provide a growing experience in monitoring of distributed (over LAN) systems and the experience in software development and use of complex farm (the pilot run starts before summer 2000). We (will) have access to some facilities distributed over WAN to test the relevant middleware. We could immediately benefit from Grid services like security and authentication, which may be critical when the database servers are exposed to the WAN traffic. We could also benefit of any service to allow to manage all aspects of the resources in an uniform way over WAN (for example data base administration, software release, etc...).

The second phase would be the use of the Grid wide area data management services to access the data, comparing it in terms of performances with the previous one.

Till this point, the COMPASS use case dealt basically with a "Tier0-Tier2" architecture: all data sit in the Tier0, each Tier2 satellite holds only a largely exclusive fraction. It is not excluded, however, that the data exchange among satellites become relevant. In this case, more mature Grid services to be used for the COMPASS data analysis will be definitely needed; for example all the tools to evaluate the effective bandwidth to a sample of data - which may be replicated in more than one site- will be needed and coupled with some workload management. This can be the topic of the third phase of the project.

To conclude, it is important to stress that the COMPASS infrastructure may also be an ideal existing test bed: the development of the Grid project require also the development of simulation tools to validate the test bed results and to guide the design of future systems. As for the MONARC project, where valuable simulation tools have been developed, the Grid will need test bed and benchmark data to serve as input for the different modeling packages, and part of them can come from COMPASS.

Human Resources

Nome e Cognome	Ruolo	INFN	EU
Franco Bradamante	PO	10%	10%
Benigno Gobbo	Ricercatore INFN	30%	20%
Massimo Lamanna	Ricercatore INFN	40%	20%
Anna Martin	PA	20%	20%

Work plan

Short and medium term: In the framework of the COMPASS "Tier0-Tier2" system, in which the export and remote access of Objy DBs is already foreseen (starting in the second half of 2000), we could:

use and test monitoring tools from Work Package 2, Task 3 (Application monitoring);

contribute with minitoring experise to Work Package 2, Task 3, for example Objectivity AMS, Objectivity Lock Server, etc.;

use and test tools for remote data access management in use in the Grid framework, like security and authentication, as well as tools being developed in the Work Pakage 2, Task 2 (Data Management), like remote data and DB administration;

contribute with tests and experience to Objectivity related issues (AMS);

use and test tools from Work Package 2, Task 4 (Computing Fabric);

close the feedback loop and run the system (CCF and satellite) collecting and analysing the monitor information.

Long term: The system CCF+satellites is suited to test more advanced Grid services. In this context we could:

use and test tools for data distribution among the various satellites (Work Package 2, Data Management);

use and contribute to tools for workload management and transparent data access (Work Package 2, Task 1, Workload Management, and Task 2, Data Management);

use and test tools from Work Package 2, Task 4 (Computing Fabric).

APPENDIX B: Financial requests for 2000 and 2001

Summary of the financial requests by WP:

Table of Consumables for WP1

Site	Description	Year 2000	Total
CNAF	License for Netscape LDAP Server – 1000 keys	10 Mlire
Total		10 ML

Table of Materials for WP1 in ML

Site	Description	Year 2000	Total
Bari	QUANTUM	15
Bologna	QUANTUM	15
Catania	QUANTUM	15
CNAF	QUANTUM	15
Ferrara	QUANTUM	15
Lecce	QUANTUM	15
LNL	QUANTUM	15
Milano	QUANTUM	15
Padova	QUANTUM	15
Pisa	QUANTUM	15
Roma1	QUANTUM	15
Torino	QUANTUM	15
Total		180 ML

Table of Materials for WP2.4 in ML

Site	Description	2000	2001	2002	2003	Total
LNL	Controllers RAID	10
	4 PC IBA		50
	1 switch IBA		20
	adapter IBA to FC/SCSI		10
PD	4 server	30
	8 porte FC Switch	20
	4 FC Adapters	15
	1 Raid Array SCSI	15
	1 Raid Array FC	15
	4 dischi SCSI	8
	4 Controller Serial ATA		5
	4 pc IA64 Architecture		40
CNAF	10 dual slim processor	90
	1 Fast Ethernet Switch	10
	10 SCSI Disk 35 GB	20
LE	10 Dual Pentium III-800 MHz, 512 MB RAM	50
	1 Switch Myrinet SAN 16 Porte 2.5 Gbps	8
	10 Myrinet SAN/PCI NIC 2.5 Gbps	35
	10 Disks SCSI 72 GB	35
	10 SAN cable 10 Feet	4
	1 Rack	1
	1 Disk Server + Disks	10
	10 Dual IA-64-800 MHz, 512 MB RAM		100
	1 Rack		1
	1 Disk Server + Disks		15
GE	1 Switch Gigabit Ethernet a 12 porte 1000 Base T	30
	8 PC with EIDE disks	24
	8 Gigabit Ethernet 1000 Base T NICs	16
Totals		446	241

Table of Consumables for WP2.4 in ML

Site	Description	2000	2001	2002	2003	Total
LNL		2	5
PD		1	4
CNAF		1	2
LE		2	5
GE		4	8
Total		10	24

Table of Materials for WP3 in ML

Description

Year 1

Year 2

Year 3

Total

3 *PC * 26 sites (5ML)

180 ML *

210ML

L2 switch *10

500ML

CNAF

Router interfaces

120 ML *

* This are the same as WP1. It is requested that 180 ML, corresponding to the Quantum Grid of the 12 site which take part to the Globus evaluation (WP1), be anticipated to year 2000 (see table, page 27), together with the 120 ML of router interfaces to be assigned to CNAF.

Summary of the financial requests by sites and experiments

2000: Consumables and Materials for services (ML)

Site	Description	Consumables	Materials

Bari	Quantum Grid		15
Bologna	Quantum Grid		15
Catania	Quantum Grid		15
CNAF	Quantum Grid		15
	WP 2.4:		120
	10 dual slim processor
	1 Fast Ethernet Switch
	10 SCSI Disk 35 GB
	Consumable	1
	License for Netscape LDAP Server - 1000 keys	10
	Router interface		120
Ferrara	Quantum Grid		15
Genova	WP 2.4:		70
	1 Switch Gigabit Ethernet a 12 porte 1000 Base T
	8 PC with EIDE disks
	8 Gigabit Ethernet 1000 Base T NICs
	Consumable	4
Lecce	Quantum Grid		15
	WP 2.4:		143
	10 Dual Pentium III-800 MHz, 512 MB RAM
	1 Switch Myrinet SAN 16 Porte 2.5 Gbps
	10 Myrinet SAN/PCI NIC 2.5 Gbps
	10 Disks SCSI 72 GB
	10 SAN cable 10 Feet
	1 Rack
	1 Disk Server + Disks
	Consumable	2
LNL	Quantum Grid		15
	Controllers RAID		10
	Consumable	2
Milano	Quantum Grid		15
Padova	Quantum Grid		15
	WP 2.4:		103
	4 server
	8 porte FC Switch
	4 FC Adapters
	1 Raid Array SCSI
	1 Raid Array FC
	4 disks SCSI
	Consumable	1
Pisa	Quantum Grid		15
Roma1	Quantum Grid		15
Torino	Quantum Grid		15
Total		20	746

2000: Materials for experiments

Roma1 ATLAS requires ~150 ML to start prototype Tier1 in Caspur

2001: Consumables by sites and experiments (ML)

Site	Description	Services	ALICE	ATLAS	CMS	LHC-b	VIRGO

Bari	1 software license for switch cabletron	6
Bologna	magnetic supports for 5 TB				14
	outsourcing					60
	magnetic supports, electric energy, etc.					43
Cagliari	LSF licenses	9
Catania	LSF licenses	2
	n. 40 LSF licenses, n. 20 Venus licenses, DLT tapes for backup		20
	n. 10 LSF licenses and n. 5 Venus licenses				8
Firenze*	connection and installation costs				5
	connection and installation costs						20
Lecce	metabolism, licenses			20
LNL	tapes				54
Milano	metabolism, cables, etc.	5
	contract with CILEA, 5 Tbyte DLT (83 tapes)			108
Napoli	consumable for Tier3			10
	connection and installation costs, etc.						25
Padova	software license	4
	5 TB tapes				14
Pavia	farm metabolism			10
Perugia*	2 Tbytes				5
	cables, etc.						2
Pisa	magnetic supports for 5 TB				14
Roma1	cables, rack	5
	outsourcing			100
	backup tapes				14
	connection and installation						25
Roma2	consumable for Tier3			10
Torino	magnetic tapes				5
	DLT tapes, etc.		9
Total		31	29	258	133	103	72

2001: Materials by sites and experiments (ML)

Site	Description	Services	ALICE	ATLAS	CMS	LHC-b	VIRGO

Bari	1 Server switch cabletron	10
	CPU + Disk + tape, Switch		168
	700 SI95, 20 CPU, 1 TERABYTE, LAN UNIT, TAPE				163
Bologna	1 Switch GB	25
	CPU for 1.4 KSI95, disks for 1.2 TB, Autochanger DLT, 1 switch		203
	700 SI95 CPU power, 1 TB disk space, 1 switch for Farm				149
	computer (CPU n. 50), switch (n.2), rack (n.2), disk (70 GB), tape reader					264
Cagliari	1 switch	67
	Quantum Grid	15
	CPU: 700 SI95, disk 400 GB, tape unit with autoloader, switch		116
Catania	n. 2 smart switch Gigabit Ethernet, n. 3 smart switch Fast Ethernet	30
	CPU for 1000 SI95, disk space for 0.6 TB, 1 tape unit with autoloader, 1 switch		138
	10 CPU - 35 SpecInt95, 0.5 Tbytes				54
Cosenza	Quantum Grid	15
	personal computer			10
Firenze*	10 PC-0.5 Tbyte disks				54
	20 Gflops, 1 Tbyte disks, 1switch fast ethernet						180
Genova	3 PC	15
LNL	89 CPU, 5TB disks, 1 Tape Lib, switch for farm				815
Milano	switch	12
	PC for 180 SpecInt 95, SCSI 1.8 Tbyte disk, robot DLT for 1000 slots			242
Napoli	Quantum Grid	15
	400 SI95, 0,4 TB, DLT			50
	100 Gflops, 1 Tbyte distributed disk capacity, 4 switch fast ethernet + 1 Gbit/links, 1 server						600
Padova	4 PC IA64 architecture (Fabric setup), 4 controller serival ATA	45
	20 CPU-700 SI95, 1 TB disk, 1 LAN switch				149
	10 CPU, 400 SI95 + 0.6 TB disk		58
Pavia	400 SI95, 0,4 TB			50
Perugia*	Quantum Grid	15
	10 PC (500SPI95), 1 TB disk (25 disks), 2 switch fast ethernet (1 with upload GbitEthernet)						115
	10 PC (350 SI95), 500 Gbytes				54
Pisa	700 SI95 CPU power, 1 TB, 1 switch for Farm				149		20
Roma1	Fast/Giga Ethernet switch	30
	PC 700 SI95, 1 TB disks, LAN, 3 racks				152
	cluster itanium 50 Gflops 162 CPU, storage 360 GB, uplink 1 Gbit						490
Roma2	Quantum Grid	15
	400 SI95+0.40 disk+1 DLT			50
Roma3	Quantum Grid	15
Salerno	CPU (0.7 KSI95), disks (0.6 TB), tape unit, switch		116
Torino	switch	50
	PC farm (350 SI95), 0,5 TB disk				54
	PC farm (1400 SI95), switch, DLT unit		203
Trieste	Quantum Grid	15
	Fast cluster, 2 PC Pentium III		43
Total		389	1045	402	1793	264	1425

* The VIRGO request will be finalized in the second part of the Year.

Internal and external travels for sites and Servizi Calcolo +technical personnel (Information Technology Group) in ML

INFN-Grid	Total fte	2001	2000	IT fte	2001	2000	total fte DataGrid	2001	2000	IT fte	2001	2000
Bari	6.7	52	17	1.2	9	3	2	55	18	0.5	11	4
Bologna	8.7	58	19	1	6	2	1.6	46	15	0.4	8	3
Cagliari	2.4	21	7	0.3	2	1	0.4	19	6
Catania	8.3	60	20	6.1	37	12	3.8	87	29	3.5	80	27
CNAF	6.2	37	12	6.2	45	15	4.9	111	37	4.9	111	37
Cosenza	0.6	4	1
Ferrara	1.0	6	2	0.5	3	1
Firenze	3.3	20	7	0.1	1
Genova	2.1	13	4	1.3	8	3	1	19	6	1	19	6
Lecce	5.0	36	12	3.8	27	9	0.4	9	3	0.4	9	3
LNL	2.9	20	7	2.6	19	6	2.6	57	19	2.6	57	19
Milano	2.8	23	8	0.4	5	2	1.5	45	15	0.3	14	5
Napoli	4.4	26	9	0.4	2	1
Padova	8.9	69	23	3.1	25	8	3.4	101	34	1.7	58	19
Parma	0.4	5	2
Pavia	1.3	8	3
Perugia	2.7	16	5	0.1	1
Pisa	6.3	38	13	0.8	5	2	2.4	46	15	0.5	10	3
Roma1	7.8	53	18	1.1	7	2	2.6	57	19	0.4	8	3
Roma2	0.9	5	2
Roma3	0.8	5	2
Salerno	1.8	11	4
Torino	6.5	42	14	3.5	24	8	2.9	63	21	2	46	15
Trieste	4.2	25	8	1.9	11	4	1	19	6	0.8	15	5

Total	96	653	218	34.3	235	78	30.5	734	245	19	446	149

Internal and external travels for ALICE

INFN-Grid	fte ALICE	2001	2000	fte ALICE data-Grid	2001	2000
Bari	2	14	5	0.9	21	7
Bologna	1	6	2
Cagliari	1.9	17	6	0.4	19	6
Catania	1	7	2
LNL	0.3	2	1
Padova	0.7	4	1
Salerno	1.8	11	4
Torino	2.1	13	4	0.9	17	6
Trieste	1.1	7	2	0.2	4	1

Total	11.9	81	27	2.4	60	20

Internal and external travels for ATLAS

INFN-Grid	fte ATLAS	2001	2000	fte ATLAS data-Grid	2001	2000
Cosenza	0.6	4	1
Genova	0.7	4	1
Lecce	0.8	6	2
Milano	1.4	11	4	1.2	31	10
Napoli	1.5	9	3
Pavia	1.3	8	3
Pisa	0.7	4	1
Roma1	2.5	15	5	1.3	25	8
Roma2	0.6	4	1
Roma3	0.4	2	1
Trieste	0.2	1	0

Total	10.7	68	23	2.5	56	19

Internal and external travels for CMS

INFN-Grid	fte CMS	2001	2000	fte CMS data-Grid	2001	2000
Bari	3.1	26	9	0.6	23	8
Bologna	4.1	28	9	0.6	19	6
Catania	1.2	9	3	0.3	7	2
Firenze	1.7	10	3
Padova	5.1	41	14	1.7	42	14
Perugia	1.6	10	3
Pisa	4.3	26	9	1.9	36	12
Roma1	1.8	14	5	0.6	19	6
Torino	0.9	5	2

Total	23.8	168	56	5.7	147	49

Internal and external travels for LHC-b

INFN-Grid	fte LHC-b	2001	2000	fte LHC-b data-Grid	2001	2000
Bologna	2.6	19	6	0.6	19	6
Cagliari	0.2	1	0
Ferrara	0.5	3	1
Firenze	0.6	4	1
Milano	0.7	4	1
Roma1	0.2	1	0

Total	4.8	32	11	0.6	19	6

Internal travels for VIRGO

INFN-Grid	fte VIRGO	2001	2000
Firenze	0.9	5	2
Napoli	2	12	4
Perugia	1	6	2
Pisa	0.3	2	1
Roma1	1.3	11	4

Total	5.5	36	12

Internal and external travels for APE

INFN-Grid	fte APE	2001	2000	fte APE data-Grid	2001	2000
Bari	0.4	3	1
Genova	0.1	1	0
Milano	0.3	2	1
Parma	0.4	5	2
Pisa	0.2	1	0
Roma1	0.9	5	2	0.3	6	2
Roma2	0.3	2	1

Total	2.6	19	6	0.3	6	2

Internal travels for ARGO

INFN-Grid	fte ARGO	2001	2000
Lecce	0.4	3	1
Napoli	0.5	3	1
Roma3	0.4	2	1

Total	1.3	8	3

Internal travels for COMPASS

INFN-Grid	fte COMPASS	2001	2000
Trieste	1	6	2

Total	1.0	6.0	2.0

APPENDIX C: Personnel

Note 1 – INFN-Grid percentages include the European DataGrid ones.

Note 2 - 1 FTE = 12 Person-Months/year.

Note 3 – IT = Information Technology Group = Servizi Calcolo + Tecnologi Computing + Professori e Ricercatori Dipartimenti Informatica

Table .1 – Summary of Personnel Resources by Site

BARI

Ruolo o

Afferenza

DataGrid

INFN-Grid

Esperimento

Cognome

Nome

Qualifica

o Gruppo

WP#

Tot

WP#

Tot

o Servizio

Castellano

Marcello

P.Ass.

Politecnico

0,6

0,6

ALICE

Cea

Paolo

P. Ass.

Gr. IV

0,2

0,2

APE

Cosmai

Leonardo

Ric. III

Gr. IV

0,2

0,2

APE

D'Amato

Maria

Bors.INFN

Gr. I

0,3

0,2

0,5

CMS

Di Bari

Domenico

R.U.

Gr. III

0,3

0.3

0,4

0,4

ALICE

Fini

Rosanna

Ric. III

Gr. III

0,3

0.3

0,3

0,3

ALICE

Maggi

Giorgio

P.Ord.

Gr. I

0,2

0,2

CMS

Resp. Loc.

Magno

Emanuele

Tecnol. Un.

Cen. Calc.

0,2

0,2

Mennea

Maria

Dott.

CMS

Manzari

Vito

Ric. III

Gr. III

0,3

0,3

0,3

0,3

ALICE

Natali

Sergio

P.Ord.

Gr. I

0,2

0,2

CMS

Piscitelli

Giacomo

P.Ass.

Politecnico

0,4

0,4

ALICE

Silvestris

Lucia

Ric. III

Gr. I

0,3

0,3

2,2

0,3

0,4

0,7

CMS

Zito

Giuseppe

Ric. III

Gr. I

0,3

0,3

2,2

0,5

0,5

CMS

ZZZ

Art. 15

CTER

Sezione

0,5

0,5

0,5

Totali

2,0

6,7

BOLOGNA

Ruolo o

Afferenza

DataGrid

INFN-Grid

Cognome

Nome

Qualifica

o Gruppo

WP#

Tot

WP#

Tot

Esperimento

Anselmo

Ric. III

Gr. II

0,25

0,25

ALICE

Bellagamba

Ric. III

Gr. I

0,25

0,25

ALICE

Bortolotti

Daniela

CTER

Sezione

0,3

0,3

Capiluppi

Paolo

P. Ass.

Gr. I

0,3

0,3

0,5

0,4

0,9

CMS

Cara Romeo

Ric. III

Gr. I

0,2

0,2

ALICE

Fanfani

Alessandra

Dott.

Gr. I

0,5

0,5

CMS

Galli

Domenico

Ric. Un.

Gr. I

0,3

0,3

0,4

2.3

0.3

0,7

LHCb

Giacomelli

Roberto

CTER

Gr. I

0,4

0,4

CMS

Grandi

Claudio

Ass. Ric.

Gr. I

0,3

0,3

0,4

0,8

CMS

Luvisetto

Marisa

Ric. III

Gr. II

0,3

0,3

ALICE

Marconi

Umberto

Ric. III

Gr. I

0,3

0,3

0,4

2.3

0.3

0,7

LHCb

Mazzanti

Paolo

Ric. II

Gr. I

0,3

2,4

0,2

0,5

CMS

Resp. Loc.

Rovelli

Tiziano

Ric. Un.

Gr. I

0,3

0,3

CMS

Semeria

Franco

Tecno. III

Sezione

0,4

0,4

2,1

0,5

0,2

0,7

Semprini Cesari

Nicola

P. Ass.

Gr. I

0,3

0,3

LHCb

Siroli

GianPiero

Ric. Un.

Gr. I

2,4

0,3

0,4

0,7

CMS

Vagnoni

Vincenzo

Dott.

Gr. I

0,2

0,3

0,5

LHCb

Vecchi

Stefania

Ric. III

Gr. I

0,4

0,4

LHCb

Totali

1,6

8,7

CAGLIARI

Ruolo o

Afferenza

DataGrid

INFN-Grid

Cognome

Nome

Qualifica

o Gruppo

WP#

Tot

WP#

Tot

Esperimento

Bonivento

Walter

Ric.

Gr. I

0,2

0,2

LHCb

DeFalco

Alessandro

Ass. Ric.

Gr. III

0,3

0,3

ALICE

Masoni

Alberto

I Ric.

Gr. III

0,4

0,4

0,4

0,2

0,6

ALICE

Resp. Loc.

Silvestri

Antonio

Tecno. III

Sezione

0,3

0,3

Tocco

Luisanna

Dott.

Gr. III

0,3

0,3

ALICE

Usai

Gianluca

Ric. U.

Gr. III

0,2

0,2

ALICE

ZZZ

Ass. Ric.

Gr. III

0,5

0,5

ALICE

Totali

0,4

2,4

CATANIA

Ruolo o

Afferenza

DataGrid

INFN-Grid

Cognome

Nome

Qualifica

o Gruppo

WP#

Tot

WP#

Tot

Esperimento

Barbanera

Franco

P. Ass.

Inform.

2,1

0,2

0,2

Barbera

Roberto

Ric. Un.

Gr. III

0,1

0,4

0,5

ALICE

Belluomo

Patrizia

CTER

Sezione

0,3

0,3

0,5

0,5

Cangiano

Ernesto

Tecno. III

Sezione

2,4

0,25

0,15

0,1

0,5

Cavalieri

Salvatore

P. Ass.

DIT-Ing.

0,4

0,4

2,1

0,4

0,4

CMS

Commis

Enrico

Tecno. U.

Ing.

0,4

0,4

2,1

0,4

0,4

Costa

Salvatore

Ric. U.

Gr. III->Gr. I

0,3

0,5

0,8

CMS

La Corte

Aurelio

Ric. U.

Ing.

0,4

0,4

2,3

0,4

0,4

Lo Bello

Lucia

Ass. Ric.

DIT-Ing.

0,4

0,4

2,1

0,4

0,4

CMS

Mirabella

Orazio

P. Ord.

DIT-Ing.

0,4

0,4

2,1

0,4

0,4

CMS

Monforte

Salvatore

Dott.

DIT-Ing.

0,4

0,4

2,1

0,4

0,4

CMS

Palmeri

Armando

Ric. I

Gr. III

0,5

0,5

ALICE

Rocca

Carlo

CTER

Sezione

2,4

0,25

0,15

0,1

0,5

Sassone

Vladimiro

P. Ord.

Inform.

2,1

0,2

0,2

Sava

Giuseppe

CTER

0,5

0,5

Tomarchio

Orazio

B. Univ.

Ing.

0,4

0,4

2,3

0,4

0,4

Tricomi

Alessia

Ass. Ric.

Gr. I

0,3

0,3

0,4

0,4

CMS

Vita

Lorenzo

P. Ord.

Ing.

0,4

0,4

2,1

0,4

0,4

ZZZ

Tecnologo

Tecno. III

Sezione

2,4

0,25

0,5

Totali

3,8

8,3

CNAF

Ruolo o

Afferenza

DataGrid

INFN-Grid

Cognome

Nome

Qualifica

o Gruppo

WP#

Tot

WP#

Tot

Esperimento

Chierici

Andrea

B. INFN

Gr. V

0,30

0,3

2,4

0,3

0,2

0,5

Ciancarini

Paolo

P. Ord.

Inform.

0,5

0,5

2,1

0,5

0,5

Dell’Agnello

Luca

Tecno. III

Gr. V

0,3

0,5

2,4

0,5

0,5

Ferrari

Tiziana

Tecno. III

Gr. V

0,30

0,3

2,1

0,2

0,5

0,7

Fonti

Luigi

Tecno. III

Gr. V

0,3

0,3

2,1

0,3

0,2

0,5

Ghiselli

Antonia

Tecno. I

Gr. V

0,3

0,3

2,1

0,3

0,2

0,3

0,8

Giacomini

Francesco

Ass. Ric.

Gr. V

0,7

2,1

0,7

2,4

0,2

0,1

Italiano

Alessandro

CTER

0,1

0,1

Matteuzzi

Pietro

Tecno. II

Gr. V

2,4

0,3

0,3

Ruggieri

Federico

Ricer. I

Gr. I

0,5

0,5

0,5

0,5

CMS

Resp. Loc.

Vistoli

Cristina

Tecno. II

Gr. V

0,5

0,5

2,1

0,5

0,2

2,4

0,1

0,8

Vita Finzi

Giulia

C. Am.

Gr. V

0,4

0,4

Zani

Stefano

CTER

Gr. V

0,3

0,3

2,4

0,3

0,3

Totali

4,5

6,3

COSENZA

Ruolo o

Afferenza

DataGrid

INFN-Grid

Cognome

Nome

Qualifica

o Gruppo

WP#

Tot

WP#

Tot

Esperimento

Capua

Marcella

Contr.

Gr. I

0,2

0,2

ATLAS

LaRotonda

Laura

P. Ass.

Gr. I

0,2

0,2

ATLAS

Schioppa

Marco

Ric. U.

Gr. I

0,2

0,2

ATLAS

Totali

0,6

FERRARA

Ruolo o

Afferenza

DataGrid

INFN-Grid

Cognome

Nome

Qualifica

o Gruppo

WP#

Tot

WP#

Tot

Esperimento

Gambetti

Michele

Art. 15

Sezione

2,1

0,2

0,1

0,3

Gianoli

Alberto

Tecno. III

Gr. I

2,1

0,2

0,3

0,5

LHCb

Luppi

Eleonora

Tecno. U. II

Gr. I

2,1

0,1

0,2

Resp. Loc.

Totali

FIRENZE

Ruolo o

Afferenza

DataGrid

INFN-Grid

Cognome

Nome

Qualifica

o Gruppo

WP#

Tot

WP#

Tot

Esperimento

Bellucci

Leonardo

Dott.

Gr. I

CMS

Cuoco

Elena

Ass. Ric.

Gr. II

0,3

0,3

VIRGO

D'Alessandro

Raffaello

Ric. Un.

Gr. I

0,2

0,2

CMS

Res. Loc.

Dominici

Piero

Dott.

Gr. II

0,2

0,2

VIRGO

Fabroni

Leonardo

Dott.

Gr. II

0,1

0,2

0,3

VIRGO

Graziani

Giacomo

Ass. Ric.

Gr. I

0,2

0,4

LHCb

Lenzi

Michela

Ass. Ric.

Gr. I

0,5

0,5

CMS

Passaleva

Giovanni

Ric.

Gr. I

0,2

0,2

LHCb

Vetrano

Flavio

P. Ass.

Gr. II

0,1

0,1

VIRGO

ZZZ

Art. 15

Sezione

0,1

0,1

Totali

3,3

GENOVA

Ruolo o

Afferenza

DataGrid

INFN-Grid

Cognome

Nome

Qualifica

o Gruppo

WP#

Tot

WP#

Tot

Esperimento

Becchi

Carlo Maria

P. Ord.

Gr. IV

0,1

0,1

APE

Brunengo

Alessandro

Tecn. III

Sezione

0,3

0,3

Chiola

Giovanni

P. Ord.

Inform.

0,5

0,5

2,4

0,5

0,5

IT Università

Ciaccio

Giuseppe

Ric. Un.

Inform.

0,5

0,5

2,4

0,5

0,5

IT Università

Dameri

Mauro

Ric. II

Gr. I

0,4

0,4

ATLAS

Osculati

Bianca

P. Ass.

Gr. I

0,3

0,3

ATLAS

Totali

2,1

LECCE

Ruolo o

Afferenza

DataGrid

INFN-Grid

Cognome

Nome

Qualifica

o Gruppo

WP#

Tot

WP#

Tot

Esperimento

Aloisio

Giovanni

P.Ass.

2,1

0,2

2,4

0,2

0,4

IT Università

Cafaro

Massimo

Ric. U.

0,4

0,4

2,1

0,3

2,4

0,2

0,1

0,6

IT Università

Campeggio

Salvatore

Tec. U.

2,1

0,2

2,4

0,1

0,3

IT Università

Cataldi

Gabriella

Ric. III

Gr. I

0,2

0,2

ATLAS

Depaolis

Lucio

Tec. U.

2,1

0,1

2,4

0,2

0,1

0,4

IT Università

Fasanelli

Enrico M.V.

Tecn. III

Sezione

2,1

0,2

2,4

0,2

0,6

Gorini

Edoardo

P.Ass.

Gr. I

0,2

0,2

ATLAS

Martello

Daniele

Ric. U.

0,2

0,2

ARGO

Perrino

Roberto

Ric. III

Gr. I

0,2

0,2

ATLAS

Primavera

Margherita

Ric. III

Gr. I

0,2

0,2

ATLAS

Surdo

Antonio

Ric.

0,2

0,2

ARGO

Tommasi

Franco

Ric. U.

0,2

0,2

IT Università

ZZZ

Dottorato

Dott.

2,4

0,3

0,3

ISUFI

ZZZ

Dottorato

Dott.

2,1

INFN/ISUFI

Totali

0,4

5,0

LNL

Ruolo o

Afferenza

DataGrid

INFN-Grid

Cognome

Nome

Qualifica

o Gruppo

WP#

Tot

WP#

Tot

Esperimento

Berti

Luciano

Sezione

0,3

0,3

0,3

0,3

Biasotto

Massimo

Sezione

0,5

2,4

0,5

0,4

0,1

Gulmini

Michele

Sezione

0,5

0,5

2,4

0,5

0,5

Maron

Gaetano

Tecno. I

Sezione

0,5

0,5

2,4

0,5

0,5

Toniolo

Nicola

Sezione

0,3

0,3

0,3

0,3

Vannucci

Luigi

0,3

0,3

ALICE

Totali

2,6

2,9

MILANO

Ruolo o

Afferenza

DataGrid

INFN-Grid

Cognome

Nome

Qualifica

o Gruppo

WP#

Tot

WP#

Tot

Esperimento

Alemi

Mario

Perf.

0,5

0,5

LHCb

Destri

Claudio

P. Ass.

Gr. IV

0,1

0,1

APE

Frezzotti

Roberto

Ass. Ric.

Gr. IV

0,2

0,2

APE

Lo Biondo

Giuseppe

CTER

Sezione

0,1

0,1

Paganoni

Marco

Ric. Un.

Gr. I

0,2

0,2

LHCb

Perini

Laura

P. Ass.

Gr. I

0,3

0,6

0,3

0,6

ATLAS

Prelz

Francesco

Tecn. III

Sezione

0,3

0,3

2,1

0,2

0,1

0,3

Ragusa

Francesco

P. Ass.

Gr. I

0,2

0,2

ATLAS

Resconi

Silvia

Art. 23

Gr. I

0,3

0,6

0,3

0,6

ATLAS

Totali

1,5

2,8

NAPOLI

Ruolo o

Afferenza

DataGrid

INFN-Grid

Cognome

Nome

Qualifica

o Gruppo

WP#

Tot

WP#

Tot

Esperimento

Barone

Fabrizio

P. Ass.

Gr. II

0,2

0,2

VIRGO

Calloni

Enrico

Ric. Univ.

Gr. II

0,3

0,3

VIRGO

Carlino

GianPaolo

Ric. III

Gr. I

0,4

0,4

ATLAS

Catalanotti

Sergio

P. Ass.

Gr. II

0,3

0,3

ARGO

De Rosa

Rosario

Ric. Univ.

Gr. II

0,5

0,5

VIRGO

Della Volpe

Domenico

B. INFN

Gr. I

0,4

0,4

ATLAS

Di Sciascio

Giuseppe

Ric. III

Gr. II

0,2

0,2

ARGO

Doria

Alessandra

Tecno. Art. 23

Gr. I

0,4

0,4

ATLAS

Eleuteri

Antonio

Dott.

Gr. II

0,5

0,5

VIRGO

Garufi

Fabio

Tecno. Art. 23

Gr. II

0,3

0,3

VIRGO

Lo Re

Paolo

Tecno. III

Sezione

0,2

0,2

Mastroserio

Paolo

Tecno. III

Sezione

0,2

0,2

Merola

Leonardo

P. Ass.

Gr. I

0,3

0,3

ATLAS

Milano

Leopoldo

P. Ord.

Gr. II

0,2

0,2

VIRGO

Totali

4,4

PADOVA

Ruolo o

Afferenza

DataGrid

INFN-Grid

Cognome

Nome

Qualifica

o Gruppo

WP#

Tot

WP#

Tot

Esperimento

Balsamo

Simonetta

Ric. U.

0,4

0,4

2,4

0,4

0,4

IT Università

Bellato

Marco

Tecn.

Gr. I

2,4

0,25

0,25

CMS

Costa

Fulvia

CTER

Sezione

2,4

0,3

0,3

Ferrari

Roberto

CTER

Sezione

2,4

0,3

0,3

Gasparini

Ugo

P. Ass.

Gr. I

0,6

0,6

0,6

0,6

CMS

Lacaprara

Stefano

Dott.

Gr. I

0,5

0,5

CMS

Lippi

Ivano

Ric. III

Gr. I

0,6

0,6

0,6

0,6

CMS

Mazzucato

Mirco

Ric. I

Gr. I

0,5

0,5

0,7

0,7

CMS

Michelotto

Michele

Tecn. III

Sezione

0,3

0,3

0,3

2,4

0,3

0,6

Morando

Maurizio

P. Ass.

Gr. III

0,3

0,3

ALICE

Orlando

Salvatore

P. Ass.

0,3

0,3

2,1

0,3

0,3

IT Università

Ronchese

Paolo

Ric. Un.

Gr. I

0,5

0,5

CMS

Saccarola

Ivo

CTER

Sezione

2,4

0,3

0,3

Sgaravatto

Massimo

Tecn. III

Sezione

0,4

0,3

0,7

2,1

0,4

0,3

0,2

0,9

Turrisi

Rosario

Bors.

Gr. III

0,4

0,4

ALICE

Vanini

Sara

Bors.INFN

Gr. I

0,7

0,7

CMS

Ventura

Sandro

Tecn.

Gr. I

2,4

0,25

0,25

CMS

ZZZ

Borsa Un.

B. Univ.

Gr. I

CMS

Totali

3,4

8,9

PARMA

Ruolo o

Afferenza

DataGrid

INFN-Grid

Cognome

Nome

Qualifica

o Gruppo

WP#

Tot

WP#

Tot

Esperimento

Alfieri

Roberto

Tec. U.

Gr. IV

0,2

0,2

APE

Onofri

Enrico

P. Ord.

Gr. IV

0,2

0,2

APE

Totali

0,4

PAVIA

Ruolo o

Afferenza

DataGrid

INFN-Grid

Cognome

Nome

Qualifica

o Gruppo

WP#

Tot

WP#

Tot

Esperimento

Conta

Claudio

P. Ord.

Gr. I

0,2

0,2

ATLAS

De Vecchi

Carlo

Tecn.

0,25

0,25

ATLAS

Polesello

Giacomo

Ric.

Gr. I

0,2

0,2

ATLAS

Rimoldi

Adele

Ric. U.

Gr. I

0,2

0,2

ATLAS

Vercesi

Valerio

I Ric.

Gr. I

0,2

0,4

ATLAS

Totali

1,25

PERUGIA

Ruolo o

Afferenza

DataGrid

INFN-Grid

Cognome

Nome

Qualifica

o Gruppo

WP#

Tot

WP#

Tot

Esperimento

Biasini

Maurizio

Ric. Un.

Gr. I

0,2

0,2

CMS

Cattuto

Ciro

Dott.

Gr. II

0,5

0,5

VIRGO

Gammaitoni

Luca

Ric. Un.

Gr. II

0,2

0,2

VIRGO

Gentile

Fabrizio

CTER

Sezione

0,1

0,1

Lariccia

Paolo

P. Ord.

Gr. I

0,4

0,4

CMS

Punturo

Michele

Tecno. III

Gr. II

0,3

0,3

VIRGO

Resp.

Santinelli

Roberto

Dott.

Gr. I

0,5

0,5

CMS

Servoli

Leonello

Ric. III

Gr. I

0,3

0,2

0,5

CMS

Resp.

Totali

2,7

PISA

Ruolo o

Afferenza

DataGrid

INFN-Grid

Cognome

Nome

Qualifica

o Gruppo

WP#

Tot

WP#

Tot

Esperimento

Arezzini

Silvia

CTER

Sezione

2,2

0,3

0,3

Bagliesi

Giuseppe

Ric. III

Gr. I

0,3

0,3

0,3

0,6

CMS

Cella

Giancarlo

Bors. Dip. Fis.

Gr. II

0,1

0,1

VIRGO

Ciulli

Vitaliano

Ass. Ric. SNS

Gr. I

0,3

0,3

0,6

0,6

CMS

Controzzi

Andrea

Bors. SNS

Gr. I

2,2

0,5

CMS

Costanzo

Davide

Ass. Ric. INFN

Gr. I

0,5

0,5

ATLAS

Del Prete

Tarcisio

Dir. Ric.

Gr. I

0,2

0,2

ATLAS

Donno

Flavia

Tec. III

Sezione

0,5

0,5

2,2

0,2

0,3

0,5

Ferrante

Isidoro

Ric. Univers.

Gr. II

0,1

0,1

VIRGO

Giassi

Alessandro

Ric. III

Gr. I

0,3

0,3

0,3

0,3

CMS

Palla

Fabrizio

Ric. III

Gr. I

0,3

0,3

0,5

0,5

CMS

Schifano

Fabio

Tec. III Art. 23

Gr. IV

2,2

0,2

0,2

APE

Sciaba'

Andrea

Bors. INFN

Gr. I

0,3

0,3

0,2

0,3

0,5

CMS

Vicere'

Andrea

Ric. Art. 36

Gr. II

0,1

0,1

VIRGO

Xie

Zhen

Bors. INFN

Gr. I

0,4

0,4

0,5

0,3

0,8

CMS

Totali

2,4

6,3

ROMA

Ruolo o

Afferenza

DataGrid

INFN-Grid

Cognome

Nome

Qualifica

o Gruppo

WP#

Tot

WP#

Tot

Esperimento

Anzellotti

Daniela

CTER

Sezione

2,4

0,2

0,1

0,3

Barone

Luciano M.

Ric. Un.

Gr. I

0,3

0,3

2,2

0,3

0,9

CMS

Resp.

Battista

Claudia

Tecno. III

Sezione

0,4

0,4

2,4

0,4

0,4

De Rossi

Marco

CTER

Sezione

2,4

0,1

0,0

0,1

De Salvo

Alessandro

Art. 23

Gr. I

0,4

0,4

0,4

0,4

ATLAS

Diemoz

Marcella

Ric. II

Gr. I

0,3

0,3

CMS

Falciano

Speranza

Ric. II

Gr. V

2,4

0,2

0,4

ATLAS

Frasca

Sergio

Ric. III

Gr. II

0,3

0,3

VIRGO

Resp.

Lonardo

Alessandro

Ric. III

Gr. IV

2,2

0,3

0,3

APE

Longo

Egidio

P. Ord.

Gr. I

0,3

0,3

CMS

Luminari

Lamberto

Ric. II

Gr. I

0,3

0,6

0,3

0,6

ATLAS

Resp.

Majorana

Ettore

Art. 23

Gr. II

0,1

0,2

VIRGO

Marzano

Francesco

Ric. I

Gr. V

2,4

0,15

0,1

0,25

ATLAS

Michelotti

Andrea

Ric. III

Gr. IV

2,2

0,3

0,3

APE

Mirabelli

Giovanni

Ric. II

Gr. I

0,3

0,3

2,1

0,3

0,2

0,5

ATLAS

Nisati

Aleandro

Ric. III

Gr. I

0,25

0,25

ATLAS

Organtini

Giovanni

Ric. Un.

Gr. I

2,2

0,2

0,3

0,5

CMS

Palomba

Cristiano

Ass. Ric.

Gr. II

0,2

0,1

0,5

VIRGO

Pasqualucci

Enrico

Tecno. III

0,1

0,1

ATLAS

Ricci

Fulvio

P. Ass.

Gr. II

0,3

0,3

VIRGO

Rossetti

Davide

Art. 23

Gr. IV

0,3

0,3

2,2

0,2

0,1

0,3

APE

Resp.

Santacesaria

Roberta

Ric. II

Gr. I

0,2

0,2

LHCb

Resp.

Spanu

Alessandro

CTER

Sezione

2,4

0,2

0,1

0,3

Resp.

Valente

Enzo

Ric. I

Gr. I

0,3

0,3

2,4

0,2

0,1

0,4

CMS

Resp. Loc.

Totali

2,6

8,4

ROMA2

Ruolo o

Afferenza

DataGrid

INFN-Grid

Cognome

Nome

Qualifica

o Gruppo

WP#

Tot

WP#

Tot

Esperimento

Camarri

Paolo

Contr.

Gr. I

0,3

0,3

ATLAS

Di Ciaccio

Anna

P. Ass.

Gr. I

0,3

0,3

ATLAS

Guagnelli

Marco

Tecno. III

0,3

0,3

APE

Totali

0,9

ROMA3

Ruolo o

Afferenza

DataGrid

INFN-Grid

Cognome

Nome

Qualifica

o Gruppo

WP#

Tot

WP#

Tot

Esperimento

Bussino

Severino

Ass. Ric.

Gr. II

0,2

0,2

ARGO

Farilla

Ada

Ric. II

Gr. I

0,2

0,2

ATLAS

Stanescu

Cristian

Tecno. II

Gr. I/II

0,4

0,4

ARGO-ATLAS

Totali

0,8

SALERNO

Ruolo o

Afferenza

DataGrid

INFN-Grid

Cognome

Nome

Qualifica

o Gruppo

WP#

Tot

WP#

Tot

Esperimento

Cifarelli

Luisa

P. Ord.

Gr. III

0,1

0,1

ALICE

D'Apolito

Carmen

C.T.

Gr. III

0,2

0,2

ALICE

Fusco Girard

Mario

P. Ass.

Gr. III

0,2

0,2

ALICE

Grella

Giuseppe

P. Ass.

Gr. III

0,2

0,2

ALICE

Guida

Michele

Ric. U.

Gr. III

0,2

0,2

ALICE

Quartieri

Joseph

P. Ass.

Gr. III

0,2

0,2

ALICE

Seganti

Alessio

Dott.

Gr. III

0,25

0,25

ALICE

Vicinanza

Domenico

Dott.

Gr. III

0,25

0,25

ALICE

Virgili

Tiziano

Ric. U.

Gr. III

0,2

0,2

ALICE

Totali

1,8

TORINO

Ruolo o

Afferenza

DataGrid

INFN-Grid

Cognome

Nome

Qualifica

o Gruppo

WP#

Tot

WP#

Tot

Esperimento

Amapane

Nicola

Dott.

Gr. I

0,3

0,3

CMS

Anglano

Cosimo

Ric. Un.

Inform.

0,3

0,3

2,1

0,2

0,1

0,3

IT Università

Cerello

Piergiorgio

Ric. III

Gr. III

0,3

0,3

0,3

0,3

ALICE

Donatelli

Susanna

P. Ass.

Inform.

0,3

0,3

2,1

0,3

0,3

IT Università

Forte

Antonio

CTER

Sezione

2,4

0,1

0,2

Gaido

Luciano

Tecno. III

Sezione

0,3

0,3

2,1

0,3

0,2

0,1

0,6

Gallio

Mauro

P. Ass.

Gr. III

0,3

0,3

ALICE

Masera

Massimo

Ric. Un.

Gr. III

0,3

0,3

0,3

0,3

ALICE

Pittau

Roberto

Ric. Un.

Inform.

0,2

0,2

IT Università

Ramello

Luciano

P. Ass.

Gr. III

0,3

0,3

ALICE

Scomparin

Enrico

Ric. III

Gr. III

0,3

0,3

ALICE

Sitta

Mario

Ric. Un.

Gr. III

0,3

0,3

ALICE

Solano

Ada

Ric. Un.

Gr. I

0,3

0,3

CMS

Vitelli

Annalina

Dott.

Gr. I

0,3

0,3

CMS

Werbrouck

Albert

P. Ord.

Gr. III

0,3

0,3

2,1

0,3

0,3

ALICE

ZZZ

Tecnologo

Tecno. III

Sezione

0,6

0,6

2,1

0,9

0,9

ZZZ

Ass. Ricer.

Ass. Ric.

0,5

0,5

2,1

1,0

Totali

2,9

6,5

TRIESTE

Ruolo o

Afferenza

DataGrid

INFN-Grid

Cognome

Nome

Qualifica

o Gruppo

WP#

Tot

WP#

Tot

Esperimento

Bradamante

Franco

P. Ord.

Gr. I

0,1

0,1

COMPASS

Fragiacomo

Enrico

Gr. III

0,5

0,5

ALICE

Gobbo

Benigno

Ric. III

Gr. I

0,3

0,3

COMPASS

Gomezel

Roberto

Tecno. III

Sezione

0,3

0,3

Lamanna

Massimo

Ric. III

Gr. I

0,4

0,4

COMPASS

Martin

Anna

P. Ass.

Gr. I

0,2

0,2

COMPASS

Piano

Stefano

Gr. III

0,3

0,3

ALICE

Rui

Rinaldo

Gr. III

0,3

0,3

ALICE

Strizzolo

Claudio

CTER

Sezione

0,2

0,2

Strizzolo

Lucio

Sezione

0,2

0,2

Tirel

Alessandro

Sezione

0,2

0,2

ZZZ

Tecnico

CTER

Sezione

Totali

UDINE

Ruolo o

Afferenza

DataGrid

INFN-Grid

Cognome

Nome

Qualifica

o Gruppo

WP#

Tot

WP#

Tot

Esperimento

Cabras

Giuseppe

0,15

0,15

ATLAS

DeAngelis

Alessandro

0,1

0,1

ATLAS

Totali

0,25

ALICE Simulation	2001	2002	2003
Events	1.0E+06	2.0E+06	4.0E+06
Needed CPU (SI95) (assuming 12 months time and 100% efficiency)	3000	6000	12000
Needed disk (TB)	6.5	13	26

2001	–Tier-n first implementation –Network connection –Main centers in the Grid	–HLT studies as of 106-107 events/channel –Coordination of OO databases –Network tests on "ad hoc" links and "data mover" tests –All disk option storage
2002	–Tier1 dimension is doubled –Tier0 Centre is connected (Network and first functionality's) –Grid extension including the EU project	–Physics studies in preparation of the TDR and 5% data challenge for Computing TDR –OO distributed databases (remote access) –Performance Tests –Mass storage studies
2003	–Infrastructure completion –Functionality tests and CMS Integration	–Final studies for Physics TDR and preparation for 2004 data challenge – Data challenge –Full integration with CMS Analysis and Production

	Description	Year 1	Year 2	Year 3	Total
	3 PC 26 sites (5ML)	180 ML *	210ML
	L2 switch *10		500ML
CNAF	Router interfaces	120 ML *