CERN les.robertson@cern.ch juin-00 La Politique du tout PC au CERN CUIC – Arcachon – 21-22 juin...

Preview:

Citation preview

CERN

les.robertson@cern.ch juin-00

La Politique du tout PC au CERNLa Politique du tout PC au CERN

CUIC – Arcachon – 21-22 juin 2000

Les Robertson

CERN/IT – Genève

CERN

juin-00 - #2les robertson - cern/it

SommaireSommaire

Le problème La stratégie Les difficultés

CERN

les.robertson@cern.ch juin-00

Le problèmeLe problème

CERN

juin-00 - #4les robertson - cern/it

Architectures & operating systems supported at end 1999

AIX

Windows NT

Irix

Sola

ris

Digital Unix

HP-

UXM

AC-

OS

Linu

x

Windows

95

SPARCMIPS

Inte

l IA-3

2

PA-RISCPower

PC

Alpha Windows 2000The legacy of ten yearsThe legacy of ten years

of RISC computingof RISC computing

CERN

juin-00 - #5les robertson - cern/it

Combien d’architectures et systèmes d’exploitation sont vraiment nécessaire?

Combien coûte le support? Combien vaut la diversité?

Comment imposer des limitations de choix dans un environnement de recherche scientifique ?

CERN - The European Organisation for Nuclear Research

The European Laboratory for Particle Physics

Fundamental research in particle physics Financed by 20 European countries 6,000 users (researchers) from all over the world

LHC accelerator under construction Proton-proton collider 27 km of super-conducting magnets Target date for first beams - 2005 Four experiments

2000 physicists, 150 universities

CERN

CMS detector- as big as a 6-storey office block,- costing ~FF 2.000M- 1 PetaByte of filtered data per year

level 1 - special hardware

40 MHz (40 TB/sec)level 2 - embedded processorslevel 3 - PCs

75 KHz (75 GB/sec)5 KHz (5 GB/sec)100 Hz(100 MB/sec)data recording &

offline analysis

The LHC DetectorsCMS

ATLAS

LHCb3.5 PetaBytes / year

~108 events/year

CERN

juin-00 - #9les robertson - cern/it

Performance or Throughput?Performance or Throughput?

High Throughput Computing mass of modest problems throughput rather than performance resilience rather than ultimate reliability

Ten years of experience in exploiting inexpensive mass market components

But we need to marry these with inexpensive highly scalable management

tools

Much in common with data mining, Internet computing facilities, ……

Estimated CPU Capacity at CERN

0

500

1,000

1,500

2,000

2,500

1998 1999 2000 2001 2002 2003 2004 2005 2006

year

K S

I95

~10K SI951200 processors

Non-LHC

technology-price curve (40% annual price improvement)

LHC

10-20Kcpus?

Estimated DISK Capacity ay CERN

0

200

400

600

800

1000

1200

1400

1600

1800

1998 1999 2000 2001 2002 2003 2004 2005 2006

year

Ter

aByt

es

Non-LHC

technology-price curve (40% annual price improvement)

LHC

lmr for Monarc study- april 1999

250 Gbps

0.8 Gbps

8 Gbps

…………3000 processors1500 boxes160 clusters40 sub-farms

24 Gbps*

960 Gbps*

6 Gbps*

1.5 Gbps

100 drives

12 Gbps

5400 disks

340 arrays

……...

LAN-WAN routers

CERN

CMS Offline Farmat CERN circa 2006

tapes

0.8 Gbps (daq)

0.8 Gbps

5 Gbps

disks

processors

storage network

storage network

farm network

lmr for Monarc study- april 1999

250 Gbps

0.8 Gbps

8 Gbps

…………3000 processors1500 boxes160 clusters40 sub-farms

24 Gbps*

960 Gbps*

6 Gbps*

1.5 Gbps

100 drives

12 Gbps

5400 disks

340 arrays

……...

LAN-WAN routers

CERN

CMS Offline Farmat CERN circa 2006

tapes

0.8 Gbps (daq)

0.8 Gbps

5 Gbps

disks

processors

storage network

storage network

farm networkLHC physics facility – 4 experiments

2 M SPECint9510-20K processors

2 PByte disk>20 K disks

CERN

juin-00 - #14les robertson - cern/it

Summary of the problemSummary of the problem

HEP is using far too many operating systems in many cases with only slightly different functionality or hardware

cost benefits and at a high cost for users and support teams

The scale of LHC computing - massive numbers of processors/boxes integration of regional computing centres and CERN

problem is how to manage on this scale while limiting costs of equipment, management &

support

We must reduce the diversity while retaining flexibility to use low-cost, mass market componentsand adapt rapidly to changing physics needs

CERN

les.robertson@cern.ch juin-00

La stratégieLa stratégie

CERN

juin-00 - #16les robertson - cern/it

OpportunityOpportunity

PCs + { Linux ¦ Windows } offer an historic opportunity to reduce the solution set

Costs and performance PCs will consistently be among the very best

price/performers for HEP codes They may not be the fastest,but they are fast

enough

Linux -a non-proprietary operating system compatible with the recent Unix history

Windows – a mass market alternative – widely used on the desktop

CERN

juin-00 - #17les robertson - cern/it

PolicyPolicy

Restrict ourselves to PC hardware with Linux or Windows 2000

Develop a migration plan - progressively freeze support for other Unixes,

announcing end-dates which are reasonable for old experiments,

strongly discourage further investments in RISC systems by current and future experiments

install large Linux public facility, testbed for future experiments

Concentrate investment in Linux and Windows bring support up to the standards of proprietary

Unixes tackle the problems of scaling the management

and performance of physics farms and desktops seek HEP-wide consensus

CERN

juin-00 - #18les robertson - cern/it

But do not be unrealistic ----But do not be unrealistic ----

This is a convergence policy which looks realistic now and will provide a single starting point

for LHC computing but we can be sure that the industry

will not stand still, and we shall sooner or later have to expand the systems and architectures supported

AIX

WNTIrix

Sola

ris

Digital Unix

HP-

UX

MAC-

OS

Linu

x

Windows

95

SPARC

MIPS In

tel I

A-32

PA-RISC

Power

PC

Alpha LinuxWindows

2000

Intel IA-64- - - ?

CERN

les.robertson@cern.ch juin-00

Les difficultésLes difficultés et l'état de la migrationet l'état de la migration

CERN

juin-00 - #20les robertson - cern/it

Difficulties - IDifficulties - I

Physics – (almost) entirely Unix based Linux is not quite ready

(Too) wide a choice of kernels, compilers Poor debugging Different versions supported by different applications Complex packages (Oracle, AFS) – better go with the

standard platform Stability problems under load Who provides in-depth, on-site Linux systems support?

Solution: Standard Linux Package certified for all CERN

applications Solaris/SPARC for special purposes Open posts for Linux experts

CERN

juin-00 - #21les robertson - cern/it

Difficulties - IIDifficulties - II

In a research environment Easy to estimate the costs of systems support Hard to estimate the cost of application migration

The application experts have already moved on The developers have other (more interesting)

problems to solve The problem is not only to port the code but (more important) to acquire confidence in the

physics results Compiler, architecture, old bugs But there are signs that Linux+Intel are as good as

any! In the past, the production use of multiple

architectures was an important factor in finding bugs

CERN

juin-00 - #22les robertson - cern/it

Current Status – PhysicsCurrent Status – Physics

For older experiments – Strong resistance to aggressive migration

proposal Now aiming at complete freeze on all proprietary

Unixes during 2003

For future experiments (not yet collecting data) – General agreement, but reserve position on

alternative platform for validation

For new experiments collecting data now Easy to calculate the benefits Have already completed migration

CERN

juin-00 - #23les robertson - cern/it

Current Status – other applicationsCurrent Status – other applications

Engineering applications Aggressive migration plan to Windows NT/2000

& Linux, with some residual SUN Major exception is mechanical CAE (Euclid +

Digital Unix)

Administration Database (Oracle) on SUN Clients Web-based (Netscape) Strong pockets of MAC resistance

– led by the Directorate

CERN

juin-00 - #24les robertson - cern/it

ConclusionsConclusions

Les besoins énormes du LHC exigent la standardisation et l’utilisation des composants bon-marché

Opportunité –Linux + Windows 2000 avec Intel IA32/64 & Ethernet

Grande inertie (résistance?) de la part des « vielles » expériences - il faudra 4 années pour terminer la migration

Mais – déjà plus de la moitié des systèmes installés et 75% de la capacité sont Linux/Intel