Rapport d'HathiTrust sur un plan de sauvegarde des données informatiques en cas de sinistre

Embed Size (px)

Citation preview

  • 7/27/2019 Rapport d'HathiTrust sur un plan de sauvegarde des donnes informatiques en cas de sinistre.

    1/61

    HathiTrust

    isa

    Solution

    TheFoundationsofa

    DisasterRecoveryPlanfortheShared

    DigitalRepository

    Thisreportservesas

    recommendationsmadeby

    MichaelJ.Shallcross,

    2009DigitalPreservationIntern

    UniversityofMichigan

    SchoolofInformation

  • 7/27/2019 Rapport d'HathiTrust sur un plan de sauvegarde des donnes informatiques en cas de sinistre.

    2/61

    ii

    ExecutiveSummary

    ThisreportseekstoestablishtheframeworkofaDisasterRecoveryPlanfortheHathiTrust

    DigitalLibrary.Whileprofessionalbestpracticesandinstitutionalneedshaveprovidedaclearmandate

    forHathiTrustsDisasterRecoveryProgram,commonparlancehasoftenobscuredtwoprominent

    featuresofsuchinitiatives.First,aDisasterRecoveryPlanisactuallycomprisedofasuiteofdocuments

    whichdetailarangeofissues,fromcrisiscommunicationsandthecontinuityofadministrativeactivities

    totherestorationofhardwareanddata.Second,thereisnoconclusiontotheplanningprocess;itis

    insteadacontinuouscycleofobservation,analysis,solutiondesign,implementation,training,testing,

    andmaintenance.

    Theprimarygoalofthepresentdocumentistoprovideafoundationonwhichfutureplanning

    effortsmaybuild.Tothatend,itexaminesthestrategiesbywhichHathiTrusthasanticipatedand

    mitigatedtherisksposedbytencommonscenarioswhichcouldprecipitateadisaster:

    o Hardwarefailureanddatalosso Networkconfigurationerrorso Externalattackso Formatobsolescenceo Coreutilityorbuildingfailureo Softwarefailureo Operatorerroro Physicalsecuritybreacho Mediadegradationo Manmadeaswellasnaturaldisasters.

    Asthislistreveals,adisasterwithinthedigitalrepositoryrefersnotmerelytodataloss,thedestruction

    ofequipment,ordamagetoitsenvironment,buttoanyeventwhichhasthepotentialtocausean

    extendedserviceoutage.Foreachscenario,thereportdiscussespossiblethreats,summarizesthe

    potentialseverityofrelatedevents,andthendetailssolutionsHathiTrusthasenactedthroughdirectquotationsfromtheHathiTrustWebsiteandTRACselfassessment,ServiceLevelAgreements,and

    literaturefromserviceprovidersandvendors.Attachedappendicesproviderelevantinformationand

    includecontactsforimportantHathiTrustresources,anannotatedguidetoDisasterRecoveryPlanning

    references,andanoverviewofkeystepsintheDisasterRecoveryPlanningprocess.

    TheconcludingsectionofthereportprovidesrecommendationsandactionitemsforHathiTrust

    asitproceedswithitsDisasterRecoveryInitiative.ThesearedividedintoShort(06mos.),Intermediate

    (612mos.)andLongTerm(12+mos.)objectivesandarearrangedinasuggestedorderof

    accomplishment.

    o Shorttermgoalsinclude: DescribingthenatureandextentofHathiTrustsinsurancecoverage Testingandvalidationofcurrenttapebackupprocedures Improvedphysicalandintellectualcontroloversystemhardware Establishment,distribution,andmaintenanceofphonetrees Increaseddocumentationofinstitutionalknowledge IdentificationofDisasterRecoverymeasuresinplaceattheIndianapolissite.

    o Intermediatetermobjectivesfocuson: CreationofaDisasterRecoveryPlanningCommittee

  • 7/27/2019 Rapport d'HathiTrust sur un plan de sauvegarde des donnes informatiques en cas de sinistre.

    3/61

    iii

    Initiationofthedatacollectionandanalysisessentialtothecreationofrecoverystrategies(Thissectionprovidesahighlevelbreakdownofvarioustasksand

    includesthecoordinationofactivitiesbetweentheAnnArborandIndianapolis

    sitesaswellaswithserviceprovidersandvendors.)

    o Longtermactionitemsdealwith: CompletionandimplementationofthesuiteofDisasterRecoverydocuments Initiationofstafftrainingandtestsoforganizationalcompliance. Storageofanadditionalcopyofbackuptapesataremotethirdlocation InvestigationofanalternatehotsiteinAnnArborintheeventadisaster

    renderstheMACCunusable

    Considerationofathirdinstanceoftherepository Avoidanceofvendorlockinifakeysuppliershouldgooutofbusiness.

    Thisreportdemonstratesthatvariousriskmanagementstrategies,designelements,operating

    procedures,andsupportcontractshaveendowedHathiTrustwiththeabilitytopreserveitsdigital

    contentandcontinueessentialrepositoryfunctionsintheeventofadisaster.Theestablishmentofthe

    Indianapolismirrorsite,theperformanceofnightlytapebackupstoaremotelocation,andthe

    redundantpowerandenvironmentalsystemsoftheMACCreflectprofessionalbestpracticesandwillenableHathiTrusttoweatherawiderangeofforeseeableevents.Unfortunately,disastersoftenresult

    fromtheunknownandtheunexpected;whiletheaforementionedstrategiesarecrucialcomponentsof

    aDisasterRecoveryPlan,theymustbesupplementedwithadditionalpoliciesandprocedurestoensure

    that,comewhatmay,HathiTrustwillbeabletocarryonasbothanorganizationandadedicatedservice

    provider.

  • 7/27/2019 Rapport d'HathiTrust sur un plan de sauvegarde des donnes informatiques en cas de sinistre.

    4/61

    iv

    Acknowledgements

    TheauthorwouldliketothankShannonZacharyforherencouragementandguidance;Cory

    SnavelyandJeremyYorkfortheirgenerousexpenditureoftime,energy,andknowledge;andNancy

    McGovernandLanceStuchellforaccesstotheiroutstandingDisasterRecoveryPlanningresources.The

    followingindividualshavealsobeeninvaluablesourcesofadvice,support,andinformation:JohnWilkin,

    BobCampe,CyndiMesa,AnnThomas,JohnWeise,LarryWentzel,LaraUngerSyrigos,BillHall,Emily

    Campbell,SebastienKorner,JessicaFeeman,PhilFarber,ChrisPowell,CameronHanover,Stephen

    Hipkiss,TimPrettyman,ReneGobeyn,andKrystalHall.ThanksalsotoDr.ElizabethYakel,MagiaKrause,

    andVeronicaandCoraFambrough.TheworkinthisreportwasmadepossiblebyanIMLSGrant.

  • 7/27/2019 Rapport d'HathiTrust sur un plan de sauvegarde des donnes informatiques en cas de sinistre.

    5/61

    v

    TableofContents

    ExecutiveSummary p.ii Acknowledgements p.iv Introduction p.1

    o GoalsforHathiTrustsDisasterRecoveryProgram p.1o TheMandateforDisasterRecoveryPlanninginDigitalPreservation p.2o DisasterPreparednessintheDesignandOperationofHathiTrust p.2o EssentialHathiTrustBusinessFunctions p.3

    HathiTrustsDisasterRecoveryStrategies p.5o BasicRequirementsforDisasterRecovery p.5o DisasterRecoveryStrategy#1:RedundancybetweentheAnnArborandIndianapolisSitesp.5o DisasterRecoveryStrategy#2:NightlyAutomatedTapeBackups p.6

    Scenario1:HardwareFailureorObsolescenceandDataLoss p.8o Review:RisksInvolvingHardwareFailureorObsolescenceandDataLoss p.8o HathiTrustsSolutionsforHardwareFailureandDataLoss p.8o RedundantComponentsandSinglePointsofFailureintheHathiTrustInfrastructure p.9o KeyFeaturesofHathiTrustsIsilonIQClusteredStorage p.10o HardwareSupportandService p.12o EquipmentTracking p.13o HardwareReplacementSchedule p.13o TimelineforEmergencyReplacementofHathiTrustInfrastructure p.13o HathiTrustandInsuranceCoverageattheUniversityofMichigan p.14

    Scenario2:NetworkConfigurationErrors p.15o Review:RisksInvolvingNetworkConfigurationErrors p.15o

    HathiTrustsSolutionsforNetworkConfigurationErrors p.15o ExtentofITComSupport p.15o ITComResponsibilities p.16o ITComServicesinResponsetoOutagesorDegradationImpactingtheNetwork p.16o HathiTrustResponsibilities p.16

    Scenario3:NetworkSecurityandExternalAttacks p.17o Review:RisksInvolvingNetworkSecurityandExternalAttacks p.17o HathiTrustsSolutionsforNetworkSecurity p.17

    Scenario4:FormatObsolescence p.18o Review:RisksInvolvingFormatObsolescence p.18o HathiTrustsSolutionsforFormatObsolescence p.18o SelectionofFileFormats p.18o FormatMigrationPoliciesandActivities p.19

    Scenario5:CoreUtilityand/orBuildingFailure p.20o Review:RisksInvolvingCoreUtilityorBuildingFailure p.20o HathiTrustsSolutionsforUtilityorBuildingFailure p.20o GeneralMaintenanceandRepairsinUniversityofMichiganFacilities p.20o TheMichiganAcademicComputingCenter(MACC) p.20o ArborLakesDataFacility(ALDF) p.22

  • 7/27/2019 Rapport d'HathiTrust sur un plan de sauvegarde des donnes informatiques en cas de sinistre.

    6/61

    vi

    Scenario6:SoftwareFailureorObsolescence p.23o Review:RisksInvolvingSoftwareFailureorObsolescence p.23o HathiTrustsSolutionsforSoftwareIssues p.23

    Scenario7:OperatorError p.24o Review:RisksInvolvingOperatorError p.24o HathiTrustsSolutionsforOperatorError p.24o Ingest p.24o ArchivalStorage p.24o Dissemination p.24o DataManagement p.24

    Scenario8:PhysicalSecurityBreach p.25o Review:RisksInvolvingaPhysicalSecurityBreach p.25o HathiTrustsSolutionsforPhysicalSecurity p.25o SecurityattheMACC p.25o SecurityattheALDF p.26

    Scenario9:NaturalorManmadeDisaster p.27o Review:RisksInvolvingaNaturalorManmadeDisaster p.27o HathiTrustsSolutionsforNaturalorManmadeCatastrophicEvents p.27o BasicDisasterRecoveryStrategies p.28

    Scenario10:MediaFailureorObsolescence p.29o Review:RisksInvolvingMediaFailureorObsolescence p.29o HathiTrustsSolutionsforMediaFailure p.29o RemainingVulnerabilities p.29

    ConclusionsandActionItems p.30o Conclusions p.30o ShortTermActionItems p.30o IntermediateTermActionItems p.31o LongTermActionItems p.32

    APPENDIXA:ContactInformationforImportantHathiTrustResources p.34 APPENDIXB:HathiTrustOutagesfromMarch2008throughApril2009 p.37 APPENDIXC:WashtenawCountyHazardRankingList p.38 APPENDIXD:AnnotatedGuidetoDisasterRecoveryPlanningReferences p.39 APPENDIXE:OverviewoftheDisasterRecoveryPlanningProcess p.45 APPENDIXF:TSMBackupServiceStandardServiceLevelAgreement(2008) p.52 APPENDIXG:ITCS/ITComCustomerNetworkInfrastructureMaintenanceStandardService

    Agreement(2006) p.53

    APPENDIXH:MACCServerHostingServiceLevelAgreement(Draft,2009) p.54 APPENDIXI:MichiganAcademicComputingCenterOperatingAgreement(2006) p.55

    **AppendicesFIareembeddedPDFfiles.**

  • 7/27/2019 Rapport d'HathiTrust sur un plan de sauvegarde des donnes informatiques en cas de sinistre.

    7/61

    20090824 1

    Introduction

    Intherealmofprintlibraries,adisasterisafairlyunambiguousevent:itisafire,abrokenpipe,

    aninfestationofpestsinshort,anythingwhichthreatensthecontinueduseandexistenceoftextsor

    theenvironmentinwhichtheyarestored.Thisbasicdefinitionmayalsobeappliedtothedigitallibrary,

    inwhichadisasterrefersnotmerelytothelossofcontentorcorruptionofdata,thedestructionofequipmentordamagetoitsenvironment,buttoanyeventwhichhasthepotentialtocausean

    extendedserviceoutage.Thislastpartprovestobethegreatestdifferencebetweentheprintand

    digitalworldsbecausethereareagreatmanythreatswhichcanleavedataintactbutincapacitatethe

    primaryfunctionsofadigitallibrary.ThedailyoperationofaninstitutionsuchasHathiTrustinvolvesthe

    anticipationandresolutionofavarietyofproblemscrashedservers,softwarebugs,networkingerrors,

    etc.whichonlyrisetothelevelofadisasterwhentheyexceedthecapacityofnormaloperating

    proceduresand/orthemaximumallowableoutageperiods.DisasterRecoveryPlanningthuspromptsus

    todeveloprobuststrategiestomitigateandlimittheeffectsofcommonproblemsandatthesametime

    forcesustothinktheunthinkable.Nevertheless,confrontingworstcasescenariosisavitalactivity;the

    beliefthataneventwillneverhappensimplybecauseithasneverhappenedisaninvitationtothevery

    disasterweseektoavoid.Hereinliesaconundrum,inthatthecreationofdetailedplansforevery

    eventualityisnearlyimpossibleandalsoimpractical,sincetheresultsofsuchanendeavorwouldbe

    needlesslycomplexaswellasexpensive.Atitsbasis,then,DisasterRecoveryPlanningdemandsan

    astuteassessmentofrisksothatwemayweighthecostsofpreparationsandsolutionsagainstthecosts

    ofapotentialevent.

    Sowheretobegin?WhenthesubjectofDisasterRecoveryPlanningarises,commonparlance

    oftenobscurestwoprominentfeaturesofsuchinitiatives.First,aDisasterRecoveryPlanisactually

    comprisedofasuiteofdocumentswhichdetailavarietyofrelatedissues,fromcrisiscommunications

    andthecontinuityofadministrativeactivitiestotherecoveryofhardwareanddataandtherestoration

    ofcorefunctions.Second,thereisnoconclusiontotheplanningprocessorapointatwhichaplanis

    done;thereisinsteadacontinuouscycleofobservation,analysis,solutiondesign,implementation,

    training,testing,andmaintenance.Theessentialfirststepisthereforeathoroughknowledgeofthe

    organization,itsgoals,anditsmandateforaDisasterRecoveryProgramsothatlatereffortscanfocusonthearticulationofpoliciesandthedevelopmentofsolutions.Asapreliminarystepinthiseffort,this

    reportlookstoestablishabasicfoundationfromwhichfutureplanningeffortsmaygrow.

    GoalsforHathiTrustsDisasterRecoveryProgram WhileamoreformalstatementofHathiTrustsgoalsandrequirementsforitsDisasterRecovery

    Programmustbeelucidated,therepositorysmissionstatementprovidesagoodindicationofitsmain

    objectiveintheformationofaDisasterRecoveryPlan.Aspartofitsaimtocontributetothecommon

    goodbycollecting,organizing,preserving,communicating,andsharingtherecordofhuman

    knowledge,HathiTrustseekstohelppreservetheseimportanthumanrecordsbycreatingreliableand

    accessibleelectronicrepresentations.

    1

    Thisstatementclearlyjoinsthetwinimperativesofpreservationandaccesswithanadditionalrequirement:reliability.Thedevelopmentandimplementationofa

    DisasterRecoveryPlanwillensurethatdigitalobjectswillretaintheirauthenticityandintegrityoverthe

    longtermandthatpartnerlibrariesanddesignatedusersmayrelyonHathiTrustservices(ortheirtimely

    resumption)andcontentinthefaceofcatastrophicevents.

    1HathiTrust.Mission&Goals(2009)retrievedfromhttp://www.hathitrust.org/mission_goalson8July2009.

  • 7/27/2019 Rapport d'HathiTrust sur un plan de sauvegarde des donnes informatiques en cas de sinistre.

    8/61

    20090824 2

    TheMandateforDisasterRecoveryPlanninginDigitalPreservation HathiTrustsmandateforacomprehensiveandproactiveDisasterRecoveryPlanstemsfroma

    numberofsignificantsources,amongwhichwemayincludeitsmissionandgoals.TheInstitutional

    DataResourceManagementPolicy(2008)oftheUniversityofMichigansStandardPracticeGuidealso

    providesanimpetusforthecreationofaDisasterRecoveryProgram.Whilenotnecessarilyinclusiveof

    theMichiganDigitizationProjectmaterialsstoredinHathiTrust,thisdocumentunderscoreshow

    importantitisthatdataresourcesbesafeguarded[and]protectedandcontingencyplans[]be

    developedandimplemented.2Initsdiscussionofthelatterpoint,thepolicyspecifiesthat:

    DisasterRecovery/BusinessContinuityplansandothermethodsofrespondingtoanemergency

    orotheroccurrencesofdamagetosystemscontaininginstitutionaldata[]willbedeveloped,

    implemented,andmaintained.Thesecontingencyplansshallinclude,butarenotlimitedto,

    databackup,DisasterRecovery,andemergencymodeoperationsprocedures.Theseplanswill

    alsoaddresstestingofandrevisiontodisasterrecovery/businesscontinuityproceduresanda

    criticalityanalysis.3

    Whiledatabackupproceduresandahostofriskmanagementpracticesarealreadyanintegralpartof

    HathiTrustsoperation,therepositorynowlookstoformalizetheotherstrategiessuggestedbythe

    InstitutionalDataManagementPolicy.Beyondtheexamplelaidoutbythisdocument,HathiTrusts

    mandateforDisasterRecoveryderivesfromtheprofessionalliteraturedetailingbestpracticesinthe

    fieldofdigitalpreservation.TheReferenceModelforanOpenArchivalReferenceSystemidentifies

    DisasterRecoveryasanessentialcomponentofitsArchivalStoragefunctionandhighlightsthe

    importanceofsuchplansinachievingthegoaloflongtermpreservationofadigitalarchivesholding.As

    outlinedintheOAISdocument,theDisasterRecoveryfunctionprovidesamechanismforduplicating

    thedigitalcontentsofthearchivecollectionandstoringtheduplicateinaphysicallyseparatefacility.4

    HathiTrusthassuccessfullymetthisrequirementbyperformingnightlytapebackupsandestablishinga

    mirrorsiteatIndianaUniversityinIndianapolis.TheTrustedRepositoriesAudit&Checklist:Criteriaand

    Checklist(2007)isevenmoreexplicitinitsrequirementthatrepositoriesdocumenttheirpoliciesand

    procedureswithsuitablewrittendisasterpreparednessandrecoveryplan(s),includingatleastoneoff

    sitebackupofallpreservedinformationtogetherwithanoffsitecopyoftherecoveryplan(s).5

    Professionalbestpracticesaswellasinternalneedsandgoalsthusprovidethemandatewhichunderlies

    HathiTrustsdevelopmentofaformalDisasterRecoveryPlan.

    DisasterPreparednessintheDesignandOperationofHathiTrust OneoftheprimarygoalsofHathiTrustistoprovidetransparencyinallofitsoperations,

    includingitsworktocomplywithdigitalpreservationstandardsandreviewprocesses.6Nowhereisthis

    commitmentmoreclearthaninitseffortstoanticipateandmitigateriskswhichcouldthreatenthe

    2UniversityofMichigan.InstitutionalDataResourceManagementPolicy(2008)StandardPracticeGuide,

    retrievedfromhttp://spg.umich.edu/on8July2009.3Ibid.4ConsultativeCommitteeforSpaceDataSystems.ReferenceModelforanOpenArchivalInformationSystem

    (2002)p.48.5OCLCandCRL.SectionC3.4TrustedRepositoriesAudit&Checklist:CriteriaandChecklist(2007)p.49.6HathiTrust.Accountability(2009)retrievedfromhttp://www.hathitrust.org/accountabilityon25June2009.

  • 7/27/2019 Rapport d'HathiTrust sur un plan de sauvegarde des donnes informatiques en cas de sinistre.

    9/61

    20090824 3

    contentsandfunctionsoftheSharedDigitalRepository.Asafirststepinaddressingthedisaster

    preparednessrequirementinsectionC3.4oftheTRACCriteriaandChecklist,7thisdocumentservestwo

    purposes.First,itprovidesanoverviewofthepolicies,procedures,resourcesandcontractsthatenable

    HathiTrusttoaddressthechallengesandthreatsendemictothefieldofdigitalpreservation.Materialis

    thereforeciteddirectlyfromtheHathiTrustWebsite(http://www.hathitrust.org),themostrecent

    versionofHathiTrustsreviewofitscompliancewiththeminimumrequiredelementsoftheTRAC

    CriteriaandChecklist,8andrelevantliteratureprovidedbykeyvendorsandserviceproviders.9Second,

    thisreportexaminesHathiTrustscurrentlevelofdisasterpreparednessanddefinescurrentand

    forthcomingeffortsinitsdevelopmentofadynamicandproactiveDisasterRecoveryProgram.Perthe

    recommendationsoftheTRACCriteriaandChecklist,thisdocumentrecordsthemeasuresand

    precautionsalreadyinplaceinregardstospecifictypesofdisastersthatcouldbefallHathiTrust.These

    eventsincludehardwarefailure,dataloss,networkconfigurationerrors,externalattacks,coreutility

    failure,formatobsolescence,softwarefailure,physicalsecuritybreach,andmanmadeaswellasnatural

    disasters.Whileaformal,writtenplandetailingindividualrolesandresponsibilitiesintherepositorys

    responsetoeachofthesescenariosisstillforthcoming,theevidencegatheredinthisreportrevealsthat

    crucialelementsofaDisasterRecoveryPlanarealreadyinplacewithinHathiTrust.10

    EssentialHathiTrustBusinessFunctionsAsthedevelopmentoftheDisasterRecoveryPlanproceeds,itisimportanttobearinmindthat

    itsgoalisnotmerelytherestorationofhardwareanddatabutalsotherecoveryandcontinuityof

    essentialrepositoryfunctions.Thefollowinglistrepresentscorefunctionsthatneedtobeaddressedby

    HathiTrustsDisasterRecoveryPlanandassuchshouldnotbeconsideredacomprehensive

    representationoftherepositorysfunctions.Bydirectingplanningeffortstowardspecificfunctions

    (ratherthantheorganizationsactivitiesasawhole),HathiTrustmayprioritizeandfocusitsrecovery

    responsesandresourcestoensurethatthemostessentialfunctionsgobackonlinefirst.Subsequent

    discussionofDisasterRecoverystrategiesandriskmanagementsolutionsinthisreportarepresented

    undertheassumptionthatthecontinuityofthesefunctionsisaprimaryobjective.Theprioritizationof

    thesefunctionsremainstobedeterminedbyanappropriateauthority.11

    7Repositoryhassuitablewrittendisasterpreparednessandrecoveryplan(s),includingatleastoneoffsitebackup

    ofallpreservedinformationtogetherwithanoffsitecopyoftherecoveryplan(s).Therepositorymusthavea

    writtenplanwithsomeapprovalprocessforwhathappensinspecifictypesofdisaster(fire,flood,system

    compromise,etc.)andforwhohasresponsibilityforactions.Thelevelofdetailinadisasterplanandthespecific

    risksaddressedneedtobeappropriatetotherepositoryslocationandserviceexpectations.Fireisanalmost

    universalconcern,butearthquakesmaynotrequirespecificplanningatalllocations.Thedisasterplanmust,

    however,dealwithunspecifiedsituationsthatwouldhavespecificconsequences,suchaslackofaccesstoa

    building.OCLCandCRL.TrustedRepositoriesAudit&Checklist:CriteriaandChecklist(2007)p.49.8HathitrustDigitalLibraryReviewofCompliancewithTrustworthyRepositoriesAudit&Certification:Criteriaand

    ChecklistMinimumRequiredElements,revisedMay20,2009.Availableat

    http://hathitrust.org/documents/trac.pdf9ContactinformationforrelevantUniversityofMichigandepartmentsandserviceprovidersaswellasforexternal

    vendorsmaybefoundinAppendixA.10AlistofresourcesrelatedtodisasterrecoveryandtheplanningprocessmaybefoundinAppendixD(Annotated

    ListofDisasterRecoveryPlanningResources).11ThislistofessentialHathiTrustbusinessfunctionswasdevelopedinconjunctionwithJeremyYork.

  • 7/27/2019 Rapport d'HathiTrust sur un plan de sauvegarde des donnes informatiques en cas de sinistre.

    10/61

    20090824 4

    o Ingest Ingestdigitalobjects(SIPs)viaGRINtheGoogleReturnInterface(ora

    modifiedingestportalforlocalcontent)

    ValidateingestedcontentwithGROOVEtheGoogleReturnObjectOrientedValidationEnvironment(oramodifiedversionforlocalizedingest)

    o ArchivalStorage Preserveindefinitelydigitalobjectsandmetadata(AIPs)intheSharedDigital

    Repository(includesensuringtheintegrityandauthenticityofmaterials).This

    functionaddressestheneedsofpartnerlibrariesaswellasindividualusers.

    Recordchangestoandactionsonitemswhiletheyareintherepository Maintainapersistentobjectaddressforitemswithinrepository

    o Dissemination Provideaccesstodigitalobjectsforusers Allowforthetextsearchesthroughavarietyoffields Enablelargescalefulltextsearches Permitthecreationofpublicandprivatecontentcollections Disseminatedigitalobjects(DIPs)tousers(viathepageturneraccesssystem

    anddataAPI)

    DistributedatasetsandHathiTrustAPIstodevelopers ResearchanddevelopadditionalapplicationsandresourcesforHathiTrust

    o Administration Providetransparentanduptodateinformationtousersandthegeneralpublic

    viahttp://www.hathitrust.org/

    CommunicateinformationandcoordinateactivitiesamongstpartnerlibrariesandHathiTrustboardsandcommittees.

    o DataManagement UpdateandmanagetheRightsandGeoIPdatabases BuildandmaintainCollectionBuilderandLargeScaleSearchSolrindexes Determineappropriateuseraccesstotextsviadatabasequeries SynccontentwiththeIndianapolissiteandbackupcontenttotape

  • 7/27/2019 Rapport d'HathiTrust sur un plan de sauvegarde des donnes informatiques en cas de sinistre.

    11/61

    20090824 5

    HathiTrustsDisasterRecoveryStrategies

    BasicRequirementsforDisasterRecovery RoyTennanthasidentifiedthreerequisitecomponentsofadigitalDisasterRecoveryPlan:(1)

    theuseofaneffectivedataprotectionsystem(i.e.RAID),(2)redundantpowerandenvironmentalsystems,and(3)regularbackupofinformationtotapeand,ideally,toaremotemirroredsite.12

    HathiTrusthasincorporatedalltheseelementsintoitsdesignandoperation.ItsIsilonIQstoragecluster

    providesahighdegreeofdataredundancywithitsN+3parityprotection;theMichiganAcademic

    ComputingCenterprovidesfullyredundantpowerandenvironmentalsystemsforHathiTrust

    infrastructure;andnightlytapebackupsandthereplicationofdatatoafullyoperationalmirrorsite

    locatedatIndianaUniversityinIndianapoliswiththesamelevelsofpowerandenvironmental

    conditioningprovidemultiplecopiesaswellasgeographicdistributionofcontent.

    o HathiTrustisintendedtoprovidepersistentandhighavailabilitystoragefordepositedfiles.Inordertofacilitatethis,theinitiativestechnologyconcentratesoncreatinga

    minimumoftwosynchronizedversionsofhighavailabilityclusteredstoragewithwide

    geographicseparation(thefirsttwoinstancesofstoragewillbelocatedinAnnArbor,

    MIandIndianapolis,IN),aswellasanencryptedtapebackup(writtentoandstoredina

    separateAnnArborfacility).

    Eachofthesestorageortapeinstancesisphysicallysecure(e.g.,inalockedcageina

    machineroom)andonlyaccessibletospecifiedpersonnel.Eachseparatestorage

    systemisalsoequippedwithmechanismstoprovidemirroredmanagementandaccess

    functionality,andemploy100%dataredundancyinanefforttopreventdataloss.13

    DetailsonparityprotectionandtheHathiTrustserverenvironmentareavailablebelow(seeScenario1

    andScenario5,respectively).

    DisasterRecoveryStrategy#1:RedundancybetweentheAnnArborandIndianapolisSites HathiTrust'sfirstlineofdefenseintheeventofadisasterisitshotmirrorsiteinIndianapolis.

    WhileingestofmaterialisrestrictedtotheAnnArborlocation,bothsitespossesstwowebservers,a

    MYSQLdatabaseserver,andanIsilonIQstoragecluster(currentlycomposedof21nodes,servers

    composedofCentralProcessingUnitsaswellasstorage).Duringnormaloperations,thisarrangement

    allowsHathiTrusttobalanceahighvolumeofwebtrafficacrossbothsitessuchthatindividualuser

    requestsmaybehandledbyeithersiteinatransparentmanner.Shouldthetolerancesforfailurebe

    exceededatasite(asinadisastersituation)thefailovercapabilitybuitintotheHathiTrustarchitecture

    enablestheremainingsitetoprovideaccesstothedesignatedcommunitywithoutnoticeableservice

    disruptions.AsnotedintheMay2009HathiTrustUpdate,withthefulloperationofbothlocations,We

    arenowensuringthatusersdonotfeeltheeffectsofsinglesiteoutages,suchasroutinemaintenance,

    12Tennant,Roy.DigitalLibraries:CopingwithDisasters.LibraryJournal,15November2009.Retrievedfrom

    http://www.libraryjournal.com/article/CA180529.htmlon13July2009.13HathiTrust.Technologyretrievedfromhttp://www.hathitrust.org/technologyon15June2009.

  • 7/27/2019 Rapport d'HathiTrust sur un plan de sauvegarde des donnes informatiques en cas de sinistre.

    12/61

    20090824 6

    bytakingadvantageofsiteredundancy.14However,becauseingesttakesplaceonlyinAnnArbor,the

    lossofkeycomponentstherewouldinhibittherepositorysabilitytoacquirenewcontent.

    HathiTrustutilizesIsilonSystemsSyncIQApplicationSoftwaretosynchronizedataatthe

    IndianapolissitewithnewlyingestedorupdatedmaterialfromtheAnnArborsite.Thesyncto

    Indianapolisrunson24separatesubsetsofthedataandeachonerunsevery2hours,withthe

    exceptionofSundays.Inotherwords,subset1runsatmidnightonMonday,subset2runsat2a.m.,and

    soon.ThemaximumtimefordatatobereplicatedfromAnnArbortoIndianapoliswouldthereforebe

    threedaysplustheruntimeofthesyncprocess(whichtendstotakelessthanthreehours.)15

    o SyncIQisanasynchronousreplicationapplicationthatfullyleveragestheuniquearchitectureofIsilonIQstoragetoefficientlycopydatafromaprimaryclustertoone

    locatedatasecondarylocation.16

    o Allnodes[inboththesourceandtargetIsilonIQclusters]concurrentlysendandreceivedataduringreplicationjobsinrealtime,withoutimpactingusersreadingand

    writingtothesystem.17

    o Arobustwizarddrivenwebbasedinterfaceisfullyintegratedinto[Isilonsproprietary]OneFSmanagementtooltocontrolallthefunctionality,including

    scheduling,policysettings,monitoringandloggingofdatatransferredandbandwidth

    utilization.18

    o Onlyfilesthathavechangedwillbereplicatedtothetargetclusters.Thiswilloptimizetransfertimesandminimizebandwidthused.19

    o Intheeventthesecondarysystemisnotavailableduetoasystemornetworkinterruption,thereplicationjobwillbeabletorollbackandrestartatthelastsuccessful

    copyoperation.20

    o Uponacriticalfailureorlossofnetworkconnection,analertwillbesenttoallrecipientsconfiguredtoreceivecriticalalerts.21

    DisasterRecoveryStrategy#2:NightlyAutomatedTapeBackupsHathiTrustsabilitytorecoverfromadisasterisalsoensuredbythenightlyautomatedtape

    backupsperformedbytheTivoliStorageManager(TSM)clientapplicationinstalledontheingestservers

    connectedtotheHathiTruststorageclusterandmanagedbyMichigansITCSTSMGroup.TheTSM

    BackupServiceStandardServiceLevelAgreement22outlinestheobligationsandresponsibilitiesofboth

    theserviceproviderandHathiTrust:

    14HathiTrust.UpdateonMay2009Activities(2009)retrievedfrom

    http://www.hathitrust.org/updates_may2009on2July2009.15Snavely,Cory(Head,UMLibraryITCoreServices).Personalemailon13July2009.

    16BackupandRecoveryWithIsilonIQClusteredStorage,2007p.11

    17Ibid.

    18Ibid.

    19Ibid.

    20Ibid.

    21Ibid

    22PleaserefertoAppendixF(TSMBackupServiceStandardServiceLevelAgreement).

  • 7/27/2019 Rapport d'HathiTrust sur un plan de sauvegarde des donnes informatiques en cas de sinistre.

    13/61

    20090824 7

    o TheprogressiveincrementalmethodologyusedbyTivoliStorageManageronlybacksupneworchangedversionsoffiles,therebygreatlyreducingdataredundancy,network

    bandwidthandstoragepoolconsumptionascomparedtotraditionalmethodologies

    basedonperiodicfullbackups.23

    o ITCSisresponsibleforallofthecentralserverhardware,tapehardware,networkinghardware,andrelatedcomponents.ITCSisalsoresponsibleforhardwaremaintenance

    aswellassoftwaremaintenance,administration,andsecurityauditsonthecentral

    (nonclient)TSMservers.(TSMBackupServiceSLA,sec.4.1)

    o ITCSprovides7x24oncallmonitoringandsupport,andstrivestokeeptheserversupinproductionatalltimes.Thetargetuptimeis99.9%ofthetime.TheTSMhardware

    designismodularandshouldallowustotakepiecesoutofservicewithoutaffecting

    customers.Wheneverpossible,systemmaintenancewillbeperformedduringstandard

    weekendmaintenancewindowsasdefinedbyITCS.(sec.4.2)

    o Inanemergency,[email protected](thiswillgototheoncallstaffspagerinrealtime).(sec.4.6)

    o ITCSisresponsibleforphysicalsecurity.Machineaccessaudits,OSsecurity,andnetworksecurityontheTSMserverendarealsotheresponsibilityofITCS.(sec.4.9)

    o Theservice[]includesdatacompression,dataencryptions,anddatareplication.(sec.1.0)

    o ITCSwillmaintainatleasttwoTSMsitesandwillmirrordatabetweenthesitestoprovideredundancyintheeventofadisaster.CurrentlythosesitesaretheArborLakes

    DataFacility(ALDF)at4251PlymouthRd.andtheMichiganAcademicComputingCenter

    (MACC)locatedat1000OakbrookDr.(sec.4.10)

    o Bothfacilitiesaresecure,climatecontrolledsitesdesignedandbuiltforhighavailableproductionservices.24

    o Intheeventofacustomerdisasterwithlargescale(afullserverormore)dataloss,ITCSwillworkwiththecustomertooptimizetherestoretimetobestofourability.We

    willonlybeabletodevoteresourcestotheextentthatothercustomersarenot

    affected.Restoringlargefileservers(multipleTerabytes)cantakeseveraldays.If

    customerswanttominimizethisamountoftimetorestore,wecanpurchaseadditional

    resourcesforthispurpose.Contactusdirectly,andwellworkoutascenariowith

    costinginformation.IntheeventofaMAJORcampusoutageaffectingalargenumberof

    customers,ITCSmanagementwillworkwithcustomerstodeterminehowtoprioritize

    customerrestores.(sec.4.11)

    o DisasterRecoveryplanningistheresponsibilityofthecustomerunit.(sec.5.8)HavingestablishedthemainDisasterRecoverystrategiesemployedbyHathiTrust,wemaynowproceed

    toinvestigatethemeansbywhichitanticipatesandmitigatesthemostcommonthreatsfacingdigital

    repositories.

    23IBM.IBMTivoliStorageManager:FeaturesandBenefits(2009)retrievedfromhttp://www

    01.ibm.com/software/tivoli/products/storagemgr/features.html?S_CMP=rnavon16June2009.24InformationTechnologyCentralServicesattheUniversityofMichigan.FrequentlyAskedQuestionsaboutthe

    TSMBackupService(2009)retrievedfromhttp://www.itcs.umich.edu/tsm/questions.phpon16June2009.

  • 7/27/2019 Rapport d'HathiTrust sur un plan de sauvegarde des donnes informatiques en cas de sinistre.

    14/61

    20090824 8

    Scenario1:HardwareFailureorObsolescenceandDataLoss

    Review:RisksInvolvingHardwareFailureorObsolescenceandDataLoss Thefollowingtablehighlightsthevariouseventswhichposearisktothehardwareanddataof

    HathiTrust.Thesethreatsmaystemfromflawsormalfunctionsintheequipmentitselforasaresultofexternaleventsthatincludephysicalsecuritybreachesandnaturalormanmadedisasters.The

    arrangementofthesepotentialrisksreflectstherelativeseverityoftheirrespectiveconsequences.

    HathiTrustsSolutionsforHardwareFailureandDataLoss ThethreatsfacedbyHathiTrustshardware(andassociatedapplicationsaswellasthedata

    storedtherein)arecomprisedofthefailureofredundantfeatures,failurethatexceedscomponents

    toleranceforredundancy,andsinglepointsoffailure.Whilethefailureofredundantcomponentsmay

    happenmorefrequently(i.e.,thelossofanindividualdrivewithintheIsilonIQcluster),suchlossesdo

    nothavealargeimpactontherepository;eventswhichcompromisesinglepointsoffailurewillhave

    muchgreaterconsequencesforthecontinuityofHathiTrustoperations.Atthesametime,whilea

    componentmayhaveredundancyononelevel(forexample,therearefiveserversdedicatedtoingest),

    thatcomponentsimultaneouslymaybeconsideredatahigherleveltobeasinglepointoffailure(i.e.,

    becausetheingestserversarehousedinasinglechassis,theentireunitisvulnerabletoaneventsuch

    asafire).Thisdualityhighlightstheneedforvigilanceandforesightinmanagingtherepositorys

    infrastructure.

    BecauseHathiTrustreliesheavilyuponhardwaretofulfillitsmissionanddeliverservicestoits

    designatedcommunityofusers,theselectionofequipmentanddevelopmentofsystemarchitecture

    Severity Event

    Highimpact Lossatasinglepointoffailure

    Anadditionalfailurepasttoleranceswhenonlyonesiteisoperational Serviceisunavailableandcannotberestoreduntilcomponentisrepaired/restored

    ModerateImpact Failureofacomponentpastredundancytolerance

    Systemnolongerhasredundancy:additionallossorfailureofcomponentswillresultinlossofsystem.Thisisaparticularproblemifonesiteisalreadydown.

    Lossofdbserver(homeofRightsdb)orofbothWebserversatasitewillrenderthatlocationinaccessible LossoffourdrivesornodesineitherIsilonstorageclusterwillresultinthelossof

    thatinstance.Theclusterwillbeofflineandunabletohandlereadorwrite

    requests;alltrafficwouldhavetobehandledbytheremainingsite.

    LossofUMArborLakessitewouldpreventperformanceoftapebackups. LossofUMMACCsitewoulddepriveIUsiteofdataredundancy Lossofingestserverswouldpreventnewcontentfromenteringrepository

    LowImpact Failureofredundantsystemcomponents

    IncludesredundantcomponentswithineachsiteaswellasgeneralredundancybetweentheIUandUMsites

    o HTinfrastructurehasbeendesignedtoavoidsinglepointsoffailureandtoensuredataandequipmentredundancy

    o Servicecontinuesinanuninterruptedandtransparentmanner

  • 7/27/2019 Rapport d'HathiTrust sur un plan de sauvegarde des donnes informatiques en cas de sinistre.

    15/61

    20090824 9

    hasaimedatminimizingthedangersposedbysinglepointsoffailurethroughtheintroductionof

    strategicredundancies.Thebasicmeansforavoidingthedisastrouseffectsofhardwarefailureordata

    losshavebeentheestablishmentoftheIndianapolismirrorsiteandthenightlybackupofcontentto

    tape.(Formoredetail,pleaserefertotheprecedingsection).Whilethesestrategiesaccountfor

    extraordinaryevents,HathiTrustsserverreplacementscheduleallowstherepositorytoanticipatethe

    resultsofnormalequipmentuseanddepreciation.Stepstosafeguardthelongtermfunctionalityof

    HathiTrusthavethereforebeencomplementedbyaconsiderationofbestpracticesfordisaster

    preparedness.

    RedundantComponentsandSinglePointsofFailureintheHathiTrustInfrastructureThefollowingsectionsprovideageneraloutlineofHathiTrustsredundantcomponentsand

    singlepointsoffailure.Giventhecomplexityoftherepositorysinfrastructure,unknownor

    unanticipatedscenariosmayexist;futureDisasterRecoveryPlanningwillthusinvolveaperiodicreview

    ofkeyfeaturesandvulnerabilities.

    o SiteRedundancy:TheestablishmentofthemirrorsiteinIndianaprovidesHathiTrustwithafullyredundantoperation.Becausebothinstancesprovidefullaccesstocontent

    inadditiontootherrepositoryfunctions,userswillnotexperiencealossordegradation

    ofserviceintheeventthatserviceislostfromonesite.KeyexceptionstoHathiTrusts

    siteredundancyarenotedbelow.

    o RedundantComponentsatEachSite:ThefollowingcomponentsprovideeachsitewithatoleranceunderwhichlimitedfailureswillnotdisruptmajorHathiTrustfunctionsand

    userservices.

    Webservers:eachsitehastwoserverssothatifonefails,theothermaycontinuetohandletraffic.ThesealsohosttheGeoIPdatabase.

    IsilonIQclusters:thecurrentconfigurationof21nodesfeaturesN+3parityprotection;thisdataredundancypermitsthesimultaneousfailureof3driveson

    separatenodesorthelossofthreeentirenodeswithoutservicedegradation.

    Ingestservers:theAnnArborsitepossessesfiveserverssothatingestmaycontinue(albeitataslowerrate)intheeventofanyfailures.

    LargeScaleSearch(LSS)Solrindex:currentlyhousedonthewebservers,butwillsoonbemaintainedonfivenewserversinAnnArbor.

    o SinglePointsofFailure:25Thesearecomponentsofasystemwhich,iflost,willpreventtheentiresystemfromfunctioning.Eventhosecomponentswithwhollyredundantpeer

    devices(suchastheweboringestservers)maybeconsideredsinglepointsoffailureif

    theyhaveexceededtheircapacitytosustainlosses(i.e.,ifonewebserveratasitehas

    alreadybeenlost).

    SinglePointsofFailureattheComponentLevel:BecauseonlyoneofthesecomponentsexistsateachHathiTrustsite,alosswillresultinsystemfailure.

    MYSQLdatabaseserver:housestherightsdatabase,ingesttrackingdatabase,andtheCollectionBuilderSolrindex

    Servernetworkswitches Outboundnetworkswitches

    SinglePointsofFailureattheSystemLevel:Whileanygivencomponentmayhavevariousdegreesofinternalredundancy(suchasmultiplepowersuppliesor

    25ContentinthissectioniscourtesyofCorySnavely(personalemailfrom13July2009).

  • 7/27/2019 Rapport d'HathiTrust sur un plan de sauvegarde des donnes informatiques en cas de sinistre.

    16/61

    20090824 10

    multipledrives)itmightstillfailasawholeandthusresultinthelossofa

    particularinstanceofHathiTrust.Thefollowingarecomponentslocatedateach

    sitewhich,whilepossessedofinternalredundancies,arestillsubjectto

    completeloss(asintheeventofafire)andmaythusrenderasiteinoperable.

    IsilonIQstoragecluster:theentireclustercouldbelostinalargescaleevent.Additionally,thelossofafourthdriveornodewillexceedthe

    clustersfailuretoleranceandresultinaservicedisruption.

    Webservers:shouldonefail,theremainingserverwillbeasinglepointoffailure.

    Bladeserverchassis:sinceweb,ingest,anddatabaseserversarehousedinonechassis,theentireunitcouldpotentiallyfail.

    LSSindex:inthenearfuture,theserversinAnnArborwillbethesoleinstanceoftheLargeScaleSearchindex.

    MirlyndatabaseandMirlyn2Solrindex26:thesearecurrentlykeycomponentsoftheUMLibraryinfrastructure;shouldthesebe

    unavailable,accesstoanduseofHathiTrustwillbecompromised.

    KeyFeaturesofHathiTrustsIsilonIQClusteredStorage TheIsilonIQstorageclusterstoresandprovidesdigitalobjectsforHathiTrustspartnerlibraries

    andmembersofitsdesignatedcommunity.Theclusterprovidesahighdegreeofinherentredundancy,

    whichgivesbothHathiTrustsitesaconsiderabledegreeoftoleranceinregardstothefailureofvarious

    aspectsofthestorageunits.Asoneexample,IsilonsproprietaryOneFSoperatingsystempermitsthe

    individualstoragenodestheindividualserversthatarethebuildingblocksoftheclustertofunction

    ascoherentpeerssothatanyonenodeknowseverythingcontainedontheotherunitsinthecluster.

    o Isilon'sOneFSoperatingsystem[]intelligentlystripesdataacrossallnodesinaclustertocreateasingle,sharedpoolofstorage.27

    o Becauseallfilesarestripedacrossmultiplenodeswithinacluster,nosinglenodestores100%ofafile;ifanodefails,allothernodesintheclustercandeliver100%ofthe

    fileswithinthatcluster.28

    o Adistributedclusteredarchitecturebydefinitionishighlyavailablesinceeachnodeisacoherentpeertotheother.Ifanynodeorcomponentfails,thedataisstillaccessible

    throughanyothernode,andthereisnosinglepointoffailureasthefilesystemstateis

    maintainedacrosstheentirecluster.29

    26MirlynisthenameoftheUniversityofMichiganscurrentOnlinePublicAccessCatalog,whichissupportedby

    theAlephintegratedlibrarysystem.Mirlyn2isabetaversionofUMsrecentlyimplementednextgeneration

    catalog,basedontheVuFindplatform,whichwillbecomethemainlibrarycatalogonAugust3,2009.27IsilonSystems,Inc.IsilonIQOneFSOperatingSystem(2009)retrievedfrom

    http://www.isilon.com/products/OneFS.phpon17June2009.28IsilonSystems.UncompromisingReliabilitythroughClusteredStorage:DeliveringHighlyAvailableClustered

    StorageSystems(2008)p.7.Incomputerdatastorage,datastripingisthetechniqueofsegmentinglogically

    sequentialdata,suchasasinglefile,sothatsegmentscanbeassignedtomultiplephysicaldevices.[]ifonedrive

    failsandthesystemcrashes,thedatacanberestoredbyusingtheotherdrivesinthearray.

    (http://en.wikipedia.org/wiki/Data_striping,retrievedon16August,2009).29IsilonSystems.BreakingtheBottleneck:SolvingtheStorageChallengesofNextGenerationDataCenters

    (2008)p.8

  • 7/27/2019 Rapport d'HathiTrust sur un plan de sauvegarde des donnes informatiques en cas de sinistre.

    17/61

    20090824 11

    HathiTrustsIsilonIQclustersensureahighdegreeofdataredundancywiththeirN+3parityprotection.

    N+3providestriplesimultaneousfailureprotectionsothatuptothreedrivesonseparateIsilonIQ

    nodes,orthreeentirenodes,canfailatthesametimeandalldatawillstillbefullyavailable.

    o TraditionalRAID5parityprotectionresultsindatalossifmultiplecomponentsfailpriortothecompletionofarebuild.FlexProtect,incontrast,automaticallydistributesall

    dataanderrorcorrectioninformationacrosstheentireIsilonclusterandwithitsrobust

    errorcorrectiontechniquesefficientlyandreliablyensuresthatalldataremainsintact

    andfullyaccessibleevenintheunlikelyeventofsimultaneouscomponentfailures.30

    o Eachfileisstripedacrossmultiplenodeswithinacluster,with[three]paritystripesforeachdatablock.31

    ThefilesystemmayalsoperformaDynamicSectorRepair(DSR)atthetimeofanyfilewriting.Ifit

    encountersabaddisksector,thefilesystemwilluseparityinformationelsewhereinthesystemto

    rebuildthenecessaryinformationandrewriteanewblockelsewhereelseonthedrive.Thebadsector

    willberemappedbythedrivesothatitisneverusedagainandthewriteoperationwillbecompleted.

    TheIsilonrestriperisametaprocess/infrastructurethathasfourprimaryphasestohelp

    manageandprotectdataintheeventthatcomponentsoftheclustersustainapartialfailureor

    malfunction.Theprocessesrunasbackgroundoperationsanddonotrequiresystemdowntime.3233

    o FlexProtectrepairsdata(i.e.,intheeventofadriveloss)usingparity. IsilonOneFSwithFlexProtectcanboasttheindustryleadingMeanTimeto

    DataLoss(MTTDL)forpetabyteclusters.34

    FlexProtectintroducesstateoftheartfunctionality,whichrebuildsfaileddisksinafractionofthetime,harnessesfreestoragespaceacrosstheentirecluster

    tofurtherinsureagainstdataloss,andproactivelymonitorsandpreemptively

    migratesdataoffofatriskcomponents.35

    o AutoBalancerebalancesthedatainaclusteraccordingtobusinessrules,inrealtime,nondisruptively.36

    Assoonasthe[neworrepaired]nodeisturnedonandnetworkcablesareconnected,AutoBalanceimmediatelybeginstomigratecontentfromthe

    existingstoragenodestothenewlyaddednodeacrosstheclusterinterconnect

    backendswitch,rebalancingallofthecontentacrossallnodesinthecluster

    andmaximizingutilization.37

    30IsilonSystems,Inc.IsilonIQOneFSOperatingSystem(2009)retrievedfrom

    http://www.isilon.com/products/OneFS.phpon30June2009.31IsilonSystems.UncompromisingReliabilitythroughClusteredStorage:DeliveringHighlyAvailableClustered

    StorageSystems(2008)p.732IsilonXSeriesSpecifications(productbrochure)

    33InformationontheIsilonrestripercomesfromapersonalemailsentbyKipCranfordofIsilonSystems,Inc.on1

    June2009.34IsilonSystems.DataProtectionforIsilonScaleOutNAS(2009)p.4

    35IsilonSystems,Inc.IsilonIQOneFSOperatingSystem(2009)retrievedfrom

    http://www.isilon.com/products/OneFS.phpon15June2009.36McFarland,Anne.IsilonAcceleratesDeliveryofDigitalContentTheClipperGroupNavigator(2003).

    37IsilonSystems.TheClusteredStorageRevolution(2008)p.13

  • 7/27/2019 Rapport d'HathiTrust sur un plan de sauvegarde des donnes informatiques en cas de sinistre.

    18/61

    20090824 12

    o Collectcleansuporphanednodesanddatablockstopreventfragmentationofdata.o MediaScanverifiesdisksectors.

    ThefunctionofMediaScanistoscaneveryblockinthefilesystemlookingforbaddisksectors.Ifitencountersabadsector,itwillperformaDynamicSector

    Repair(DSR)anduseparityinformationelsewhereinthesystemtorebuildthe

    necessaryinformationandrewriteanewblocksomewhereelseonthedrive.

    MediaScanperiodicallyreviewsdatablocksanddisksectorsthatmaynothavebeenaccessed,fromafilelevel,inmonthsoryearsandtherebyhelpstokeep

    thedrivesashealthyaspossible.

    o AsoftheOneFS5.0release,allfilesystemmetadatacanbecheckedbytheIntegrityScanrestriperphase.ThisprocesswillallowHathiTrusttocompletelycheckfile

    dataandmetadataviaassociatedchecksums.

    OtherinstancesofinherentredundancyincludenonvolatileRAM,afullyjournaledfilesystem,and

    softwareapplicationsthatmanageclientconnectionsintheeventofanodesfailure.

    o OneFSisafullyjournaledfilesystemwithlargeamountsofbatterybackednonvolatilerandomaccessmemory(NVRAM)withineachnode,whichensurestheintegrity

    ofthefilesystemintheeventofunexpectedfailuresduringanywriteoperation.38

    o TheIsilonSmartConnectmodule[ensures]thatwhenanodefailureoccurs,allinflightreadsandwritesarehandedofftoanothernodeintheclustertofinishits

    operationwithoutanyuserorapplicationinterruption.[]Ifanodeisbroughtdown

    foranyreason,includingafailure,thevirtualIPaddressesontheclientswillseamlessly

    failoveracrossallothernodesinthecluster.Whentheofflinenodeisbroughtback

    online,SmartConnectautomaticallyfailsbackandrebalancestheNFSclientsacrossthe

    entireclustertoensuremaximumstorageandperformanceutilization.39

    HardwareSupportandService HathiTrustequipmentiscoveredbysupportandserviceagreementswithitsvariousvendors

    (SunMicrosystems,Dell,CDWG,etc.).Agoodexampleofonesuchagreementisfoundinthe

    PlatinumsupportprovidedbyIsilonSystemsandwhichincludes:

    o Extended24x7x365Telephone&OnlineHardwareandSoftwareSupporto 24x7ProactiveMonitoring&AlertsEmailHome(forHardwareandSoftware)o ReturnPartstoFactoryforRepairand4hourReplacementPartsDeliveryo SupportIQ(EnhancedServiceabilityDiagnostics)andSystemEventTrackingo OnsiteTroubleshootingo IsilonHardwareInstallationo SoftwareProductDocumentation,ReleaseNotes,andaccesstoProductTechnicalNoteso RemoteDiagnosis(ProvidedUserGrantsAccess)o Maintenance&PatchReleases

    38IsilonSystems.UncompromisingReliabilitythroughClusteredStorage:DeliveringHighlyAvailableClustered

    StorageSystems(2008)p.939IsilonSystems.DataProtectionforIsilonScaleOutNAS(2009)p.6

  • 7/27/2019 Rapport d'HathiTrust sur un plan de sauvegarde des donnes informatiques en cas de sinistre.

    19/61

    20090824 13

    o MinorandMajorUpgradeReleases(IncludesPerformanceImprovements,NewFeatures,ServiceabilityImprovements).40

    EquipmentTrackingLITCoreServices(CS)maintainsaninventoryofserversonawikipageaccessibletoitsstaff.

    Detailsincludeeachserversname,location,onlineandretiredates,upgrades,notesonstorage,andits

    primaryservice.Additionalinformationisprovidedrelatedtospecifications,supportcontracts,andkey

    contactinformation.TheCSserverinventoryiscurrentlyoutofdate.

    HardwareReplacementScheduleo HathiTrustreplacesstorageregularly,approximatelyevery34yearsorastheusable

    lifeofstorageequipmentdictates(HTTRACC1.7)

    o HathiTruststaffupgradehardwareonaregularbasis(i.e.,everythreeorfouryears),andtohelpdetectmorerapidgrowthindemands,thewebserverandstorage

    infrastructureshavetheirownperformancemonitoringthatindicateoverload

    conditions.(HTTRACC1.10)

    TimelineforEmergencyReplacementofHathiTrustInfrastructureShouldaseriouseventrequirethereplacementofpart(orall)oftheHathiTrusttechnical

    infrastructure,thefollowingtimelineprovidesageneralestimateofthetimerequiredtoorder,ship,

    andinstallnewequipment.AcursoryreviewofthetimenecessaryforHathiTrusttorecoverfroma

    majordisasteratthemainAnnArbororIndianapolisdatacentersuggeststhatalargeeventcouldidle

    aninstanceoftherepositoryforatleastamonthandahalf.Inadditiontotheserversandswitches

    mentionedabove,criticalcomponentsincludefour30Apowerdistributionunits(PDUs)perrackand

    fourracksperdatacenterasofthiswriting.

    o SubmissionofPurchaseOrders: Forordersunder$5,000,theMPathwaysapplicationallowstheUniversity

    Librarysbusinessmanagertosendpurchaseordersdirectlytovendors.

    Forordersover$5,000,ProcurementServicesnormallytakesonetotwobusinessdaystoapprovethepurchase,buttheprocessmaytakeuptoaweekif

    questionsariseoradditionalpurchaseinformationisneeded.

    o DeliveryofEquipment: Productsthevendorhasinstockandavailableforimmediateshipmenttake13

    daystobedelivered.

    Itemsthatneedtobeconfigured(suchasservers)usuallytake12weeks. Isilonstoragewilltake3weekstobedeliveredinaworstcasescenario.

    o Installation: 3daysFTEforIsilonIQclusterinadditiontothetimerequiredforotherservers,

    switches,PDUsandrackunits.

    40IsilonSystems.SupportAdvantageOfferings(2009)retrievedfrom

    http://www.isilon.com/support/?page=planson30June2009.

  • 7/27/2019 Rapport d'HathiTrust sur un plan de sauvegarde des donnes informatiques en cas de sinistre.

    20/61

    20090824 14

    o DataRestoration:about.5TB/hour(15days,asofJune2009)41 WhileHThasabout110TBofdatainitsstorage,thebackuptapesmaintained

    bytheTSMGroupcontainroughly176TBofinformationduetothedata

    encryptionusedtoprotecttheintellectualrightsofthematerial(asof06/2009).

    Thelengthoftimerequiredforabaremetalrestorationwillbeinfluencedbytapemounts,networkspeed,restoringtotheNFSshares,decryption,etcetera.

    Ifthelibrary/HTweretopurchaseanadditionaltapedrive(atroughly$20,000),theprocesscouldbespedup,perhapstoabout1TB/hour.

    Intheeventofalargescaledisasterinwhichmultiplecampusunitsrequireextensivedatarestoration,theTSMBackupServiceSLAstatesthatITCS

    managementwillworkwithcustomerstodeterminehowtoprioritizecustomer

    restores.(sec.4.11)ThisdeterminationwillreflecttheUniversityofMichigans

    organizationalpriorities42:

    Priority1:Healthandsafetyoffaculty,staff,students,hospitalpatients,contractors,renters,andanyotherpeopleonUniversitypremises.

    Priority2:Deliveryofhealthcareandhospitalpatientservices Priority3:Continuationandmaintenanceofresearchspecimens,

    animals,biomedicalspecimens,researcharchives.

    Priority4:Deliveryofteaching/learningprocessesandservices Priority5:SecurityandpreservationofUniversityfacilities/equipment. Priority6:Maintenanceofcommunity/Universitypartnerships.

    o Fractionalrestoreswould,forthemostpart,runatcomparablespeedsunlesstherewasaneedtorestorealargenumberofrandomfiles,inwhichcasetherewouldbea

    decreaseinspeedduetotapeseekandmounttimes.

    o DelaysinrecoverycouldbeincreaseddramaticallyiftheMACCdatacenteroritsinfrastructurehassustaineddamageandneedsrepair.

    HathiTrustandInsuranceCoverageattheUniversityofMichiganTheOfficeofFinancialOperationsreviewsandaddsfinancialassetsgreaterthan$5,000tothe

    assetmanagementsystemoftheUniversityofMichigan.ThePropertyControlOfficeisthenresponsible

    fortaggingfinancialassetswithuniqueUniversityofMichiganidentifiersandtrackingthem.Risk

    ManagementServicesadministerstheUniversityspropertyinsuranceandwillprovidethe

    reimbursementofreplacementcostsforitemsselfinsuredbyMichigan.AsofJuly2009,thenatureand

    extentoftheUniversityofMichigansinsurancecoverageforHathiTrusthardwareremainedunder

    review.ThemaincontactwithRiskManagementServicesinthismatterhasbeenCyndiMesa,Headof

    UMLibraryFinance.

    41Hanover,Cameron(ITCSTSMGroupStorageEngineer).Personalemailon23June2009.

    42UniversityofMichiganAdministrativeInformationServices.EmergencyManagement,BusinessContinuity,and

    DisasterRecoveryPlanning(2007)retrievedfromhttp://www.mais.umich.edu/projects/drbc_methodology.html

    on6July2009.

  • 7/27/2019 Rapport d'HathiTrust sur un plan de sauvegarde des donnes informatiques en cas de sinistre.

    21/61

    20090824 15

    Scenario2:NetworkConfigurationErrors

    Review:RisksInvolvingNetworkConfigurationErrorsThefollowingtablesummarizestherisksfacingHathiTrustastheresultofnetworkconfiguration

    errors.ConsiderationisgiventonetworkconnectionswithinUMdatacentersaswellasatUMsHatcher

    GraduateLibrary(siteofkeyadministrativeanddevelopmentactivities).Thearrangementofthese

    eventsreflectstherelativeseverityoftheirrespectiveconsequences.

    HathiTrustsSolutionsforNetworkConfigurationErrorsHathiTrustscontinuedaccesstotheInternetviatheUMnetBackboneisessentialforits

    continuedprovisionofservice.Therepositoryreceivesnetworkinfrastructuremaintenancethrough

    UMsITCS/ITCom;withitsrobustdisasterplanninginadditiontothelessonslearnedfromtheMidwest

    blackoutof2003,ITComguaranteescontinuednetworkaccessinallbutthemostcatastrophic

    scenarios.Intheeventofawidespreadpoweroutage,HathiTrustwouldbeabletomaintainaccessto

    theUMnetBackbonesincedatacentersareequippedwithredundantpowersuppliesandtheHatcher

    GraduateLibraryiscurrentlycategorizedasapriorityrecipientofpowerfromtheuniversity.ITCSalso

    has17generatorswhichcanbeusedtomaintainpowertonetworkswitchesintheeventofablackout.

    TheresponsibilitiesandobligationsofbothpartiesareoutlinedintheCustomerNetworkInfrastructure

    MaintenanceServiceAgreement.43

    ExtentofITComSupporto ITComagreestoprovidetheUnitNetworkInfrastructureMaintenancetoincludedata

    switches,routers,accesspoints,hubs,uninterruptiblepowersupplies(UPSs),firewalls,

    andotheridentifiedandagreeduponcomponents.(ITCSsec.1.0)

    43PleaserefertoAppendixG(ITCS/ITComCustomerNetworkInfrastructureMaintenanceServiceAgreement).

    Severity Event

    Highimpact Lossofservernetworkswitchoroutboundnetworkswitch LossofaccesstoUMnetBackbone

    ModerateImpact ExtendedlossofpoweratHatcherLibrarycouldleadtolossoflocalserversanddisruptionofadministrativeandoperationalactivities.

    LowImpact LossofpowerthatthreatensabilitytoconnecttoLocalAreaNetwork(LAN)/Backbone

    o Thelibraryremains(fornow)apriorityrecipientofelectricityfromtheUMpowerplant

    o CampusdatacentershaveUPSsandredundantbackuppower Failureoflocal/serversideconnections

    o Shouldproblemsarisewithconnectionstoindividualnodes,theclusteredarchitectureoftheIsilonsystemwillallowread/writerequeststobe

    handledbyalternatenodes.

    o IfconnectionsfailatoneHTsite,trafficcanbehandledbyremainingsite.

  • 7/27/2019 Rapport d'HathiTrust sur un plan de sauvegarde des donnes informatiques en cas de sinistre.

    22/61

    20090824 16

    ITComResponsibilitieso Provideandmaintainthenecessarymaterialsandelectroniccomponentstooperate

    theUnitNetworkInfrastructure.(sec.5.2)

    o ProvideconfigurationandNetworkInfrastructureAdministrationsupportnecessarytorepairandmaintaintheUnitNetworkInfrastructurehardwareandsoftwarecoveredby

    thisagreement.(sec.5.3)

    o Monitor24hours/dayand365days/year(24x365),supportedprotocolstothebackboneinterfaceoftheUnitsnetworkuptoandincludingtheextensiontothefirst

    huborswitch.(sec.5.6)

    o Monitor24hours/dayand365days/year(24x365),networkinterfacesonuninterruptiblepowersupplies(UPS)thatsupporttheUnitnetworkswitches.Provide

    notificationintheeventthataUPSisactivated,(inputpowerislostordegradedand

    systemswitchestobatterypower),deactivated,(inputpowerisrestored),or

    unreachable.ProvidenotificationtotheUnitNetworkAdministratorwhenbatteries

    degradetothepointofneedingreplacement.(sec.5.7)

    o ProvidemaintenanceonthestationcablingasinstalledbyITCom,oranapprovedUMvendorwhichmetITCominstallationspecifications.(sec.5.8)

    o ProvidePreventativeMaintenance(clean&vacuum)oneachCustomerUnitswitchcoveredinthisagreementyearly.(sec.5.9)

    ITComServicesinResponsetoOutagesorDegradationImpactingtheNetworko Aresponsewithin30minutesoftheITComNOCnotificationortheUnitscall,to

    provideinformationtotheUnitonspecificstepsthathavebeen/willbetakentoresolve

    theproblem.(sec.7.2.1)

    o Anonsitevisit,ifnecessary,withintwo(2)hoursoftheresponse(i.e.,themaximumonsiteresponsetimewillbetwoandahalf(21/2)hours).Anupdatewillbeprovided

    totheUnitNetworkAdministratorifonsiteandabestguessETRwillbeprovidedbased

    onavailablefacts.ITComwillcontinuetoprovidetheUnitwithupdateseverytwohours

    duringanoutage.(sec.7.2.1)

    o IfanoutageisidentifiedwithintheagreementservicehoursITComwillresolvetheoutageeveniftherepairtimeextendsbeyondtheserviceagreementhours.(sec.

    7.2.1)(Repairsoutsideoftheagreementhoursresultinadditionallaborexpenses.)

    o ConductmonitoringviaSNMPPOLLINGatoneminuteintervals.(sec.7.2.1)

    HathiTrustResponsibilitiesITComsresponsibilitiesendatthefirstnetworkswitchandfromtheretoitsservers,HathiTrust

    isresponsibleformaintainingnetworkconnectivityandsecurity.TherepositoryusesInternet2for

    communicationandsynchronizationbetweentheAnnArborandIndianapolissites.EachIsilonnodehas

    dual10GBInfinibandportsforinternal(i.e.,intracluster)communicationanddual1GBEthernetfor

    externalcommunication.

    Scenario3:NetworkSecurityandExternalAttacks

  • 7/27/2019 Rapport d'HathiTrust sur un plan de sauvegarde des donnes informatiques en cas de sinistre.

    23/61

    20090824 17

    Review:RisksInvolvingNetworkSecurityandExternalAttacksThefollowingtablegivesageneraloverviewofthebasicthreatanexternalattackornetwork

    securitybreachposestoHathiTrust;entriesarearrangedbyseverity.Thelist,however,isnotexhaustive

    andnoattempthasbeenmadetopublicizepotentialvulnerabilities.

    HathiTrustsSolutionsforNetworkSecurity MaliciousactivityagainstHathiTrustcouldinvolveunauthorizedaccesstoasystemordata,

    denialofservice,orunauthorizedchangestothesystem,software,ordata.Asanacademicentity,the

    repositoryisseenaslessofatargetforsuchactionsthancommercialorgovernmentaltargets;despite

    thisperceivedlowerrisk,HathiTrusthasnotbeenlulledintoafalsesenseofsecurity.Therepository

    takesseriouslythepotentialforviolationsofitsnetworkandoperatingsystemsecurityandtherefore

    hasinstitutedaprogramofperiodicsoftwareupdatesinadditiontothemaintenanceofanITCom

    supportedfirewall,authenticationrequiredaccess,andothermeasures(suchasthrottlingsoftwareto

    deterdenialofserviceattacks).Becausecontentiscurrentlyacceptedfromtrustedsources(namely,

    GoogleandlegacydigitalcollectionsfromHathiTrustpartners)theGROOVEprocessdoesnotincludea

    virusdetectionphase.Asdigitalobjectsareingestedfromagreaternumberofsources,additional

    securitymeasuresshouldbeconsidered.

    o HathiTruststaffapplysecurityupdatestotheoperatingsystemandtonetworkingdevicesassoonastheybecomeavailableinordertominimizesystemvulnerability.As

    withnewsoftwarereleases,securityupdatesaretestedinadevelopmentenvironment

    beforebeingreleasedtoproduction.Softwarepackagesthatpresentalowersecurity

    riskandthathaveagreaterpotentialtoaffectapplicationbehavior(webservers,

    languageinterpreters,etc.)aregenerallyinstalled,configuredandtestedmanuallyto

    allowforgreatercontrolinmanagingupdates.Softwareupdatesarenotapplied

    automatically;moreover,updatesthatpresentapotentialforhavinganimpacton

    systembehaviorareappliedandtestedfirstinthedevelopmentenvironment.Ifno

    impactsareseen,HathiTruststaffapplytheseupdatesinproductionafteratesting

    periodofatleastoneweek.(HTTRACC1.10)

    Severity Events

    Highimpact UnauthorizedaccesstoHathiTrustcontentleadstotheinfringementofcopyrights. Lossofdataorfunctionalityforanextendedperiodoftimeasaresultofmalicious

    activity.

    ModerateImpact HathiTrustservicesaretemporarilyunavailableasaresultofmaliciousactivity.LowImpact ThedeliveryofHathiTrustservicesslowsastheresultofmaliciousactivity.

    Asecurityweaknessexistswithinthesystembutremainsunexploited.

  • 7/27/2019 Rapport d'HathiTrust sur un plan de sauvegarde des donnes informatiques en cas de sinistre.

    24/61

    20090824 18

    Scenario4:FormatObsolescence

    Review:RisksInvolvingFormatObsolescenceThefollowingtableoutlinesthethreatsposedbyformatobsolescenceandarrangesthem

    accordingtotheirpotentialseverity.

    HathiTrustsSolutionsforFormatObsolescenceAnawarenessandacknowledgementofthedangersofformatobsolescencehasledHathiTrust

    toimplementproactivepoliciesandprocedurestoensurelongtermaccesstotherepositoryscontent.

    Therepositoryonlyacceptsspecificformatsthatmeetrigorousspecificationsand,throughtheprior

    experienceofUniversityofMichiganpersonnel,hasdevelopedprotocolsforthesuccessfulmigrationof

    contentfromoneformattoanother.Inaddressingthethreatofformatobsolescence,thepreservation

    oftheintegrityandauthenticityofdepositedcontenthasbeenanoverarchingconcern.

    SelectionofFileFormatso HathiTrustiscommittedtopreservingtheintellectualcontentandinmanycasesthe

    exactappearanceandlayoutofmaterialsdigitizedfordeposit.HathiTruststoresandpreservesmetadatadetailingthesequenceoffilesforthedigitalobject.HathiTrusthas

    extensivespecificationsonfileformats,preservationmetadata,andqualitycontrol

    methods,includedintheUniversityofMichigandigitizationspecifications,datedMay1,

    2007.44(HTTRACB1.1)

    o HathiTrustcurrentlyingestsonlydocumentedacceptablepreservationformats,includingTIFFITUG4filesstoredat600dpi,JPEGorJPEG2000filesstoredatseveral

    resolutionsrangingfrom200dpito400dpi,andXMLfileswithanaccompanyingDTD

    (typicallyMETS).HathiTrustsupportstheseformatsbecauseoftheirbroadacceptance

    aspreservationformatsandbecausetheformatsaredocumented,openandstandards

    based,givingHathiTrustaneffectivemeanstomigrateitscontentstosuccessivepreservationformatsovertime,asnecessary.TheRepositoryAdministratorshave

    undertakensuchtransformationsinthepast;moreover,HathiTrustoffersenduser

    servicesthatroutinelytransformdigitalobjectsstoredinHathiTrusttopresentation

    formatsusingmanyofthewidelyavailablesoftwaretoolsassociatedwithHathiTrusts

    44Specificationsareavailableat

    http://www.lib.umich.edu/lit/dlps/dcs/UMichDigitizationSpecifications20070501.pdf

    Severity Events

    Highimpact Applicationsandhardwarearenolongerabletoreadordisplaydigitalobjects. Errorsintranslatingandreadingfilesarenotunderstoodoracknowledgedby

    repositoryusers.

    ModerateImpact ProblemswiththetranslationoffileformatsresultinDIPsthatdonotfaithfullyreflecttheoriginaldigitalobjects.

    LowImpact Formatsandassociatedapplicationschangebutretaincompatibilitywitholderversionsofthefileformats.

  • 7/27/2019 Rapport d'HathiTrust sur un plan de sauvegarde des donnes informatiques en cas de sinistre.

    25/61

    20090824 19

    preservationformats.HathiTrustgivesattentiontodataintegrity(e.g.,through

    checksumvalidation)aspartofformatchoiceandmigration.45

    o Eachformatconformstoawelldocumentedandregisteredstandard(e.g.,ITUTIFFandJPEG2000)and,wherepossible,isalsononproprietary(e.g.,XML).(HTTRACB4.2)

    FormatMigrationPoliciesandActivitieso HathiTrustiscommittedtomigratingtheformatsofmaterialscreatedaccordingto[its]

    specificationsastechnology,standards,andbestpracticesinthedigitallibrary

    communitychange.(HTTRACB1.1)

    o HathiTruststaffmembersconductmigrationsfromonestoragemediumtoanotherusingtoolsthatvalidatechecksumsinternally.(Digitalobjectsarestoredbothonline

    andontape,andtheonlinestoragesystemconductsregularscanstodetectandcorrect

    dataintegrityproblems.)Atotalfilecountisdonefollowingalargedatatransfer,and

    regularlyscheduledintegritychecksfollow.(HTTRACC1.7)

    o [HathiTrust]hasmigratedlargeSGMLencodedcollectionstoXML,andLatin1characterencodingstoUTF8Unicode.Oursuccessinmigratingfromolderformatsto

    newerformatsdemonstratesourcommitmenttoourcollectionsandourabilitytokeep

    materialsinourrepositoryviable.Allmigrationsaredocumentedinchangelogs.(HT

    TRACB4.2)

    45HathiTrust.Preservation(2009)retrievedfromhttp://www.hathitrust.org/preservationon16June2009.

  • 7/27/2019 Rapport d'HathiTrust sur un plan de sauvegarde des donnes informatiques en cas de sinistre.

    26/61

    20090824 20

    Scenario5:CoreUtilityand/orBuildingFailure

    Review:RisksInvolvingCoreUtilityorBuildingFailureThefollowingtablesummarizesthedangersautilityorbuildingfailureposestoHathiTrustand

    rankseventsbytheirpotentialseverity.

    HathiTrustsSolutionsforUtilityorBuildingFailureThecontinueddeliveryofHathiTrustsservicesdependsuponthemaintenanceofpower,

    environmentalcontrol,andsecurityinitsserverenvironmentattheMichiganAcademicComputing

    Center(MACC)andotherlocationsthathostcomponentsoftherepository.Inthisrespect,HathiTrustis

    heavilyreliantupontheinfrastructureoftheMACCaswellasthatoftheArborLakesDataFacility,home

    tooneinstanceoftheTSMGroupsbackuptapelibrary.Bothlocationsprovidecloselymonitoredand

    highlyredundantenvironmentsthathelpensurethatHathiTrustsinfrastructureremainssecureand

    operable.Atthesametime,administrativeanddatamanagementfunctionscriticaltothedevelopment

    andmaintenanceoftherepositorytakeplaceintheUniversityofMichigansHatcherGraduateLibrary.

    TheserviceandcooperationofMichigansPlantOperationsDivisionarethereforecriticalforthe

    continuedaccesstoanduseofthisstructureintheoperationofHathiTrust.

    GeneralMaintenanceandRepairsinUniversityofMichiganFacilitiesFacilitiesandmaintenanceissuesontheUniversityofMichigancampusarereportedtothe

    PlantOperationsDivision,theDepartmentofPublicSafety(DPS),andOccupationalSafetyand

    EnvironmentalHealth(OSEH)inadditiontotheimpactedfacilitysmanager.Repairworkiscoordinated

    bytheUniversityLibraryfacilitiesmanagerinconjunctionwithadministratorsandworkersfromPlant

    Operations.

    TheMichiganAcademicComputingCenter(MACC) TheMACChostsmanyofthekeycomponentsoftheMichigansUniversityLibrarysystemandas

    wellasthetechnicalinfrastructureofHathiTrust.TheUniversityofMichigandoesnotownthebuilding

    inwhichthedatacenterislocatedbutinsteadoperatestheMACCinconjunctionwiththeMichigan

    InformationTechnologyCenter(MITC)Foundationandotherpartners.TheMACCServerHostingService

    Severity Events

    ExtensivestructuraldamagerenderstheMACC(orkeyelementsofitsinfrastructure)unusableandnecessitatestheestablishmentofahotsitetorecover

    andcontinueoperations.

    Additionalfailurepasttoleranceinbackupcoolingorpowerinfrastructure

    Highimpact

    ModerateImpact Failureofbackuppowerpastredundancytolerance(failureof2generators)o Datacentercoordinatormayinitiateloadshedandshutdownhalfofthe

    MACC(butlibraryrackswillremainoperational)

    Structuraldamagerendersfacilitytemporarilyunsafeand/orunusable.LowImpact Lossofpower

    Lossofenvironmentalcontrolunitswithinredundancy

  • 7/27/2019 Rapport d'HathiTrust sur un plan de sauvegarde des donnes informatiques en cas de sinistre.

    27/61

    20090824 21

    LevelAgreement46liststheresponsibilitiesofthedatacenteraswellastherepository;ofparticular

    significancearetheMACCsagreementsto:

    o Provideacontrolledphysicalenvironmenttosupportservers[with]roomaveragetemperatureofbetween65and75degreesand3550%relativehumidity[and]

    monitoredenvironmentals(temperature,humidity,smoke,water,electrical.(sec.4.1)

    o Provideadequate,conditioned,60cycleelectricalservicewithadequatebackupelectricalcapacitytosupportcircuits,service,andoutlets[andalsoto]provide

    UninterruptiblePowerSupply(UPS)andgeneratorbackup(sec.4.2)

    o Provide7x24telephonecontactforemergenciesandforemergencyaccesstofacility.(sec.4.4)

    Inadditiontofeaturessuchasredundantelectricalandenvironmentalsystems,theMACC

    maintainsafulltimecoordinatorandstaffwhoprovide24x7responsestofailuresormalfunctionsinthe

    serverenvironment.Alertspromptedbyissueswiththeenvironmentalsystemsorpoweraresenttothe

    UniversityofMichiganNetworkOperationsCenter(NOC)duringnonbusinesshours.

    o Overview: TheMACC'sredundancyisdesignedtoensurethesafetyandsecurityofthe

    datahousedwithin.Itconsistsof:

    Adualpowerpathfromthepropertylinetothepowerdistributionunits

    Dieselpoweredgeneratorsforelectricalbackup Flywheels(notbatteries)toprovidepowerwhilethegeneratorscome

    on

    Stateoftheartgeneratorsandflywheelsforbackuppower Threeextracomputerroomairconditioners Twoextradrycoolers Glycolloopforcoolingwithtwoparallelpathwayswithcrossovervalves

    atregularintervals.47

    Astateoftheartmonitoringsystemkeepstrackof1,700differentparametersandautomaticallynotifiesstaffofanyirregularity.48

    o EnvironmentalControlsandMonitoring TheMACChas18ComputerRoomAirConditioningunits(CRACs).Atanygiven

    time,only15arenecessarytomaintaintherequiredtemperatureandhumidity.

    [Thus,thecomputerroomhasN5+1redundancyinitscoolingability.]Italsois

    equippedwithanumberofportablecoolerstoaddressspecificcoolingneeds.

    Theheatfromtheroomistransferredtoanunderfloorglycolloopthat

    releasestheheattotheoutdoors.49

    46PleaserefertoAppendixH(MACCServerHostingServiceLevelAgreement).

    47MichiganAcademicComputingCenter.VitalStatistics(2009)retrievedfrom

    http://macc.umich.edu/about/vitalstatistics.phpon16June2009.48.MichiganAcademicComputingCenter(2009)retrievedfromhttp://macc.umich.edu/index.phpon16June

    2009.49.VitalStatistics(2009)retrievedfromhttp://macc.umich.edu/about/vitalstatistics.phpon16June2009.

  • 7/27/2019 Rapport d'HathiTrust sur un plan de sauvegarde des donnes informatiques en cas de sinistre.

    28/61

    20090824 22

    Thelayoutofthefacilityallowsthefrontonthecomputerrackstobefacingthecoldaisles.Theseaisleshaveperforatedfloortilesthroughwhichthecool

    airispumpeddirectlytothecomputerslocatedthere.Heatisdischargedfrom

    thebacksofthecomputers,whichcreatesthehotaisles.Thisalternating

    arrangementfacilitatesthecoolingprocess,asthehotairproducedbythe

    computerscanbesiphonedoffbeforeitminglestoomuchwiththecoolerairof

    thefacility.50

    TwoseparatesmokedetectionandfirealarmsystemsprotecttheMACC.Oneisforthebuilding;theotherisfortheMACCitself.Thetwosystemswork

    togethertoactivatealarmsystemsandnotifythefiredepartmentandkey

    personnel.Intheeventofanactualfire,thefiresuppressionsystempipeswill

    notfillwithwaterunlessthereisapressuredropcausedbymeltingofoneor

    moreofthesprinklerheads.51

    o BackupPower Threegenerators,eachroughlythesizeofarailcar,providebackuppower.

    Onlytwoofthethreearerequiredtorunthefacilityintheeventofapower

    outage.52

    TheMACCusesenvironmentallyresponsibleflywheelsinsteadofbatteriesforpowerbackupwhilethegeneratorscomeonline.Thecombinationofgenerators

    andflywheelsprovidesthefacilitywithafullyredundantuninterruptiblepower

    system(UPS).53

    TheMACChasacontractwiththeUMPlantOperationsDivisionforthedeliveryofdieselfuelforitsgeneratorsintheeventofanextendedblackout.54

    Intheeventthatabackupgeneratorisdisabled,theMACCcoordinatorwillinitiateloadshed,inwhichonehalfoftheMACCwillbeshutdownsothatthe

    otherhalf(andrequisiteenvironmentalsystems)maycontinuetooperate.The

    HathiTrustandUMLibraryracksareamongthosewhichwillretainpower

    shouldthisresponseprovenecessary.55

    ArborLakesDataFacility(ALDF)TheALDFhousestheTSMGroupsinfrastructureandoneinstanceofthebackuptapelibrary

    thatformsanintegralpartofHathiTrustsDisasterRecoverystrategy.Asthehomeofcritical

    componentsoftheUMnetBackbone,theALDFprovidesasafeandsecurelocationforonesetofthe

    repositorysbackuptapes.Intheinterestofsecurity,thisreportwillomitfurtherinformationonthe

    exactnatureofthefacilityspowerandenvironmentalsystems.

    50Ibid.

    51Ibid.

    52.MichiganAcademicComputingCenter(2009)retrievedfromhttp://macc.umich.edu/index.phpon16June

    2009.53Ibid.

    54Gobeyn,Rene(MACCDataCenterCoordinator).Personalinterviewon23June2009.

    55Ibid.

  • 7/27/2019 Rapport d'HathiTrust sur un plan de sauvegarde des donnes informatiques en cas de sinistre.

    29/61

    20090824 23

    Scenario6:SoftwareFailureorObsolescence

    Review:RisksInvolvingSoftwareFailureorObsolescenceThefollowingtabledetailsvariousrisksinherenttosoftwarefailureorobsolescenceandranks

    themaccordingtotheirseverity.

    HathiTrustsSolutionsforSoftwareIssues ThedevelopmentanduseofHathiTruststoolsandresourcesdependsonhighlyfunctional

    softwareapplications.Repositorypolicieshavethereforebeencraftedtoensurethattheseapplications

    arethoroughlytestedandregularlyupdatedtominimizethethreatofserviceoutagesasaresultof

    softwarefailureorobsolescence.HathiTrustfurthermoreemploysopensourceapplicationsthatare

    wellsupportedandenjoywidespreaduseanddevelopmentwithinthedigitallibrarycommunity.

    o Changesinsoftwarereleasesofallcomponentsofthesystem(fromingesttoaccess)aredevelopedandtestedinanisolateddevelopmentenvironmenttopreparefor

    releasetoproduction.Whenreadyforrelease,developersrecordthechangesmade

    andincrementversionnumbersofsystemcomponentsasappropriateusingaversion

    controlsystem.Newversionsofsoftwarearereleasedusingautomatedmechanisms(in

    ordertopreventmanualerrors).Majorchangesandupgradesinhardwarearchitecture

    arerecordedinmonthlyreportsofunitactivity,andthusaretraceabletothatlevelof

    detail.(HTTRACC1.8).

    o Additionally,subsetsofproductiondataareavailableinthedevelopmentenvironmenttoallowdeveloperstoensurepropersystembehaviorbeforereleasingchangesto

    production.(HTTRACC1.9)

    o Inordertodesign,buildandmodifysoftwareforthedesignatedendusercommunity,HathiTrustconductsanactiveusabilityprogramandseeksinputfromtheStrategic

    AdvisoryBoardofHathiTrust.Similarly,withregardtosoftwaredevelopmentinsupport

    ofthearchivingneedsoftheParticipatingLibraries,HathiTrustfocusesonthe

    developmentofhighlyfunctionalingestandvalidationmechanisms.HathiTrustalso

    seeksandrespondstoguidancefromtheStrategicAdvisoryBoardwithregardto

    archivingservices.(HTTRACC2.2)

    Severity Events

    Highimpact Softwarebugescapesdetectionindevelopmentenvironmentandresultsincrashofapplication.

    ModerateImpact Softwarebugescapesdetectionindevelopmentenvironmentandpreventsfullaccesstodigitalobjects.

    Improperversionofsoftwareisintroducedtosystem(couldhaveagreaterorlesserimpactdependingonresultsoferrorandrepositorysabilitytodetectit).

    LowImpact

    Softwarebugescapesdetectionindevelopmentenvironmentandpreventsfulluseofsystemcapabilities(i.e.,rotationofimagesoradditionalfunctionality)

  • 7/27/2019 Rapport d'HathiTrust sur un plan de sauvegarde des donnes informatiques en cas de sinistre.

    30/61

    20090824 24

    Scenario7:OperatorError

    Review:RisksInvolvingOperatorErrorThefollowingtablesummarizesriskstoHathiTrustposedbyoperatorerror;eventsareranked

    accordingtotheirpotentialseverity.

    HathiTrustsSolutionsforOperatorErrorInanyhumanenterprise,occasionaloperatorerrorisunavoidable;HathiTruststrivestoensure

    thatanysucheventsaredetectedandresolvedinatimelyfashion.56Tohelpavoidoccurrencesand

    mitigatetheirpotentialimpact,HathiTrusthasautomatedmanyproceduresandalsoreliesupon

    applicationassertions,whichcannotifyadministratorswhenprocessesarenotoperatingcorrectly.Even

    ifanerrorisintroducedtothefilesystemandthenbackedup,theTSMclientsavesuptosevenversions

    ofafileforuptosixmonthssothatanearlierversioncanberetrieved.

    Ingest:TheGoogleReturn(ObjectOriented)ValidationEnvironment(GROOVE)processisentirelyautomatedtoavoidtheintroductionofoperatorerrortotheprocess;stepsinclude:

    o Identificationofmaterialforingesto

    DecryptionandunzippingoffilesFormatverificationandvalidationwithJHOVEo LunBarcodeandMD5checksumvalidationo CreationofHathiTrustMETSdocumentso EstablishmentofHathiTrusthandles(persistentURLs)o Extensionofthepairtreefiledirectory(asnewmaterialentersthesystem)

    ArchivalStorage:FilesstoredwithintherepositoryarenotaccesseddirectlyormanipulatedbystaffsothatneitherthezippedimageandOCRfilesnortheMETSdocumentmaybeaccidently

    alteredordeleted.

    Dissemination:Thepageturnerapplicationreferencesthestoredimageandthencreatesa.png(forTIFFs)or.jpg(forJPEG2000s)filefordisplaytotheviewer.

    DataManagement:Newversionsofsoftwarearereleasedusingautomatedmechanisms(inordertopreventmanualerrors).(HTTRACC1.8)

    56PleaserefertoAppendixB(HathiTrustOutagesfromMarch2008throughApril2009).

    Severity Events

    Highimpact Operatorerrorresultsintheirreparablelossofdataordamagetoequipment. Operatorerrorresultsinlossofkeyrepositoryfunctions(ingest,storage,

    dissemination,etc.)foranextendedperiodoftime.

    ModerateImpact Operatorerrorremainsundetectedandcausespersistentproblemsinthesystembuthasnolongtermconsequences.

    LowImpact Operatorerrorisdetectedbynormalproceduresorviaanactivitylogandcanbereadilycorrected.

  • 7/27/2019 Rapport d'HathiTrust sur un plan de sauvegarde des donnes informatiques en cas de sinistre.

    31/61

    20090824 25

    Scenario8:PhysicalSecurityBreach

    Review:RisksInvolvingaPhysicalSecurityBreach MaintainingthephysicalsecurityoftheHathiTrustinfrastructureisyetanothercrucialelement

    intherepositoryseffortstomanagerisksandtherebylessenthechancethatadisastertypeevent

    occurs.Risksinvolvethedamageanddestructionofequipmentandcouldevenextendtounauthorized

    systemaccess.MultiplelevelsofsecurityexistatboththeMichiganAcademicComputingCenter

    (MACC)andtheArborLakesDataFacility(ALDF)toprotectHathiTrustfromtheactsofvandalism,

    destructionormalicioustampering.Detailsonthepotentialimpactsofaphysicalsecuritybreachare

    coveredinScenario1:HardwareFailureandScenario3:NetworkSecurity.

    HathiTrustsSolutionsforPhysicalSecurityo Eachof[theHathiTrust]storageortapeinstancesisphysicallysecure(e.g.,inalocked

    cageinamachineroom)andonlyaccessibletospecifiedpersonnel.57

    SecurityattheMACCTheMACCServerHostingSLAstatesthedatacenterstaffwill:

    o Provideservicesnecessarytomaintainasafe,secure,andorderlyenvironmentforalltenantsoftheMACC.(sec.4.7)

    o ProvideaccesscontrolviaHiDcardandbiometricreadersforthoselistedontheTenantStaffAuthorizedforAccesslist.(sec.4.5)

    TheMACCWebsiteandtheMichiganAcademicComputingCenterOperatingAgreement58provide

    additionaldetailsconcerningtheresourcesandproceduresthathelpprotectHathiTrustsequipmentat

    theMACC.TheMACCDataCenterCoordinatorpersonallyoverseestheenforcementofsecurity

    protocolsandconductsregularauditsofsecuritylogsand,whennecessary,reviewssurveillancevideo

    footage.

    o SecuritySystems Stateoftheartsecuritydevicessuchasirisscanners,cameras,closedcircuit

    televisionandoncallstaffkeepthedataandmachineshousedintheMACC

    safe.59

    Accesstothedatacenterwillbebytwofactorauthentication(accesscardandirisscan)orescorted,supervisedaccess.Accesstothebuildingwillbebyaccess

    card.(MACCOA,sec.5.3.1)

    Camerasthroughoutthecorridor,securitytrap,andfacilitywillbemonitoredandmaintainedbytheDataCenterCoordinator.(sec.5.2.1)

    o SecurityProcedures57HathiTrust.Technology(2009)retrievedfromhttp://www.hathitrust.org/technologyon15June2009.

    58PleaserefertoAppendixI(MichiganAcademicComputingCenterOperatingAgreement).

    59MichiganAcademicComputingCenter.VitalStatistics(2009)retrievedfrom

    http://macc.umich.edu/about/vitalstatistics.phpon17June2009.

  • 7/27/2019 Rapport d'HathiTrust sur un plan de sauvegarde des donnes informatiques en cas de sinistre.

    32/61

    20090824 26

    TheOperationsAdvisoryCommitteewillestablishproceduresforgrantingaccesscardstothefacilitytothosewhosejobsrequirehandsonaccessto

    systems.Allrequestsforaccesscardswillbevettedandapprovedbythe

    OperationsAdvisoryCommitteeattheirnextmeeting.(sec.5.3.2)

    Everyoneontheaccesslistforthedatacenterwillberequiredtoattendatrainingsessionbeforeworkinginthedatacenterandsignanaccessagreement

    statingpoliciestheymustobservewhileinthedatacenter.(sec.5.3.8)

    SecurityattheALDFAsnotedintheTSMBackupServiceSLA,theUniversityofMichigansITCSisresponsiblefor

    physicalsecurityattheALDF.(sec.4.9)Whilethisdocumentwillnotdetailspecificfeaturesofthe

    ALDFsoperation,multiplelevelsofsecurityandoversightareemployed.

  • 7/27/2019 Rapport d'HathiTrust sur un plan de sauvegarde des donnes informatiques en cas de sinistre.

    33/61

    20090824 27

    Scenario9:NaturalorManmadeDisaster

    Review:RisksInvolvingaNaturalorManmadeDisasterThefollowingtabledetailstheriskstoHathiTrustposedbyanaturalormanmadedisaster;

    eventsarerankedbyorderoftheirseverity.DuetopossibleoverlapbetweenthisscenarioandScenario

    1(HardwareFailure),readersareencouragedtoconsultthatearliersection.

    HathiTrustsSolutionsforNaturalorManmadeCatastrophicEventsTheUniversityofMichiganAnnArborCampusEmergencyProcedures(revisedJanuary2008)

    hassetprocedurestoaddressbuildingevacuations(intheeventoffire),tornadoes,severeweather,

    flooding,chemical/biological/radioactivespills,aswellasbombthreats,civildisturbances,andactsof

    violenceorterrorism.60Inallcases,staffwillfollowthedirectionsofPublicSafetyandnotreenter

    buildingsorresumeworkuntiladvisedtodosobyDPSorOSEHorsomeonefromonsiteincident

    command.

    Intheeventofaseverenaturalormanmadedisaster,therepairandrestorationofthephysical

    locationsofHathiTrustinfrastructurewouldneedtobecoordinatedbetweentherepositoryandthe

    appropriatefacilitymanagers.Suchactivitywouldrelyuponthedisasterrecoveryplansinplaceatthe

    MITCBuilding(homeoftheMACC)andUniversityofMichigan(whichincludestheHatcherGraduate

    LibraryandtheALDF).Itmustbenotedthataneventwhichcausessignificantdamagetoanimportant

    structureortoabuildingsinfrastructurecouldresultinthelossofaninstanceoftherepositoryforan

    extendedperiodoftime.Insuchacase,HathiTrustwouldneedtosetupanalternatehotsiteuntil

    structuralrestorationiscomplete(oranewfacilityhasbeenfound).

    60PleaseseeAppendixC(WashtenawCountyHazardRankingList).

    Severity Events

    Highimpact Widespreaddamagetoadatacenterand/oritsinfrastructurethatforcesaninstanceoftherepositorytofindanewhotsitewithsufficientpowersupply,

    environmentalcontrols,andsecurity.

    Damagetoworkareasforcestafftorelocatetoanewcenterofoperations. Extensivelossordamagetohardwarerequireslargescalereplacement. Withtheextendedlossofonesite,HathiTrustlosesredundancy(andpossiblysome

    functionality:i.e.theabilitytoingestnewmaterialinAnnArbor)andthusacentral

    componentofitsdisasterrecoveryandbackupplans.

    AnactofviolenceorterrorismoccursatornearHathiTrustfacilities.ModerateImpact Aneventresultsinanextendedoutageatonesitethatexceedstherecoverytime

    objective.

    Hardwaresustainssomedamageandsiteisabletocontinueoperationinareducedcapacity.

    AnactualorthreatenedactofviolenceorterrorismforcesthetemporaryevacuationorquarantineofHathiTrustfacilities.

    LowImpact LocalconditionsresultinatemporaryoutageataHathiTrustsite.

  • 7/27/2019 Rapport d'HathiTrust sur un plan de sauvegarde des donnes informatiques en cas de sinistre.

    34/61

    20090824 28

    BasicDisasterRecoveryStrategiesIntheimmediateaftermathofalargescalemanmadeornaturaldisaster,therepositorys

    immediaterecoverywillbeenabledbyitsbasicsystemarchitecture:

    o theinitiativestechnologyconcentratesoncreatingaminimumoftwosynchronizedversionsofhighavailabilityclusteredstoragewithwidegeographicseparation(thefirsttwoinstancesofstoragearelocatedinAnnArbor,MIandIndianapolis,IN),aswellasan

    encryptedtapebackup(writtentoandstoredinaseparatefacilityoutsideofAnn

    Arbor).61

    TheestablishmentofthemirrorsiteinIndianapolisandtheretentionofmultiplebackuptapesattwo

    locationsinAnnArborensurethataseriouseventateitherlocationwillnotimpedethecontinued

    functioningoftherepositoryattheother.Considerationmustbegivenastohowdataatthe

    Indianapolissitewillbebackedupandhowkeyrepositoryfunctions(suchasingest)willproceedifthe

    AnnArborinstanceisofflineforanextendedperiodoftime.Likewise,alongtermoutageattheIU

    locationwouldrequireHathiTrusttoestablishathirdsitefordatabackup(i.e.,alocationwhere

    additionalcopiesofbackuptapescouldbestored).

    61HathiTrust.Technologyretrievedfromhttp://www.hathitrust.org/technologyon15June2009.

  • 7/27/2019 Rapport d'HathiTrust sur un plan de sauvegarde des donnes informatiques en cas de sinistre.

    35/61

    20090824 29

    Scenario10:MediaFailureorObsolescence

    Review:RisksInvolvingMediaFailureorObsolescenceThefollowingtablesummarizesriskstoHathiTrustposedbythefailureofthemediausedforits

    databackups.Whiletherisksfromthisarelimited(bothcopiesofthetapebackupswouldhavetobe

    impactedfordatatobeunavailable),theissueshouldnonethelessbeaddressedwithregulartest

    restorationsand/orinspectionsofthemedia.

    HathiTrustsSolutionsforMediaFailure GiventhenatureofHathiTrustsstoragesystem,thisscenarioisonlyaconcerninregardstothe

    digitalmagnetictapesusedbytheTSMGroupforbackups.

    o TwotapecopiesofallbackupdataaremadeandthesearestoredinseparateclimatecontrolledconditionsintapelibrariesattheMACCandtheALDF.

    o Contentistransferredtonewtapeduringdatadefragmentation(whichoccurswhenexistingtapesare80%full),

    o Ifadegradedorotherwisebadsectionoftapeisdetectedduringabackupprocedurethattapeisimmediatelymarkedasreadonly.

    Dataisthenceforthwrittentoadifferenttape;existingdataonthebadtapewillbecopiedtoproperlyfunctioningmedia.

    Ifdatacannotbereclaimedfrombadtape,theTSMGroupwouldcontactHathiTrustsothatthebackupofcontentcanbeproperlycompleted.

    RemainingVulnerabilitiesThereissomereasonforconcerninthisareabecausetheTSMGroupdoesnothavearegular

    programtomonitoritsmediaforphysicaldegradationorimpairmentafterdatadefragmentation.While

    thetapesarereportedtobehighlydependable,problemssuchasstickyshed(thehydrolysisofthe

    tapesbinder)couldbecomeanissuewitholdertapes.Aregularprogramoftapevalidationortest

    restorationswouldprovideanopportunitytocheckonthephysicalconditionanddataintegrityofthe

    tapes.Likewise,thecreationofascheduleforthereplacementofoldertapescouldavoidfuture

    problemswithmediadegradation.

    Severity Events

    Highimpact Physicaldegradation(i.e.intapebinder,substrate,ormagneticcontent)affectsbothcopiesofolderbackuptapes.

    ModerateImpact Becausebackuptapesarenotregularlytestedoraudited,thephysicalsubstrateoftapesmaydegradeovertime.

    LowImpact Badtapeisdetectedduringatapebackup.

  • 7/27/2019 Rapport d'HathiTrust sur un plan de sauvegarde des donnes informatiques en cas de sinistre.

    36/61

    20090824 30

    ConclusionsandActionItems

    ConclusionsAsthisreportdemonstrates,avarietyofriskmanagementstrategiesinadditiontodesign

    elements,operatingprocedures,andserviceandsupportcontractsendowHathiTrustwiththeabilityto

    preserveitsdigitalcontentandcontinueessentialrepositoryfunctionsintheeventofarangeof

    disasters.TheestablishmentoftheIndianapolismirrorsite,theperformanceofnightlytapebackups,

    andtheredundantpowerandenvironmentalsystemsoftheMACCreflectprofessionalbestpractices

    andwillenableHathiTrusttoweatherawiderangeofforeseeableevents.Asitis,disastersoftenresult

    fromtheunknownandtheunexpected;whiletheaforementionedstrategiesarecrucialcomponentsof

    aDisasterRecoveryPlan,theymustbesupplementedwithadditionalpoliciesandprocedurestoensure

    that,comewhatmay,HathiTrustwillbeabletocarryonasbothanorganizationandadedicatedservice

    provider.

    IntheefforttosecureHathiTrustslongtermcontinuity,thepresentdocumentstandsmerelyas

    apreliminarystepintheestablishmentofalegitimateDisasterRecoveryPlan.ThedataonHathiTrusts

    policies,procedures,andcontractsconsolidatedhereinshouldfacilitatethedatacollectionrequisiteto

    theinitialphasesoftheplanningprocess,butthecoreactivitiesofformulatingtechnicaland

    administrativeresponsestrategiesanddelegatingrolesandresponsibilitiesremaintobeundertaken.

    Thefollowingsectionoutlinesrecommendationsandactionitemsderivedfromresearchintothe

    repositoryaswellasfromdiscussionswithCorySnavelyandotherHathiTruststaffmembers.Itemshave

    beenseparatedintoanapproximatetimelineofactivityrangingfromShortTermthroughLongTerm

    andthearrangementwithineachcategoryrepres