【數據中心設計】CDCP 學習筆記 - 數據中心 (Part 13) - Data Centre Design - Operational Considerations

Service Level Management

  • The organization should maintain a service catalogue
  • The service catalogue is a key component of service delivery and should describe:
    • Unique identifier of the service
    • Description of the service should include:
      • Description of the service to be delivered
      • Service hours and exceptions
      • Availability requirements
    • Support:
      • Response times
      • Escalation and contact points
      • Cost


  • An SLA (Service Level Agreement) is a legal document which provides a mechanism where the cost of non-conformity will be accounted for
  • An SLA should describe the service commitments at an appropriate level of detail
    • Whatis to be provided
    • What constitutes a violation of the SLA
  • An SLA should be maintained by regular reviews
  • Service levels should be monitored and reported against documented targets
  • Non-conformance should be reported, reviewed, and where appropriate, escalated


Safety 安全措施

  • The organization should have an occupational health and safety policy
  • Policies, plans and procedures should be in place addressing emergency preparedness and response
    • Plan, act, evaluate, take corrective actions
  • Safety staff should be appointed which should have clearly defined roles, responsibilities and authorization levels, considering the following functions:
    • Risk manager
    • Safety manager
    • First Aid officer
    • Emergency warden
  • The organization should conduct (regular) safety awareness training for all staff
  • To protect individuals from exposure to workplace hazards and the risk of injury, staff should be familiar with the usage of Personal Protective Equipment (PPE), such as:
    • Ear protection
    • Safety glasses
    • Hard hat
    • Protection gloves/shoes
    • Insulated tools


Security 安保措施

  • To manage entry control of individuals, a security matrix should be established using the following categories:
    • Organization staff
    • Contractors
    • Vendors/suppliers
    • Customers
    • Visitors
  • To manage entry control of incoming and outgoing goods, inspections (holding area) should take place for potential security risks as well as hazards
  • Goods(equipment), parcels,letters, etc.
  • Where applicable security controls need to be monitored
  • Security patrol may be required for data centres where a higher level of security is required
  • All individuals working at the data centre should attend a security awareness training
  • The security awareness program should include the following:
    • Overall security policies
    • Specific security requirements of the department
    • Behavioural considerations
    • Security incident reporting structure


Facilities Maintenance 設備維護

  • Several types of maintenance activities may take place:
    • Preventive / predictive / reactive (corrective)
  • The organization should have appropriate maintenance agreements in place and should cover the following, not limited to:
    • Legal entity name
    • Start and end date
    • Description services provided
    • Qualifications and experience of personnel allocated
    • Commercial terms
    • Names and signatures by authorized officers


  • A maintenance schedule should be created and maintained
  • The schedule should be published on a need to know basis
  • The organization should keep track of scheduled events and the actual date and time of execution
  • Maintenance includes:
    • Equipment
    • Cleaning
    • Labeling
    • Documentation
    • Etc.

  預防性維護 —— 即使在可能不需要的情況下,也要進行常規的維護。

  預測性維護 —— 實時維護,基於監測設備性能和狀態。

  反應性(或糾正性)維護 —— 發生故障,需要解除故障。


Governance - documentation

  • The organization should ensure that the data centre establishes a fully functional document management system, addressing the following steps:
    • Creation
    • Classification
    • Approval
      • Creator / modifier / reviewer / approver
    • Publishing
      • Online (digital) / hard copy (paper)
    • Maintenance
    • Archiving
    • Destruction


Governance - vendor management

  • Vendors should be selected and managed in a controlled fashion, considering the following activities:
    • Service requirements analysis
      • Technical / financial / commercial / legal
    • Request for Proposal (RFP)
    • Contract management
    • Vendor management
    • Performance (SLA) reviews
    • Retirement



【數據中心設計】CDCP 學習筆記 - 數據中心 (Part 12-2) - Data Centre Design - Designing a Scalable Network Infrastructure

Cabling recommendation TIA-942-B (2017)

  • Category 6, 6A, or 8Category 6A or higher is recommended)
  • OM3, OM4, or OM5OM4 or OM5 is recommended)
  • Add MPO-16, MPO-24 and MPO-32 as options for termination of more than 2 fibres in addition to the MPO-12 connector
  • Add 75-ohm coaxial cables and connectors in ANSI/TIA-568.4-D

Testing and verifying structured cabling

  • Get 3rd party compliance proof such as ULZ ETS before confirmation of using a brand and new technologies, e.g. Cat 6A, Cat8, OM4 and OM5
  • 100% testing and 3rd party testing to verify the installation
  • Copper cable parameters to test
    • Return loss / propagation delay / delay skew
    • ANEXT (Cat6A only, seldom test on-site)
  • Fibre testing
    • Return loss / insertion loss / link testing
    • Laser bandwidth DMD/ EMB not verifiable on site

Storage Area Networks (SAN)

  • Allows for fast, flexible and redundant network-wide data storage
  • SAN requires high-quality optic networks
    • FC-AL (Fiber Channel Arbitrated Loop)
      • Hub connects servers to storage
    • Switch fabric
      • High speed, low latency switches, preferred for enterprise-class networks
  • Most SANs use OM3/4 multimode fiber
  • Switch fabric needs lots of network points
    • Plan capacity - use structured cabling
    • Avoid point to point cabling clogging up the underfloor as it may result in poor cooling and a high risk of failure

Network redundancy

  • Network diversity
  • Network redundancy
  • Redundancy on the backbone
  • Redundancy in the data centre
    • Redundant network equipment in sub-racks
    • Redundant cable paths
    • Separate office building management networks
  • Ensure separate physical routes


Building to building connectivity (1) - Telco 電訊公司

  • Determine capacity needed
  • Evaluate local telco capability
    • Connectivity (speed/bandwidth)
    • Uptime guarantees
    • Service
    • Pricing
  • Budget appropriately
    • Monthly subscription and/or usage fees
  • Commonly an expensive option
  • But if off-site is typically the only option for hardwired connections


Building to building connectivity (2) - Hardwire 專屬線纜

  • Copper/fibre cabling from building to building
  • One-time only investment
  • Time-consuming to install
  • Commonly an expensive option
  • What if you relocate


Building to building connectivity (3) - Canopy 無線覆蓋

  • Direct connection between points or as AP (access point)
  • Range
    • > 190 km as AP
    • > 249 km as Point to Point
  • Very good transmission speed
    • Approx. 400 Mbps (PMP 450) with multi-point
    • Approx. 500 Mbps (PTP820) point-to-point
  • Security features (encryption)
  • Reasonably prone to EMF / EMP
  • One-time investment


Building to building connectivity (4) - FSO 無線激光

  • Free Space Optics (FSO)
  • Direct connection between points
    • Requires line of sight
  • Advantages
    • Many vendors
    • High-bandwidth
      • 1 Gbps - 30 Gb/s
    • Good reach
      • 1.5 km - 4.4 km
    • Protocol independent
    • Quality transmission
    • Highly secure
    • License-free worldwide
    • One time investment


Network monitoring system required capabilities

  • Flexible and versatile
  • Multi-vendor support
  • Supporting your network technologies
    • ATM / Frame Relay / MPLS VPN & TE etc.
  • Supports SNMP (Simple Network Management Protocol)
    • V1 & V2
    • SNMP V3 for secure networks
  • Support for RMON
  • Automated root cause analysis
  • Notification capabilities
  • Reporting capabilities
    • Offline / Online

建議數據中心運營商購買一個較靈活多變的分析系統,並且支持較多廠商。分析系統一般都會支持SNMP V1和V2,但對於允許安全監控的V3標準並不總是支持。支持RMON技術,RMON可以詳細分析網絡連接,具備流量整形等功能。RMON能自動分析所有的硬件連接,查找問題的緣由。定期通知工作/管理人員,使其能夠持續地了解網絡的當前狀態。系統可以按照需要生成離線和實時報告。


【數據中心設計】CDCP 學習筆記 - 數據中心 (Part 12-1) - Data Centre Design - Designing a Scalable Network Infrastructure

Structured cabling system

  • Network cabling is the foundation to support a high-availability data centre, IT equipment and its applications 
  • Proven products and contractors are crucial for the proper design, installations works and maintenance of a cabling infrastructure; it will reduce downtime and improve
    • Operational efficiency
    • Manageability
    • Reliability
    • Availability

ICT網絡是數據中心設施建設的最終目的,而network cabling system是所有ICT設備的基礎。其產品質量和設計好壞直接決定了數據中心的可用性。



Consider the following in planning the cabling infrastructure of the data centre:

  • Structured cabling (更昂貴,一次到位)vs On-demand cabling(逐步增加)
  • Current requirements/ future growth
  • Which media (fibre, Cat6, Cat6A etc.)
  • Apparatus (i.e. type of patch panels etc.)
  • ANSI/TIA-942 data centre standard cabling requirements
  • Brand name preference/ global specification / global pricing

Structured cabling

  • Reduce the risk of downtime
  • Easy re-patching
  • Easy fault finding
  • Better cooling
  • Standardized length - easy stock

Copper cables

  • Unshielded / Shielded
  • Solid cables
    • Less insertion loss
    • Best used for permanent links
    • Length maximum 90 metres
  • Flexible / Stranded cables (patch cords)
    • Flexible
    • Higher losses
    • Best used for patch panel / short distances
    • Maximum length < 10 metres
  • Total channel (Solid + Flexible) = 100m
  • Cat 5E = 100 Mb/s, 1 Gb/s
  • Cat 6 = 1 Gb/s or Cat 6A = 10 Gb/s
  • Cat 7 or 7A = 10 Gb/s (Non-RJ45)
  • Cat 8 = 40 Gb/s (2 GHz, 30 metres, shielded cabling, 2 connections channel ONLY, designed with data centre focus

Copper termination / patch panels

  • Structured cabling patch panels terminate cable links
    • Flat panel
    • Angled panel
  • Use the correct patch cables (RJ45)
    • Patch cable should always be the same or class higher than structured cable
    • Match shielded or unshielded twisted pair types
    • No on-site termination

Fibre cables

  • Not prone to EMF
  • Distance longer than copper
  • Lightweight and smaller than copper
  • 62.5/125 pm and 50/125 pm (multi-mode)
  • 8.3/125 pm (single-mode)
    • The first number represents the diameter of the core
    • The second number represents the size of the cladding
    • Both values are in microns
  • The actual diameter for a fibre 'cable' is much bigger because of protection over the cladding



Single-mode vs Multi-mode

  • Single-mode
    • Laser light source
    • Longer distance (several km)
    • Used in infinit band, carrier circuits, campus environment and other specialized applications
    • Standards: ITU G.652; OS1, OS2
  • Multi-mode
    • LED (100 Mb/s, 1 Gb/s), Laser light source (10 Gb/s and above)
    • More expensive than single-mode but the equipment itself is cheaper
    • Standards: ISO 11801; OM1, OM2 for speed up to 1Gbps, OM3 and OM4 for 10Gbps and up


Multi-mode光纖的纖芯直徑比Single-mode光纖大得多,通常為50-100微米。它允許更多的光源(不需要非常精確),因此,平均成本較低,可適用於短距離傳輸且高達1 Gbps的速度。

ISO 11801標準分類了OM1, OM2, OM3和OM4。OM3和OM4進一步定義了光纖的帶寬,EMB(Effective Modal bandwidth) ),單位是MHz/Km。 如下圖所示:


Fibre terminations/patch panels

  • SC largely replaced by:
    • LC or mini LC
  • MPO now used for 40 Gb/s and higher speed links
  • Handle fibre with care, ensure connectors are always cleaned before mating
  • Don't exceed the bend radius for the cable



TIA-942 network cable / Rating levels illustration

上圖中,Rated I/II/III/IV表示了布線/元件/空間對冗余的要求。
  • 在Rated I中,不需要冗余組件,比較簡單。
  • 在Rated II中,需要至少兩個獨立的路由的進線,不同路由進線至少有20米的間隔。
  • 在Rated III中,需要至少兩個獨立的主進線房,它們之間應至少相隔20米,兩間房需要處於獨立的防火分區。
  • 在Rated IV中,需要至少兩個獨立的配線房,這提供額外的冗余,但代價是路由變得異常覆雜,難於管理。。

以上網絡進線的冗余配置非常類似電力系統的Rated 1/2/3/4的配置,由主線後備,配線後備,一直到線路提供設備前的末端切換,可以類比學習。


【數據中心設計】CDCP 學習筆記 - 數據中心 (Part 11) - Data Centre Design - Physical Security & Safety and Auxiliary Systems

Questions need to be addressed:

  • What are you trying to protect?
  • What are the threats?
  • What is the probability of the danger occurring?
  • What is the impact?
  • What is the risk identified?
    • Risk = impact * probability(risk assessment)
  • Does the risk identify require a control?
  • How does one design, build, implement (and subsequently during operations, monitor, review and improve) the control?

Physical security considerations

  • Perimeter protection and intrusion detection 
    • Fencing / Walis
      • Barbwire facing outwards, slanted 45 degrees
    • Security control room/guard house
    • Boom barriers
    • Concrete bollards
    • CCTV(Closed-Circuit Television) cameras
    • Security zones
    • Access control (badge) system
    • Security awareness posters

CCTV cameras

  • CCTV
    • Recording on the hard disk
    • Motion detection cameras
    • Night vision camera
    • Event/face recognition
  • Install cameras in such a way that the areas they monitor are overlapping to prevent black spots / blind areas(overlapping)
  • Cameras and recorder must be on UPS
  • Recorder to be located in a secure area
    • Avoid placement inside the computer room
  • Copy of hard disk should be stored off-site or in another remote area of the building

Entry control

  • Areas can be secured in various ways
    • Revolving doors
    • Mantraps
    • Turnstiles
    • Cages —— No impact on cooling/fire suppression system, ensure that cages are installed slab-to-slab (subfloor of area up to the ceiling).
  • Door locks
    • Key lock —— Proper key management procedures
    • Electronic locks —— Card reader / security code / biometrics (fingerprint, iris scan etc.)

Physical safety considerations

  • Signage (regulatory and additional), using indicators for:
    • Location of fire extinguishers
    • Location of first-aid kit
    • Emergency numbers and contacts
  • Escape routes at each door
  • Emergency response plans
  • Safety awareness posters
    • Cardiopulmonary Resuscitation (CPR)
    • General safety practices

Monitoring system —— Data Centre monitoring requirements

  • To have the ability to see at a glance everything is in a normal state
  • To have peace of mind that should an alarm condition occurs, the relevant personnel will be informed(24x7 informing relevant staff)
  • Have centralized monitoring capabilities that integrates with current monitoring software
  • Keep a history of alarms and trending data for analysis(provide detailed reports)


  • EMS —— Environmental Monitoring System
    • Monitors only
    • Most of the time low-level monitoring only (i.e. dry/alarm contacts)
    • Relatively in-expensive
    • Limited alarm contact inputs and limited notification capabilities
  • BMS —— Building Management System
    • Monitors and control
    • Provides High-level monitoring (i.e. full parameter monitoring)
    • Relatively expensive
    • More detailed level compared to an EMS
  • Either system fits a certain purpose
    • The purpose is that abnormalities are noticed early so that actions can be taken to avoid disasters

DCIM —— Data Centre Infrastructure Management

  • DCIM integrates information technology and facility management disciplines to centralize monitoring, management and intelligent capacity planning of a data centre's critical systems
  • A lot of variances exist with most DCIM solutions to focus on:
    • Asset management
    • Power monitoring
    • Environmental monitoring
    • Capacity planning
    • Change management


Water leak detection

  • Pad based
    • Covers certain areas only
    • Inexpensive
    • for smaller data centres only
  • Cable based
    • Placed under the raised floor along the perimeter and pipes
    • Detection only or distance monitoring
    • Need to keep cable clean
    • for large data centres

Notification system for monitoring system

  • Should be able to alert persons and groups relevant to the alarm detected
  • Should be able to have various thresholds, both severity-based and timing-based with corresponding alerts
  • Alerts to be communicated by
    • E-mail
    • SNMP(Simple Network Management Protocol)
    • SMS (Short Message Service)
    • Audible alarm
    • Voice dialling

What to monitor?

  • Monitor at least(基本版)
    • Temperature / Humidity in various zones
    • UPS status
    • Water leakage
    • Fire suppression
    • Air conditioning
    • Standby generator set
  • Nice to have(進階版)
    • Breaker / PDU level monitoring
    • Rack door open / close
    • Power, temperature, humidity inside the rack
    • Pressure / airflow

【數據中心設計】CDCP 學習筆記 - 數據中心 (Part 10) - Data Centre Design - Water Supply & Fire Protection

Water supply in a Data Centre

  • Fire suppression
  • Sanitary facilities
  • Makeup water for cooling
  • Cleaning
  • General facilities maintenance (gardens etc.)

Backup water supply: Water storage tanks

  • for the fire suppression installation
  • for makeup water for cooling towers
  • for chilled water backup (buffer storage)
  • for off-peak chilled water storage (thermal storage)

Backup water supply: Well water / Retention pond

Well water

  • Cleanliness
  • Supply capability
  • Drilling could be very expensive

Retention pond

  • Fixed quantity
  • Large area required
  • Water temperature varies
  • Water quality checks on a regular basis

Fire protection and safety

  • originate from electrical sources
    • Equipment (overheating, zinc whiskers or dead shorts)
    • Electrical distribution (wiring, loose connections, sparks)
    • Light fixtures
  • Bad connections, overloading and dust are contributing factors
  • During various data centre audits it is proven that a high percentage of data centres have (potential) issues with fire protection

Requirements for Data Centre fire suppression

  • Detect as early as possible
  • Safe for humans (as much as possible)
  • Environmentally friendly
  • Effective for fires in the data centre and supporting facilities
  • Do not, or minimize, damage to sensitive equipment
  • Comply with national and building code


  • NFPA 75 / NFPA 72
  • NFPA 2001 / ISO 14520
  • Local codes

Detection systems

  • VESDA (Very Early Smoke Detection Apparatus)
  • HSSD (Highly Sensitive Smoke Detection)
    • Works via air sampling
    • 1000 times more sensitive than standard smoke detectors
    • Care must be taken, especially during building works such as hacking, drilling, etc.

Smoke detectors for fire panels
  • Ionization detectors(較Photoelectric detector更靈敏)
    • Uses low radiation (harmless) material (americium-241)
    • When smoke enters the detector the ions create an electrical path setting alarm condition
  • Photoelectric detector
    • Uses a light source (i.e. LED etc.) and photocell
    • When smoke enters, deflection of light will activate photocell triggering an alarm
  • Sprinklers act as detection and activation
    • Slow, typically activates at 70 degrees or higher
    • accidental damage will result in immediate water problems
  • Concealed sprinkler systems
    • More attractive finish / Less risk of damage
    • The cover plate drops off at 57 Celsius and the deflector drops down
    • Sprinkler head activates at 70 Celsius or higher

Fire suppression system

Wet / Dry sprinkler system 考慮因素

  • Preferred to have dry pipe principle (pre-action)
  • Not harmful to humans
  • Environmentally friendly
  • Effective
  • Leaves water damage —— Drainage piping
  • Slow response
  • Widely used in data centres as a secondary system
  • Compulsory under most building codes for high-rise buildings

Fire suppression systems: 

  • Halon 1301 —— banned in most countries, depletes the ozone layer
  • Carbon Dioxide —— lowest-priced, not allowed in most countries in occupied areas
  • FM200 —— widely used in data centres, harmful gas will develop when FM200 is burned
  • Novec —— cost-effective, clear gas, environmentally safe, cylinders need to be close to the hazard area, containers can be refilled on site
  • Inergen —— more appropriate for the larger computer rooms, 10 times more space compared to FM200
  • Argonite —— reduces oxygen levels down to approx. 12.5%, loud noise generated
  • FE13 —— allows high nozzles (up to 8 metres, other 4 metres)
  • Pyrogen —— 氣溶膠滅火劑, no piping and no pressure cylinders required

Best practices for main fire suppression

  • Install VESDA/HSSD type of system
  • Use any of the gas-based systems as the primary fire suppression system
  • Use a pre-action sprinkler as a secondary system
  • Ensure that the room is properly sealed
  • Ensure that gas content is enough to achieve concentration levels required
  • Create extraction vents
  • Proper maintenance




Handheld extinguishers —— Class C (for Data centres)

  • Fires involving energized electrical equipment such as appliances of all kinds, motors, computers etc.
  • Extinguishers contain carbon dioxide, Halon, dry chemical or liquid extinguishing agent
  • Note: The indicator of classes can vary per country, class C can be indicated by class E

Signage and safety

  • Exit / Emergency signs to be located at: (可以跟BS 5266設計)
    • Every escape door
    • In pathways leading to doors (arrows)
    • Must be visible from all areas in the hazard area
  • Evacuate area immediately
  • Alarm bell
  • Strobe light
  • Gas release abort - (Telephone / Intercom)
  • Gas manual release

Regulatory requirements / Best practice

  • EPO (Emergency Power Off)(防止火勢擴散)
  • Auto unlock doors - Doors should use the swing-out principle (code permitted)
  • Escape routes should be clear and within the distance required by law
  • (Automatic) the shutdown of air conditioners - Country dependent, check regulation
  • Integration with existing building fire panel
  • Gas manual release and gas abort buttons
  • Fulfil local fire-codes