2022/05/20

【數據中心設計】CDCP 學習筆記 - 數據中心 (Part 13) - Data Centre Design - Operational Considerations

Service Level Management

  • The organization should maintain a service catalogue
  • The service catalogue is a key component of service delivery and should describe:
    • Unique identifier of the service
    • Description of the service should include:
      • Description of the service to be delivered
      • Service hours and exceptions
      • Availability requirements
    • Support:
      • Response times
      • Escalation and contact points
      • Cost

數據中心的運營是數據中心周期的一個重要部分。服務水平管理,是數據中心運營的一個重要方面。這要求運營者能夠關注客戶的需求和要求。由於服務通常是商業性質的,客戶了解如何支持和反饋服務也很重要。

  • An SLA (Service Level Agreement) is a legal document which provides a mechanism where the cost of non-conformity will be accounted for
  • An SLA should describe the service commitments at an appropriate level of detail
    • Whatis to be provided
    • What constitutes a violation of the SLA
  • An SLA should be maintained by regular reviews
  • Service levels should be monitored and reported against documented targets
  • Non-conformance should be reported, reviewed, and where appropriate, escalated

服務水平協議(SLA)是一個總體性文件,一般描述服務目錄中的數據中心和IT服務的服務承諾。


Safety 安全措施

  • The organization should have an occupational health and safety policy
  • Policies, plans and procedures should be in place addressing emergency preparedness and response
    • Plan, act, evaluate, take corrective actions
  • Safety staff should be appointed which should have clearly defined roles, responsibilities and authorization levels, considering the following functions:
    • Risk manager
    • Safety manager
    • First Aid officer
    • Emergency warden
  • The organization should conduct (regular) safety awareness training for all staff
  • To protect individuals from exposure to workplace hazards and the risk of injury, staff should be familiar with the usage of Personal Protective Equipment (PPE), such as:
    • Ear protection
    • Safety glasses
    • Hard hat
    • Protection gloves/shoes
    • Insulated tools

安全是最重要的,因此,數據中心運營者應該有一個職業健康和安全政策。應制定政策、計劃和程序來處理應急事件。作為安全的一部分,應該任命專職的安全人員。安全是每個人的責任,安全意識培訓不是一次性的,而是應該定期進行的。另外,強烈建議對首次來到現場的承建商進行安全介紹。除了一般的安全做法,工作人員還需要接受具體的安全事項培訓,包括個人防護等設備的使用。


Security 安保措施

  • To manage entry control of individuals, a security matrix should be established using the following categories:
    • Organization staff
    • Contractors
    • Vendors/suppliers
    • Customers
    • Visitors
  • To manage entry control of incoming and outgoing goods, inspections (holding area) should take place for potential security risks as well as hazards
  • Goods(equipment), parcels,letters, etc.
  • Where applicable security controls need to be monitored
  • Security patrol may be required for data centres where a higher level of security is required
  • All individuals working at the data centre should attend a security awareness training
  • The security awareness program should include the following:
    • Overall security policies
    • Specific security requirements of the department
    • Behavioural considerations
    • Security incident reporting structure

實施訪客進出控制,首先必須分類區域,確定可以進入區域的角色。下一步是確定角色,對不同角色進行分組,確定進出權限。通常情況下,具備監測出入和警報監測、CCTV監測。對於需要更高安全級別的大型數據中心設施,可能需要進行安全巡邏,由保安人員完成,接受適當的培訓。所有在數據中心工作的人都應參加職能安保意識培訓。此外,在現場工作的承建商和供應商應接受基本的安保意識簡報,以充分了解數據中心的安保規則。


Facilities Maintenance 設備維護

  • Several types of maintenance activities may take place:
    • Preventive / predictive / reactive (corrective)
  • The organization should have appropriate maintenance agreements in place and should cover the following, not limited to:
    • Legal entity name
    • Start and end date
    • Description services provided
    • Qualifications and experience of personnel allocated
    • Commercial terms
    • Names and signatures by authorized officers

設備維護包括建築結構及其周圍環境的維護,數據中心的基礎設施,如機械、電氣和管道的維護也是其中重要一部分。

  • A maintenance schedule should be created and maintained
  • The schedule should be published on a need to know basis
  • The organization should keep track of scheduled events and the actual date and time of execution
  • Maintenance includes:
    • Equipment
    • Cleaning
    • Labeling
    • Documentation
    • Etc.

  預防性維護 —— 即使在可能不需要的情況下,也要進行常規的維護。

  預測性維護 —— 實時維護,基於監測設備性能和狀態。

  反應性(或糾正性)維護 —— 發生故障,需要解除故障。

維護計劃需要跟相關部門協調。同時應該符合當地的法規,包括定期的強制性的檢測(如排放、污染、電力、消防設施的定期維護等)。


Governance - documentation

  • The organization should ensure that the data centre establishes a fully functional document management system, addressing the following steps:
    • Creation
    • Classification
    • Approval
      • Creator / modifier / reviewer / approver
    • Publishing
      • Online (digital) / hard copy (paper)
    • Maintenance
    • Archiving
    • Destruction

管理者應確保數據中心具備一個功能齊全的文件管理系統,以滿足法律、法規、監管、商業和運營要求。數據中心會產生大量的文件,要確定文件類別。


Governance - vendor management

  • Vendors should be selected and managed in a controlled fashion, considering the following activities:
    • Service requirements analysis
      • Technical / financial / commercial / legal
    • Request for Proposal (RFP)
    • Contract management
    • Vendor management
    • Performance (SLA) reviews
    • Retirement

供應商在實現數據中心對其客戶的服務承諾方面往往發揮著關鍵作用。因此,確保以可控的方式管理供應商是很重要的。第一步是服務需求分析階段,以確定對供應商的需求。下一步,將確定供應商。一旦收到供應商的競標,評估過程就開始了,如果一切順利的話,最後就是協議的簽署。

2022/05/19

【數據中心設計】CDCP 學習筆記 - 數據中心 (Part 12-2) - Data Centre Design - Designing a Scalable Network Infrastructure

Cabling recommendation TIA-942-B (2017)

  • Category 6, 6A, or 8Category 6A or higher is recommended)
  • OM3, OM4, or OM5OM4 or OM5 is recommended)
  • Add MPO-16, MPO-24 and MPO-32 as options for termination of more than 2 fibres in addition to the MPO-12 connector
  • Add 75-ohm coaxial cables and connectors in ANSI/TIA-568.4-D


Testing and verifying structured cabling

  • Get 3rd party compliance proof such as ULZ ETS before confirmation of using a brand and new technologies, e.g. Cat 6A, Cat8, OM4 and OM5
  • 100% testing and 3rd party testing to verify the installation
  • Copper cable parameters to test
    • NEXT / PS-NEXT / ELFEXT/ PS-ELFEXT
    • Return loss / propagation delay / delay skew
    • ANEXT (Cat6A only, seldom test on-site)
  • Fibre testing
    • Return loss / insertion loss / link testing
    • Laser bandwidth DMD/ EMB not verifiable on site


Storage Area Networks (SAN)

  • Allows for fast, flexible and redundant network-wide data storage
  • SAN requires high-quality optic networks
    • FC-AL (Fiber Channel Arbitrated Loop)
      • Hub connects servers to storage
    • Switch fabric
      • High speed, low latency switches, preferred for enterprise-class networks
  • Most SANs use OM3/4 multimode fiber
  • Switch fabric needs lots of network points
    • Plan capacity - use structured cabling
    • Avoid point to point cabling clogging up the underfloor as it may result in poor cooling and a high risk of failure


Network redundancy

  • Network diversity
  • Network redundancy
  • Redundancy on the backbone
  • Redundancy in the data centre
    • Redundant network equipment in sub-racks
    • Redundant cable paths
    • Separate office building management networks
  • Ensure separate physical routes

如今,網絡的冗余是較易實現的,但要留意要做到物理上的冗余還是邏輯上的冗余。在某些數據中心,物理上的單點故障仍然存在,另外,即使在完全物理冗余的環境中,網絡也需要配置適當後備路由協議方案、後備虛擬IP地址等。


Building to building connectivity (1) - Telco 電訊公司

  • Determine capacity needed
  • Evaluate local telco capability
    • Connectivity (speed/bandwidth)
    • Uptime guarantees
    • Service
    • Pricing
  • Budget appropriately
    • Monthly subscription and/or usage fees
  • Commonly an expensive option
  • But if off-site is typically the only option for hardwired connections

第一個選擇是當地的電訊供應商(安全性存疑)。告知供應商網絡連接的地點、所需的連接速度、預期正常運行時間以及對服務水平潛在要求。通常情況下,電訊供應商會收取一次性的安裝費,之後按月收費,所以會長期交付一定的重覆的費用。


Building to building connectivity (2) - Hardwire 專屬線纜

  • Copper/fibre cabling from building to building
  • One-time only investment
  • Time-consuming to install
  • Commonly an expensive option
  • What if you relocate

另一個選擇是在多個建築物之間自行鋪設專屬線纜。由於公共環境的限制,通常不允許在公共土地鋪設私有線纜,常見擁有較多土地的使用者,例如,大學校園等等。


Building to building connectivity (3) - Canopy 無線覆蓋

  • Direct connection between points or as AP (access point)
  • Range
    • > 190 km as AP
    • > 249 km as Point to Point
  • Very good transmission speed
    • Approx. 400 Mbps (PMP 450) with multi-point
    • Approx. 500 Mbps (PTP820) point-to-point
  • Security features (encryption)
  • Reasonably prone to EMF / EMP
  • One-time investment

無線覆蓋系統可以跨越長距離。能提供良好的傳輸,也容易出現安全問題。長距離傳輸,一次性投資,需要定期維護。


Building to building connectivity (4) - FSO 無線激光

  • Free Space Optics (FSO)
  • Direct connection between points
    • Requires line of sight
  • Advantages
    • Many vendors
    • High-bandwidth
      • 1 Gbps - 30 Gb/s
    • Good reach
      • 1.5 km - 4.4 km
    • Protocol independent
    • Quality transmission
    • Highly secure
    • License-free worldwide
    • One time investment

以激光為載體,兩個設備相互瞄準來傳輸網絡數據。存在的限制是這兩個設備間必須無遮擋的點對點連接。如果中間有建築物等障礙物,將無法傳輸。由於點對點連接,因此較為安全。如果一旦偵測到有人試圖攔截信號,連接將自動中斷。這類設備一般是一次性的投資,但在一些國家,需要每年支付一定的費用。


Network monitoring system required capabilities

  • Flexible and versatile
  • Multi-vendor support
  • Supporting your network technologies
    • ATM / Frame Relay / MPLS VPN & TE etc.
  • Supports SNMP (Simple Network Management Protocol)
    • V1 & V2
    • SNMP V3 for secure networks
  • Support for RMON
  • Automated root cause analysis
  • Notification capabilities
  • Reporting capabilities
    • Offline / Online

建議數據中心運營商購買一個較靈活多變的分析系統,並且支持較多廠商。分析系統一般都會支持SNMP V1和V2,但對於允許安全監控的V3標準並不總是支持。支持RMON技術,RMON可以詳細分析網絡連接,具備流量整形等功能。RMON能自動分析所有的硬件連接,查找問題的緣由。定期通知工作/管理人員,使其能夠持續地了解網絡的當前狀態。系統可以按照需要生成離線和實時報告。


2022/05/18

【數據中心設計】CDCP 學習筆記 - 數據中心 (Part 12-1) - Data Centre Design - Designing a Scalable Network Infrastructure

Structured cabling system

  • Network cabling is the foundation to support a high-availability data centre, IT equipment and its applications 
  • Proven products and contractors are crucial for the proper design, installations works and maintenance of a cabling infrastructure; it will reduce downtime and improve
    • Operational efficiency
    • Manageability
    • Reliability
    • Availability

ICT網絡是數據中心設施建設的最終目的,而network cabling system是所有ICT設備的基礎。其產品質量和設計好壞直接決定了數據中心的可用性。

數據中心使用的產品和新技術最好定期進行跟蹤和記錄。對於服務及產品承包商也應該有類似行業能力和服務質量的記錄。(白名單和黑名單)

因此,正確的設計、安裝網絡布線可以減少數據中心當機時間,提高運行和管理效率、提升數據中心的可靠性和可用性。


Consider the following in planning the cabling infrastructure of the data centre:

  • Structured cabling (更昂貴,一次到位)vs On-demand cabling(逐步增加)
  • Current requirements/ future growth
  • Which media (fibre, Cat6, Cat6A etc.)
  • Apparatus (i.e. type of patch panels etc.)
  • ANSI/TIA-942 data centre standard cabling requirements
  • Brand name preference/ global specification / global pricing


Structured cabling

  • Reduce the risk of downtime
  • Easy re-patching
  • Easy fault finding
  • Better cooling
  • Standardized length - easy stock


Copper cables

  • Unshielded / Shielded
  • Solid cables
    • Less insertion loss
    • Best used for permanent links
    • Length maximum 90 metres
  • Flexible / Stranded cables (patch cords)
    • Flexible
    • Higher losses
    • Best used for patch panel / short distances
    • Maximum length < 10 metres
  • Total channel (Solid + Flexible) = 100m
  • Cat 5E = 100 Mb/s, 1 Gb/s
  • Cat 6 = 1 Gb/s or Cat 6A = 10 Gb/s
  • Cat 7 or 7A = 10 Gb/s (Non-RJ45)
  • Cat 8 = 40 Gb/s (2 GHz, 30 metres, shielded cabling, 2 connections channel ONLY, designed with data centre focus


Copper termination / patch panels

  • Structured cabling patch panels terminate cable links
    • Flat panel
    • Angled panel
  • Use the correct patch cables (RJ45)
    • Patch cable should always be the same or class higher than structured cable
    • Match shielded or unshielded twisted pair types
    • No on-site termination


Fibre cables

  • Not prone to EMF
  • Distance longer than copper
  • Lightweight and smaller than copper
  • 62.5/125 pm and 50/125 pm (multi-mode)
  • 8.3/125 pm (single-mode)
    • The first number represents the diameter of the core
    • The second number represents the size of the cladding
    • Both values are in microns
  • The actual diameter for a fibre 'cable' is much bigger because of protection over the cladding

光纖有很多優點,由於它們不容易受到電磁場的影響,幾乎可以安裝在任何地方,甚至電源附近。與銅電線相比,光纖材料重量輕,同時又佈置很遠的距離。

光纜的尺寸用兩個數字表示。第一個數字(在斜線之前),是內核的直徑,第二個數字(在斜線之後)代表外層的直徑。


Single-mode vs Multi-mode

  • Single-mode
    • Laser light source
    • Longer distance (several km)
    • Used in infinit band, carrier circuits, campus environment and other specialized applications
    • Standards: ITU G.652; OS1, OS2
  • Multi-mode
    • LED (100 Mb/s, 1 Gb/s), Laser light source (10 Gb/s and above)
    • More expensive than single-mode but the equipment itself is cheaper
    • Standards: ISO 11801; OM1, OM2 for speed up to 1Gbps, OM3 and OM4 for 10Gbps and up

Single-mode光纖使用激光,但缺點是通常只有一個頻率。因此,一般可以跨越超長的距離,達到幾公里。

Multi-mode光纖的纖芯直徑比Single-mode光纖大得多,通常為50-100微米。它允許更多的光源(不需要非常精確),因此,平均成本較低,可適用於短距離傳輸且高達1 Gbps的速度。

ISO 11801標準分類了OM1, OM2, OM3和OM4。OM3和OM4進一步定義了光纖的帶寬,EMB(Effective Modal bandwidth) ),單位是MHz/Km。 如下圖所示:

對於數據中心,數據傳輸的最大的平均距離大概是幾百米,所以大多使用Multi-mode光纖。Single-mode光纖在電信/電訊等長距離傳輸中有更多的應用,反而在數據中心很少使用。


Fibre terminations/patch panels

  • SC largely replaced by:
    • LC or mini LC
  • MPO now used for 40 Gb/s and higher speed links
  • Handle fibre with care, ensure connectors are always cleaned before mating
  • Don't exceed the bend radius for the cable

使用光纖連接器接駁時需要光纖表面的清潔,因此,要確保在接駁前進行清潔,以防止接駁後出現性能問題。

由於會影響到內部光線的反射角度,線纜的彎曲半徑對光纖影響很大,因此在安裝時要注意光線線纜的彎曲半徑要求。


TIA-942 network cable / Rating levels illustration

上圖中,Rated I/II/III/IV表示了布線/元件/空間對冗余的要求。
  • 在Rated I中,不需要冗余組件,比較簡單。
  • 在Rated II中,需要至少兩個獨立的路由的進線,不同路由進線至少有20米的間隔。
  • 在Rated III中,需要至少兩個獨立的主進線房,它們之間應至少相隔20米,兩間房需要處於獨立的防火分區。
  • 在Rated IV中,需要至少兩個獨立的配線房,這提供額外的冗余,但代價是路由變得異常覆雜,難於管理。。

以上網絡進線的冗余配置非常類似電力系統的Rated 1/2/3/4的配置,由主線後備,配線後備,一直到線路提供設備前的末端切換,可以類比學習。


2022/05/17

【數據中心設計】CDCP 學習筆記 - 數據中心 (Part 11) - Data Centre Design - Physical Security & Safety and Auxiliary Systems

Questions need to be addressed:

  • What are you trying to protect?
  • What are the threats?
  • What is the probability of the danger occurring?
  • What is the impact?
  • What is the risk identified?
    • Risk = impact * probability(risk assessment)
  • Does the risk identify require a control?
  • How does one design, build, implement (and subsequently during operations, monitor, review and improve) the control?


Physical security considerations

  • Perimeter protection and intrusion detection 
    • Fencing / Walis
      • Barbwire facing outwards, slanted 45 degrees
    • Security control room/guard house
    • Boom barriers
    • Concrete bollards
    • CCTV(Closed-Circuit Television) cameras
    • Security zones
    • Access control (badge) system
    • Security awareness posters


CCTV cameras

  • CCTV
    • Recording on the hard disk
    • Motion detection cameras
    • Night vision camera
    • Event/face recognition
  • Install cameras in such a way that the areas they monitor are overlapping to prevent black spots / blind areas(overlapping)
  • Cameras and recorder must be on UPS
  • Recorder to be located in a secure area
    • Avoid placement inside the computer room
  • Copy of hard disk should be stored off-site or in another remote area of the building


Entry control

  • Areas can be secured in various ways
    • Revolving doors
    • Mantraps
    • Turnstiles
    • Cages —— No impact on cooling/fire suppression system, ensure that cages are installed slab-to-slab (subfloor of area up to the ceiling).
  • Door locks
    • Key lock —— Proper key management procedures
    • Electronic locks —— Card reader / security code / biometrics (fingerprint, iris scan etc.)


Physical safety considerations

  • Signage (regulatory and additional), using indicators for:
    • Location of fire extinguishers
    • Location of first-aid kit
    • Emergency numbers and contacts
  • Escape routes at each door
  • Emergency response plans
  • Safety awareness posters
    • Cardiopulmonary Resuscitation (CPR)
    • General safety practices


Monitoring system —— Data Centre monitoring requirements

  • To have the ability to see at a glance everything is in a normal state
  • To have peace of mind that should an alarm condition occurs, the relevant personnel will be informed(24x7 informing relevant staff)
  • Have centralized monitoring capabilities that integrates with current monitoring software
  • Keep a history of alarms and trending data for analysis(provide detailed reports)


EMS/BMS

  • EMS —— Environmental Monitoring System
    • Monitors only
    • Most of the time low-level monitoring only (i.e. dry/alarm contacts)
    • Relatively in-expensive
    • Limited alarm contact inputs and limited notification capabilities
  • BMS —— Building Management System
    • Monitors and control
    • Provides High-level monitoring (i.e. full parameter monitoring)
    • Relatively expensive
    • More detailed level compared to an EMS
  • Either system fits a certain purpose
    • The purpose is that abnormalities are noticed early so that actions can be taken to avoid disasters


DCIM —— Data Centre Infrastructure Management

  • DCIM integrates information technology and facility management disciplines to centralize monitoring, management and intelligent capacity planning of a data centre's critical systems
  • A lot of variances exist with most DCIM solutions to focus on:
    • Asset management
    • Power monitoring
    • Environmental monitoring
    • Capacity planning
    • Change management

現在已經有很多供應商提供DCIM,但需要具體例子具體分析,首先要確定DCIM提供了哪些附加功能,特別是如果數據中心的自動化系統已經提供得很到位了,是否需要再增設DCIM系統需要再次衡量。


Water leak detection

  • Pad based
    • Covers certain areas only
    • Inexpensive
    • for smaller data centres only
  • Cable based
    • Placed under the raised floor along the perimeter and pipes
    • Detection only or distance monitoring
    • Need to keep cable clean
    • for large data centres


Notification system for monitoring system

  • Should be able to alert persons and groups relevant to the alarm detected
  • Should be able to have various thresholds, both severity-based and timing-based with corresponding alerts
  • Alerts to be communicated by
    • E-mail
    • SNMP(Simple Network Management Protocol)
    • SMS (Short Message Service)
    • Audible alarm
    • Voice dialling


What to monitor?

  • Monitor at least(基本版)
    • Temperature / Humidity in various zones
    • UPS status
    • Water leakage
    • Fire suppression
    • Air conditioning
    • Standby generator set
  • Nice to have(進階版)
    • Breaker / PDU level monitoring
    • Rack door open / close
    • Power, temperature, humidity inside the rack
    • Pressure / airflow


【數據中心設計】CDCP 學習筆記 - 數據中心 (Part 10) - Data Centre Design - Water Supply & Fire Protection

Water supply in a Data Centre

  • Fire suppression
  • Sanitary facilities
  • Makeup water for cooling
  • Cleaning
  • General facilities maintenance (gardens etc.)


Backup water supply: Water storage tanks

  • for the fire suppression installation
  • for makeup water for cooling towers
  • for chilled water backup (buffer storage)
  • for off-peak chilled water storage (thermal storage)


Backup water supply: Well water / Retention pond

Well water

  • Cleanliness
  • Supply capability
  • Drilling could be very expensive

Retention pond

  • Fixed quantity
  • Large area required
  • Water temperature varies
  • Water quality checks on a regular basis


Fire protection and safety

  • originate from electrical sources
    • Equipment (overheating, zinc whiskers or dead shorts)
    • Electrical distribution (wiring, loose connections, sparks)
    • Light fixtures
  • Bad connections, overloading and dust are contributing factors
  • During various data centre audits it is proven that a high percentage of data centres have (potential) issues with fire protection


Requirements for Data Centre fire suppression

  • Detect as early as possible
  • Safe for humans (as much as possible)
  • Environmentally friendly
  • Effective for fires in the data centre and supporting facilities
  • Do not, or minimize, damage to sensitive equipment
  • Comply with national and building code


Standards

  • NFPA 75 / NFPA 72
  • NFPA 2001 / ISO 14520
  • Local codes


Detection systems

  • VESDA (Very Early Smoke Detection Apparatus)
  • HSSD (Highly Sensitive Smoke Detection)
    • Works via air sampling
    • 1000 times more sensitive than standard smoke detectors
    • Care must be taken, especially during building works such as hacking, drilling, etc.


Smoke detectors for fire panels
  • Ionization detectors(較Photoelectric detector更靈敏)
    • Uses low radiation (harmless) material (americium-241)
    • When smoke enters the detector the ions create an electrical path setting alarm condition
  • Photoelectric detector
    • Uses a light source (i.e. LED etc.) and photocell
    • When smoke enters, deflection of light will activate photocell triggering an alarm
  • Sprinklers act as detection and activation
    • Slow, typically activates at 70 degrees or higher
    • accidental damage will result in immediate water problems
  • Concealed sprinkler systems
    • More attractive finish / Less risk of damage
    • The cover plate drops off at 57 Celsius and the deflector drops down
    • Sprinkler head activates at 70 Celsius or higher

Fire suppression system

Wet / Dry sprinkler system 考慮因素

  • Preferred to have dry pipe principle (pre-action)
  • Not harmful to humans
  • Environmentally friendly
  • Effective
  • Leaves water damage —— Drainage piping
  • Slow response
  • Widely used in data centres as a secondary system
  • Compulsory under most building codes for high-rise buildings


Fire suppression systems: 

  • Halon 1301 —— banned in most countries, depletes the ozone layer
  • Carbon Dioxide —— lowest-priced, not allowed in most countries in occupied areas
  • FM200 —— widely used in data centres, harmful gas will develop when FM200 is burned
  • Novec —— cost-effective, clear gas, environmentally safe, cylinders need to be close to the hazard area, containers can be refilled on site
  • Inergen —— more appropriate for the larger computer rooms, 10 times more space compared to FM200
  • Argonite —— reduces oxygen levels down to approx. 12.5%, loud noise generated
  • FE13 —— allows high nozzles (up to 8 metres, other 4 metres)
  • Pyrogen —— 氣溶膠滅火劑, no piping and no pressure cylinders required


Best practices for main fire suppression

  • Install VESDA/HSSD type of system
  • Use any of the gas-based systems as the primary fire suppression system
  • Use a pre-action sprinkler as a secondary system
  • Ensure that the room is properly sealed
  • Ensure that gas content is enough to achieve concentration levels required
  • Create extraction vents
  • Proper maintenance

理想情況下,火災當然需要被盡可能快地探測到,因此快速的反應系統,如VEDSA是最合適的。氣體滅火系統作為主要系統,對數據中心的設備幾乎沒有傷害。如果法例要求,水系統可以作為後備系統來配置。

房間需要被密封,一旦滅火氣體被釋放,氣體可以被控制在房間內。需要留意的是,滅火氣體亦有可能會從電纜盤的孔中洩漏。滅火氣體的量需要仔細計算,滅火的氣體量需要一個濃度的要求,氣體量不能過少也不能太多。

另外,需要配備抽風系統,以便在排除氣體後能夠將氣體抽出房間。


Handheld extinguishers —— Class C (for Data centres)

  • Fires involving energized electrical equipment such as appliances of all kinds, motors, computers etc.
  • Extinguishers contain carbon dioxide, Halon, dry chemical or liquid extinguishing agent
  • Note: The indicator of classes can vary per country, class C can be indicated by class E


Signage and safety

  • Exit / Emergency signs to be located at: (可以跟BS 5266設計)
    • Every escape door
    • In pathways leading to doors (arrows)
    • Must be visible from all areas in the hazard area
  • Evacuate area immediately
  • Alarm bell
  • Strobe light
  • Gas release abort - (Telephone / Intercom)
  • Gas manual release


Regulatory requirements / Best practice

  • EPO (Emergency Power Off)(防止火勢擴散)
  • Auto unlock doors - Doors should use the swing-out principle (code permitted)
  • Escape routes should be clear and within the distance required by law
  • (Automatic) the shutdown of air conditioners - Country dependent, check regulation
  • Integration with existing building fire panel
  • Gas manual release and gas abort buttons
  • Fulfil local fire-codes


2022/05/15

【數據中心設計】CDCP 學習筆記 - 數據中心 (Part 9-3) - Data Centre Design - Cooling Infrastructure

Up Flow or Down Flow

  • Up Flow
    • Can be installed with or without a raised floor
    • Limited airflow guidance (need to use ducting)
  • Down Flow
    • Can 'only' be used with raised floor
    • Allows for air flow guidance through a raised floor

Wall flow
  • A new trend is a wall-flow concept
  • CRAHs are mounted in the service corridor and are blowing the air horizontal into the data centre
  • Racks are positioned directly on the slab
  • Racks can be taller and heavier
  • Return air is ducted to the service corridor
  • No need for the raised floor

在Service Corridor上安裝CRAC unit,不需要架空地板,為了避免冷熱空氣的混合,需要確保熱空氣通過管道或者通過假天花被引回Service Corridor。


架空地板冷卻方式與非高架地板冷卻方式哪個好?

  • 關於哪種方案是最好的,沒有統一的答案,取決於以下因素:
    • ICT設備的類型
    • slab-to-slab height
    • required cooling capacity
    • flexibility
  • 架空地板並不再是一種趨勢,因為未必所有建築都考慮過有架空地板的配置。


Cooling Concepts (1) —— with Raised Floor

  • Hot- and Cold-Aisle setup
    • Racks are placed Front-to-Front and Back-to-Back
    • Cold- and Hot-Air areas are separated
    • Some hot air will still flow into the cold air areas

  • A suspended ceiling can aid in guiding the hot air back to the air-conditioner without mixing with the cold-air areas
  • Increased efficiency


  • Air conditioners should be placed perpendicular to the hot aisle
    • Allow fast hot air to return
    • Allow more evenly equalised air pressure under the raised floor


Placement of equipment in the rack 

  • With traditional downflow cooling, the bottom of the rack will be colder than the top
    • Put high heat load approx. 30 cm above raised floor and not above half the rack height
    • Higher possible if you have enough CFM


在架空地板配置中,機架底部會有更多的冷氣供應量。因此,把最高的熱負荷的設備儘量放在底部。在機架的頂部,冷氣量愈少,因此冷卻能力較低。把較低的熱負荷的設備儘量放在較高部分。


Avoid leakage and short circuit air

  • Air leakage from the floor - Use grommets
  • Air leakage within the racks (back-wash) leads to in-efficiencies - Use blanking panels


The maximum throw of the air conditioner

  • Consider the maximum throw of your air conditioner (if using perimeter cooling)
    • Typically 12 - 18m
  • Too long rooms require placement of perimeter air conditioners on both sides



Cooling Concepts (2) —— with Non-Raised Floor

In-Row cooling


  • In-row cooling can be deployed when using a non-raised floor setup
  • Cooling close to the heat load leads to good efficiencies for airflow
  • Fewer racks per sqm inside the computer room
  • In-row cooling units are
    • Direct expansion
    • Chilled water
    • Fluid cooled


Overhead duct

  • Overhead, ducted, cooling dumps cold air directly in front of the racks and extract the hot air from the back
  • Ducts have often louvres/vents to regulate CFM/CMH
  • Ducts must be well designed
    • Ensure enough air volume can be dumped and extracted at the right locations
    • Air conditioner redundancy must be taken into account
  • Do not paint the ducts as paint might splinter over time causing particulates to contaminate the room
  • Inspect and clean ducts on a regular basis


Cooling Concepts (3) —— Supplemental Cooling

High-Density Cooling: Floor mount



  • Collects cool air from underfloor
  • Increases CFMto the rack
  • Flexible, Movable
  • Neighbouring racks could have a potential cooling impact


High-Density Cooling: Hot air fans

  • Collect hot air at the point of generation
  • Route directly to CRAC
  • Snap-on retrofit
  • Variable speed: as needed
  • Flexible, movable
  • Only assist with removal of heat
  • Does not assist in increasing the cold air supply


High-Density Cooling: Overhead supplement air

  • Traditional CRAC / HVAC downflow/throw units with raised floor principle need to be extensively extended
  • Ducted return to air conditioner direct from the rack
  • Top flow rack units in addition to raised floor cooling
  • Limit 10-25 kW


High-Density Cooling: Self-contained racks

  • Fully ducted supply and return, local rack cooler
  • Collect equipment exhaust air
  • Provide cooling coils in the rack
  • Specialized rack
  • Heat removal by water or refrigerant
  • Flexible / movable
  • Fire suppression considerations
  • Limit: 18-35 kW per rack
  • Cooling principles: water/dielectric/Liquid CO2/Refrigerant


Cold Aisle vs Hot Aisle Containment?



Cold Aisle Containment

  • The cold aisle area is contained
  • Cold air only goes where it needs to be, being the air-intake of the equipment
  • Air volume required for a cold aisle can be calculated by reviewing the CFM requirements of all the equipment inside the contained area
  • Typically recommended when containment is applied to the entire room
  • It is required that proper redundancy measures are taken as very little buffer air might be available in case of an air conditioner failure
  • Slight overpressure needs to be created
    • Ideally, air pressure is measured in the cold aisle area and connected to air conditioners to regulate the CFM/ CMH output
  • Be aware that CFM / CMH is variable in most of today's equipment


Hot Aisle Containment

  • The hot aisle area is contained
  • Hot air is separated from the cold air and is guided directly back to the air conditioner or outside of the building
  • Cold air is flowing throughout the entire computer room
  • Typically recommended when only a small area needs to be contained
  • Cold air is going through the entire room 
    • Cold air should only go where it is required being the intake of the equipment
  • The whole room is now acting as a buffer area
  • Works well with in-row cooling
  • Hot aisle areas can sometimes be very hot and noisy 
    • Review local regulations for personnel working in such environments


【數據中心設計】CDCP 學習筆記 - 數據中心 (Part 9-2) - Data Centre Design - Cooling Infrastructure

製冷的種類/方式 Type of Air Conditioning: 

Air-cooled, Self-contained

  • Based on refrigerant, condenser & compressor located in the outdoor unit
  • Pros:
    • Low cost
    • Low installation cost
    • Many choices are available
    • Easy to maintain
  • Cons:
    • Needs a heat exhaust pipe
    • Low sensible cooling capacity
    • Classified as a Comfort air conditioner
    • High operating cost
    • Not recommended for data centres

有很好的調節能力,不屬於精密空調,有溫度和濕度控制波動的太大,並不適用於數據中心。


Air-cooled, Split system (DX)

  • Based on refrigerant, the condenser outside
  • Often called 'DX' (Direct Expansion)
  • Pros:
    • 'Low' purchase cost
    • Many choices available
    • Easy to maintain
    • Easy to expand Air-conditioner
  • Cons:
    • Limitation to the length of pipe run and height difference between indoor and outdoor unit
    • Restrictions in certain buildings/countries
    • Each CRAC must have its own condenser

通常被稱為CRAC Unit,是基於refrigerant的。evaporator在數據中心內,連接管道到建築物外的condenser unit。

某些建築法規不允許將condenser unit放在建築物外。另一個問題是管道的長度的限制。

設備較易擴展,可以通過增加室內機和室外機來擴容。


Fluid-cooled (Water-Glycol)

  • Based on the fluid, a complete refrigerant circuit in the CRAC, dry cooler outside
  • Pros:
    • Longer pipe runs are possible
    • Multiple CRACs can use the same dry cooler
    • Redundancy must be well planned
    • Built-in free cooling option
  • Cons:
    • Not easy to maintain (volume and quality of Glycol)
    • Higher cost than air-cooled
    • Not allowed in specific buildings
      • Glycol is only used in cold climates to avoid freezing

另一個系統是mixture of water with (ethylene) glycol,類似汽車中使用的防凍劑。

由於使用泵,因此pipe可以更長。同時,意味著有更多的部件需要維護,並更有可能發生故障。一個condensor unit可以為數據中心內的多個CRAC供電,如果空間有限的話,這會是一個優勢。但一個condensor unit連接多個CRAC供電會引發單點故障,需要考慮冗余方案。

總的來說,從維護的角度來看,這種解決方案要覆雜一些,而且總體運行成本較高。


Chilled water

  • Based on water, the central refrigerant circuit is within the chiller
  • Pros:
    • Longer pipe runs are possible
    • No refrigerant in the data centre
    • Simple air-handling units in the data centre
    • Big free cooling potential in cold areas
    • Multiple CRAHsusing the same chiller plants
  • Cons:
    • The very high initial cost
    • Risk of chiller failure (redundancy)
    • Typically only for large computer rooms

以chilled water為基礎的冷卻系統效率很高,更適用於對於cooling capacity要求很大的數據中心。因此,chilled water系統通常只對較大的機房設置可行。

  • outdoors (air-cooled) or indoors (water-cooled)
  • a central chiller plant and linked to the CRAHs
  • Only water/glycol is used as a cooling medium in the piping system
  • High energy efficiency due to hybrid-mode and free cooling
  • System design must be well planned:
    • Acoustic
    • Central hydraulic pipework
    • Buffer tanks
    • Pumps
    • Redundancy etc.

Chiller Plant主要分為風冷和水冷。Central Chiller Plant可以為整個建築提供冷量。冗余要考慮電氣系統的後備以及chiller pipe的後備,以確保該設施可同時維護。


Direct/indirect Air Handler

  • located outdoor, using air cool
  • mounted on the side / on the top of the building
  • supplying outdoor air directly to the data centre
  • Indirect air handlers
    • re-cooling the air over an air-to-air heat exchanger
    • pre-cooling of the outside air with an adiabatic system is possible
  • Direct air handlers 
    • appropriate filtering of the outdoor air needs to take place to avoid potential dust and gaseous contamination. 
  • Pros:
    • Energy efficient in cold / medium regions
    • lowest possible Power Usage Effectiveness (PUE)
    • The unit is located outside, with more white space
    • Downsizing of the electrical infrastructure
    • Fast return on investment
  • Cons:
    • High water usage (can be limited by the authorities)
    • Direct air handlers can pollute the data centre
    • Risk of legionella
    • A redundant water source is needed