Data is a collection of raw facts from which conclusions can be drawn.
Letters, photographs, movies, word documents, etc. are all examples of data.
What are the two catagories of data?
Structured and Unstructured
Describe Structured Data
Structured data is organized in rows and columns in a rigidly defined format so that applications can retrieve and process it efficiently.
Describe Unstructured Data
Data is unstructured if its elements cannot be stored in rows and columns, and is therefore difficult to query and retrieve by business applications.
Examples of unstructured data are images, PDFs, documents, audio / video, email attachments, x-rays, etc.
Information is the intelligence and knowledge derived from data.
Examples of intelligence could be the buying habits of customers and the health histories of patients.
What is the value of information to a business?
Identifying new business opportunities.
Identifying patterns that lead to changes in existing business.
Creating a competitive advantage.
How is the type of storage to be used determined?
The type of storage used is based on the type of data and the rate at which it is created and used.
Redundant Array of Independent Disks. RAID is used in all storage architectures such as DAS, SAN and so on.
Direct Attached Storage. Connects directly to the server (host) or a group of servers in a cluster. Storage can either be internal or external to the server.External DAS alleviated the challenges of limited internal storage capacity.
Storage Area Network. This is a dedicated, high performance Fibre Channel (FC) network to facilitate Block Level communication between servers and storage. Storage
is partitioned and assigned to a server for accessing its data.
What are the benefits of SAN?
SAN offers scalability, availability, performance and cost benefits compared to DAS.
Network Attached Storage. Dedicated storage for File Serving applications. Connects to an existing communication network (LAN) and provided file access to
What are the benefits of NAS?
NAS offers higher availability, scalability, performance and cost benefits compared to general purpose file servers.
What is IP SAN?
Internet Protocol Storage Area Network. One of the latest evolutions in storage architecture. IP SAN is a convergence of technologies used in SAN and NAS. It
provides Block Level communication across a LAN or WAN resulting in greater consolidation and availability of data.
What are the five core elements of Data Center Infrastructure?
Application / User Interface
Database (More commonly referred to as a Database Management System)
Server and Operating System
What are the seven key requirements for data center elements?
What are the four activities within the Information Life Cycle Management Process?
Managing the Environment
Organizing Storage Resources
**Classifying Data is the most difficult activity in the process**
What are the benefits of implementing Information Life Cycle Management?
Simplified Backup and Recovery
Lower Cost of Total Ownership
What are the three most basic components of a storage system environment?
What are the physical components of a host?
Input / Output (I/O) Device
What are the three methods of communication between I/O devices and the host?
User to Host (Keyboard, Mouse, etc.)
Host to Host (via Network Interface Card)
Host to Storage Device (via Host Bus Adapter)
What are the logical components of a host?
**Note: Host Bus Adaptors interface on the back end**
What are the logical components of a host?
What are the two application data access classifications?
Block Level (Data stored and retrieved in Blocks specifying the LBA)
File Level (Data stored and retrieved by specifying the name and path of the files)
A defined Format for communication between sending and receiving devices.
What are the three major communication protocols for system components?
Tightly Connected Entities
Directly Attached Entities
Network Connected Entities
Give three storage media options
What are the key components of a disk drive?
Read / Write Head
Actuator Arm Assembly
**All of these items are housed in the Head Disk Assembly**
What are the two ways of accessing data on a platter?
Cylender, Head, Sector (CHS)
Logical Block Addressing (LBA)
What are the things that affect disk drive performance?
Disk Service Type
What are the components that comprise service time?
Data Transfer Rate
What are the three seek time specifications?
Track to Track
Define 'Little's Law'
It is the relationship between the number of requests in a queue and the response time.
N=a x R
N = Total number of requests in the system
a = The arrival rate
R = Average response time
What does RAID provide?
What are the components of a RAID Array?
What are the common RAID Levels?
Describe RAID 0
A striped array with no fault tolerance.
Describe RAID 1
Describe Nested RAID
Combines the benefits of multiple RAID configurations.
0+1: Striping & Mirroring. Commonly Called a Mirrored Stripe. The process of striping across HDDs is performed then the entire stripe is mirrored.
1+0: Mirroring & Striping. Refferred to as a Striped Mirror. The incoming data is first mirrored and then both copies of data are striped across multiple HDDs.
Describe RAID Parity
Parity is a method of protecting striped data from HDD failure without the cost of mirroring.
An additional HDD is added to the strip width to hold parity.
Parity is a mathematical construct that allows re-creation of the missing data.
It is a redundancy check that ensures full protection of data without maintaining a full set of duplicate data.
Describe RAID 3
Stripes data for high performance and uses parity for improved fault tolerance.
Parity information is stored on a dedicated disk drive so that data can be re-constructed it a drive fails.
ALWAYS reads and writes complete stripes of data across all disks.
Provides good bandwidth for the transfer of large volumes of data.
Used in applications that involve large amounts of sequential data such as video streaming.
Describe RAID 4
Stripes data for high performance.
Uses parity for improved fault tolerance.
Unlike RAID 3, disks in RAID 4 can be accessed independently so that specific data elements can be read or written on a single disk without read or write of the entire stripe.
Describe RAID 5
Drive (strips) are independently accessible
Parity is distributed across all disks
Preferred for messaging, data mining, medium performance media serving and Relational Database Management System (RDBMS) implementations in which Database administrators (DBAs) optimize data access.
Describe RAID 6
Distributes parity across all disks
Can survive two disk failures
Rebuild operation may take longer due to the presence of two parity sets.
What is a 'Hot Spare'?
Refers to a spare HDD in a RAID array that temporarily replaces a failed HDD of a RAID set.
What is EMC^2's Best practice concerning Hot Spares?
For every two Disk Array Enclosures (DAE) one Hot Spare will be used.
What is an intelligent Storage System?
RAID Arrays that are:
Highly optimized for I/O processing
Hove large amounts of cache for improving I/O performance
Have operating environments that provide:
Intelligence for managing cache
Array resource allocation
Connectivity for heterogeneous hosts
Advanced array based local and remote replication options
What are the benefits of an intelligent storage system?
easier data management
improved data availability & protection
Enhanced business continuity & support
Improved security and access control
What are the components of an intelligent storage system?
What is the function of the 'Front End' in an intelligent storage system?
The front end provides the interface between the storage system and the host. It consists of two components:
Front End Ports
Front End Controllers
What is the function of a front end port?
The front end ports enable hosts to connect to the intelligent storage system.
Each front end port has processing logic that executes the appropriate transport protocol, such as SCSI, FC or iSCSI for storage connections.
What is the function of a front end controller?
The front end controllers route data to and from cache via the internal data bus. When cache receives write data, the controller sends and acknowledgement message back to the host.Controllers optimize I/O processing by using command queuing algorithms.
Describe command queuing
Command queuing is a technique implemented on front end controllers. It determines the execution order of received commands and can reduce unnecessary drive head movements and improve disk head movements and improve disk performance.
What are the most commonly used command queuing algorithms?
First in First Out (FIFO): Default algorithm where commands are executed in the order in which they are received.
Seek Time Optimization: Commands executed based on optimizing read /write head movements which may result in reordering of commands.
Access Time Optimization: Commands are executed based on the combination of seek time optimization and an analysis of rotational latency for optimal performance.
Cache is semiconductor memory where data is placed temporarily to reduce the time required to service I/O requests from the host.
Describe the ways that cache is implemented in write operations
Write Through: Data is placed into the cache and immediately written to disk, and an acknowledgement is sent to the host.
Write Back: Data is placed in the cache and an acknowledgement is sent to the host immediately.
What is a Read Cache Hit?
If the requested data is found in the cache it is called a read cache hit or a read hit and the data is sent to the host without any disk operation.
What is a Cache Miss?
If the requested data is not found in the cache, it is called a cache miss and the data must be read from disk.
Describe two cache management algorithms implemented by intelligent storage systems to proactively maintain a free set of pages.
Least Recently Used (LRU): An algorithm that continuously monitors data access in cache and identifies the cache pages that have not been accessed for a long time.
Most Recently Used (MRU): An algorithm that is the converse of LRU. In
MRU the pages that have been accessed most recently are freed up or
marked for reuse.
Describe 'Watermarking' in cache management
Flushing is the process of commuting data from the cache to the disk. On the basis of the I/O access rate and pattern, high and low levels called Watermarks are set in cache to manage the flushing process. This process provides headroom in the write cache for improved performance. There are three watermarks:
- High Water Mark
- Low Water mark
Describe 'Idle Flushing'
Idle Flushing occurs continuously, at a modest rate, when the cache utilization level is between the high and low watermark.
Describe 'High Watermark Flushing'
Activated when cache utilization hits the high watermark. The system dedicates some additional resources to flushing. This type of flushing has minimal impact on host I/O processing.
Describe 'Forced Flushing'
Forced Flushing occurs in the event of a large I/O burst when the cache
reaches 100% of its capacity, which significantly affects the I/O
response time. In Forced Flushing, dirty pages are forcibly flushed to disk.
Describe two methods of Cache Data Protection
The risk of losing data held in the cache can be mitigated by:
Cache mirroring: Each write to cache is held in two different memory locations on two independent memory cards.
Cache vaulting: A set of physical disks called vault drives are used to dump the contents of the the cache in the event of a power failure.
In an intelligent storage system, what is the 'back end'?
The back end provides the interface between the cache and physical disks. From the cache data is sent to the back end and then routed to the destination disk. the back end consists of two components:
Back End Ports:
Back End Controllers: Communicates with the disks when performing reads and writes and also provides additional, but limited temporary data storage.
What is a LUN?
Physical drives or groups of RAID Protected drives can be logically split into volumes known as Logical Unit Numbers (LUN). The use of LUNs improves disk utilization by only allocating the portion of disk space needed by the host thereby leaving the remainder of disk space to be allocated to other hosts.
What is LUN Masking?
LUN Masking is an access control mechanism that provides data access control by defining which LUNs a host can access.
LUN masking is typically implimented at the front end controller.
LUN Masking ensures that volume access by servers is controlled appropriately, preventing unauthorized or accidental use in a distributed environment.
Usually implimented on staorage arrays.
Describe the capabilities of a high end storage array?
**Also referred to as Active - Active Arrays**
Large storage capacity
- Huge cache to service host I/Os
- Fault tolerance architecture
- Multiple front end ports and support to interface protocols - High scalability ability to handle large amounts of concurrent I/Os
**Symmetrix is an example of a high end storage system**
- Designed for large enterprises
Describe the capabilities of a Midrange storage array
**Also referred to as Active - Passive Arrays**
Host can perform I/Os to LUNs only through active paths
- Other paths remain passive until active path fails
- Have two controllers, each with cache, RAID controllers and disk drive interfaces
- Designed for small and medium enterprises
- Less scalable than a high end array
**CLARiiON is an example**
Describe the characteristics of the CLARiiON CX-4
Support for Ultraflex technology
Scalable up to 960 disks
Supports flash drives
Supports RAID 0,1, 1+0, 3, 5, 6
Supports up to 16GB of cache per controller (2 controllers = 32GB total)
Supports storage based local and remote data replication via SnapView (Local) and MirrorView(Remote)
CLARiiON Messaging Interface (CMI)
Stanby power supply
FLARE Storage Operating Environment
Describe the characteristics of the Symmetrix DMX-4
Incrementally scalable to 2,400 disks
Dynamic global cache memory (16GB - 512GB)
Advanced processing power
High data processing bandwidth (up to 128 GB/s)
Supports RAID 1, 1+0 (AKA 10 for mainframe), 5, 6
Storage based local and remote replication through TimeFinder (Local) and SRDF (Remote)
Utilizes Direct Matrix Architecture
Each memory director connects to each front end director
Uses the Enginuity OS
Describe the characteristics of the Symmetrix VMAX Series
96 to 2,400 drives up to 2 PB (3x more usable capacity)
One to eight VMAX engines
Upt to 1TB global memory
Twice the host ports (FC, iSCSI, Gb Ethernet, FICON) up to 128 ports
8Gb/s FC, FICON and FC SRDF
Twice the back end connections for flash
Quad core 2.3GHz processors to provide more than twice the IOPS
What is DAS?
Direct Attached Storage is an architecture where storage connects directly to servers. Uses Block Level protocol for access.Internal HDD and tape libraries are examples of DAS
***Can be internal or external***
Describe Internal DAS
Internal DAS is internally connected to the host by a serial or parallel bus.
The physical bus has distance limitations and can only be sustained over short distances for high speed connectivity.Most internal buses can only support a limited number of devices
Describe External DAS
In External DAS Architectures, the server connects directly to the external storage device. In most cases, communication between the host and the storage device takes place over SCSI or FC protocol.External DAS overcomes distance and device count limitations of Internal DAS
What are the benefits of DAS?
Ideal for data provisioning
Quick deployment for small environments
Simple to deploy
Low capital expense
What are the four DAS connectivity options?
ATA and SATA
Buss and Tag (primarily for external mainframe)
What are the two types of DAS Management?
Internal: Host provides disk partitioning and file system layout.
External: Array based management, lower TCO for managing data and storage infrastructure.
What are some of the challenges of DAS?
Scalability is limited
Number of connectivity ports to hosts
number of addressable disks
Downtime is required for maintenance with internal DAS
Limited ability to share resources
Array front end port, storage space
resulting in islands of over and underutilized storage pools
What is the definition of SCSI?
Small Computer System Initiative. SCSI is all about an initiator sending a command to a target.
What does SCSI communication involve?
SCSI Initiator Device: Issues commands to SCSI target devices.SCSI Target Device: Executes commands issued by initiators.
What are the versions of SCSI?
SCSI -1: Defined cable length, signaling characters, commands, and transfer modes, Uses 8-bit narrow bus (supoports 8 devices)
SCSI -2: Defined common Set (CCS), 16 bit, improved performance and reliability
SCSI -3: Latest version, comprised different but related standards, rather than one large document.
*Can support between 8 and 16 devices
What is SCSI Addressing?
Used to uniquely number (0-15) identify hosts and devices. the UNIX naming convention is used to identify a disk and the three identifiers - initiator ID, target ID, and a LUN.
Structure and Organization of FC Data
Exchange Operation (conversation): enables two N_ports to identify and manage a set of information units.
Sequence (Sentence): refers to a contiguous set of frames that are sent from one port to another.
Frame (word): the fundamental unit of data transfer at Layer 2. *Each frame can contain up to 2,112 bytes of payload
What SCSI ID has the highest priority?
What is a SCSI Port?
SCSI ports are physical connectors that the SCSI cable plugs into for communication with a SCSI device.
SCSI device may contain initiator port, target port and target / initiator port.
To cater to service requests from multiple devices, a SCSI device may also have multiple ports.
World Wide Names: a unique 64-bit identifier which is static to the port. Used to physically identify ports.Like a NIC's MAC Address Every HBA has one Burned into an array port.
What is SAN?
Storage Area Network. Is a dedicated high speed network for block level access. Carries data between servers (AKA Hosts) and storage devices through FC switches.
Provides Block Level data access.
Consolidates resources centralizing storage and management
Scalability (theoretical limit 15 million nodes)
Fibre Channel Addressing is dynamically assigned during fabric login. Used to communicate between nodes within SAN. Like an IP Address on a NIC Address format: 24 bit, dynamically assigned
What are the components of SAN?
A SAN consists of three basic components:
These components can be further broken down into the following key elements:
Interconnecting Devices (such as FC switches or hubs)
SAN Management Software
Fibre Channel Protocol Stack (5)
FC-4: Upper Layer protocol
FC-2: Transport Layer
FC-1: transmission layer
FC-0: physical interface
FC-3 has not been implemented
Fiber Channel Architecture Overview:
Used channel technology
high performance with low protocol overheads
FCP is SCSI-3 over FC network
Has five layers
What is Fibre Channel SAN and its components?
moves blocks of data over fibre optic cables using SCSI commands between initiator and target.
Components: director/switch, host (node), storage (node), cables, management software to control ports/switches.
Fabric Log In: between N-Port to F_port Between node and switch (switch/array or initiator/target) 1st in process
What are the two types of optical cables?
Single Mode: Can carry single beams of light with a distance of up to 10 KM.
Multi Mode:Can carry multiple beams of light simultaneously at a distance of up to 500M.
(Note: multi mode cable can suffer from modal dispersion)
Port Login : between N_Port to N_Port (initiator to target initial contact) 2nd in process
Process login (figure out how to talk by a common language - SCSI) 3rd in series
What are the different types of SAN connectors?
Standard Connector (SC) Duplex Connectors
Lucent Connector (LC) Duplex Connector
Patch Panel Connectors
Straight Tip (ST) Simplex Connectors
Inter Switch Links - connects two or more FC Switches to each other using E-Ports.
Used to transfer host to storage data as well as the fabric management traffic from one switch to another. Also one of the scaling mechanisms in SAN connectivity
What are the different port types on SAN?
N_Port (node port): end point in the fabric to the switch.
NL_Port (node loop port): supports arbitrated loop topology. Goes into a HUB.
E_Port (expansion port): FC port that forms the connection between two FC Switches.
F_Port (fabric port): a port on a switch that connects an
FL_Port (public loop): a fabric port that participates in FC-AL. Connected to the NL_Ports on an FC-AL loop.
G_Port (generic port): can operate as an E_Port or an F_Port and determines its functionality automatically during initialization.
What are the three commonly used SAN Interconnecting Devices?
Hubs: Physically connect nodes in a logical loop or a physical star topology.
Switches: More intelligent than hubs and directly route data from one physical port to another.
Directors: Departmental switch.
Describe the SAN Interconnectivity Option called FC-SW?
Fibre Channel switched fabric (FC-SW) - provides interconnected devices, dedicated bandwidth, and scalability.Also know as fabric connect.
Describe the SAN Interconnectivity Option called FC-AL?
Fibre Channel Arbitrated Loop (FC-AL): devices are attached to a shared loop. Devices on the loop must arbitrate to gain control of the loop. At any given time, only ONE device can perform I/O operations on the loop.
What is the simpliest form of SAN Interconnectivity?
Point to Point - two devices are connected directly to each other (like DAS).
Describe SAN Management Software?
A suite of tools used in a SAN to manage the interface between host and storage arrays.
Provides integrated management of SAN environment.
Web based GUI or CLI
What is Core-Edge Fabric? & What are the two types?
Two types of switch tiers - the edge tier (comprised of switches) and the core tier (enterprise directors)
Single Core: all hosts are connected to the edge tier and the core tier.
Dual Core: can be expanded to include more core switches - enables load balancing.
Describe the Fabric Topology Mesh and name the different types
Each switch is connected directly to the other switches by using ISLs. Promotes enhanced SAN connectivity.
Full Mesh: every switch is connected to another switch in the topology - appropriate for a smaller # of switches (4).
Partial Mesh: several hops or ISLs may be required for traffic to reach its destination. Can cause latency issues.
Describe the term Zoning in Fabric Management
Zoning is an FC switch funtion that enables nodes within the fabric to be logically segmented into groups that can communicated with each other.
Access control done on the switch or fabric vs. LUN masking which is done on the array
Setting up a relationship between initiator and target.
What are the Storage Over IP protocol Options?
Is SCSI over IP.
Has IP encapsulation
Hardware-based gateway to Fibre Channel Storage
Used to connect servers
Fibre Channel-to-IP bridge / tunnel
Point to point
Fibre Channel end points
**Used in DR Implementations**
What is iSCSI?
An IP base protocol that establishes and manages connections between storage, hosts and bridge devices over IP.
Carries block level data over IP based networks, including Ethernet networks and the Internet packets
Is built on the iSCSI protocol by encapsulating SCSI commands and data in order to allow these encapsulated commands and data blocks to be transported using TCP/IP
Describe the components of a Zone
Members: nodes within the SAN that can be included in a zone
Zones: comprise a set of members
Zone Set: comprise of a group of zones that can be activated or deactivated as a single entity fabric
*Only one zone set per fabric can be active at a time
Describe the Types of Zoning
sets up the relationship to set what initiator can see what target
Port Zoning (hard zoning): uses FC addressing of the physical ports to define the zones (most secure - EMC general Practice).
WWN zoning (soft zoning): uses world wide names to define zones.
What are the components of iSCSI?
iSCSI host initiators:
Host computer using a NIC or iSCSI HBA to connect to storage
iSCSI initiator software may need tobe installed
Storage array with embedded iSCSI capable network port
LAN for IP Storage Network:
Interconnected Ethernet switches and / or routers
What is NAS and what are the benefits?
Network attached storage. It is an IP based file-sharing device attached to a local area network.
What are the iSCSI host connectivity options
Code that can be loaded onto a host to provide the translation between the storage I/O calls and the network interface
TCP Offload Engine (TOE):
Moves the TCP processing load off the host CPU onto the NIC Card to free up processing cycles for application execution.
A network interface adapter with and integrated SCSI ASIC (Application Specific Integrated Circuit)
Simplest option for boot from SAN
What are the component of NAS?
NAS Head (CPU and Memory)
Operating System to manage NAS functions
NFS (unix) and CIFS (microsoft)
Industry-standard storage protocols
Describe the NAS File Sharing Protocols
CIFS - Common Internet File System Protocol
Microsoft Environment based on the server message block protocol
NFS - Network File System Protocol
UNIX environment file sharing protocol
Describe the NAS I/O Process
requester packages the I/O request into TC/IP to a remote file system which is handled by the NAS
The NAS converts the I/O into an appropriate physical storage request (block level I/O)
When the data is returned from the physical storage pool, the NAS processes and repackages it into a file protocol response.
The NAS packages this response into TCP/IP again and forwards it to the client through the network.
What are the three iSCSI Topologies?
Native Connectivity: Do not have and FC components; perform all communication over IP.
Bridged Connectivity: Enable the co-existance of FC with IP by providing iSCSI to FC bridging functionality.
Combining FCP and Native Connectivity
Describe the types of NAS Implementations
Integrated NAS: - has all components of NAS in a single enclosure. Connects to the IP network to provides connectivity to the clients and service the file I/O
Gateway NAS: has independent NAS head and one or more storage arrays (2 protocols)
What are the two ways in which iSCSI discovery takes place?
Send Targets Discovery:
Initiator is mutually configured with the target
Internet Storage Name Service (iSNS):
Initiators and targets automatically register themselves with iSNS server
iSNS is a client / server model
Describe how Managing an Integrated System (NAS Connectivity) works
Both the NAS component and the storage array are managed via NAS management software
Describe managing a Gateway System (NAS)
NAS component managed via NAS Management software and the storage array is managed via array management software
What are the two types of iSCSI names?
IQN: iSCSI Qualified Name
IQN (ex: iqn.2008-02.com.example:optional)
EUI: Extended Unique Identifier
Celerra is a dedicated high-performance infrastructure for FILE LEVEL I/Os