preview

Architecture Of Glusterfs As A Scalable File System

Decent Essays

GlusterFS is scalable file system which is implemented in C language. Since it is an open source its features can be extended [8]. Architecture of GlusterFS is a powerful network written in user space which uses FUSE to connect itself with virtual file system layer [9].
Features in GlusterFS can be easily added or removed [8]. GlusterFS has following components:
• GlusterFs server storage pool – it is created of storage nodes to make a single global namespace. Members can be dynamically added and removed from the pool.
• GlusterFs storage client – client can connect with any Linux file system with any of NFS, CFS, HTTP and FTP protocols. Fuse – fully functional Fs can be designed using Fuse and it will include features like: simple …show more content…

That somehow defeats the purpose of a high-availability storage cluster, must synchronize the system time of all bricks, clearly the lack of accessible disk space wasn't GlusterFS's fault, and is probably not a common scenario either, but it should spit out at least an error message.
2.4. HDFS File System
Hadoop distributed file system is written in Java for Hadoop framework, it is scalable and portable FS. HDFS provide shell commands and Java application programming interface (API). [12] Data in a Hadoop cluster is broken down into smaller pieces (called blocks) and distributed throughout the cluster. In this way, the map and reduce functions can be executed on smaller subsets of larger data sets, and this provides the scalability that is needed for big data processing. [12] A Hadoop cluster has nominally a single namenode plus a cluster of datanodes, although redundancy options are available for the namenode due to its criticality. Each datanode serves up blocks of data over the network using a block protocol specific to HDFS. The file system uses TCP/IP sockets for communication. Clients use remote procedure calls (RPC) to communicate with each other.

Fig 5. HDFS Architecture [19]

HDFS stores large files across multiple machines. It achieves reliability by replicating the data across multiple hosts, and hence theoretically does not require redundant array of independent disks (RAID) storage on

Get Access