preview

Questions On Google File System

Better Essays

4 Modern Distributed File System
4.1 GFS (Google File System)
Google File System (GFS) as a proprietary file system was first published by ACM 2003 Article, and was developed by Google for its own use. Its design goal was to provide efficient, reliable access to a large amount of data using clusters of commodity hardware. Those cheap "commodity" computers will bring the high failure rate of individual nodes and the subsequent data loss. So GFS has some strategies to deal with the system failure. GFS also supports for high data throughputs, even when it comes at the cost of latency.

In GFS, files are extremely rarely overwritten, or shrunk. When these files need to be modified, it only adds append to those files.
A GFS cluster consists …show more content…

Only when all chunk servers send back acknowledge, the changes can be saved on the system. This strategy guarantees the completion and atomicity of the operation.
Client application accesses the files by first querying the Master server for the locations of the desired chunks; with these information the client can contact with the chunk servers directly for further operations. But if the chunks are being operated on (i.e. there are outstanding leases exist), the client cannot access those files at this time.
GFS is not implemented in the kernel of an operating system, but is instead provided as a user space library.
4.2 HDFS (Hadoop Distributed File System)
Hadoop Distributed File System (HDFS) is developed from GFS, so it has almost the same architecture with GFS, master/slave architecture. HDFS is designed to hold large amount of data (terabytes or even petabytes) and distributes the data in a cluster of connected computers. HDFS, as the important part of Hadoop, usually handles those data with large size. It puts the large data into small chunks, which is usually 64 megabytes, and stores three copies of each chunk into different data nodes (chunk servers). By fragmenting the large data and distributing them into different datanodes allow client application to read data from distributed files and perform operations by using MapReduce. but is an open source system developed using GFS as a

Get Access