Jump to content

Recommended Posts

Posted

Hello everyone!

I have an assignment about configuring and compering NFS and Hadoop.

For me this two option for distributed file system are unknown and i would like to know if someone has previous knowledge related to this.

My main question is which one NFS or Hadoop is the best choice, which is more suitable?

Thank you.

Posted

Since i draw a distinction between network and distributed systems (i.e. To me the fact that something communicates over a network doesn't make it distributed, distributed meaning given shares of doesn't apply to things where the share can only be given in one chunk to one device). So to me NFS is not a distributed file system, merely a network one, and Hadoop is not even a file system, it's a software development framework. In fact these things are so different from each-other that comparing the two is akin to comparing an octopus to an apple...

Posted

maybe my question was not clear....i am sorry because my native language is not English. My assignment states that we need to create a private cloud to store and process gigabytes of sensor data, which will require a suitable distributed file system and we need do decide which option between NFS and Hadoop is better to achieve that. They are two different technologies used for the same intention.

Posted (edited)

Worry not, English is not my native language either. They are different technologies used with different intention. Questions you have to ask are: What format is the sensor data in? How/what is processing it and is it Hadoop-ready? Is it gigabytes total or gigabytes per second of data? What the total eventual workload is?

 

As i have already said, NFS is technically not a distributed file system. It simply allows you to share files (or devices, since everything in linux is a file). It is not a distributed system, as in my file is located in this one box, the more boxes i add, the more shares i manage, and more i have to do to automate to limit loosing everything.

 

Hadoop is a software framework for highly distributed systems, but not without it's limitations (at least just by itself). Hadoop doesn't store files, Hadoop doesn't act as an easily queryable database, and you have to design your project with Hadoop in mind from the beginning, your software, your data, your processing algorithms, everything has to be catered to this system.

 

Neither of the solutions is a truly distributed (in the sense of "cloud") file system. NFS is a file system but is not distributed, Hadoop is distributed, but is not a file system...

Edited by AtomicMaster

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.