2010-03-25

Why use (hard) links ?

Dear lazyweb,

When explaining inodes, directories and hard links, students regularly ask "Why would you (ever) use hard links ?".

I usually reply with three obvious reasons:
1. To never have a file twice on your system (and thus save disk space).
(they need to know that deleting files with hard links doesn't add free space)
2. For compatibility with legacy applications that use old file locations (without the need to rewrite those applications).
3. Because Linux programs know how they are called, mke2fs, mkfs.ext2 and mkfs.ext3 can be the same file, but with distinct functionality.

Sometimes these arguments fail to convince students of the usefulness of hard links. What would you say ?

17 comments:

bert said...

AFAIK hard links are also part of the rename implementation: first a hardlink is created, then the original link is removed. In the typical UNIX style, the individual steps of the implementation are available for use as well.

Paul Cobbaut said...

True, but it doesn't argument when someone should use this individual step...

Anonymous said...

For incremental backups (e.g. with rsync): you put a daily back-up in separate directories, so that every directory contains a full back-up of that date, but each back-up only takes up as much space as is needed for the differences with the previous back-up. Then when you don't need an old back-up anymore, you can safely remove its directory completely, without harming the back-ups of a later date.

Dieter_be said...

Maybe the problem is not so much the arguments themselves. I find argument 1 very compelling.
maybe you should give them some examples. Such as rsnapshot, or how this allows you to create a simple but effective tag-based music categorisation system.

Gerry said...

Sharing unix sockets between chroots seems like a good example to me.
Think of a typical Postfix + MySQL setup running in a chrooted environment. You could share MySQL's unix socket with the chrooted Postfix by using a hardlink.

Paul Cobbaut said...

thanks!

Philip Paeps said...

all the things you mention could be done with softlinks too, at the expense of an inode and a couple of bytes.

The main reason to use hardlinks has been mentioned: it closes the rename race.

The reason there's a command line tool for it (and often a shell builtin) is probably so you can rename without races in shell scripts.

localhost said...

I had a race condition with Philip. He won. :)

Suggested reading material: Modern Operating Systems, by Andy Tanenbaum. I don't have the book on my shelf here at work, but the google says that page 736 may be interesting.

Paul Cobbaut said...

Google books says it's only 728 pages...

Anonymous said...

ditto on incremental backups

Dag said...

How about "providing different file permissions for the same data" ?

Not that I would recommend it by default, but 2 (hard)links to the same inode can have different permissions. So you could provide a file in the home-directory of a few people that each can access, even though they can't access each other's file. That's something you cannot do with symlinks and might serve a purpose.

Of course, hardlinks are bound to a single filesystem, and therefor you are limited in other ways.

Paul Cobbaut said...

Dag,
permissions are inside the inode. I wonder how you make two hard links to the same inode with different permissions. afaik this is not possible.
Or am I missing something ?

Dag Wieers said...

Paul: You are absolutely right. What I meant was that you may not have the permissions to access that file (originally), but because you have a hardlink in a path that you can access, it becomes accessible.

So what I did not make clear was that you also need access to the parent path(s) to be able to access a file, and those may not be hardlinked and can have different permissions.

Sorry for the confusion :-) Explaining it was harder than what I wanted to say in my head... In fact I think I should have provided an example of some kind.

Paul Cobbaut said...

Yep Dag, that was what I was thinking you meant. Looks similar to the chroot argument from Dieter.
Thanks all again, it will be in linux-training.be as soon as I find the time.

paul

Playing with linux said...

Good article :)

Dice said...

Though it's been mentioned twice, I get the impression you might have underrated (or 'underconsidered' :) incremental backups.

It keeps the required amount of backup space (and thus bandwidth) at a minimum. This way, it's possible to create (really: update) a full backup in the blink of an eye.

Apple's Time Machine uses that concept:

https://secure.wikimedia.org/wikipedia/en/wiki/Time_Machine_%28Mac_OS%29#How_it_works

Also there's a good Linux GUI implementation utilizing hardlinks (with rsync):

http://backintime.le-web.org/documentation/

From a user's perspective this might actually be one of the major arguments.

raj said...

The only use case of hard-links is if "if deleted" scenario comes,
Say you have two files hard-linked even if you delete any of the link files the data can be accessed using the other. the disk will be freed only after last link file is deleted. this is not the case in soft-links, if original file is deleted, the space will be freed immediately and the soft-link files will be left in a dangled state.