6 min read

On Hard and Soft Links

We use hard links all the time. In fact, we use them much more than soft links (perhaps better known by their alternate names symbolic links or symlinks). As a matter of fact, every file in your filesystem(s) is a hard link.

When a hard link is created, it shares the same inode as its target. This is why hard links cannot be linked across filesystems, as each filesystem will have its own number of inodes. At first blush this may seem odd, but recall that many different filesystems are supported by and exist in an operating system (see virtual filesystem switch (VFS) for more information), and so there is no guarantee that the filesystem of a target supports hard linking.

Since they share the same inode, all the file attributes (that is, the metadata) that an inode stores in its data structure will be reflected across all the hard links. Obviously, this means that any changes to the contents of one link will be reflected in all the other hard links because the inode references the same data blocks. But not only that: file permissions, ownership, timestamps (create time, access time, modify time), et al.

Conversely, a soft link does not share the target’s inode. Rather, it just points to the file name of the target (although they both have separate and distinct inodes). This can have some unexpected consequences if you’re not careful.

For instance, if a symlink is created and then the target it points to is deleted, well, that’s a bad thing. You’re now left with a broken link and potential heartache:

$ touch 1
$ ln -s {1,2}
$ ls -l
total 0
-rw-rw-r-- 1 btoll btoll 0 Oct  3 15:42 1
lrwxrwxrwx 1 btoll btoll 1 Oct  3 15:42 2 -> 1
$ echo herp >> 1
$ cat {1,2}
herp
herp
$ rm 1
$ ls -l
total 0
lrwxrwxrwx 1 btoll btoll 1 Oct  3 15:42 2 -> 1
$ cat 2
cat: 2: No such file or directory

To create a hard link, the target link must already exist in the filesystem, and the call will fail if it is not present. Conversely, a soft link can be created for a non-existent target file. These symlinks that point to non-existent files are called dangling, orphaned or broken link.

Use the ln command-line program to create both a hard and soft links:

Hard link:

$ ln target_file.txt new_hard_link.txt

Soft link:

$ ln -s target_file.txt new_soft_link.txt

You can also create a new hard link with cp by supplying the -l -or --link flag.

For example, let’s create a file foo, hard link the file bar to it and then inspect the number of hard links to this particular inode as well as the inode number and the time of last access, respectively:

$ touch foo
$ ln {foo,bar}
$ stat --format "%h | %i | %x" {foo,bar}
2 | 7213164 | 2022-09-20 17:16:43.388685355 -0400
2 | 7213164 | 2022-09-20 17:16:43.388685355 -0400

Now, update the access time of the bar file. Observe that both the files have been updated, since the inode that they both share had its data structure modified:

$ touch bar
$ stat --format "%h | %i | %x" {foo,bar}
2 | 7213164 | 2022-09-20 17:19:49.208433336 -0400
2 | 7213164 | 2022-09-20 17:19:49.208433336 -0400

Let’s change the access rights:

$ stat --format %A {foo,bar}
-rw-rw-r--
-rw-rw-r--
$ chmod 4701 bar
$ stat --format %A {foo,bar}
-rws-----x
-rws-----x

Weeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee

Hard links use the link system call, whereas soft links use the symlink system call.

Meanwhile, when deleting a file, i.e. using rm, it’s not really deleting the file but removing the link to it. Crucially, it’s not truly removed, or unlinked, from the filesystem until all the references to it in the inode reach zero (this can lead to some clever data retrieval operations for unwanted deletes). At this point, the content on disk in the data blocks is freed, and the file cannot be recovered.

See my article on On Recovering Deleted Files with lsof to take advantage on this knowledge of how inodes work to potentially recover deleted files.

Incidentally, both (additional) hard links and soft links are cheap, then, because they don’t create any new space on the hard disk, but some are cheaper than others. For instance, symlinks add a layer of indirection which a hard link doesn’t, because the kernel needs to dereference a pointer to lookup the actual target name. Probably not a big deal, but it’s good to know.

For instance, since a soft link will create its own inode, the kernel would first look up the inode number of the symbolic link in the inode table and then follow its reference to the actual inode of the target to be able to look up its file contents.

This example demonstrates that hard and soft links will get its own inode:

$ touch 1
$ ln -s {1,2}
$ ls -l
total 0
-rw-rw-r-- 1 btoll btoll 0 Oct  3 15:42 1
lrwxrwxrwx 1 btoll btoll 1 Oct  3 15:42 2 -> 1
$ stat --format %i {1,2}
6821092
6821093

So, why use a soft link over a hard link?

Some file systems don’t allow hard links to directories (although symlinks can be created to directories). This is because linking to directories could lead to circular references (hence a circular directory structure), and a filesystem needs to be a tree-like structure.

For example, on my Linux machine on an ext4 filesystem type, I tried to create a hard link to the current directory but was disallowed:

$ ln . derpy
ln: .: hard link not allowed for directory

The following example will demonstrate that filesystems will get their own allotted number of inodes. Observe that the number of the inode is greatly different depending on the filesystem (especially, when it was created, i.e., longer-living filesystems will (usually) have higher numbers assigned to inodes).

$ mkdir foo
$ sudo mount -t tmpfs -o size=150M tmpfs foo
$ file foo
foo: sticky, directory
$ touch derp.txt
$ touch foo/derp.text
$ stat derp.txt ; echo ; stat foo/derp.text
  File: derp.txt
  Size: 0               Blocks: 0          IO Block: 4096   regular empty file
Device: 802h/2050d      Inode: 6555906     Links: 1
Access: (0664/-rw-rw-r--)  Uid: ( 1000/   btoll)   Gid: ( 1000/   btoll)
Access: 2022-09-02 23:11:46.427818478 -0400
Modify: 2022-09-02 23:11:46.427818478 -0400
Change: 2022-09-02 23:11:46.427818478 -0400
 Birth: -

  File: foo/derp.text
  Size: 0               Blocks: 0          IO Block: 4096   regular empty file
Device: 100004h/1048580d        Inode: 3           Links: 1
Access: (0664/-rw-rw-r--)  Uid: ( 1000/   btoll)   Gid: ( 1000/   btoll)
Access: 2022-09-02 23:11:52.371811864 -0400
Modify: 2022-09-02 23:11:52.371811864 -0400
Change: 2022-09-02 23:11:52.371811864 -0400
 Birth: -

Clearly, they have different device names and different inode numbers.

Here’s another example. Note that these three utilities are all hard links referencing the same inode:

$ ls -il /bin | ag 786461
786461 -rwxr-xr-x 3 root root   39144 Sep  5  2019 bunzip2
786461 -rwxr-xr-x 3 root root   39144 Sep  5  2019 bzcat
786461 -rwxr-xr-x 3 root root   39144 Sep  5  2019 bzip2

References