Reference Guide

Documentation for MooseFS 3.0.x and 2.0.x can be found at the link below:


 

If you are looking documentation for older (1.6.x) versions of MooseFS you can use following links:

Installing MooseFS 1.6.x Step by Step Tutorial (English version)
MooseFS 1.6.x 分布式文件系统安装向导 (Chinese version)
Podręcznik instalacji systemu plików MooseFS 1.6.x krok po kroku (Polish version)

 

 

REQUIREMENTS FOR THE MASTER SERVER, CHUNK SERVERS AND CLIENTS

 

Master

 

As the managing server (master) is a crucial element of MooseFS, it should be installed on a machine which guarantees high stability and access requirements which are adequate for the whole system. It is advisable to use a server with a redundant power supply, ECC memory, and disk array RAID1/RAID5/RAID10. The managing server OS has to be POSIX compliant (systems verified so far: Linux, FreeBSD, Mac OS X and OpenSolaris).

 

The most important factor in sizing requirements for the master machine is RAM, as the full file system structure is cached in RAM for speed. The master server should have approximately 300 MiB of RAM allocated to handle 1 million files on chunkservers.

 

The necessary size of HDD depends both on the number of files and chunks used (main metadata file) and on the number of operations made on the files (metadata changelog); for example the space of 20GiB is enough for storing information for 25 million files and for changelogs to be kept for up to 50 hours.

 

Metalogger

 

MooseFS metalogger just gathers metadata backups from the MooseFS master server - so the hardware requirements are not higher than for the master server itself; it needs about the same disk space. Similarly to the master server - the OS has to be POSIX compliant (Linux, FreeBSD, Mac OS X, OpenSolaris, etc.).

 

If you would like to use the metalogger as a master server in case of its failure, the metalogger machine should have at least the same amount of RAM and HDD as the main master server.

 

Chunkservers

 

Chunkserver machines should have appropriate disk space (dedicated exclusively for MooseFS) and POSIX compliant OS (verified so far: Linux, FreeBSD, Mac OS X and OpenSolaris).

 

Minimal configuration should start from several gigabytes of storage space (only disks with more than 256 MB and chunkservers reporting more than 1 GB of total free space are accessible for new data).

 

Clients (mfsmount)

 

mfsmount requires FUSE to work; FUSE is available on several operating systems: Linux, FreeBSD, OpenSolaris and MacOS X, with the following notes:

  • In case of Linux a kernel module with API 7.8 or later is required (it can be checked with dmesg command - after loading kernel module there should be a line fuse init (API version 7.8)). It is available in fuse package 2.6.0 (or later) or in Linux kernel 2.6.20 (or later). Due to some minor bugs, the newer module is recommended (fuse 2.7.2 or Linux 2.6.24, although fuse 2.7.x standalone doesn't contain getattr/write race condition fix).
  • In case of FreeBSD we recommed using fuse-freebsd (https://github.com/glk/fuse-freebsd) which is a successor to fuse4bsd.
  • For MacOSX we recommend using OSXFUSE (http://osxfuse.github.com/), which is a successor to MacFUSE and has been tested on MacOSX 10.6 and 10.7.
 

MAKING AND INSTALLING

 

The preferred MooseFS deployment method is installation from the source.

Source package supports standard ./configure && make && make install procedure. Significant configure options are:

  • --disable-mfsmaster - don't build managing server (useful for plain node installation)

  • --disable-mfschunkserver - don't build chunkserver

  • --disable-mfsmount - don't build mfsmount and mfstools (they are built by default if fuse development package is detected)

  • --enable-mfsmount - make sure to build mfsmount and mfstools (error is reported if fuse development package cannot be found)

  • --prefix=DIRECTORY - install to given prefix (default is /usr/local)

  • --sysconfdir=DIRECTORY - select configuration files directory (default is ${prefix}/etc)

  • --localstatedir=DIRECTORY - select top variable data directory (default is ${prefix}/var; MFS metadata are stored in mfs subdirectory, i.e. ${prefix}/var/mfs by default)

  • --with-default-user=USER - user to run daemons as if not set in configuration files (default is nobody)

  • --with-default-group=GROUP - group to run daemons as if not set in configuration files (default is nogroup)

For example, to install MooseFS using system FHS-compliant paths on Linux, use: ./configure --prefix=/usr --sysconfdir=/etc --localstatedir=/var/lib

make install respects standard DESTDIR= variable, allowing to install package in temporary location (e.g. in order to create binary package). Already existing configuration or metadata files won't be overwritten.

 

Managing server (master)

 

To install the master (managing server) process one needs to:

  • install mfs-master package (make install after running configure without --disable-mfsmaster option in case of installation from source files)

  • create a user with whose permissions the service is about to work (if such user doesn't already exists)

  • make sure that the directory for the meta datafiles exists and is writable by this user (make install run as the root does the thing for the user and paths set up by configure, if the user existed before)

  • configure the service (file mfsmaster.cfg), paying special attention to TCP ports in use

  • add or create (depending both on the operating system and the distribution) a batch script starting the process mfsmaster

After the installation the managing server is started by running mfsmaster. If the mfsmaster program is executed by the root, it switches to a configured user; otherwise it runs as a user who executed it. In case of a server outage or an improper shutdown the mfsmetarestore utility will restore the file system information.

 

Metalogger

 

Metalogger daemon is installed together with mfs-master. The minimal requirements are not bigger than master itself; it can be run on any machine (e.g. any chunkserver), but the best place is the backup machine for MooseFS master (so in case of primary master machine failure it's possible to run master process in place of metalogger - see appropriate HOWTO for details). You can have as many metalogger processes/machines in your network as you like.

 

To install the metalogger process one needs to:

  • install mfs-master package make install after running configure without --disable-mfsmaster option in case of installation from source files)
  • create a user with whose permissions the service is about to work (if such user doesn't already exists)
  • make sure that the directory for the metadata files exists and is writable by this user (make install run as the root does the thing for the user and paths set up by configure, if the user existed before)
  • configure the service (file mfsmetalogger.cfg, paying special attention to TCP ports used (MASTER_PORT has to be the same as MATOML_LISTEN_PORT in mfsmaster.cfg on managing server)
  • add or create (depending both on the operating system and the distribution) a batch script starting the process mfsmetalogger

After the installation the metalogger is started by running mfsmetalogger. If the mfsmetalogger program is executed by the root, it switches to a configured user; otherwise it runs as a user who executed it.

 

Chunkservers

 

At least one data server (chunkserver) needs to be set up to work with the management server. These machines should have appropriate free space on disks and POSIX compliant OS (verified so far: Linux, FreeBSD, Mac OS X and OpenSolaris). Chunkserver stores data chunks/fragments as files on a common file system (eg. ext3, xfs, ufs). It is important to dedicate file systems used by MooseFS exclusively to it - this is necessary to manage the free space properly. MooseFS does not take into account that a free space accessible to it could be taken by other data. If it's not possible to create a separate disk partition, filesystems in files can be used (have a look at the following instructions).

 

Linux:

  • creating and formatting 2GB file:

    dd if=/dev/zero of=FILENAME bs=1024 count=1 seek=$((2*1024*1024-1))
    mkfs -t ext3
    FILENAME

  • mounting:

    mount -t ext3 -o loop FILENAME MOUNTPOINT

FreeBSD:

  • creating and mounting:

    dd if=/dev/zero of=FILENAME bs=100m count=400

    mdconfig -a -t vnode -f FILENAME -u X

    newfs -m0 -O2 /dev/mdX

    mount /dev/mdX MOUNTPOINT

  • mounting a previously created file system:

    mdconfig -a -t vnode -f file -u X

    mount /dev/mdX MOUNTPOINT

Mac OS X:

  • Start "Disk Utility" from "/Applications/Utilities"

  • Select from menu "Images->New->Blank Image ..."

Note: on each chunkserver disk some space is reserved for growing chunks and thus inaccessible for creation of the new ones. Only disks with more than 256 MB and chunkservers reporting more than 1 GB of total free space are accessible for new data. Minimal configurations should start from several gigabytes of storage.

 

To install the data server (chunkserver):

  • isolate space intended for MooseFS as separate file systems, mounted at a defined point (e.g. /mnt/hd1, /mnt/hd2 itd.)

  • install mfs-chunkserver package (make install after running configure without --disable-mfschunkserver option in case of instalation from source files)

  • create a user, with whose permissions the service is about to work (if such user doesn't already exist)

  • give this user permissions to write to all filesystems dedicated to MooseFS

  • configure the service (mfschunkserver.cfg file), paying special attention to used TCP ports (MASTER_PORT has to be the same as MATOCS_LISTEN_PORT in mfsmaster.cfg on managing server)

  • enter list of mount points of file systems dedicated to MooseFS file mfshdd.conf

  • add or create (depending both on the operating system and distribution) batch script starting process mfschunkserver

Note: It's important which local IP address mfschunkserver uses to connect to mfsmaster. This address is passed by mfsmaster to MFS clients (mfsmount) and other chunkservers to communicate with the chunkserver, so it must be remotely accessible. Thus master address (MASTER_HOST) must be set to such for which chunkserver will use proper local address to connect - usually belonging to the same network as all MFS clients and other chunkservers. Generally loopback addres (localhost, 127.0.0.1) can't be used as MASTER_HOST, as it would make the chunkserver inaccessible for any other host (such configuration would work only on single machine running all of mfsmaster, mfschunkserver and mfsmount).

 

After installation the data server is started simply by running mfschunkserver. If mfschunkserver program is executed by root, it switches to configured user; otherwise it runs as user who executed it.

 

Clients (mfsmount)

 

Installing MooseFS client:

  • install mfs-client package (make install after running configure without --disable-mfsmount option in case of installation from source files)

  • create a directory where MooseFS will be mounted (e.g. /mnt/mfs)

MooseFS is mounted with the following command:

mfsmount [-h master] [-p port] [-l path] [-w mount-point]

where master is the host name of the managing server, port is the same as given in MATOCU_LISTEN_PORT in file mfsmaster.cfg, path is mounted MooseFS subdirectory (default is /, which means mounting the whole file system), mount-point is the previously created directory for MooseFS.

 

USING MOOSEFS

 

Mounting the File System

 

After launching the managing server and data servers (chunkservers) (one is required but at least two are recommended) one can mount file system by starting the } process. MooseFS is mounted with the following command:

 

mfsmount mountpoint [-d] [-f] [-s] [-m] [-n] [-p] [-H MASTER] [-P PORT] [-S PATH] [-o OPT[,OPT...]]

where MASTER is the host name of the managing server, PORT is the same as given in MATOCU_LISTEN_PORT in file mfsmaster.cfg, PATH is mounted MooseFS subdirectory (default is /, which means mounting the whole file system), mountpoint is the previously created directory for MooseFS.

 

By starting the mfsmount process with the -m (or -o mfsmeta) option one can mount the auxiliary file system MFSMETA (which may be useful to restore a file accidentally deleted from the MooseFS volume or to free some space by removing a file before elapsing the quarantine time), for example:

mfsmount -m /mnt/mfsmeta

 

As of 1.6.x the mfsmount may be set up in the /etc/fstab on Linux platforms to facilitate having MooseFS mounted on the system startup:
mfsmount /mnt/mfs fuse mfsmaster=MFSMASTER_IP,mfsport=MFSMASTER_PORT,_netdev 0 0
You can use here all available options for mfsmount (apart from "debug"). Now MooseFS resources should be automatically mounted during system startup. Without restart you can test this entry by issuing: mount -a -t fuse which means to remount all filesystems (of the given types) mentioned in fstab. (_netdev option is recognized by startup scripts (at least of some distributions - eg. RedHat or Debian) and means that this is a network file system and can be mounted only after the network had been started).

 

Basic operations

 

After mounting the file system one can perform all standard file operations (like creating files, copying, deleting, changing names, etc.). MooseFS is a networking file system, so operations progress may be slower than in a local system.

Free space on the MooseFS volume can be checked the same way as for local file systems, e.g. with the df command:

 

$ df -h | grep mfs

mfsmaster:9421 85T 80T 4.9T 95% /mnt/mfs

mfsmaster:9321 394G 244G 151G 62% /mnt/mfs-test

What is important is that each file can be stored in more than one copy. In such cases it takes adequately more space than its proper size. Additionally, files deleted during the quarantine time are kept in a "trash can" so they also take up space, their size also depends on the number of copies). Just like in other Unix file systems, in case of deleting a file opened by some other process, data is stored at least until the file is closed.

 

You may alse like to have a look at this FAQ entry: When doing df -h on a filesystem the results are different from what I would expect.

 

Operations specific for MooseFS

 

Setting the goal

 

The "goal" (i.e. the number of copies for a given file) can be verified by the mfsgetgoal command and changed with the mfssetgoal command:

$ mfsgetgoal /mnt/mfs-test/test1

/mnt/mfs-test/test1: 2

$ mfssetgoal 3 /mnt/mfs-test/test1

/mnt/mfs-test/test1: 3

$ mfsgetgoal /mnt/mfs-test/test1

/mnt/mfs-test/test1: 3

 

Similar operations can be done on the whole directory trees with the mfsgetgoal -r and mfssetgoal -r commands:

$ mfsgetgoal -r /mnt/mfs-test/test2

/mnt/mfs-test/test2:

files with goal 2 : 36

directories with goal 2 : 1

$ mfssetgoal -r 3 /mnt/mfs-test/test2

/mnt/mfs-test/test2:

inodes with goal changed: 37

inodes with goal not changed: 0

inodes with permission denied: 0

$ mfsgetgoal -r /mnt/mfs-test/test2

/mnt/mfs-test/test2:

files with goal 3 : 36

directories with goal 3 : 1

 

The actual number of copies of a file can be verified with the mfscheckfile and mfsfileinfo commands:

$ mfscheckfile /mnt/mfs-test/test1

/mnt/mfs-test/test1:

3 copies: 1 chunks

$ mfsfileinfo /mnt/mfs-test/test1

/mnt/mfs-test/test1:

chunk 0: 00000000000520DF_00000001 / (id:336095 ver:1)

copy 1: 192.168.0.12:9622

copy 2: 192.168.0.52:9622

copy 3: 192.168.0.54:9622

 

Note: a zero length file contains no data, so despite the non-zero "goal" setting for such file, these commands will return an empty result.

 

In case of a change in the number of copies of an already existing file, the data will be multiplied or adequately deleted with a delay. It can be verified using the commands described above.

 

Setting the "goal" for a directory is inherited for the new files and directories created within it (it does not change the number of copies of already existing files).

 

The summary of the contents of the whole tree (an enhanced equivalent of du -s, with information specific for MooseFS) can be called up with the command mfsdirinfo:

$ mfsdirinfo /mnt/mfs-test/test/:

inodes: 15

directories: 4

files: 8

chunks: 6

length: 270604

size: 620544

realsize: 1170432

The above summary displays the number of the directories, files, data fragments (chunks) used by the files, as well as the size of the disk's space taken by files in the directory (length - the sum of file sizes, size - with block size taken into account, realsize - total disk space utilization considering all copies of chunks).

 

Setting quarantine time for trash bin

 

A quarantine time of storing a deleted file in a "trash can" can be verified by the mfsgettrashtime command and changed with mfssettrashtime:

 

$ mfsgettrashtime /mnt/mfs-test/test1

/mnt/mfs-test/test1: 604800

$ mfssettrashtime 0 /mnt/mfs-test/test1

/mnt/mfs-test/test1: 0

$ mfsgettrashtime /mnt/mfs-test/test1

/mnt/mfs-test/test1: 0

 

These tools also have recursive option operating on whole directory trees:

 

$ mfsgettrashtime -r /mnt/mfs-test/test2

/mnt/mfs-test/test2:

files with trashtime 0 : 36

directories with trashtime 604800 : 1

$ mfssettrashtime -r 1209600 /mnt/mfs-test/test2

/mnt/mfs-test/test2:

inodes with trashtime changed: 37

inodes with trashtime not changed: 0

inodes with permission denied: 0

$ mfsgettrashtime -r /mnt/mfs-test/test2

/mnt/mfs-test/test2:

files with trashtime 1209600 : 36

directories with trashtime 1209600 : 1

 

Time is given in seconds (useful values: 1 hour is 3600 seconds, 24h - 86400 seconds, 1 week - 604800 seconds). Just as in the case of the number of copies, the storing time set for a directory is inherited for newly created files and directories. The number 0 means that a file after the removal will be deleted immediately and its recovery will not be possible.

 

Removed files may be accessed through a separately mounted MFSMETA file system. In particular it contains directories /trash (containing information about deleted files that are still being stored) and /trash/undel (designed for retrieving files). Only the administrator has access to MFSMETA (user with uid 0, usually root).

 

$ mfssettrashtime 3600 /mnt/mfs-test/test1

/mnt/mfs-test/test1: 3600

$ rm /mnt/mfs-test/test1

$ ls /mnt/mfs-test/test1

ls: /mnt/mfs-test/test1: No such file or directory

 

The name of the file that is still visible in the "trash" directory consists of an 8-digit hexadecimal i-node number and a path to the file relative to the mounting point with characters / replaced with the | character. If such a name exceeds the limits of the operating system (usually 255 characters), the initial part of the path is deleted.

 

The full path of the file in relation to the mounting point can be read or saved by reading or saving this special file:

# ls -l /mnt/mfs-test-meta/trash/*test1

-rw-r--r-- 1 user users 1 2007-08-09 15:23 /mnt/mfs-test-meta/trash/00013BC7|test1

# cat '/mnt/mfs-test-meta/trash/00013BC7|test1'

test1

# echo 'test/test2' > '/mnt/mfs-test-meta/trash/00013BC7|test1'

# cat '/mnt/mfs-test-meta/trash/00013BC7|test1'

test/test2

 

Moving this file to the trash/undel subdirectory causes restoring of the original file in a proper MooseFS file system - at path set in a way described above or the original path (if it was not changed).

# mv /mnt/mfs-test-meta/trash/00013BC7|test1 /mnt/mfs-test-meta/trash/undel/

 

Note: if a new file with the same path already exists, restoring of the file will not succeed.

Similarly, you cannot move the file with different filename.

 

Deleting the file from the "trash can" results in releasing space previously taken up by it (with a delay - the data is deleted asynchronously). In such cases it is impossible to restore the file.

 

It is also possible to change the number of copies or the time of storing files in the "trash can" with mfssetgoal and mfssettrashtime tools (like for the files on the proper MooseFS).

 

Beside the trash and trash/undel directories MFSMETA holds a third directory reserved with files intended for final removal, but still open. These files will be erased and their data will be deleted immediately after the last user closes them. Files in the reserved directory are named the same way as those in trash, but no further operations are possible for these files.

 

Taking snapshots

 

Another characteristic feature of the MooseFS system is the possibility of taking a snapshot of the file or directory tree with the mfsmakesnapshot command:

 

$ mfsmakesnapshot source ... destination

 

(In case of normal file duplication, data of the file can be changed by another process writing to the source file. mfsmakesnapshot prepares a copy of the whole file (or files) in one operation. Furthermore, until modification of any of the files takes place, the copy does not take up any additional space.)

 

After such operation, subsequent writes to the source file do not modify the copy (nor vice versa).

 

Alternatively, file snapshots can be created using mfsappendchunks utility, which works like mfssnapshot known from MooseFS 1.5:

$ mfsappendchunks destination-file source-file ...

When multiple source files are given, their snapshots are added to the same destination file, padding each to chunk boundary (64MB).

 

Additional attributes

 

Additional attributes of file or directory (noowner, noattrcache, noentrycache) can be checked, set or deleted using mfsgeteattr, mfsseteattr and mfsdeleattr utilities, which behave similarly to mfsgetgoal/mfssetgoal or mfsgettrashtime/mfssettrashtime. See mfstools manual page for details.

 

MOOSEFS MAINTENANCE

 

Starting MooseFS cluster

 

The safest way to start MooseFS (avoiding any read or write errors, inaccessible data or similar problems) is to run the following commands in this sequence:

  • start mfsmaster process
  • start all mfschunkserver processes
  • start mfsmetalogger processes (if configured)
  • when all chunkservers get connected to the MooseFS master, the filesystem can be mounted on any number of clients using mfsmount (you can check if all chunkservers are connected by checking master logs or CGI monitor).
 

Stopping MooseFS cluster

 

To safely stop MooseFS:

  • unmount MooseFS on all clients (using the umount command or an equivalent)
  • stop chunkserver processes with the mfschunkserver stop command
  • stop metalogger processes with the mfsmetalogger stop command
  • stop master process with the mfsmaster stop command.
 

Maintenance of MooseFS chunkservers

 

Provided that there are no files with a goal lower than 2 and no under-goal files (what can be checked by mfsgetgoal -r and mfsdirinfo commands), it is possible to stop or restart a single chunkserver at any time. When you need to stop or restart another chunkserver afterwards, be sure that the previous one is connected and there are no under-goal chunks.

 

MooseFS metadata backups

 

There are two general parts of metadata:

  • main metadata file (metadata.mfs, named metadata.mfs.back when the mfsmaster is running), synchronized each hour
  • metadata changelogs (changelog.*.mfs), stored for last N hours (configured by BACK_LOGS setting)

The main metadata file needs regular backups with the frequency depending on how many hourly changelogs are stored. Metadata changelogs should be automatically replicated in real time. Since MooseFS 1.6.5, both tasks are done by mfsmetalogger daemon.

 

MooseFS master recovery

 

In case of mfsmaster crash (due to e.g. host or power failure) last metadata changelog needs to be merged into the main metadata file. It can be done with the mfsmetarestore utility; the simplest way to use it is:

$ mfsmetarestore -a

If master data are stored in location other than the specified during MooseFS compilation, the actual path needs to be specified using the -d option, e.g.:

$ mfsmetarestore -a -d /storage/mfsmaster

 

MooseFS master recovery from a backup

 

In order to restore the master host from a backup:

  • install mfsmaster in normal way
  • configure it using the same settings (e.g. by retrieving mfsmaster.cfg file from the backup)
  • retrieve metadata.mfs.back file from the backup or metalogger host, place it in mfsmaster data directory
  • copy last metadata changelogs from any metalogger running just before master failure into mfsmaster data directory
  • merge metadata changelogs using mfsmetarestore command as specified before - either using mfsmetarestore -a, or by specifying actual file names using non-automatic mfsmetarestore syntax, e.g.

$ mfsmetarestore -m metadata.mfs.back -o metadata.mfs changelog.*.mfs

 

Please also read a mini howto about preparing a fail proof solution in case of outage of the master server. In that document we present a solution using CARP and in which metalogger takes over functionality of the broken master server.