Cromfs: Compressed ROM filesystem for Linux (user-space)
0. Contents
This is the documentation of cromfs-1.5.10.2.
1. Purpose

Cromfs is a compressed read-only filesystem for Linux.
It uses the LZMA compression algorithm from
7-zip,
and a powerful block merging mechanism, that is especially efficient
with gigabytes of large files having lots of redundancy.
The primary design goal of cromfs is compression power.
It is much slower than its peers, and uses more RAM.
If all you care about is "powerful compression"
and "random file access", then you will be happy with cromfs.
The creation of cromfs was inspired
from
Squashfs
and
Cramfs.
The downloading section is at the bottom
of this page.
2. News
3. Overview
- Data, inodes, directories and block lists are stored compressed
- Files are divided into fragments and those fragments are stored as
offsets to solid blocks (fblocks) containing data, meaning that parts
of different files are compressed together for effective compression,
and identical fragments are compressed only once.
- Duplicate inodes, files and even duplicate file portions are detected
and stored only once without extra overhead
- Most of inode types recognized by Linux are supported (see comparisons).
- The LZMA compression is used
for fblocks. In the general case, LZMA compresses better than gzip and bzip2.
- Being a filesystem, the files on a cromfs volume can be
randomly accessed in arbitrary order; by all the means one
would expect, including memorymapping.
- Works on 64-bit and 32-bit systems.
See
the documentation of the cromfs format for technical details
(also included in the source package as doc/FORMAT).
4. Limitations
- Filesystem is write-once, read-only. It is not possible to append
to a previously-created filesystem, nor it is to mount it read-write.
- Max filesize: 264 bytes (16777216 TB), but 256 TB with default settings.
- Max number of files in a directory: 230 (smaller if filenames are longer, but still more than 100000 in almost all cases)
- Max number of inodes (all files, dirs etc combined): 260, but depends on file sizes
- Max filesystem size: 264 bytes (16777216 TB)
- There are no "." or ".." entries in directories. This does not matter in Linux.
- cromfs and mkcromfs are slower than their peers.
- The cromfs-driver consumes a lot of memory. It is not
suitable for very size-constrained systems.
- Maximum filename length: 4294967295 bytes
- Maximum symlink length: 65535 bytes
- Being an user-space filesystem, it might not be suitable for
root filesystems of rescue, tiny-Linux and installation disks.
(Facts needed.)
- For device inodes, hardlink count of 1 is assumed.
(This has no effect to compression efficiency.)
5. Development status
Development status: Stable. (Really: progressive.)
(Fully functional release exists, but is updated from time to time.)
Cromfs has been in beta stage for over a year, during which time
very little bugs have been reported, and no known bugs remain at
this time.
It does not make sense to keep it as "beta" indefinitely,
but since there is never going to be a "final" version —
new versions may always be released — it is now labeled
as "progressive".
In practice, the author trusts it works as advertised, but as per GPL policy,
there is NO WARRANTY whatsoever. The entire risk to the quality and performance
of the program suite is with you.
#include "GNU gdb/show warranty"
6. Comparing to other filesystems
This is all very biased probably, hypothetical,
and by no means a scientific study, but here goes:
Legend:
Good,
Bad,
Partial
Feature |
Cromfs |
Cramfs (1.1) |
Squashfs (4.2) |
Cloop |
Compression unit |
adjustable arbitrarily (2 MB default) |
adjustable, must be power of 2 (4 kB default) |
adjustable, must be power of 2 (1 MB max) |
adjustable in 512-byte units (1 MB max) |
Files are compressed (up to block size limit) |
Together |
Individually |
Individually, except for fragments |
Together |
Maximum file size |
16 EB (264 bytes) (theoretical; actual limit depends on settings) |
16 MB (224 bytes) |
16 EB (264 bytes) (4 GB before v3.0) |
Depends on slave filesystem |
Maximum filesystem size |
16 EB (264 bytes) |
272 MB |
16 EB (264 bytes) (4 GB before v3.0) |
16 EB (264 bytes) |
Duplicate whole file detection |
Yes |
No |
Yes |
No |
Hardlinks detected and saved |
Yes |
Yes |
Yes, since v3.0 |
depends on slave filesystem |
Near-identical file detection |
Yes (identical blocks) |
No |
No |
No |
Compression method |
LZMA |
gzip (patches exist to use LZMA) |
gzip, LZO (since 4.1), XZ (LZMA2, since 4.2) |
gzip or LZMA |
Ownerships |
uid,gid (since version 1.1.2)
| uid,gid (but gid truncated to 8 bits) |
uid,gid |
Depends on slave filesystem |
Timestamps |
mtime only |
None |
mtime only |
Depends on slave filesystem |
Endianess-safety |
Theoretically safe (untested on bigendian) |
Safe, but not exchangeable |
Safe, but not exchangeable |
Depends on slave filesystem |
Linux kernel driver |
No |
Yes |
Yes |
Yes |
Userspace driver |
Yes (fuse) |
No |
An extraction tool (unsquashfs) |
Yes (third-party, using fuse).
Cloop itself provides an extraction tool (extract_compressed_fs),
but cannot be used to extract a single file. |
Windows driver |
No |
No |
No |
No |
Appending to a previously created filesystem |
No |
No |
Yes |
No (the slave filesystem can
be decompressed, modified, and compressed
again, but in a sense, so can every other
of these.) |
Mounting as read-write |
No |
No |
No |
No |
Supported inode types |
all |
all |
all |
Depends on slave filesystem |
Fragmentation (good for compression, bad for access speed) |
Depends on compression settings
| None |
File tails only |
Depends on slave filesystem |
Holes (aka. sparse files); storage optimization
of blocks which consist entirely of nul bytes |
Any two identical blocks are merged and stored only once.
| Supported |
Supported
| Depends on slave filesystem |
Padding (partially filled sectors, wastes space) |
No |
Unknown |
Mostly not |
Depends on slave filesystem, usually yes |
Extended attributes |
No |
Unknown |
Unknown |
Unknown, may depend on slave filesystem |
Note: If you notice that this table contains wrong information,
please contact me telling what it is and I will change it.
Note: cromfs now saves the uid and gid in the filesystem. However,
when the uid is 0 (root), the cromfs-driver returns the uid of the
user who mounted the filesystem, instead of root. Similarly for gid.
This is both for backward compatibility and for security.
If you mount as root, this behavior has no effect.
6.1. Compression tests
Note: I use the -e and -r options in all of these mkcromfs tests
to avoid unnecessary decompression+recompression steps, in order
to speed up the filesystem generation. This has no effect in
compression ratio.
In this table,
k equals 1024 bytes (2
10)
and
M equals 1048576 bytes (2
20).
Note: Again, these tests have not been peer-verified so it is not
a real scientific study. But I attest that these are the results I got.
Item |
10783 NES ROMs (2523 MB) |
Firefox 2.0.0.5 source code (233 MB)
(MD5sum 5a6ca3e4ac3ebc335d473cd3f682a916)
|
Damn small Linux liveCD (113 MB)
(size taken from "du -c" output in the uncompressed filesystem) |
Cromfs |
mkcromfs -s65536 -c16 -a… -b… -f…
With 16M fblocks, 2k blocks: 198,553,574 bytes (v1.4.1)
With 16M fblocks, 1k blocks, 194,813,427 bytes (v1.4.1)
With 16M fblocks, ¼k blocks: 187,575,926 bytes (v1.5.0)
|
mkcromfs
With default options: 33,866,164 bytes (v1.5.2)
(Peak memory use (RSS): 97 MB (mostly comprising of memory-mapped files)
|
mkcromfs -f1048576
With 64k blocks (-b65536), 39,778,030 bytes (v1.2.0)
With 16k blocks (-b16384), 39,718,882 bytes (v1.2.0)
With 1k blocks (-b1024), 40,141,729 bytes (v1.2.0)
|
Cramfs v1.1 |
mkcramfs -b65536
dies prematurely, "filesystem too big" |
mkcramfs
with 2M blocks (-b2097152), 65,011,712 bytes
with 64k blocks (-b65536), 64,618,496 bytes
with 4k blocks (-b4096), 77,340,672 bytes
|
mkcramfs -b65536
51,445,760 bytes
|
Squashfs v3.2 |
mksquashfs -b65536
(using an optimized sort file) 1,185,546,240 bytes |
mksquashfs
49,139,712 bytes |
mksquashfs -b65536
50,028,544 bytes
|
Cloop v2.05~20060829 |
create_compressed_fs
(using an iso9660 image created with mkisofs -R)
using 7zip, 1M blocks (-B1048576 -t2 -L-1): 1,136,789,006 bytes
|
create_compressed_fs
(using an iso9660 image created with mkisofs -RJ)
using 7zip, 1M blocks (-B1048576 -L-1): 46,726,041 bytes
(1 MB is the maximum block size in cloop)
|
create_compressed_fs
(using an iso9660 image)
using 7zip, 1M blocks (-B1048576 -L-1): 48,328,580 bytes
using zlib, 64k blocks (-B65536 -L9): 50,641,093 bytes
|
7-zip (p7zip) v4.30 (an archive, not a filesystem) |
7za -mx9 -ma=2 a
with 32M blocks (-md=32m): 235,037,017 bytes
with 128M blocks (-md=128m): 222,523,590 bytes
with 256M blocks (-md=256m): 212,533,778 bytes
| 7za -mx9 -ma=2 -md=256m a
29,079,247 bytes
(Peak memory use: 2545 MiB) |
7za -mx9 -ma2 a
37,205,238 bytes
|
An explanation why mkcromfs beats 7-zip in the NES ROM packing test:
7-zip packs all the files together as one stream. The maximum dictionary
size in 32-bit mode is 256 MB.
(Note: The default for "maximum compression" is 32 MB.)
When 256 MB of data has been packed and more data comes in,
similarities between the first megabytes of data and the latest data are
not utilized. For example, Mega Man and Rockman are two
almost identical versions of the same image, but because there's more
than 400 MB of files in between of those when they are processed in
alphabetical order, 7-zip does not see that they are similar, and will
compress each one separately.
7-zip's chances could be improved by sorting the files so that it will
process similar images sequentially. It already attempts to accomplish
this by sorting the files by filename extension and filename, but it
is not always the optimal way, as shown here.
mkcromfs however keeps track of all blocks it has encoded, and will remember
similarities no matter how long ago they were added to the archive.
(Click here to read
how it does that.)
This is why it outperforms 7-zip in this case, even
when it only used 16 MB fblocks.
In the liveCD compressing test, mkcromfs does not beat 7-zip because this
advantage is too minor to overcome the overhead needed to provide random
access to the filesystem. It still beats cloop, squashfs and cramfs though.
6.2. Speed tests
Speed testing hasn't been done yet. It is difficult to test the speed,
because it depends on factors such as cache (with compressed filesystems,
decompression consumes CPU power but usually only needs to be done once)
and block size (bigger blocks need more time to decompress).
However, in the general case, it is quite safe to assume
that mkcromfs is the
slowest of all. The same goes
for resource testing (RAM).
cromfs-driver requires an amount of RAM proportional to a few factors.
It can be approximated with this formula:
Max_RAM_usage = FBLOCK_CACHE_MAX_SIZE × fblock_size + READDIR_CACHE_MAX_SIZE × 60k + 8 × num_blocks
Where
- fblock_size is the value of "--fblock" used when the filesystem was created
- FBLOCK_CACHE_MAX_SIZE is a constant defined in cromfs.cc (default: 10)
- READDIR_CACHE_MAX_SIZE is a constant defined in cromfs.cc (default: 3)
- 60k is an estimate of a large directory size (2000 files with average name length of 10-20 letters)
- num_blocks is the number of block structures in the filesystem
(maximum size is
ceil(total_size_of_files / block_size)
,
but it may be smaller.)
For example, for a 500 MB archive with 16 kB blocks and 1 MB fblocks,
the memory usage would be around 10.2 MB.
7. Getting started
- Install the development requirements: make, gcc-c++ and fuse
- Remember that for fuse to work, the kernel must also contain the fuse support.
Do "modprobe fuse", and check if you have "/dev/fuse" and check if it works.
- Configure the source code:
$ ./configure
It will automatically determine your software environment
(mainly, the features supported by your compiler).
- Build the programs:
$ make
This builds the programs "cromfs-driver", "cromfs-driver-static",
"util/mkcromfs", "util/cvcromfs" and "util/unmkcromfs".
- Create a sample filesystem:
$ util/mkcromfs . sample.cromfs
- Mount the sample filesystem:
$ mkdir sample
$ ./cromfs-driver sample.cromfs sample
- Observe the sample filesystem:
$ cd sample
$ du
$ ls -al
- Unmounting the filesystem:
$ cd ..
$ fusermount -u sample
8. Tips
8.0.1. To improve compression
To improve the compression, try these tips:
- Do not change --lzmafastbytes. The default value is 273,
which is the maximum possible.
- Specify values for --lzmabits , such as --lzmabits 2,0,3 .
This will make the final compression phase considerably
faster.
- Adjust the block size (--bsize) in mkcromfs. If your files
have a lot identical content, aligned at a certain boundary,
use that boundary as the block size value. If you are uncertain,
use a small value (500-5000) rather than a bigger value (20000-400000).
Too small values will however make inodes large, so keep it sane.
Note: The value does not need to be a power of two.
- Adjust the fblock size (--fsize) in mkcromfs. Larger values
cause almost always better compression. However, large values
also increase memory consumption when the filesystem is mounted,
so keep it sane. If uncertain, use the default value (2097152).
Note: The value does not need to be a power of two.
- Adjust the --autoindexperiod option (-A). A smaller value will
increase the chances of mkcromfs finding an identical block
from something it already processed (if your data has that
opportunity). Finding that two blocks are identical always
means better compression.
- Sort your files. Files which have similar or partially
identical content should be processed right after one other.
- Adjust the --bruteforcelimit option (-c). Larger values will require
mkcromfs to check more fblocks for each block it encodes (making the
encoding much slower), in the hope it improves compression.
Basically, --bruteforcelimit is a way to virtually multiply
the --fsize (thus improving compression) by an integer factor
without increasing the memory or CPU usage of cromfs-driver.
Using it is recommended, unless you want mkcromfs to be fast.
The upper limit on meaningful values for the -c option is the
number of fblocks on the resulting filesystem.
If uncertain, try something like the value of 33554432 / fsize
.
For 2 MB fblocks, that would make -c16.
- You can approximate how many blocks your filesystem will
have by this formula:
total_amount_of_unique_data / bsize
.
- If the value is less than 65536, use the
--16bitblocknums (-2) option. It will theoretically save
(number_of_blocks*2) bytes of uncompressed room by making
inodes smaller.
- If the value is less than 16777216, use the
--24bitblocknums (-3) option. It will theoretically save
(number_of_blocks) bytes of uncompressed room by making
inodes smaller.
Due to LZMA compression, the saving in file size might become
neglible, but it will make cromfs-driver slightly faster,
and there are no speed penalties.
- Adjust the --lzmabits values. This affects the compression
phase of mkcromfs (the last phase after blockifying)
- Use "--lzmabits full" if you have
absolutely no regard for compression time — it will try each
and every combination of pb, lp and lc and choose the one that results
in best LZMA compression — for every compressed item separately.
It is 225 times slower than the normal way.
- Use "--lzmabits auto" if you want mkcromfs to use a heuristic
algorithm to choose the parameters based on a few experiments.
It is 27…200 times slower than the normal way,
depending on the data. This is enabled by default. Specifying
"full" or giving the values manually overrides it.
8.0.2. To improve mkcromfs speed
To improve the filesystem generation speed, try these tips:
- Use the --decompresslookups option (-e), if you have the
diskspace to spare.
- Use a large value for the --randomcompressperiod option,
for example -r100000. This together with -e will significantly
improve the speed of mkcromfs, on the cost of temporary disk
space usage. A small value causes mkcromfs to randomly compress
one of the temporary fblocks more often. It has no effect to
the compression ratio of the resulting filesystem.
- Use the TEMP environment variable to control where the temp
files are written. Example: TEMP=~/cromfs-temp ./mkcromfs …
- Specify a low value for --lzmafastbytes in the mkcromfs command
line. This will cause LZMA to consume less memory and be faster,
at the cost of compression power. The default value is 273 (maximum).
The minimum possible value is 5.
- Use larger block size (--bsize). Smaller blocks mean more blocks
which means more work. Larger blocks are less work.
- Do not use the --bruteforcelimit option (-c). The default value 0
means that the candidate fblock will be selected straightforwardly.
- If you have a multicore system, add the --threads option.
Select --threads 2 if you have a dual core system, for example.
You can also use a larger value than the number of cores, but
same guidelines apply as with the -j in GNU make. Currently
this option does not affect compression power, so it is
recommended to use it.
- Use "--lzmabits 2,0,3" (or other values of your choice) to
make LZMA compression about 27 times faster, with a slight
cost of compression power. The default option is "auto",
which tests a number of different lzmabits values to end
up with hopefully optimal compression.
8.0.3. To control the memory usage
To control the memory usage, use these tips:
- Adjust the fblock size (--fsize). The memory used by cromfs-driver
is directly proportional to the size of your fblocks. It keeps at
most 10 fblocks decompressed in the RAM at a time. If your fblocks
are 4 MB in size, it will use 40 MB at max.
- In mkcromfs, adjust the --autoindexperiod option (-A). This will
not have effect on the memory usage of cromfs-driver, but it will
control the memory usage of mkcromfs. If you have lots of RAM, you
should use smaller --autoindexperiod (because it will improve the chances
of getting better compression results), and use bigger if you have less RAM.
- Find the CACHE_MAX_SIZE settings in cromfs.cc and edit them. This will
require recompiling the source. (In future, this should be made a command
line option for cromfs-driver.)
- In mkcromfs, adjust the block size (--bsize). The RAM usage of mkcromfs
is directly proportional to the number of blocks (and the filesystem size),
so smaller blocks require more memory and larger require less.
- Adjust the --blockindexmethod option. Different values of this option
have different effect on the virtual memory use of mkcromfs (it does
not affect cromfs-driver, though).
Use "--blockindexmethod none" and "-A0" if you want the smallest possible
memory usage for your selected block size. It has an impact on the compression
power, but you can compensate it by using a large value for the --bruteforcelimit
option instead, if you don't mind longer runtime.
8.0.4. To control the filesystem speed
To control the filesystem speed, use these tips:
- The speed of the underlying storage affects.
- The bigger your fblocks (--fsize), the bigger the latencies are.
cromfs-driver caches the decompressed fblocks, but opening a non-cached
fblock requires decompressing it entirely, which will block the user
process for that period of time.
- The smaller your blocks (--bsize), the bigger the latencies are, because
there will be more steps to process for handling the same amount of data.
- Use the most powerful compiler and compiler settings available
for building cromfs-driver. This helps the decompression and cache lookups.
- Use fast hardware…
8.0.5. Using cromfs with automount
Since version 1.3.0, you can use cromfs in conjunction with the
automount (autofs) feature present in Linux kernel. This allows
you to mount cromfs volumes automatically on demand, and umount
them when they are not used, conserving free memory.
This line in your autofs file (such as auto.misc) will do the trick
(assuming the path you want is "books", and your volume
is located at "/home/myself/books.cromfs"):
books -fstype=fuse,ro,allow_other :/usr/local/bin/cromfs-driver\#/home/myself/books.cromfs
9. Understanding the concepts
Skip over this section if you don't think yourself as technically inclined.
cromfs workings are explained in a nutshell
here.
9.0.1. Inode
Every object in a filesystem (from user's side) is an "inode".
This includes at least symlinks, directories, files, fifos and device entries.
The inode contains the file attributes and its contents, but
not its name.
(The name is contained in a directory listing, along with the reference to the inode.)
This is the traditional way in *nix systems.
When a file is "hardlinked" into multiple locations in the filesystem,
the inode is not copied. The inode number just is listed in multiple
directories.
A symlink however, is an entirely new inode unrelated to
the file it points to.
The file attributes and the file contents are stored separately.
In cromfs, the inode contains an array of
block numbers, which are necessary in finding the actual contents of the file.
9.0.2. Block
The contents of every file (denoted by the inode) are divided into "blocks".
The size of this block is controlled by the --bsize commandline parameter.
For example, if your file is 10000 bytes in size, and your bsize is 4000,
the file contains three blocks: 4000 + 4000 + 2000 bytes.
The inode contains thus three
block numbers,
which refer to entries in the block table.
Only regular files, symlinks and directories have "contents" that need
storing. Device entries for example, do not have associated contents.
The contents of a directory is a list of file names and inode numbers.
Every time mkcromfs stores a new block, a new block number is generated
to denote that particular block (this number is stored in the inode),
and a new
data locator is stored
to describe where the block is found (the locator is stored in the block table).
If mkcromfs reused a previously generated data locator,
only the block number needs to be stored.
9.0.3. Fblock
Fblock is a storage unit in a cromfs filesystem.
It is the physical container of block data for multiple files.
When mkcromfs creates a new filesystem, it splits each file into blocks
(see above), and for each of those blocks, it determines which fblock
they go to. The maximum fblock size is mandated by the --fsize commandline
parameter.
Each fblock is compressed separately, so a few big fblocks compresses better
than many small fblocks.
Cromfs automatically creates as many fblocks as is needed to store the
contents of the entire filesystem being created.
A fblock is merely a storage.
Regardless of the sizes of the blocks and fblocks, the fblock may
contain any number of blocks, from 1 to upwards (no upper limit).
It is beneficial for blocks to overlap, and this is an important
source of the power of cromfs.
The working principle behind fblocks is: What is the shortest
string that can contain all these substrings?
9.0.4. Block number and block table
The filesystem contains a structure called "blktab" (block table),
which is a list of
data locators.
This list is indexed by a block number.
Each locator describes, where to find the particular
block denoted by this block number.
At the end of the filesystem creation process, the blktab is compressed
and becomes "blkdata" before being written into the filesystem.
(These names are only useful when referencing the
filesystem format
documentation; they are not found in the filesystem itself.
9.0.5. Data locator
A data locator tells cromfs, where to find the contents of this particular block.
It is composed of an
fblock number and an offset
into that fblock.
These locators are stored in the global
block table, as explained above.
Multiple files may be sharing same data locators, and multiple data
locators may be pointing to same, partially overlapping data.
9.0.6. Block indexing (mkcromfs only)
When mkcromfs stores blocks, it remembers where it stored them, so that
if it later finds an identical block in another file (or the same file),
it won't need to search fblocks again to find a best placement.
The index is a map of block hashes to data locators and block numbers.
The --autoindexperiod (-A) setting can be used to extend this mechanism, that
in addition to the blocks it has already encoded, it will memorize more
locations in those fblocks — create "just in case" data locators
for future use but not actually save them in the block table, unless
they're utilized later.
This helps compression when the number of fblocks searched (--bruteforcelimit)
is low compared to the number of fblocks generated, at the cost of memory
consumed by mkcromfs, and has also potential to make mkcromfs faster
(but also slower).
9.0.7. Random compress period (mkcromfs only)
When mkcromfs runs, it generates a temporary file for each fblock of the
resulting filesystem. If your resulting filesystem is large, those fblocks
will take even more of space, a lot anyway.
To save disk space, mkcromfs compresses those fblocks when they are not
accessed. However, if it needs to access them again (to search the contents
for a match), it will need to decompress them first.
This compressing+decompressing may consume lots of time. It does not help
the size of the resulting filesystem; it only saves some temporary disk space.
If you are not concerned about temporary disk space, you should give
the --randomcompressperiod option a large number (such as 10000) to
prevent it from needlessly decompressing+compressing the fblocks
over and over again. This will improve the speed of mkcromfs.
The --decompresslookups option is related. If you use the
--randomcompressperiod option, you should also enable --decompresslookups.
By the way, the temporary files are written into wherever
your
TEMP environment variable points to.
TMP is also recognized.
9.0.8. Where are the inodes stored then?
All the inodes of the filesystem are also stored in a file, together.
That file is packed like any one other file, split into blocks and
scattered into fblocks. That data locator list of that file, is stored
in a special inode called "inotab", but it is not seen in any
directory. The "inotab" has its own place in the cromfs file.
10. Using cromfs in bootdisks and tiny Linux distributions
Cromfs can be used in bootdisks and tiny Linux distributions only
by starting the cromfs-driver from a ramdisk (initrd), and then
pivot_rooting into the mounted filesystem (but not before the
filesystem has been initialized; there is a delay of a few seconds).
Theoretical requirements to use cromfs in the root filesystem:
- Cromfs-driver should probably be statically linked
(the Makefile automatically builds a static version
since version 1.2.2).
- An initrd, that contains the cromfs-driver program
- Fuse driver in the kernel (it may be loaded from the initrd).
- Constructing an
unionfs
mount from a ramdisk
and the cromfs mountpoint to form a writable root
Do not use cromfs in machines that are low on RAM!
11. Other applications of cromfs
The compression algorithm in cromfs can be used to determine how similar
some files are to each others.
This is an example output of the following command:
$ unmkcromfs --simgraph fs.cromfs '*.qh' > result.xml
from a sample filesystem:
<?xml version="1.0" encoding="UTF-8"?>
<simgraph>
<volume>
<total_size>64016101</total_size>
<num_inodes>7</num_inodes>
<num_files>307</num_files>
</volume>
<inodes>
<inode id="5595"><file>45/qb5/ir/basewc.qh</file></inode>
<inode id="5775"><file>45/qb5/ir/edit.qh</file></inode>
<inode id="5990"><file>45/qb5/ir/help.qh</file></inode>
<inode id="6220"><file>45/qb5/ir/oemwc.qh</file></inode>
<inode id="6426"><file>45/qb5/ir/qbasic.qh</file></inode>
<inode id="18833"><file>c6ers/newcmds/toolib/doc/contents.qh</file></inode>
<inode id="19457"><file>c6ers/newcmds/toolib/doc/index.qh</file></inode>
</inodes>
<matches>
<match inode1="5595" inode2="5990"><bytes>396082</bytes><ratio>0.5565442944</ratio></match>
<match inode1="5595" inode2="6220"><bytes>456491</bytes><ratio>0.6414264256</ratio></match>
<match inode1="5990" inode2="6220"><bytes>480031</bytes><ratio>0.6732618693</ratio></match>
</matches>
</simgraph>
It reads a cromfs volume generated earlier, and outputs statistics of it.
Such statistics can be useful in refining further compression, or just
finding useful information regarding the redundancy of the data set.
It follows this DTD:
<!ENTITY % INTEGER "#PCDATA">
<!ENTITY % REAL "#PCDATA">
<!ENTITY % int "CDATA">
<!ELEMENT simgraph (volume, inodes, matches)>
<!ELEMENT volume (total_size, num_inodes, num_files)>
<!ELEMENT total_size (%INTEGER;)>
<!ELEMENT num_inodes (%INTEGER;)>
<!ELEMENT num_files (%INTEGER;)>
<!ELEMENT inodes (inode*)>
<!ELEMENT inode (file+)>
<!ATTLIST inode id %int; #REQUIRED>
<!ELEMENT file (#PCDATA)>
<!ELEMENT matches (match*)>
<!ELEMENT match (bytes, ratio)>
<!ATTLIST match inode1 %int; #REQUIRED>
<!ATTLIST match inode2 %int; #REQUIRED>
<!ELEMENT bytes (%INTEGER;)>
<!ELEMENT ratio (%REAL;)>
Once you have generated the file system, running the
--simgraph query is
relatively a cheap operation (but still O(n
2) for the number of files);
it involves analyzing the structures created by mkcromfs, and does not
require any search on the actual file contents. However, it can only report as
fine-grained similarity information as were the options in the generation of
the filesystem (level of compression).
12. Copying and contributing
cromfs has been written by Joel Yliluoma, a.k.a.
Bisqwit,
and is distributed under the terms of the
General Public License
version 3 (GPL3).
The LZMA code from the LZMA SDK is in public domain.
The LZO code from liblzo2.03 embedded within is licensed
under GPL version 2 or later.
Patches and other related material can be submitted to the
author
by e-mail at:
e@oe@fihJoel3p YlineRdej@ieluomssewy.pa <bia.osqwikm6Pjxt@ikeb7i.fi>
The author also wishes to hear if you use cromfs, and for what you
use it and what you think of it.
You can discuss CROMFS at
Freenode,
on
#cromfs.
12.1. Contribution wishes
The author wishes for the following things to be done
to this package.
- Topic: Mature enough to be included in distributions.
- Manual pages of each utility (hopefully somehow autogenerated
so that they won't be useless when new options are added)
- Improve the configure script to make it cope better
with different Fuse API versions
and different compiler versions
- Install and uninstall rules in Makefile
- Topic: Increasing useability
- A proof of concept example of utilizing cromfs
in a root filesystem (with initramfs)
- Add appending support (theoretically doable, just not very fast)
- Add threading in cromfs-driver.
Needs write-locks in fblock_cache and readdir_cache.
Possibly in BWT too.
Also blktab and fblktab if those are being changed.
- Topic: Documentation
- Graphical illustration on the filesystem structure
(fs consists of fblocks, and files are split in blocks
which are actually indexes to various fblocks)
- Document the modular structure of the source code
- Topic: Portability
- Topic: Increasing compression power
- A fast and powerful approximation of the
shortest common superstring algorithm
is needed in mkcromfs.
Input description: A set of strings S1, …, Sn.
Problem description: What is the shortest string S'
such that for
each Si, 1≤i≤n, the string Si appears as a
substring of S'?
For example, for input
["digital","organ","tall","ant"],
it would produce "organtdigitall" or "digitallorgant".
Note: This problem seems to reduce into an Asymmetric Travelling
Salesman Problem, which is NP-hard or NP-complete.
The task here is to find a good approximation
that doesn't consume a lot of resources.
13. Requirements
- GNU make and gcc-c++ are required to recompile the source code.
- The filesystem works under the Fuse
user-space filesystem framework. You need to install both the Fuse kernel
module and the userspace programs before mounting Cromfs volumes.
You need Fuse version 2.5.2 or newer.
- liblzo2-dev is recommended on i386 platforms.
If it is missing, mkcromfs will use a version shipped in the package.
14. Links
15. Downloading
The official home page of cromfs
is at
http://iki.fi/bisqwit/source/cromfs.html.
Check there for new versions.
Additionally, the most recent source code (bleeding edge) for cromfs can also be downloaded by cloning the Git repository by:
Generated from
progdesc.php (last updated: Wed, 08 Jan 2014 07:36:49 +0200)
with docmaker.php (last updated: Wed, 08 Jan 2014 07:36:49 +0200)
at Wed, 08 Jan 2014 07:36:49 +0200