Main Page

Compression

From SOMWiki

Jump to: navigation, search

Contents

Why should I use Data Compression?

Data Compression allows us to compress large, statistically redundant files into smaller files. This effectively allows us to utilize our more expensive resources, such as drive space on the SAN, more efficiently.

For example, if you have a large dataset ('over 1GB in size'), and your usage of this particular dataset is low, perhaps once a quarter, we can save disk space on the cluster by compressing this particular file. You can uncompress this file when you are ready to access it again.

By implementing the use of data compression, we do not need to force the use of disk quotas.

What types of compression software are available for use?

Software compression software exists for every modern operating system. Some operating systems, such as Linux, Solaris, and Windows XP come equipped with the ability to compress software in a particular format.

Software Compression Table
Software Notes Solaris Linux OS X Windows
gzip Standard UNIX utility, excellent compression, good crossplatform support Yes Yes Yes Winzip/Winrar Supports
bzip2 Best overall patent free compression, this is the preferred utility Yes Yes Yes Winrar Supports
pbzip2 This is a parallel version of the preferred utility Yes Yes Yes Winrar Supports
zip PKZIP compression, good for text or recursive directories Yes Yes Yes Yes (Native in Windows XP)
compress Legacy compress, not recommended Yes Yes Yes No
rar Good for windows files, not recommended for UNIX usage Yes No No With Winrar utility
tar Although not a comrpession utility, is the preferred method for archiving directories Yes Yes Yes Winzip/Winrar Supports

How to use bzip2

Bzip2 is the best available method for compressing data. We suggest using bzip2 in conjunction with the tar utility to backup entire directory structures. In some cases, you can experience up to '90%' compression of your file.

How to compress a single file using bzip2

The syntax for compressing a single file is very simple: Note: 'use -9 on all compression, this is the maximum level of compress available'

 bzip2 -9 <filename> 

Example:

bzip2 -9 mydata.txt 

This command will compress the file mydata.txt to mydata.txt.bz2 using the bzip2 utility.

How to compress a single file using pbzip2

The syntax for compressing a single file is very simple: Note: 'use -9 on all compression, this is the maximum level of compress available'

 pbzip2 -9 <filename> 

Example:

pbzip2 -9 mydata.txt 

This command will compress the file mydata.txt to mydata.txt.bz2 using the bzip2 utility.

How to uncompress a single file using bunzip2

The syntax for uncompressing a single file is very simple:

bunzip2 <filename> 

How to compress and archive an entire directory structure using bzip2 and tar

 tar -jcvf <backup file.tar.bz2> <directory structure> 

In the following example, we are going to backup the directory structure crsp_data into the mydata.tar.bz2 .

 tar -jcvf ./mydata.tar.bz2 ./crsp_data

How to uncompress an archive of a directory structure made with bzip2 and tar

 tar -jxvf <backup file.tar.bz2> -C <location> 

Example:

tar -jxvf ./mydata.tar.bz2 -C /home/mydirectory/data 

This will uncompress the file mydata.tar.bz2 and the directory structure contained inside the archive to /home/mydirectory/data

How To compress a file using gcomp (Bzip2 over the grid)

 qsub /usr/local/bin/gcomp <file> 

Example:

 qsub /usr/local/bin/gcomp mydata.csv 

This will offload a parallel bzip2 job to one of the cluster machines. This saves computing resources by choosing the least busy node. The parallel version of bzip2 will use all availible processors in the machine to help speed up compression.



How to use gzip

Gzip is an excellent overall compression utility. We suggest using gzip in conjunction with the tar utility to backup entire directory structures. In some cases, you can experience up to '70%' compression of your file.

How to compress a single file using gzip

The syntax for backing up a single file is very simple: Note: 'use -9 on all compression, this is the maximum level of compress available'

 gzip -9 <filename> 

Example:

gzip -9 mydata.txt 

This command will compress the file mydata.txt to mydata.txt.gz using the gzip utility.

How to uncompress a single file using gunzip

The syntax for uncompressing a single file is very simple:

gunzip <filename> 

How to compress and archive an entire directory structure using gzip and tar

 tar -zcvf <backup file.tar.gz> <directory structure> 

In the following example, we are going to backup the directory structure crsp_data into the mydata.tar.gz .

 tar -zcvf ./mydata.tar.gz ./crsp_data

How to uncompress an archive of a directory structure made with gzip and tar

tar -zxvf <backup file.tar.gz> -C <location> 

Example:

tar -zxvf ./mydata.tar.gz -C /home/mydirectory/data 

This will uncompress the file mydata.tar.gz and the directory structure contained inside the archive to /home/mydirectory/data

Personal tools
Views