SUMMARY: /usr/ucb/compress - safe to use?

From: Ove Hansen (hansen@SCR.SLB.COM)
Date: Wed Jun 17 1992 - 12:22:24 CDT


Firstly, thanks to everyone who answered my question whether /usr/ucb/compress
in the SunOS distribution is safe to use, and which compression algorithm is
used. I must be blind - *of course* my manual page mentions that it's called the
`Lempel-Ziv' algorithm *as well as* the article by Welch. A lot of you asked for
a summary, here it is:

Many of you said you had never had problems with `compress', used it daily, and
didn't think there was anything wrong with it. The proof is in the pudding, as
they say, and is included at the end. If you use quotas (and possibly if the
file system fills up) this is what you risk getting. The first part of the test
is using a modified `compress' code (as below), the second part uses the program
supplied by Sun, `/usr/ucb/compress'.

The solution to me was to get the source code from the net, and do the following
fix to it, suggested by dl2n+@andrew.cmu.edu (that it works can be seen from
the included test):

>> In the function copystat() it [compress.c] will have code looking like
>>
>> fclose(stdout);
>>
>> which doesn't check for error. This [should be] changed to
>>
>> if (fclose(stdout) == EOF) {
>> perror(ofname);
>> unlink(ofname);
>> return;
>> }
>>
>> which reports error, destroys the runt compressed file and returns
>> before any damage can be done.

The modifications Sun have made to their version of `compress' are the
following, I'm not really sure what by using the source from the net losing
(1) and (2) means:

>> The only differences between the "compress" that comes
>> with SunOS 4.1.2, and the one that comes with 4.3-tahoe, are:
>> 1) signal handlers declared to return "void" rather than "int" in SunOS
>> version (matching the new-style definition of signal handlers);
>> 2) SIGSEGV caught slightly later;
>> 3) a little less eager to remove the *output* file if it gets a write
>> error or gets interrupted.
>> The compression algorithm is identical to the one in 4.3-tahoe.

The compression algorithm itself is 100% safe to use, as blymn@baobab.awadi.com
and many others have said:

>> The algorithm used is the Lempel-Ziv(-Welch?) compression algorithm,
>> it is a true lossless compression.

A lot of you suggested that the files be moved to /tmp, where there of course
are no disk quotas, unfortunately there is no guarantee that /tmp will not fill
up, or that people forget to move the files, or manage to bypass any scripts I
would have to write to do this, in which case the problem will reoccur.

Others suggested `drop quotas' - I tried once on a couple of disks on my central
servers - that was just the start of a lot of problems - never again! (On local
disks used by just a few people, I don't bother to use quotas.)

I was not aware of `pack' (thanks to all those of you who made me) which is a
part of the Sun System V option. `pack' seems to work fine with quotas, if they
are exceeded it gives a warning `write error - file unchanged'. Unfortunately
it's not as efficient as `compress' for any files I have tried, and doesn't
accept filenames longer than 12 characters!

chrome# pack muir.im8
pack: muir.im8: 80.5% Compression
chrome# compress -v muir.im8
muir.im8: Compression: 96.94% -- replaced with muir.im8.Z

[136]viridian% pack dbreport.all
pack: dbreport.all: 51.3% Compression
[139]viridian% pack dbreport.allx
pack: dbreport.allx: file name too long

`zip' and `zoo' have been recommended to me by a lot of people, `zip' is
supposedly even more efficient than `compress', and I will have a closer look at
them later. I did a search on `archie' for them, if anyone wants to know an ftp
site for `compress', `zip' or `zoo', e-mail me. No other compression software
was mentioned.

Again, thanks to everyone who replied,
----------------------------------------------------------------------------
Ove Hansen e-mail : hansen@scr.slb.com
Schlumberger Cambridge Research Tel/fax: 0223-325246 / 0223-315486
P.O.Box 153, Cambridge CB3 0HG, England (International prefix for UK: 44)
============================================================================
[151]viridian% quota -v
There are no quotas on this system
Disk quotas for hansen (uid 58):
Filesystem usage quota limit timeleft files quota limit timeleft
/tmp_mnt/home/chrome1
               42974 42000 43000 7.0 days 1072 0 0

[152]viridian% pwd
/tmp_mnt/home/chrome1/hansen/compresstest

[153]viridian% ls -l 2* <-- The original file is
-rw-r--r-- 1 hansen 178045 Jun 17 09:20 2 called `2', note size.

[154]viridian% ./compress -v 2 <-- Using the MODIFIED
2: Compression: 83.90%2.Z: Disc quota exceeded `compress' code...

[155]viridian% ls -l 2* <-- which does not replace
-rw-r--r-- 1 hansen 178045 Jun 17 09:20 2 the original file.

[156]viridian% /usr/ucb/compress -v 2 <-- but, /usr/ucb/compress
2: Compression: 83.90% -- replaced with 2.Z in the SunOS distribution
                                                       creates a .Z file...

[157]viridian% ls -l 2* <-- and replaces it with the
-rw-r--r-- 1 hansen 24576 Jun 17 09:20 2.Z original. Seems OK,
                                                       doesn't it?

<< Here the quotas were changed on the server so that they wouldn't interfere >>

[158]viridian% /usr/ucb/uncompress -v 2 <-- Now let's uncompress `2'
2.Z: -- replaced with 2 because we need it!

[159]viridian% ls -l 2* <-- And it's truncated, the
-rw-r--r-- 1 hansen 154115 Jun 17 09:20 2 user furious, and the
                                                       system administrator
                                                       loses more sleep while
                                                       thinking of a fix.
 



This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:06:43 CDT