SUMMARY: Need advice from a FORMAT expert

From: Peter Steele (peter@dragon.acadiau.ca)
Date: Fri Feb 05 1993 - 08:41:52 CST


My original question was a bit lengthy, but here's the crux of
what I asked:

> My question is this: What advantage/disadvantage does one
> disk geometry have over another ...
> How does the disk's physical characteristics come into play...

I had *many* replies, most (though not all) saying that the geometry
given to Sun's format command is largely an illusion when it comes to
scsi disks. The Sun driver software just converts the cyl/head/sector
number into a block number and then asks the drive to retrieve that
block. It doesn't send it any of the other stuff--that's how scsi
works.

So, ultimately it doesn't matter what disk geometry you use. If
you specify something and can do a newfs without getting complaints,
then it will probably work. If you want to specify the real drive
geometry, it probably is a good idea. However, on many scsi drives
today, there is actually a variable number of sectors per track
to it is *impossible* to give format the real physical characteristics
of the disk. This is the case with the disk I was asking about. It's
a 2.9G Seagate disk, offering very high performance. I even called
Seagate to see what they recommended and the guy didn't really have
anything useful to say. He said if format/newfs accepts it, and
the geometry doesn't calculate an address beyond the size of the
disk, then it should be okay.

Some of the replies I got said that there might be some performance
degradation if a geometry is used that is really far from the real
geometry, although I'm not really convinced that this would be the
case.

Others suggested that if you want to save some disk space, change
the number of inodes that newfs generates by default, and to decrease
the minfree value. So, I created a 2G paritition on the 2.9G drive
using

   newfs -i 20480 -m 3 -o time /dev/rsd2d

and the resulting filesystem was something like 400M larger than
when I used the newfs defaults. This -i option I chose reduced
the number of inodes from 970,000 to 97,000--more than enough
for our purposes.

I'd like to thank all those who replied. I've included some of
the more interesting replies below:

> From: Mike Raffety <miker@il.us.swissbank.com>
>
> I came up with an even better geomety for your disk ... 383 x 79 x 188.
> That makes 5688316, losing only 131 sectors, and the two spare cylinders
> are fairly small this way.
>
> The geometry for SCSI disks is a total abstraction; a fiction, for the
> convenience of a Unix disk driver that still thinks it controls a disk
> based on its geometry. SCSI hides all that information, and rightfully
> so, since current disk technologies have varying numbers of sectors per
> track (more on the outside bigger/longer tracks, less on the inside).
>
> The SCSI spare cylinders and spare sectors are addresses entirely within
> the SCSI disk electronics; format will never be able to touch them, and
> so are not relevant to what format is asking for.
>
> (BTW, /usr/games/factor is VERY useful for coming up with three numbers
> that multiply out as closely as possible to an arbitrary number.)

> From: luisg@hadar.fai.com (Luis Galleguillos)
>
> Peter you might go insane worring about sectors and cylinders
> You have a lot repeat a lot of space now, no need to be more
> greedy or to be a miser.
>
> Translate the disk as follows:
>
> |-------------------------------------------------------------------|
> 5528 cyl
> 2.9GB ---->5528cyl
> 1MB ------>1.906 cyl aprox
>
> lets say divide the disk in 3 partitions:
> a 500MB
> b 1GB
> g 1.4GB
>
> the partition would look:
>
> |--------|---------------------------|---------------------------------|
> 0 980 2940 5528cyl
> 980 1960 2588
>
> thus the second partion has 1960 cylinders and starts at cylinder 980
>
> The last partition has only 1357.81 MB which is close to the original
> design. This partion is obtained just by what is left after all the others
> have been assigned. This takes full advantage of the entire disk.
> .
> Luis

> From: matt@wbst845e.xerox.com (Matt Goheen)
>
> Well, the geometry of disks is really a holdover from the old days of
> SMD drives (and probably even earlier). In those days, you had to tell
> the controller the geometry of the disk because (I think) SMD commands
> were things like "seek to track 143, rotate the disk to position 32 and
> read from head 3". With SCSI disks it's more like "read block 5723".
> Another complication is that SCSI disks these days may not always have
> a uniform number of tracks per cylinder (i.e. they may have fewer
> tracks on the inner cylinders). In the old days, the disk driver was
> the thing responsible for optimizing transfers to/from the disk. The
> more the driver knew, the better it could do its job. This isn't
> really the case now. Disks have their own caches and drivers don't
> really know disk geometries.
>
> So, my answer to your question is -- use whatever works. There may be
> some limitations as far as cylinder group calculations go or other
> factors that might limit how large (or how small) certain parameters
> can be.
>
> Other than formatting, you may want to play with the number of inodes
> the file system has per cylinder group. This is typically much too
> large for most file systems (USENET news being a shining
> counter-example). For example, on the twenty partitions on my
> server, the average disk space used is around 60%, but the average
> number of inodes used is around 4% (I correct this now on new partitions).
> You can use the -i option to newfs to adjust this.
>
>
> - Matt Goheen

> From: Brian Styles <brian@mrc-bsu.cam.ac.uk>
>
> I can't make any useful contribution to the geometry discussion, apart from
> the observation that, with SCSI, it's rather an illusion anyway - it won't
> store its data in the way you are imagining.
>
> But don't forget to do some tuning with newfs: the inode density (-i flag)
> in particular. Depending on what is going onto those drives, you are likely
> to find the default (1 inode per 2048 bytes) excessively generous. To get
> some idea, use df -i on some similarly-populated partitions. The saving can
> be quite high if you use numbers like 8192. It is annoying to run out of inodes
> before real disk space, but the default assumes rather a lot of tiny files, by
> present standards (e.g. I notice that one of our users' home dir partitions,
> newfs'd _before_ we saw the light, is 98% full in space but only 7% in inodes).
> You might also try relaxing the minfree (-m) proportion below the default (10%).
> On any partition which is root-owned or substantially read-only (e.g. /usr/local)
> you can reduce it to zero. If you _do_ reduce it (and I can't see why anyone
> would need a 10% margin on a nearly 3GB drive), you may want to override the
> optimisation (-o) to "time", since it will otherwise opt for space with m<10%.
>
> We haven't found a way of imposing these parameters from inside suninstall,
> which is a nuisance, but I guess that's all blown away with Solaris 2+ !
>
> Good luck,
>
> Brian Styles

> From: Perry_Hutchison.Portland@xerox.com
>
> Since a SCSI-interfaced disk deals only in block numbers across the
> interface, it should work with any combination of cyls/heads/sectors
> which does not exceed the actual capacity of the disk. However, the
> filesystem code attempts to optimize its access operations relative to
> the drive's geometry, so _in theory_ you should get better performance
> if the number of cylinders you claim in the format.dat matches the
> number which the drive actually has. A format.dat cylinder count which
> is a power of two times the actual count should also work pretty well.
>
> I'm a little puzzled by "2738 cylinders including 2 spares" -- this
> could mean that the drive internally reserves 2 of the 2738 as spares
> (and therefore acts externally as if it had only 2736), or that the
> manufacturer expects (as does SunOS) that 2 of the 2738 will be treated
> as spares by the OS.
>
> Dividing the claimed 5688447 blocks by 2738 cylinders does not come out
> even -- there are 1621 blocks left over. 2736 does not come out even
> either, but in this case there are only 303 blocks left over so 2736 may
> be the better number to use. This suggests that you specify either
> 2736 or 5472 cylinders in format.dat . Using 2736, we get 5688447 /
> 2736 = 2079 blocks per cylinder, and wonder of wonders this comes out
> even at 99 sectors per track using the manufacturer's figure of 21
> tracks per cylinder. So, what I would probably try would be 2736
> cylinders, 21 heads, and 99 sectors giving a total usable capacity of
> 2734 * 21 * 99 = 5683986 blocks, or 4461 "lost for spares and roundoff."
>
> Of course, if capacity is more important than performance, you might
> prefer to use your 5528x21x49 geometry, which loses only 2193 blocks --
> the difference of 2268 blocks is about 1Mb. Even if performance is a
> priority it might be worthwhile to do some testing as the effect of
> geometry mismatch on performance may not be all that significant.

> From: Charles A Finnell <finnell@portia.mitre.org>
>
> The number of cylinders in each partition needs to be an exact multiple of the
> cylinder group size, which is 16. Otherwise, when newfs creates the partition,
> it will waste some cylinders, thereby foiling your attempts to maximize the
> usable space. Since the total number of cylinders on the disk is rarely an
> exact multiple of 16, I allocate the odd one as "h", and place it nearest the
> disk hub, at the highest addresses.
>
> -- Charlie <finnell%mdf@mwunix.mitre.org>

> From: PHIL_LORGAN@NYMCS.Prime.COM
>
> the disk drives' physical geometry ie:sectors per track,total heads,physical
> cylinders,alternates are very important .if /etc/format.dat does not have a match for the physical geometry of a particular drive,then you must use type command in format and select 13:other.several questions are asked which must be answered correctly based on drive geometry and sector per track layout,call your vendor of this type drive and they should be able to furnish all necessary info

> From: vasey@issi.com
>
> My best guess: not much, especially considering that most of these
> disks are variable geometry drives, ie, they have fewer sectors on
> on the inner tracks (usually sub-divided by zones), and the controller
> remaps the SCSI block no. to the cyl/hd/sect address by complicated
> scheme which is (forunately ;^) hidden from the user.
>
> I suppose you could accidentally pick some combination of numbers and
> partition boundaries which would end up allocating critical sequences
> (eg, the i-list) inefficiently and make the disk thrash across zone
> boundaries occasionally, but that's not too likely, and virtually
> impossible to detect anyway, so I wouldn't lose much sleep over it.
>
> My personal preference is to use parameters that are fairly close to the
> physical characteristics of the unit, even it means losing a few MB.
> And I occasionally gravitate to nice round numbers (eg, 15hd x 80sect)
> because they make space allocation easy (and pretty partition tables! ;^).
>
> Not very scientific, but hope this helps anyway ...
>
> ++ Ron vasey@issi.com International Software Systems Peace! ++
> 1+512+338-5724 9430 Research, Austin TX 78759 <><

> From: Wilson N G <noel@essex.ac.uk>
>
> Do NOT format disks with the wrong geometry; disk controllers are usually
> prepared to believe you if you try to specify a parameter that is smaller
> than the actual value, but seldom let you specify one that is bigger. The
> net result is that you can suddenly reduce the apparant size of your disk
> and dependingon the firmware involved, this can be permanent! A colleague
> of mine once decided to test a Prime drive by telling it it was only 5
> cylinders long, and Hey Presto, the drive was now irrevocably 5 cylinders
> long! It is a general principle that drives have already been formatted
> by the manufacturers using a far more stringent validation process than
> the one used by the format command, and have had most of the bad spots
> mapped out; all you do by reformatting is to risk losing track of some
> of them, so they crop up later in the middle of your data. Also, many
> modern disks don't have fixed geometry - they have more sectors per track
> on the outside cylinders than on the inside ones. Get a label written on
> drive that reflects the geometry given in the manual and refrain from
> formatting it at all is my advice!
> Regards, Noel

> From: James.Ashton@syseng.anu.edu.au
>
> The disk drive will receive all its requests as block numbers so, as
> you have gathered, functionality is unaffected by the format geometry
> you give provided the total number of blocks is <= the number
> available. If you intend to run fast file systems on the partitions
> though you'll find that the further you deviate from the real values,
> the slower the file system will run. Mind you we're only talking a few
> percent unless you come up with a really pathological setup so perhaps
> you don't mind.
>
> Assuming you're using an FFS remember that you can save considerable
> space with judicious use of newfs parameters. Again assuming speed is
> not a big issue try using `newfs -m 2 -i 8192'. For a big file system
> without a large number of small files this will get you quite a bit
> more space than the default newfs values.
> ______________________________________________________________________________
> James Ashton System Administrator
> Department of Systems Engineering
> Voice +61 6 249 0681 Research School of Physical Sciences and Engineering
> FAX +61 6 249 2698 Australian National University
> Email James.Ashton@syseng.anu.edu.au GPO Box 4 Canberra ACT 2601 Australia

> From: acb@ziggy.csl.ncsu.edu (Andrew C. Burnette)
>
> please call Seagate. Ask for the real specs.
>
> If you want more out of the disks, and trust them,
> simply use mkfs and supply the drive with a minfree value of 5% instead of the
> default 10%. this gives you an extra 5% free.
> I have done this on all my 1Gig drives and above without error, several
> are more than three years running(how long I have been here at NCSU).
>
> newfs -v will show you the mkfs call, and you can just control-c out of it,
> and replace the '10' near the end of the string with '5' and all will be well.
>
> As for monkeying around with the other numbers to try and recover 1 meg of free
> space is dangerous. The numbers are REALLY specifications which describe
> the disk. Although scsi-2 disks may use variable sectors per track,
> there are real numbers which the vendor will happily supply to you if asked.
> good luck,
> --
> ******************************************************************************
> Andrew C. Burnette acb@ncsu.edu
> Electrical and Computer Engineering

> From: Christopher Lott AGSE <lott@informatik.uni-kl.de>
>
> I'm not a format expert but I followed your message
> very well and am very interested to hear what you
> come up with.
>
> I just went through a similar deliberation, although
> only with 500M drives, not 3G. I settled on using a
> setup that, uh, reflected what the engineering spec
> said. Unix does a lot of optimization for disk writes,
> and that work basically is for naught in the world of
> scsi drives because, as you apparently know well, all
> a scsi disk cares about are blocks.
>
> I too had real trouble pinning down where the red line
> that you dare not cross is. First, how many sectors
> does the scsi controller *really* need? The sun sysadmin
> book reports 4 cylinders; in your message you report 2.
> Then apparently the disk needs some sectors to store
> stuff internal just to it.
>
> A sufficiently smart disk should complain if the format
> tries to glom onto the sectors reserved for the disk.
> But what about the sectors that the cntlr needs? Will
> the cntlr silently overwrite the last few blocks of some
> filesystem one fine day?
>
> The disk guarantees X user sectors. What does this mean?
> I read it to mean that disk promises you can use every
> last one of these sectors and that the disk proper will
> never need even one of them. Fine, you knock off some
> for the controller and run. But how many?
>
> Another thing I've never been able to settle satisfactorily -
> where is the table of bad blocks written, and how is the
> responsibility shared between the disk and the o/s? I have
> extracted original manufacturer's defect lists from our
> disks before a format; where is that "original" list written?
> And when you remap sector using the format program, then
> what?
>
> Finally, for the "atrks" parameter given to format - what
> does the o/s use those tracks for, and does it *really*
> use those tracks on a scsi drive?
>
> And who really did kidnap the Lindbergh baby? :-)
>
> When you get around to running newfs on those huge drives
> of yours, I'd recommend turning down the reserve %age from 10%
> to, oh, about 2% or less! Geez, 10% of just one of those
> huge drives is already 300Mb!! Boy, will you have fun running
> backups. Do you know about the Amanda backup system yet? It's
> a free system which does backups in parallel and is designed to
> use a exabyte drive. We use it and it's GREAT. Anon ftp to
> ftp.cs.umd.edu and look in the amanda2 directory.
>
> chris...

-- 
Peter Steele        Unix Services Manager            peter.steele@acadiau.ca 
Acadia Univ., Wolfville, NS, Canada B0P 1X0  902-542-2201  Fax: 902-542-4364



This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:07:27 CDT