SUMMARY: ZFS in Production

From: Victor Engle <victor.engle_at_gmail.com>
Date: Tue Mar 23 2010 - 09:16:37 EDT
Sorry for the delay with this summary. I was surprised that production
use of zfs pretty common. Unfortunately my client was not willing to
switch from his tried and true ufs.

The common thread in the responses was that zfs is particularly well
suited as storage for file services. Less well suited for databases
but still very good provided tuning best practices were followed.

One responder pointed out that you can't take a lun back from zfs as
you can with a veritas disk group for example.

All the responses I received follow...

#############################################################################
I wrote a perl script to take hourly snapshots and to remove old
ones.  The idea for it came from the Apple's TimeMachine concept.
The thinking behind is that the users should be able to go back to a
snapshot several months old and copy back a file they deleted.
However, unlike the Apple TimeMachine, this script does NOT make
snapshots in a second drive.  So, if the drive dies, original data
and the snapshots all go with it.  I'll work on the next iteration
of the script to move the snapshots to a second drive/array.

###########################################################################

I've been using it for the past year, plus some.

I've been using the compression feature to conserve disk space.

I usually create one large pool, and create all the filesystems under
it. (All of the free space is share by the pool, and not locked away in
separate filesystems.)

I've been using the snapshot feature to do point in time backups. It
greatly simplified the scripts I had written to do UFS snapshots with
'fssnap'.


#######################################################################

We've been using zfs for years.  You don't describe your application,
depending on the particulars, you may want to look into tuning:
http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide
We make some modifications for Oracle with SAN disk, but for most uses
don't change anything.

As far as non-evil tuning, depending on your app you may benefit from
compression and other settings.  Even parts of Oracle benefits from
compression - we found that RMAN backups happen faster and save space
when sent to zfs with compression enabled.  Test out settings on your
app.

Also be sure to read and understand the Best Practices guide:
http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide

We use snapshots for many different situations.  On some servers we
snap partitions on a daily basis and keep these for 2-7 days,
providing us with instant recover capabilities.  To do this be sure to
understand how space is consumed with snapshots.

We also snap before upgrades to provide rollback and comparison in
case something doesn't work as expected.

In some cases we snap for transmission to remote site, using zfs
send/receive capabilities.  Still not automated or regular, though.

We're also investigating using snapshots for backup, e.g. snap
partitions and backup the snaps instead of the actual partitions.


When using snaps be sure to understand how they consume space, how
your application reacts to the snap (e.g. does database need to be
down for consistency) and be sure to clean up after yourself.  We've
had some admins create a snap, forget about it for a year, and then it
causes problems consuming too much unused space.


Good luck,

-f
http://www.blackant.net/

###################################################################################

The attached text file is the perl script.  It has a "requre" line
which I commented out.  That's a localism to get some variables
pre-loaded for our machines.  If you find some variables (ie:
$HOSTNAME) missing stuff you can figure out pretty early on.

This particular script is being used for a RAID array that has large
datasets.  The lab tells me that the original data can be re-created
so they don't want to back it up.  This is a way to guard against
accidental file deletions.  This is NOT a backup strategy.  HW = Sun
Blade 2000 + Anacapa RAID arrays.  Production and Dev.

We use ZFS for SAN luns to aggregate them to a large filesystem and
to lay OpenAFS on top.  (See www.openafs.org to see an enterprise-class
multi-platform filesystem)  We also use ZFS to make filesystems out of
Sun J4400 JBOD arrays.  We have ~5-7 TB of production data that uses
ZFS at the moment.  All production.  Does not use this script to make
backup snaps as OpenAFS has its own backup snapshots.

We run Solaris on Sun SPARC hardware, not x86, if that makes a
difference.


#####################################################################################

Awesome. Bloody easy and lots of value/features above/beyond ufs.

I've not moved to zfs boot environment. But, everything else on my
newer Solaris 10 systems is ZFS. I love it. I'm using it on a
StorageTek 2510 iSCSI device and on a J4200 multi-homed SAS device.
Also on the extra internal drives (beyond the hardware mirrored boot
drive).

You'll find recipes online for rotating snapshots. They take virtually
no resources unless/until they are needed for data replication. I
implemented a basically endless version of that that allows me to
recover data files from any day in the semester for our bio-imaging
class/laboratory.

   #!/bin/ksh

   # this should be run off cron before midnight every night.
   # it will generate a date stamped snapshot of those zfs filesystems
we deem important.
   # `date +%Y%m%d%H%M` generates a date stamp for the snapshot name
of the form 200909151431,
   # which corresponds to Sept. 15, 2009 at 2:31pm. These can be
viewed with `zfs list`.
   # for a quick overview of recovery, roll back, etc., see
   #       https://www.sun.com/offers/details/zfs_snapshots.xml

   /usr/sbin/zfs snapshot biopool/bioimaging@`date +%Y%m%d%H%M`;
   /usr/sbin/zfs snapshot biopool/capstone@`date +%Y%m%d%H%M`;
   /usr/sbin/zfs snapshot biopool/quantbiol@`date +%Y%m%d%H%M`;
   /usr/sbin/zfs snapshot biopool/students@`date +%Y%m%d%H%M`;
   /usr/sbin/zfs snapshot jpool/outreach@`date +%Y%m%d%H%M`;


of course, I have to clear those out manually, and/or create a script
to clear them out. However, I still haven't gotten a clear policy
statement from the faculty on what time frame they need; and, since it
still isn't eating much space compared to what I have, it all still
sits there.

The one complaint I have is that there is no replacement for
ufsdump/ufsrestore. The zfs send and receive only does full file
systems. You cannot recover just one file or directory using those
tools. So, they aren't really functional for backups. Amanda can be
configured to use them, but it turns out that gnutar is more
functional in terms of the use cases that typically turn up. That's
one reason I haven't gone to zfs boot environments. I depend on fssnap
and ufsdump/ufsrestore configured with a wrapper in Amanda to backup
my primary partitions. It could be I just don't know enough yet or
don't have enough experience to have confidence, and, perhaps, zfs
might actually be better for boot environments. Anyway, my setup works
fine for me.

Something I haven't seen a clear analysis of yet is higher level
failure modes for zfs. Ok, so I have raid 6 with a hot spare
configured using zfs raidz2. But, what if I was away on vacation,
didn't pay attention, and I got 3 drives failed. Now, suppose I had
two of those raidz2 (or even more for that matter) configured into one
zpool. Do I lose the entire zpool? Or will it self heal and at least
give me what data happened to be on the surviving raid components?
Based on the lack of an answer for this, I don't use one giant zpool
with multiple raidz components for large scale storage. That might be
easier in terms of allocating space and the usage of space, but I
don't want to risk it.

So, I use multiple zpools rather than putting all my eggs in one
basket. This is much better than the old situation of ufs where you
have a gazillion drives with even more partitions, and you have to
manage who gets what space and where directories are mounted and all
that stuff. Some of my Solaris 9 systems with over 30 individual
drives have become a bear in that respect. With zfs, I can ask a
faculty member to buy a drive and I will allocate him a certain amount
of space, but I don't have to just mount that drive and give him a
directory on it. The disadvantage is that I can't just add a drive to
a zpool and have any kind of raid protection. I have to wait until I
have enough drives to make sense and set up a raidz. Then I can set
reservations if they are appropriate. I would like a more incremental
way of adding space to a zpool or even to a raidz component. Say, I
have 3 drives in a raidz and want to expand it to 5 drives. I have to
zfs send the data to something else, destroy the raidz and remake it,
and then bring all the data back.

I found one of the people who works on zfs posted a specification of
how to do that and invited people to implement it (it's open source
after all). But, as far as I know, it hasn't been done.

Anyway, I've been using zfs for a couple of years and love it.

If you happen to have a StorageTek 2510 iSCSI device or something
comparable, I posted a very detailed summary on the sun managers list
on how to set that up with zfs. However, if you haven't got one, I
would recommend against it. The J series is just way easier to set up.
The J4200 or J4400 with SAS.

One thing I haven't done is to use the SSD drives to boost overall
speed. That hasn't been an issue for me. But, if raw speed really is
an issue, zfs can make use of an SSD for it's write intent log and end
up running much faster. As far as I know this isn't possible with any
other existing file system. It's part of what allows the Sun 7000
series storage systems to beat out NetApp in price/performance.



-- 
---------------

Chris Hoogendyk

#################################################################################################

We use ZFS extensively in all environments including production. We
have been migrating systems from Veritas and SVM OS filesystems to ZFS
with great success. Our newly jumpstarted systems are all using zfs
root. The best advantage is the ability to create a new boot
environment using snapshots, then patch that BE live and simply reboot
into it. Turns a 3-hour downtime for patching into 10min.

We use ZFS to create application filesystems using internal drives in
a mirrored pool. I've personally used it to manage SAN LUNs, though
this is not ready for prime-time since we absolutely need multipathing
and MPXIO does not make it straightforward to test that both paths are
actually active...it'll just tell you whether mpxio is turned on or
not. We need explicit tests for when we do SAN maintenance.

I've tested ZFS fault tolerance on internal drives (either system or
for app filesystems) and it works as advertised, and it's real easy to
replace failed disks.

Dave Foster

##################################################################################################

Hi Victor,

I've been using zfs in a production oracle environment for about a year
without any issues, although we have not used the snapshot technology.

Pete

###################################################################################################

Hi Vic

I have implemented (an early version) of ZFS with 11 x T3 arrays with
a V480 for around 2
years now in a production server giving both NFS and Samba file
serving to a workgroup of
50 people and have approx 8Tb of data managed. So far it has held
together really well without
incident, very happy with it. Weekly backups are approx 3Tb to ZFS disk.

We snapshot every night and keep 14 days worth of snapshots available.
Also use snapshot for
tape backups. A real winner with the Windows users to beable to get
last weeks version.....

Peter

######################################################################################################

ZFS is the best thing that has come out of the OS industry in the past
years IMO.  I've used it for years now, I use ZFS to create Sun virtual
servers within each specified ZFS.  I've created scripts that send
incremental changes to off-site servers, perfect for DR scenarios.  The
list goes on.  Can save a company hundreds of thousands of dollars by
virtualizing and replicating data.  I would not hesitate to use it in a
production environment.

##################################################################################################

Hi Victor,


Victor Engle wrote:
ZFS has been available for some time now and I'm currently working on a
small project where ZFS would be perfect for the customer. Before deciding
though I wanted to ask this group about any experiences you may have with
deploying ZFS in production environments.
no problems with 50+ Servers with ZFS for root FS
and no problems with 200+ servers with ZFS for data FS



I'm especially interested in ways you may have leveraged ZFS snapshots.
sorry, we dont use snapshots



HTH

Tobias
#####################################################################################################

My biggest problem is the way that I/O to a pool with a bad disk slows
to a crawl until it's detached,
despite remaining mirrors.  SVM usually doesn't do that. I've seen ZFS
resilvers  saturate the disk subsystem
too; a way to avoid starvation has been requested from Sun for over a year

################################################################################

Just remember one thing Vic you can not take away devices you add to a
zpool currently which would
result in you recreating your zpool if one of the devices would have
to be removed.  I have used zfs in
prod environment and it all depends on what you trying to do with it.
We are using one of sun's open storage
products that has zfs built in, which was purchased to replace 3 emc
clarion's (note: not based on my
recommendation) and to say the transition has been flawless is a
understatement.  We have nothing
but problems problems problems.  I am not trying to trash sun or not
recommend zfs for what you are
doing but I am just sharing my experience.

Thanks,
#############################################################################################

Hi Victor,

ZFS is certainly production ready. I have been using it for quite a
few years now in various configurations and
for me it has been very reliable.

Primarily I am using it in bothe ZFS mirrors and RAIDZ's. Biggest
RAIDZ is a 35TB configuration on a J4500
used as a staging areas in Disk->Disk->Tape for Veritas Netbackup.
Handles arund 3-4 TB a day of new data
and dat being flushed. This is on a 10 Gige attached T5220 server.

Another great use I have is fo Squid web caches where the ARC cache
proves to be a great addition for speeding
up web queries. My caches have between 1500 and 2000 simultaneous
connections to them. Works a treat there
for 24/7 type workload.

Also using it for compliance archiving for email as well on another J4500.

If you really want to get a lot of info on who is using it, try having
a look through the zfs-discuss@opensolaris.org
archives. This question gets asked regularly with plenty of good user
cases being reported. I have certainly
done so there in much more detail than above.

Hope this helps.

/Scott.


PS. Further to m previous email.

I have foind  snapshots to be fantastic in multi stage upgrades of
varyong software. I have used
it to great affect with a Blaackboard upgrade that involved multiple
Oracle updates and schema changes.
If there was a problem at any stage we just found the cause, rolled
back, fixed it and the did the step
over until we had a succesfull outcome. at that point we snapshotted
and moved onto next step. Very,
very helpful with a 20 GB database!

##################################################################################################

On a Solaris x86 (Actually a Nevada deployment) I've used/setup ZFS
snapshots for a client to do daily snapshots of his Accounting
database that is presented as a SAMBA share to the clients.

Haven't hear any complaints etc. for the past 3 years...

#########################################################################################################

First off, ZFS is production-ready, at this point, it's not a concern
with it being out for so long now.
I am using ZFS snapshots for a group of users who need multiple copies
of the same data from production,
but yet need to modify it for each test environment, which is an ideal
use of ZFS snapshots + clones. I'm
able to provide 8+ sets of test data, and no additional blocks are
used until they change data within each
test environment. Furthermore, because its test data, I've set ZFS
compression on the filesystem to
save disk space; with some types of data ZFS compression is actually
faster because if the data is
highly compressed in memory, then there is less I/O. The whole process
is very scriptable, and the
only command used is different variations of 'zfs'.

RCA


###############################################################################################################
_______________________________________________
sunmanagers mailing list
sunmanagers@sunmanagers.org
http://www.sunmanagers.org/mailman/listinfo/sunmanagers
Received on Tue Mar 23 08:17:50 2010

This archive was generated by hypermail 2.1.8 : Thu Mar 03 2016 - 06:44:16 EST