[unisog] large volume of files per filesystems

Jim Ennis jim at pegasus.cc.ucf.edu
Fri Dec 28 15:32:50 GMT 2001


I am not using NFS for the filesystems in this setup.
The backup performance seems to be holding for Solaris 8.  Whenever I can
reach my vendors, I'll work on pursuing this problem in more detail.


Jim Ennis                        | jim at pegasus.cc.ucf.edu
Systems Administrator            | (407) 823-1701  |  Fax: (407) 823-5476
University of Central Florida    | Murphy's paradox:
                                 | Doing it the hard way is always easier.


On Thu, 27 Dec 2001, Patrick Darden wrote:

>
> If the difference is sol7 vs. sol8, then it could be NFS2 vs. NFS3.  NFS3
> incorporates many performance optimizations....
>
> --
> --Patrick Darden                Internetworking Manager
> --                              706.475.3312    darden at armc.org
> --                              Athens Regional Medical Center
>
>
> On Thu, 27 Dec 2001, Jim Ennis wrote:
>
> > I did some more testing with a Solaris 8 system with 11 million files on
> > it.  The full backup runs in about 5 hours and 15 minutes.  The machine
> > with the backup performance problem is running Solaris 7 and the
> > application is active during the backup.  Since I am seeing a 600%
> > reduction in backup time (for more files) either I am getting I/O
> > contention from &((*& webct or Solaris 7 has some file system performance
> > issues.
> >
> > The backups were done to the same backup server (Sun E450 with Netbackup
> > 3.2 and Sun L1800 tape library (4 DLT7000 tape heads).
> >
> > I am trying to get some feedback from Veritas and Sun before working up an
> > upgrade plan.  Due to academic schedules, my next real window for a major
> > change would be be May or more likely, August.  But it looks like an OS
> > upgrade will be part of the upgrade plan.
> >
> >
> > Jim Ennis                        | jim at pegasus.cc.ucf.edu
> > Systems Administrator            | (407) 823-1701  |  Fax: (407) 823-5476
> > University of Central Florida    | Murphy's paradox:
> >                                  | Doing it the hard way is always easier.
> >
> >
> > On Wed, 26 Dec 2001, Patrick Darden wrote:
> >
> > >
> > > Good points.
> > >
> > > As far as file system enhancements go, here are some parallels that are
> > > already in operation:
> > >
> > > Qmail derives a huge speed enhancement from moving inboxes from one
> > > directory (/var/spool/mail) to many (/home/user).  Large sites enhance
> > > this further by spreading home directories out (/home/a/ausers
> > > /home/b/busers /home/c/cusers etc.).
> > >
> > > Squid uses 16 top level directories, each hoding 256 subdirectories.  This
> > > speeds file access tremendously.  10M files is small time for Squid.
> > >
> > > INN switched to what it calls a Cylinder file system. Instead of each news
> > > article being a separate file, it now just rams the new article in at the
> > > end of the appropriate cylinder (alt or comp or rec...).  Each cylinder is
> > > a file.  This saves on inodes, reduces wastage of blocks due to lots of
> > > small files, makes disk access faster because you can use large blocks
> > > without huge space wastage.
> > >
> > > Storing files in a database could be the answer.  MySQL should be able to
> > > store and index files much much more efficiently than flat inode files.
> > > Oracle actually boasts about this capability.  Then you just backup one
> > > database file.
> > >
> > > Storing the info in a database instead of files might be cleaner, and it
> > > works very well.  If you get more than about 3000 users it pays bigtime to
> > > turn your passwd file into a database.
> > >
> > > Finally, although I have never used this backup program, the backup progs
> > > I have used sometimes allow file indexing for fast individual file/dir
> > > restores.  This is tremendously useful, but slows backups--especially if
> > > you have a lot of files vr. a lot of gigabytes.  I would check to see if
> > > indexing is turned on, and turn it off for a trial.
> > >
> > > --
> > > --Patrick Darden                Internetworking Manager
> > > --                              706.475.3312    darden at armc.org
> > > --                              Athens Regional Medical Center
> > >
> > >
> > > On Wed, 26 Dec 2001 lbuchana at csc.com wrote:
> > >
> > > > Hi,
> > > >
> > > > In the responses so far, I have not noticed any mention of the issue of the
> > > > tape drive being a bottle neck.  If you can not feed data to the tape drive
> > > > to keep it streaming, you will have horrible performance.  Any interruption
> > > > in the data stream causes the tape drive to stop, rewind, and wait for the
> > > > next tape block.  There is at least one tape drive on the market, that has
> > > > a variable write speed to reduce or eliminate this problem, but I have no
> > > > idea of how well it (they) work as I have never seen one.
> > > >
> > > > One method that I have used to reduce the number of times a tape drive has
> > > > to rewind during a backup is to use very large tape blocks.  How well this
> > > > works with modern hardware compression board is something I have never
> > > > tested.
> > > >
> > > > Another issue to consider is reworking the application to reduce the number
> > > > of files.  At a user group meeting several years ago, a sys admin described
> > > > an application that was dealing with small gene fragments, and the user was
> > > > putting each fragment into a separate file.  The thrashing of opening and
> > > > closing thousands of files was killing system performance.  The sys admin
> > > > rewrote the users application to only use two or three files.  The
> > > > application ran on the order of a thousand times faster and did not
> > > > interfere with other users of the system.
> > > >
> > > > My real point, is you need to look at the entire system.
> > > >
> > > > B Cing U
> > > >
> > > > Buck
> > > >
> > >
> > >
> >
>
>



More information about the unisog mailing list