[unisog] large volume of files per filesystems

Patrick Darden darden at armc.org
Wed Dec 26 15:14:32 GMT 2001


Good points.  

As far as file system enhancements go, here are some parallels that are
already in operation:

Qmail derives a huge speed enhancement from moving inboxes from one
directory (/var/spool/mail) to many (/home/user).  Large sites enhance
this further by spreading home directories out (/home/a/ausers
/home/b/busers /home/c/cusers etc.).

Squid uses 16 top level directories, each hoding 256 subdirectories.  This
speeds file access tremendously.  10M files is small time for Squid.

INN switched to what it calls a Cylinder file system. Instead of each news
article being a separate file, it now just rams the new article in at the
end of the appropriate cylinder (alt or comp or rec...).  Each cylinder is
a file.  This saves on inodes, reduces wastage of blocks due to lots of
small files, makes disk access faster because you can use large blocks
without huge space wastage.

Storing files in a database could be the answer.  MySQL should be able to
store and index files much much more efficiently than flat inode files.
Oracle actually boasts about this capability.  Then you just backup one
database file.

Storing the info in a database instead of files might be cleaner, and it
works very well.  If you get more than about 3000 users it pays bigtime to
turn your passwd file into a database.

Finally, although I have never used this backup program, the backup progs
I have used sometimes allow file indexing for fast individual file/dir
restores.  This is tremendously useful, but slows backups--especially if
you have a lot of files vr. a lot of gigabytes.  I would check to see if
indexing is turned on, and turn it off for a trial.

--
--Patrick Darden                Internetworking Manager             
--                              706.475.3312    darden at armc.org
--                              Athens Regional Medical Center


On Wed, 26 Dec 2001 lbuchana at csc.com wrote:

> Hi,
> 
> In the responses so far, I have not noticed any mention of the issue of the
> tape drive being a bottle neck.  If you can not feed data to the tape drive
> to keep it streaming, you will have horrible performance.  Any interruption
> in the data stream causes the tape drive to stop, rewind, and wait for the
> next tape block.  There is at least one tape drive on the market, that has
> a variable write speed to reduce or eliminate this problem, but I have no
> idea of how well it (they) work as I have never seen one.
> 
> One method that I have used to reduce the number of times a tape drive has
> to rewind during a backup is to use very large tape blocks.  How well this
> works with modern hardware compression board is something I have never
> tested.
> 
> Another issue to consider is reworking the application to reduce the number
> of files.  At a user group meeting several years ago, a sys admin described
> an application that was dealing with small gene fragments, and the user was
> putting each fragment into a separate file.  The thrashing of opening and
> closing thousands of files was killing system performance.  The sys admin
> rewrote the users application to only use two or three files.  The
> application ran on the order of a thousand times faster and did not
> interfere with other users of the system.
> 
> My real point, is you need to look at the entire system.
> 
> B Cing U
> 
> Buck
> 



More information about the unisog mailing list