[NLPL Task Force (A)] [uninett.no #210996] Fwd: [in5550-help] Broke Saga storage again, plz help :(
Thomas Röblitz via RT
support at metacenter.no
Mon May 11 07:24:49 UTC 2020
Good morning Stephan,
On Mon May 11 00:27:43 2020, oe at ifi.uio.no wrote:
> many thanks, thomas! that confirmed the interpretation i had pieced
> together, and i believe i now can make much better sense of the
> situation
> on Saga. also maria has in the meantime successfully tested my
> suspicion
> of needing to adjust group ownership of files she had moved from $HOME
> to
> $USERWORK.
Sounds good!
>
> so our immediate need is resolved; please feel free to close this
> ticket.
Even better.
>
> i have only one immediate follow-up question (or quibble, if you
> will): it
> seems unlikely there would be files with my ‘_g’ group owner below
> $SCRATCH, as my default group is $USER (without the ‘_g’ suffix).
Agree it's unlikely.
> hence,
> only moving (rather than copying) from my home directory or explicit
> use of
> chgrp(1) would create ‘_g’ files below $SCRATCH, right? thus, i am
True.
> inclined to attribute the observed ‘delay’ in quota overruns to be
> unlocked
> to automated adjustment of group ownership in the project area.
Likely.
>
> i would suggest rephrasing or extending the information about
> different
> storage areas, possibly with a specific note about Saga (if the
> underlying
> quota mechanisms were not based on group ownership on the lustre-based
> systems). the current page talks in terms of ‘areas’ and locations,
> which
> had misled me to think about quota management in very different terms.
> there is one mention of having to think about group ownership, which
> eventually put me on the right path while looking at dusage(1). but
> overall, i think that page deserves clarification.
Agree. Will forward your request to relevant people.
>
> https://documentation.sigma2.no/files_storage/clusters.html
>
> also, come to think of it, i am no fan of automated adjustment of user
> or
> group ownership, but while you are at it: why not force group
> ownership to
> $USER (without the ‘_g’) below $USERWORK? and, finally, i think it
That sounds like a idea worth to be checked.
> would
> be nice to document the mechanisms that adjust user and group
> ownership in
> various types of storage areas.
You're demanding ;)
Will let relevant colleagues know, so they can figure out how to improve the documentation.
Best regards
Thomas
>
> good night :-)! oe
>
>
> On Sun, 10 May 2020 at 23:40 Thomas Röblitz via RT
> <support at metacenter.no>
> wrote:
>
> > Hei Stephan,
> >
> > On Sun May 10 15:02:46 2020, oe at ifi.uio.no wrote:
> > > dear colleagues,
> > >
> > > every now and again, users under the NLPL project umbrella
> > > (NN9447k)
> > > run into disk quote issues on Saga. and, trying to advice in a
> > > current case, i realize i do not understand the set-up and
> > > constraints
> > > very well myself (despite reasonably careful reading of
> > > 'https://documentation.sigma2.no/files_storage/clusters.html' :-).
> >
> > every user has a "personal" quota which is attempted to manage via
> > BeeGFS
> > quota capabilities. As you pointed out below, BeeGFS uses Unix
> > user/group
> > ownership to facilitate quota setting and enforcement. In the case of
> > the
> > personal quota (the line with $HOME), we use a specific group named
> > ${USER}_g for quota management. That is, every file in any place in
> > the
> > file system /cluster, particularly not only those under $HOME, are
> > accounted for. We have tried to prevent users from accidentally
> > creating
> > files elsewhere with group owner ${USER}_g, e.g., by setting SUID
> > bits for
> > the group of project folders and setting group ownership accordingly.
> > However, when users move files around (instead of copying) group
> > ownership
> > doesn't change. Typically users who run into quota issues under $HOME
> > (while 'du -sh $HOME' shows less then 20 GiB usage) have files with
> > group
> > ownership ${USER}_g under their $USERWORK or in any of the project
> > folders
> > they have access to.
> >
> > The latter two cases can be checked with something like
> >
> > find /cluster/work/users/marispau -group marispau_g -type f -print0 |
> > du
> > -s --files0-from=- | awk 'BEGIN {sum=0} {sum+=$1} END {print sum}'
> >
> > and
> >
> > find /cluster/projects/nn9447k -group marispau_g -type f -print0 | du
> > -s
> > --files0-from=- | awk 'BEGIN {sum=0} {sum+=$1} END {print sum}'
> >
> > >
> > > i would be grateful if someone could explain in more detail how to
> > > read the various lines output by dusage(1):
> >
> > Me too ;)
> >
> > >
> > >
> > ===============================================================================
> > > [oe at login-1.SAGA ~]$ groups
> > > oe dgi xa9910k nlpl ns9008k ns9052k nn9106k nn9447k oe_g
> > > [oe at login-1.SAGA ~]$ dusage -a
> > >
> > ===============================================================================
> > > Block quota usage on: SAGA
> > >
> > ===============================================================================
> > > File system User/Group Usage Soft Limit
> > > Hard
> > > Limit
> > >
> > -------------------------------------------------------------------------------
> > > oe_g $HOME 680.9 GiB 0 Bytes 0
> > > Bytes
> >
> > I think the first two columns are swapped. Intention of this,
> > however, was
> > to show the limit on "personal" usage under your $HOME directory. In
> > reality it just shows how much space all files with group owner
> > ${USER}_g
> > use. Note, if limits are not set, you don't have backup!
> >
> > > oe oe (u) 3.8 TiB 0 Bytes 0
> > > Bytes
> >
> > That is how much all files with user owner 'oe' use in total, i.e.,
> > including files with any group ownership (oe_g, oe, nn9447k, ...).
> > Not sure
> > why this is needed.
> >
> > > oe oe (g) 1.2 MiB 0 Bytes 0
> > > Bytes
> >
> > That is how much all files with group owner 'oe' use in total, i.e.,
> > including files with any user ownership (oe, ... not sure anything
> > else is
> > possible here). Not sure why this is needed.
> >
> > > nn9106k nn9106k (g) 0 Bytes 0 Bytes 0
> > > Bytes
> > > ns9008k ns9008k (g) 0 Bytes 0 Bytes 0
> > > Bytes
> > > ns9052k ns9052k (g) 0 Bytes 0 Bytes 0
> > > Bytes
> > > xa9910k xa9910k (g) 0 Bytes 0 Bytes 0
> > > Bytes
> >
> > All the above are project quota implemented via group ownership.
> > Contrary
> > to what the limits state, there are actually no limits set. See
> > output of
> > beegfs-ctl
> >
> > $ beegfs-ctl --getquota --gid --list nn9106k,ns9008k,ns9052k,xa9910k
> >
> > Quota information for storage pool Default (ID: 1):
> >
> > user/group || size || chunk files
> > name | id || used | hard || used | hard
> > --------------|------||------------|------------||---------|---------
> > xa9910k|205586|| 0 Byte| unlimited|| 0|unlimited
> > ns9008k|219008|| 0 Byte| unlimited|| 0|unlimited
> > ns9052k|219052|| 0 Byte| unlimited|| 0|unlimited
> > nn9106k|229106|| 0 Byte| unlimited|| 0|unlimited
> >
> > Personally, I believe that having access to these Unix groups and
> > limits
> > being unset could* be a configuration mistake. It might allow a
> > malicious
> > user to circumvent storage policies, i.e., creating files with one of
> > the
> > above groups under $HOME, project or shared folders.
> >
> > (*) We also have robinhood scanning the /cluster filesystem
> > regularly. I
> > don't know much about it. Hence, this might catch the misuse scenario
> > sketched above.
> >
> > > nlpl nlpl (g) 3.6 TiB 0 Bytes 0
> > > Bytes
> >
> > This seems to correspond to a shared folder. Limits aren't set to 0
> > actually, they are unlimited. See output of beegfs-ctl
> >
> > $ beegfs-ctl --getquota --gid nlpl
> >
> > Quota information for storage pool Default (ID: 1):
> >
> > user/group || size || chunk files
> > name | id || used | hard || used | hard
> > --------------|------||------------|------------||---------|---------
> > nlpl|205655|| 3.59 TiB| unlimited|| 7448133|unlimited
> >
> > > dgi dgi (g) 0 Bytes 0 Bytes 0
> > > Bytes
> >
> > Not sure what this group is used for.
> >
> > > nn9447k nn9447k (g) 308.7 GiB 1.0 TiB
> > > 1.0
> > > TiB
> >
> > The above is project quota implemented via group ownership. In this
> > case
> > with a limit > 0 plus some usage.
> >
> > >
> > ===============================================================================
> > >
> > > i am guessing that BeeGFS quotas are organized around user and
> > > group
> > > ownership rather than around actual path locations on the
> > > '/cluster/'
> > > filesystem? group ownership appears to be automatically adjusted
> > > periodically on Saga, e.g. to 'oe_g' below my $HOME directory, to
> >
> > Yes (user and group ownership). Yes (automatically adjusted except we
> > forget to switch it on after a maintenance ;).
> >
> > > 'nn9447k' below '/cluster/projects/nn9447k/', and to 'nlpl' below
> > > '/cluster/shared/nlpl/' (on this view, 'oe_g' means something like
> > > 'oe_home')? but no such automatic adjustment applies below
> > > $USERWORK?
> >
> > All as you figured it out I think.
> >
> > > that would mean that moving data from $HOME to $USERWORK will not
> > > affect the quota system, until i also run something like 'chgrp -R
> > > oe'
> > > below $USERWORK, right?
> >
> > Yep.
> >
> > >
> > > related to the above, is what is reported by dusage(1) (i.e. the
> > > BeeGFS quotas) real-time information, or is there some delay in how
> > > frequent this data is updated? i am asking because folk wisdom in
> > > our
> >
> > Not 100 % sure. It seems that I can get all usage & quota information
> > with
> > combinations of different parameters to beegfs-ctl, ie
> >
> > beegfs-ctl --getquota --uid oe
> > beegfs-ctl --getquota --gid --list oe,oe_g,nlpl,...
> >
> > Hence, all information should be real-time. On other systems, e.g.,
> > Fram
> > or NIRD, this might be different.
> >
> > > group has it that there can be an hour or two delay when one has
> > > overrun the quota on $HOME and after freeing up space, before one
> > > can
> > > again write to $HOME. now i am wondering whether the delay
> > > actually
> >
> > I'd rather think that some files with group ownership ${USER}_g are
> > somewhere else, e.g., in running jobs' $SCRATCH. Again, I'm not 100 %
> > sure.
> > One could use the above find examples to look into all possible
> > locations
> > ($HOME, $USERWORK, projects, shared plus all jobs' $SCRATCH).
> >
> > > reflects that people move files into, say, the project area, but
> > > until
> > > the group ownership is automatically adjusted this data is still
> > > counted against their $HOME quota (because it is owned by the '_g'
> > > group)?
> >
> > Yep, this is (maybe) the more likely case you see a delay, that is,
> > until
> > the automatic procedure adjusts group ownership.
> >
> > >
> > > with thanks in advance, oe
> >
> > You're welcome ... and please let me know if you found a good/better
> > explanation!
> >
> > Thomas
> >
> > >
> > >
> > > ---------- Forwarded message ---------
> > > From: Maria Singstad Paulsen <marispau at ifi.uio.no>
> > > Date: Sun, May 10, 2020 at 1:49 PM
> > > Subject: SV: [in5550-help] Broke Saga storage again, plz help :(
> > > To: Stephan Oepen <oe at ifi.uio.no>
> > > Cc: in5550-help at ifi.uio.no <in5550-help at ifi.uio.no>
> > >
> > > Either I'm doing something wrong, or there is something wrong
> > > somewhere else. I have not uploaded or run my program since last
> > > night, so the quota should be updated by now. I'll just try to work
> > > around this, but it might be worth looking into if this behaviour
> > > is
> > > not what's intended, I reckon.
> > >
> > > What I don't really understand is why these numbers don't even seem
> > > to
> > > add up. dusage shows 18.0 GiB for $HOME, 18.7 GiB (u) and 798.7 MiB
> > > (g). Shouldn't files in $USERWORK count towards (g) here? By the
> > > looks
> > > of it, some of my files in $USERWORK must've been counted towards
> > > (g),
> > > but seemingly not the rest? If I'm reading this correctly, that is.
> > >
> > > [marispau at login-1.SAGA ~]$ du -h -s $HOME/*
> > > 16M /cluster/home/marispau/fastText
> > > 0 /cluster/home/marispau/loadingscript.sh
> > > 1,0K /cluster/home/marispau/remote-py-interpreter.sh
> > > 1,0K /cluster/home/marispau/sanity.py
> > > 1,5K /cluster/home/marispau/syspaths.txt
> > >
> > > [marispau at login-1.SAGA ~]$ du -h -s $HOME/.??*
> > > 30K /cluster/home/marispau/.bash_history
> > > 512 /cluster/home/marispau/.bash_profile
> > > 512 /cluster/home/marispau/.bashrc
> > > 20M /cluster/home/marispau/.cache
> > > 512 /cluster/home/marispau/.config
> > > 92K /cluster/home/marispau/.ipython
> > > 1,0K /cluster/home/marispau/.keras
> > > 512 /cluster/home/marispau/.lesshst
> > > 21K /cluster/home/marispau/.lmod.d
> > > 512 /cluster/home/marispau/.pki
> > > 66M /cluster/home/marispau/.pycharm_helpers
> > > 1,0K /cluster/home/marispau/.python_history
> > > 7,5K /cluster/home/marispau/.ssh
> > > 1,0K /cluster/home/marispau/.viminfo
> > >
> > > [marispau at login-1.SAGA ~]$ du -h -s $USERWORK/*
> > > 11M /cluster/work/users/marispau/dev
> > > 146M /cluster/work/users/marispau/exam
> > > 19G /cluster/work/users/marispau/in5550
> > > 69M /cluster/work/users/marispau/train
> > >
> > > [marispau at login-1.SAGA ~]$ dusage
> > >
> > ===============================================================================
> > > Block quota usage on: SAGA
> > >
> > ===============================================================================
> > > File system User/Group Usage Soft Limit
> > > Hard
> > > Limit
> > >
> > -------------------------------------------------------------------------------
> > > marispau_g $HOME 18.0 GiB 20.0 GiB
> > > 20.0
> > > GiB
> > > marispau marispau (u) 18.7 GiB 0 Bytes 0
> > > Bytes
> > > marispau marispau (g) 798.7 MiB 0 Bytes 0
> > > Bytes
> > > nn9447k nn9447k (g) 308.7 GiB 1.0 TiB
> > > 1.0
> > > TiB
> > > dgi dgi (g) 0 Bytes 0 Bytes 0
> > > Bytes
> > >
> > ===============================================================================
> > >
> > > ________________________________________
> > > Fra: Stephan Oepen <oe at ifi.uio.no>
> > > Sendt: 10. mai 2020 00:31
> > > Til: Maria Singstad Paulsen
> > > Kopi: in5550-help at ifi.uio.no
> > > Emne: Re: [in5550-help] Broke Saga storage again, plz help :(
> > >
> > > i don't think soft links are relevant here, they just create
> > > 'aliases'
> > > (another way of referencing a file). the file itself is either in
> > > your home directory or below your user work directory, and the
> > > location of the actual file determines where it is counted against
> > > your quota (no matter how many soft links may point to it from
> > > other
> > > directories).
> > >
> > > could it be the case that hidden files or directories in your $HOME
> > > account for what seems like a discrepancy between du and dusage?
> > > try
> > > the following
> > >
> > > du -h -s $HOME/.??*
> > >
> > > files or directories whose names start with a period are not match
> > > by
> > > standard globbing (expansion of the '*' wildcard). the above will
> > > match all files or directories below $HOME that start with a
> > > period,
> > > are followed by another two arbitrary characters (the '?' in shell
> > > wildcarding), and then have an abitrary, possibly empty suffix (the
> > > '*' wildcard). this complicated pattern is necessary to avoid
> > > matching '..' (which would be included by the simpler patterns '.*'
> > > or
> > > '.?*'), which would refer to the parent directory of your $HOME,
> > > i.e.
> > > the huge filesystem with all user home directories ... running
> > > du(1)
> > > on all of those would probably take a long time ...
> > >
> > > hth, oe
> >
> >
> >
> >
More information about the infrastructure
mailing list