<div><div dir="auto">many thanks, thomas! that confirmed the interpretation i had pieced together, and i believe i now can make much better sense of the situation on Saga. also maria has in the meantime successfully tested my suspicion of needing to adjust group ownership of files she had moved from $HOME to $USERWORK.</div><div dir="auto"><br></div><div dir="auto"><span style="border-color:rgb(0,0,0)">so our immediate need is resolved; please feel free to close this ticket.</span><br></div><div dir="auto"><span style="border-color:rgb(0,0,0)"><br></span></div><div dir="auto">i have only one immediate follow-up question (or quibble, if you will): it seems unlikely there would be files with my ‘_g’ group owner below $SCRATCH, as my default group is $USER (without the ‘_g’ suffix). hence, only moving (rather than copying) from my home directory or explicit use of chgrp(1) would create ‘_g’ files below $SCRATCH, right? thus, i am inclined to attribute the observed ‘delay’ in quota overruns to be unlocked to automated adjustment of group ownership in the project area.</div><div dir="auto"><br></div><div dir="auto">i would suggest rephrasing or extending the information about different storage areas, possibly with a specific note about Saga (if the underlying quota mechanisms were not based on group ownership on the lustre-based systems). the current page talks in terms of ‘areas’ and locations, which had misled me to think about quota management in very different terms. there is one mention of having to think about group ownership, which eventually put me on the right path while looking at dusage(1). but overall, i think that page deserves clarification.<br></div><div dir="auto"><br></div><div dir="auto"><div><a href="https://documentation.sigma2.no/files_storage/clusters.html">https://documentation.sigma2.no/files_storage/clusters.html</a></div><br></div><div dir="auto">also, come to think of it, i am no fan of automated adjustment of user or group ownership, but while you are at it: why not force group ownership to $USER (without the ‘_g’) below $USERWORK? and, finally, i think it would be nice to document the mechanisms that adjust user and group ownership in various types of storage areas.<br></div><div dir="auto"><br></div><div dir="auto">good night :-)! oe</div><div dir="auto"><br></div></div><div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Sun, 10 May 2020 at 23:40 Thomas Röblitz via RT <<a href="mailto:support@metacenter.no">support@metacenter.no</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;padding-left:1ex;border-left-color:rgb(204,204,204)">Hei Stephan,<br>
<br>
On Sun May 10 15:02:46 2020, <a href="mailto:oe@ifi.uio.no" target="_blank">oe@ifi.uio.no</a> wrote:<br>
> dear colleagues,<br>
> <br>
> every now and again, users under the NLPL project umbrella (NN9447k)<br>
> run into disk quote issues on Saga. and, trying to advice in a<br>
> current case, i realize i do not understand the set-up and constraints<br>
> very well myself (despite reasonably careful reading of<br>
> '<a href="https://documentation.sigma2.no/files_storage/clusters.html" rel="noreferrer" target="_blank">https://documentation.sigma2.no/files_storage/clusters.html</a>' :-).<br>
<br>
every user has a "personal" quota which is attempted to manage via BeeGFS quota capabilities. As you pointed out below, BeeGFS uses Unix user/group ownership to facilitate quota setting and enforcement. In the case of the personal quota (the line with $HOME), we use a specific group named ${USER}_g for quota management. That is, every file in any place in the file system /cluster, particularly not only those under $HOME, are accounted for. We have tried to prevent users from accidentally creating files elsewhere with group owner ${USER}_g, e.g., by setting SUID bits for the group of project folders and setting group ownership accordingly. However, when users move files around (instead of copying) group ownership doesn't change. Typically users who run into quota issues under $HOME (while 'du -sh $HOME' shows less then 20 GiB usage) have files with group ownership ${USER}_g under their $USERWORK or in any of the project folders they have access to.<br>
<br>
The latter two cases can be checked with something like<br>
<br>
find /cluster/work/users/marispau -group marispau_g -type f -print0 | du -s --files0-from=- | awk 'BEGIN {sum=0} {sum+=$1} END {print sum}'<br>
<br>
and<br>
<br>
find /cluster/projects/nn9447k -group marispau_g -type f -print0 | du -s --files0-from=- | awk 'BEGIN {sum=0} {sum+=$1} END {print sum}'<br>
<br>
> <br>
> i would be grateful if someone could explain in more detail how to<br>
> read the various lines output by dusage(1):<br>
<br>
Me too ;)<br>
<br>
> <br>
> ===============================================================================<br>
> [oe@login-1.SAGA ~]$ groups<br>
> oe dgi xa9910k nlpl ns9008k ns9052k nn9106k nn9447k oe_g<br>
> [oe@login-1.SAGA ~]$ dusage -a<br>
> ===============================================================================<br>
> Block quota usage on: SAGA<br>
> ===============================================================================<br>
> File system User/Group Usage Soft Limit Hard<br>
> Limit<br>
> -------------------------------------------------------------------------------<br>
> oe_g $HOME 680.9 GiB 0 Bytes 0<br>
> Bytes<br>
<br>
I think the first two columns are swapped. Intention of this, however, was to show the limit on "personal" usage under your $HOME directory. In reality it just shows how much space all files with group owner ${USER}_g use. Note, if limits are not set, you don't have backup!<br>
<br>
> oe oe (u) 3.8 TiB 0 Bytes 0<br>
> Bytes<br>
<br>
That is how much all files with user owner 'oe' use in total, i.e., including files with any group ownership (oe_g, oe, nn9447k, ...). Not sure why this is needed.<br>
<br>
> oe oe (g) 1.2 MiB 0 Bytes 0<br>
> Bytes<br>
<br>
That is how much all files with group owner 'oe' use in total, i.e., including files with any user ownership (oe, ... not sure anything else is possible here). Not sure why this is needed.<br>
<br>
> nn9106k nn9106k (g) 0 Bytes 0 Bytes 0<br>
> Bytes<br>
> ns9008k ns9008k (g) 0 Bytes 0 Bytes 0<br>
> Bytes<br>
> ns9052k ns9052k (g) 0 Bytes 0 Bytes 0<br>
> Bytes<br>
> xa9910k xa9910k (g) 0 Bytes 0 Bytes 0<br>
> Bytes<br>
<br>
All the above are project quota implemented via group ownership. Contrary to what the limits state, there are actually no limits set. See output of beegfs-ctl<br>
<br>
$ beegfs-ctl --getquota --gid --list nn9106k,ns9008k,ns9052k,xa9910k<br>
<br>
Quota information for storage pool Default (ID: 1):<br>
<br>
user/group || size || chunk files<br>
name | id || used | hard || used | hard<br>
--------------|------||------------|------------||---------|---------<br>
xa9910k|205586|| 0 Byte| unlimited|| 0|unlimited<br>
ns9008k|219008|| 0 Byte| unlimited|| 0|unlimited<br>
ns9052k|219052|| 0 Byte| unlimited|| 0|unlimited<br>
nn9106k|229106|| 0 Byte| unlimited|| 0|unlimited<br>
<br>
Personally, I believe that having access to these Unix groups and limits being unset could* be a configuration mistake. It might allow a malicious user to circumvent storage policies, i.e., creating files with one of the above groups under $HOME, project or shared folders.<br>
<br>
(*) We also have robinhood scanning the /cluster filesystem regularly. I don't know much about it. Hence, this might catch the misuse scenario sketched above.<br>
<br>
> nlpl nlpl (g) 3.6 TiB 0 Bytes 0<br>
> Bytes<br>
<br>
This seems to correspond to a shared folder. Limits aren't set to 0 actually, they are unlimited. See output of beegfs-ctl<br>
<br>
$ beegfs-ctl --getquota --gid nlpl<br>
<br>
Quota information for storage pool Default (ID: 1):<br>
<br>
user/group || size || chunk files<br>
name | id || used | hard || used | hard<br>
--------------|------||------------|------------||---------|---------<br>
nlpl|205655|| 3.59 TiB| unlimited|| 7448133|unlimited<br>
<br>
> dgi dgi (g) 0 Bytes 0 Bytes 0<br>
> Bytes<br>
<br>
Not sure what this group is used for.<br>
<br>
> nn9447k nn9447k (g) 308.7 GiB 1.0 TiB 1.0<br>
> TiB<br>
<br>
The above is project quota implemented via group ownership. In this case with a limit > 0 plus some usage.<br>
<br>
> ===============================================================================<br>
> <br>
> i am guessing that BeeGFS quotas are organized around user and group<br>
> ownership rather than around actual path locations on the '/cluster/'<br>
> filesystem? group ownership appears to be automatically adjusted<br>
> periodically on Saga, e.g. to 'oe_g' below my $HOME directory, to<br>
<br>
Yes (user and group ownership). Yes (automatically adjusted except we forget to switch it on after a maintenance ;).<br>
<br>
> 'nn9447k' below '/cluster/projects/nn9447k/', and to 'nlpl' below<br>
> '/cluster/shared/nlpl/' (on this view, 'oe_g' means something like<br>
> 'oe_home')? but no such automatic adjustment applies below $USERWORK?<br>
<br>
All as you figured it out I think.<br>
<br>
> that would mean that moving data from $HOME to $USERWORK will not<br>
> affect the quota system, until i also run something like 'chgrp -R oe'<br>
> below $USERWORK, right?<br>
<br>
Yep.<br>
<br>
> <br>
> related to the above, is what is reported by dusage(1) (i.e. the<br>
> BeeGFS quotas) real-time information, or is there some delay in how<br>
> frequent this data is updated? i am asking because folk wisdom in our<br>
<br>
Not 100 % sure. It seems that I can get all usage & quota information with combinations of different parameters to beegfs-ctl, ie<br>
<br>
beegfs-ctl --getquota --uid oe<br>
beegfs-ctl --getquota --gid --list oe,oe_g,nlpl,...<br>
<br>
Hence, all information should be real-time. On other systems, e.g., Fram or NIRD, this might be different.<br>
<br>
> group has it that there can be an hour or two delay when one has<br>
> overrun the quota on $HOME and after freeing up space, before one can<br>
> again write to $HOME. now i am wondering whether the delay actually<br>
<br>
I'd rather think that some files with group ownership ${USER}_g are somewhere else, e.g., in running jobs' $SCRATCH. Again, I'm not 100 % sure. One could use the above find examples to look into all possible locations ($HOME, $USERWORK, projects, shared plus all jobs' $SCRATCH).<br>
<br>
> reflects that people move files into, say, the project area, but until<br>
> the group ownership is automatically adjusted this data is still<br>
> counted against their $HOME quota (because it is owned by the '_g'<br>
> group)?<br>
<br>
Yep, this is (maybe) the more likely case you see a delay, that is, until the automatic procedure adjusts group ownership.<br>
<br>
> <br>
> with thanks in advance, oe<br>
<br>
You're welcome ... and please let me know if you found a good/better explanation!<br>
<br>
Thomas<br>
<br>
> <br>
> <br>
> ---------- Forwarded message ---------<br>
> From: Maria Singstad Paulsen <<a href="mailto:marispau@ifi.uio.no" target="_blank">marispau@ifi.uio.no</a>><br>
> Date: Sun, May 10, 2020 at 1:49 PM<br>
> Subject: SV: [in5550-help] Broke Saga storage again, plz help :(<br>
> To: Stephan Oepen <<a href="mailto:oe@ifi.uio.no" target="_blank">oe@ifi.uio.no</a>><br>
> Cc: <a href="mailto:in5550-help@ifi.uio.no" target="_blank">in5550-help@ifi.uio.no</a> <<a href="mailto:in5550-help@ifi.uio.no" target="_blank">in5550-help@ifi.uio.no</a>><br>
> <br>
> Either I'm doing something wrong, or there is something wrong<br>
> somewhere else. I have not uploaded or run my program since last<br>
> night, so the quota should be updated by now. I'll just try to work<br>
> around this, but it might be worth looking into if this behaviour is<br>
> not what's intended, I reckon.<br>
> <br>
> What I don't really understand is why these numbers don't even seem to<br>
> add up. dusage shows 18.0 GiB for $HOME, 18.7 GiB (u) and 798.7 MiB<br>
> (g). Shouldn't files in $USERWORK count towards (g) here? By the looks<br>
> of it, some of my files in $USERWORK must've been counted towards (g),<br>
> but seemingly not the rest? If I'm reading this correctly, that is.<br>
> <br>
> [marispau@login-1.SAGA ~]$ du -h -s $HOME/*<br>
> 16M /cluster/home/marispau/fastText<br>
> 0 /cluster/home/marispau/loadingscript.sh<br>
> 1,0K /cluster/home/marispau/remote-py-interpreter.sh<br>
> 1,0K /cluster/home/marispau/sanity.py<br>
> 1,5K /cluster/home/marispau/syspaths.txt<br>
> <br>
> [marispau@login-1.SAGA ~]$ du -h -s $HOME/.??*<br>
> 30K /cluster/home/marispau/.bash_history<br>
> 512 /cluster/home/marispau/.bash_profile<br>
> 512 /cluster/home/marispau/.bashrc<br>
> 20M /cluster/home/marispau/.cache<br>
> 512 /cluster/home/marispau/.config<br>
> 92K /cluster/home/marispau/.ipython<br>
> 1,0K /cluster/home/marispau/.keras<br>
> 512 /cluster/home/marispau/.lesshst<br>
> 21K /cluster/home/marispau/.lmod.d<br>
> 512 /cluster/home/marispau/.pki<br>
> 66M /cluster/home/marispau/.pycharm_helpers<br>
> 1,0K /cluster/home/marispau/.python_history<br>
> 7,5K /cluster/home/marispau/.ssh<br>
> 1,0K /cluster/home/marispau/.viminfo<br>
> <br>
> [marispau@login-1.SAGA ~]$ du -h -s $USERWORK/*<br>
> 11M /cluster/work/users/marispau/dev<br>
> 146M /cluster/work/users/marispau/exam<br>
> 19G /cluster/work/users/marispau/in5550<br>
> 69M /cluster/work/users/marispau/train<br>
> <br>
> [marispau@login-1.SAGA ~]$ dusage<br>
> ===============================================================================<br>
> Block quota usage on: SAGA<br>
> ===============================================================================<br>
> File system User/Group Usage Soft Limit Hard<br>
> Limit<br>
> -------------------------------------------------------------------------------<br>
> marispau_g $HOME 18.0 GiB 20.0 GiB 20.0<br>
> GiB<br>
> marispau marispau (u) 18.7 GiB 0 Bytes 0<br>
> Bytes<br>
> marispau marispau (g) 798.7 MiB 0 Bytes 0<br>
> Bytes<br>
> nn9447k nn9447k (g) 308.7 GiB 1.0 TiB 1.0<br>
> TiB<br>
> dgi dgi (g) 0 Bytes 0 Bytes 0<br>
> Bytes<br>
> ===============================================================================<br>
> <br>
> ________________________________________<br>
> Fra: Stephan Oepen <<a href="mailto:oe@ifi.uio.no" target="_blank">oe@ifi.uio.no</a>><br>
> Sendt: 10. mai 2020 00:31<br>
> Til: Maria Singstad Paulsen<br>
> Kopi: <a href="mailto:in5550-help@ifi.uio.no" target="_blank">in5550-help@ifi.uio.no</a><br>
> Emne: Re: [in5550-help] Broke Saga storage again, plz help :(<br>
> <br>
> i don't think soft links are relevant here, they just create 'aliases'<br>
> (another way of referencing a file). the file itself is either in<br>
> your home directory or below your user work directory, and the<br>
> location of the actual file determines where it is counted against<br>
> your quota (no matter how many soft links may point to it from other<br>
> directories).<br>
> <br>
> could it be the case that hidden files or directories in your $HOME<br>
> account for what seems like a discrepancy between du and dusage? try<br>
> the following<br>
> <br>
> du -h -s $HOME/.??*<br>
> <br>
> files or directories whose names start with a period are not match by<br>
> standard globbing (expansion of the '*' wildcard). the above will<br>
> match all files or directories below $HOME that start with a period,<br>
> are followed by another two arbitrary characters (the '?' in shell<br>
> wildcarding), and then have an abitrary, possibly empty suffix (the<br>
> '*' wildcard). this complicated pattern is necessary to avoid<br>
> matching '..' (which would be included by the simpler patterns '.*' or<br>
> '.?*'), which would refer to the parent directory of your $HOME, i.e.<br>
> the huge filesystem with all user home directories ... running du(1)<br>
> on all of those would probably take a long time ...<br>
> <br>
> hth, oe<br>
<br>
<br>
<br>
</blockquote></div></div>