[NLPL Task Force (A)] Fwd: [sigma at uninett.no #180417] saga pilot testing
Stephan Oepen
oe at ifi.uio.no
Tue Sep 3 11:23:27 UTC 2019
apropos community; see below. oe
---------- Forwarded message ---------
From: Jørn Aslak Amundsen via RT <sigma at uninett.no>
Date: Thu, Mar 21, 2019 at 11:26 AM
Subject: [sigma at uninett.no #180417] saga pilot testing
To: <oe at ifi.uio.no>
Hi Stephan,
thank you for your interest in pilot testing. I think it will be very
interesting to have you and NLPL on board the pilot testing team.
Notice that we anticipate Saga to provide project storage similar to
what is offered on Fram: A project on Fram will have a 1-10 TB project
directory in /cluster/projects/nnxxxxk. Additionally, a community
might have a community directory in /cluster/shared, for instance
/cluster/shared/nlpl in your case. I assume this directory also should
be accompanied with an nlpl group, and that you administer membership
in this group.
If this fits in your picture, please add the need for a community
directory when filling in the pilot user survey on
https://response.questback.com/uninett/sagapilottesting.
Mvh/MfG/Sincerely --Jørn Amundsen, UNINETT Sigma2 AS
> ________________________________________
> Fra: Stephan Oepen <oe at ifi.uio.no>
> Sendt: onsdag 20. mars 2019 23.50
> Til: Jørn Aslak Amundsen
> Kopi: infrastructure
> Emne: Re: saga pilot testing
>
> hi joern,
>
> i am copying the infrastructure task force from our NLPL project
> (bjoern lindi; martin matthiesen at CSC; and joerg tiedemann at
> helsinki university). NLPL has produced a community-maintained
> software and data installation that is largely parallel on Abel and
> the finnish Taito system.
>
> the software includes several discipline-specific tools but also some
> generic machine learning frameworks that we at the time were the first
> to make work on Abel (without containerization), e.g. TensorFlow and
> PyTorch (which require newer glibc versions than the standard RHEL6
> one). you can find some high-level background here:
>
> http://wiki.nlpl.eu/index.php/Infrastructure/software/catalogue
>
> as we are preparing for the transition from Abel to Saga, we will want
> to rebuild the NLPL project directory on the new system. that would
> require replicating the data resources and rebuilding the software
> modules. our current project directory on Abel (/projects/nlpl/) has
> two terabytes of storage, the one on Taito (/proj/nlpl/) fifteen.
> what we would actually need at this point is around four terabytes.
>
> because the software installations in the project directory are not
> easily relocatable, picking the location of the target directory
> beforehand is kind of important. for uniformity with the other
> systems, we would of course prefer a relatively 'simple' path, e.g.
> something like /projects/nlpl/.
>
> activating a good part of the user base on NN9447k will kind of depend
> on availability of at least parts of the NLPL project directory. do
> you think we could hope to have a directory available to us (and a
> Un*x group to control write access by me and other members of the NLPL
> infrastructure task force) by the start of the Saga test phase?
> community-maintained software, of course, lightens the support load on
> system administrators.
>
> in case the above is a bit cryptic still, should we try to talk by
> phone sometime next week? i have yet to forward your invitation to
> the trial period to the users on NN9447k, but i do expect there is a
> group of doctoral students who would jump eagerly at the opportunity
> to test-drive modern gpus :-). some of them, in fact, have recently
> been running on Taito, under the NLPL resource sharing umbrella.
>
> best wishes, oe
More information about the infrastructure
mailing list