[NLPL Task Force (A)] saga pilot testing
Stephan Oepen
oe at ifi.uio.no
Wed Mar 20 22:50:46 UTC 2019
hi joern,
i am copying the infrastructure task force from our NLPL project
(bjoern lindi; martin matthiesen at CSC; and joerg tiedemann at
helsinki university). NLPL has produced a community-maintained
software and data installation that is largely parallel on Abel and
the finnish Taito system.
the software includes several discipline-specific tools but also some
generic machine learning frameworks that we at the time were the first
to make work on Abel (without containerization), e.g. TensorFlow and
PyTorch (which require newer glibc versions than the standard RHEL6
one). you can find some high-level background here:
http://wiki.nlpl.eu/index.php/Infrastructure/software/catalogue
as we are preparing for the transition from Abel to Saga, we will want
to rebuild the NLPL project directory on the new system. that would
require replicating the data resources and rebuilding the software
modules. our current project directory on Abel (/projects/nlpl/) has
two terabytes of storage, the one on Taito (/proj/nlpl/) fifteen.
what we would actually need at this point is around four terabytes.
because the software installations in the project directory are not
easily relocatable, picking the location of the target directory
beforehand is kind of important. for uniformity with the other
systems, we would of course prefer a relatively 'simple' path, e.g.
something like /projects/nlpl/.
activating a good part of the user base on NN9447k will kind of depend
on availability of at least parts of the NLPL project directory. do
you think we could hope to have a directory available to us (and a
Un*x group to control write access by me and other members of the NLPL
infrastructure task force) by the start of the Saga test phase?
community-maintained software, of course, lightens the support load on
system administrators.
in case the above is a bit cryptic still, should we try to talk by
phone sometime next week? i have yet to forward your invitation to
the trial period to the users on NN9447k, but i do expect there is a
group of doctoral students who would jump eagerly at the opportunity
to test-drive modern gpus :-). some of them, in fact, have recently
been running on Taito, under the NLPL resource sharing umbrella.
best wishes, oe
On Tue, Mar 19, 2019 at 10:55 AM Jørn Aslak Amundsen
<jorn.amundsen at uninett.no> wrote:
>
> Dear Abel project administrator,
>
> Abel is about to be replaced with the new Saga machine during the autumn of this year.
>
> All projects on Abel hence need to be moved, mainly to the new Saga machine. Although Saga has newer hardware and software, it provides an architecture very similar to Abel: The same CPU architecture (Intel CPUs), 8 GPU nodes (NVIDIA Pascal), same OS (CentOS), same file system (BeeGFS) and same queueing system (Slurm). For further detail on Saga, please refer to https://www.sigma2.no/content/new-supercomputer-named-saga.
>
> We would like to invite your project to pilot testing (pre-production) on Saga. Pilot testing is targeted for the weeks 24 to 34, or 11 June - 23 August 2019. It is important to us for contractual reasons that we do a significant share of the pilot testing before week 29. Hence please prioritize weeks 24-28 if possible. You will receive information about when it is possible to log in for pilot testing in due time. Your project will not be charged for the CPU and/or GPU resources consumed during pilot testing.
>
> To participate on pilot testing, a user must be available to run test jobs for a minimum of three weeks during weeks 24-34. Please distribute this link
>
> https://response.questback.com/uninett/sagapilottesting
>
> to all of your participating users. Please do only send the link to registered users on your project. The link will send you to a short questionnaire, to gather necessary information for us to support the pilot testing. Notice your availability information will not be disclosed.
>
>
> We would highly appreciate if you could inform us about your decision to participate or not to participate in Saga pilot testing no later than 26 March 2019. Please reply to sigma at uninett.no.
>
>
> Yours Sincerely --Jørn Amundsen, UNINETT Sigma2 AS
More information about the infrastructure
mailing list