[NLPL Task Force (A)] [usit-vd-rt] [rt.uio.no #3396801] newer CUDA versions on Abel
Stephan Oepen via RT
hpc-drift at usit.uio.no
Fri May 3 08:53:00 UTC 2019
hi ole,
one reason we are hardly making use of the ML nodes yet (or of the
NIRD Toolkit, for that matter) is what you might call NLPL lock-in: we
have collaboratively invested several person years in a collection of
interoperable software and data modules (on Abel and Taito). a lot of
this is discipline-specific (software like Gensim, NLTK, spaCy,
OpenNMT, and some modules developed and maintained by NLPL partners);
some of the NLPL modules are not strictly speaking specific to NLP
(e.g. DyNet, PyTorch, and TensorFlow), but we have nevertheless found
it convenient to maintain our own installations: some users require
specific versions of these frameworks, or need to run against specific
Python versions (2.7, 3.5, and 3.7 are all in demand among NLPL
users), and then of course they expect to combine these with the
discipline-specific software modules. truth be told, i believe the
NLPL collaboration is an effective way of shielding you guys from a
much broader range of user demands :-).
we are eager to transition to Saga this summer, and i am of course
aware that the gpu environment on Abel is on its way out. i was
(maybe over-) optimistically assuming that wrapping newer CUDA and
cuDNN libraries into a module would be relatively cheap to do, i.e. i
did not expect the actual drivers on the nodes would need to be
upgraded. but maybe things are not that simple, actually?
cheers, oe
On Fri, May 3, 2019 at 10:36 AM Ole Saastad via RT
<hpc-drift at usit.uio.no> wrote:
>
> We are planning to phase out the GPUs and Abel within just a few
> months. The jurassic card in Abel should have been replaced (and is in
> the form of modern ML nodes) many years ago.
>
> We do not really want to upgrade any of the software on these cards,
> which also are breaking up, they are past end of life many years ago.
>
> The modern options are the ML nodes with 4x 2080Ti cards, the NIRD
> serviceplatform with it's top of the line V100 cards and Saga (Abel
> replacement) with it's P100 cards.
>
> I suggest trying TensorFlow in the ml nodes, it's installed and work
> nicely with outstanding performance.
>
>
>
> Regards,
> Ole
>
>
>
>
>
>
>
> On Fri, 2019-05-03 at 09:59 +0200, Stephan Oepen via RT wrote:
> > 2019-05-03 09:59:06: Request 3396801 was acted upon.
> > Transaction: Ticket created by oe
> > Queue: hpc-drift
> > Subject: newer CUDA versions on Abel
> > Owner: Nobody
> > Requestors: oe at ifi.uio.no
> > Status: new
> > Ticket <URL: https://rt.uio.no/Ticket/Display.html?id=3396801 >
> >
> >
> > dear colleagues,
> >
> > newer TensorFlow versions (which some of my NLPL users are
> > requesting)
> > require CUDA 9.2 or (most recently) 10.0. could you make available
> > both as modules on Abel?
> >
> > in addition to CUDA, the current TensorFlow requirements include:
> >
> > + CUPTI ships with the CUDA Toolkit.
> > + cuDNN SDK (>= 7.4.1)
> >
> > in the past, i believe the Abel CUDA modules have also included
> > suitable versions of cuDNN, so maybe there is nothing worry about
> > here?
> >
> > with thanks in advance! oe
> >
> >
> --
> Ole W. Saastad, Dr.Scient.
> UiO/USIT/UVA/ITF/FI
> Besøk: Kristen Nygaards hus - Rom 2315
> Post: Gaustadalléen 23A, 0349 Oslo
> USIT, Postboks 1059 Blindern, 0316 Oslo
> Tel: +47-22840752
>
>
>
More information about the infrastructure
mailing list