[NLPL Task Force (A)] [usit-vd-rt] [rt.uio.no #3396801] newer CUDA versions on Abel

Stephan Oepen oe at ifi.uio.no
Fri May 10 08:02:09 UTC 2019


hi again, colleagues,

i am sorry for being pushy about this one, but i tried working with
the available CUDA versions on Abel by installating a slightly older
TensorFlow version (which still supports CUDA 9.x).  unfortunately, it
turns out there is no support in Python 3.7, which has become our
default base version of Python as of earlier this year.  the users
requesting this package need at least Python version 3.6, because they
are extending a third-party module that is not supported in 3.5.  but
then it appears there is no available module for Python 3.6 currently?

i hope you are thinking primarily about Saga installations these days,
and i do understand one wants to minimize effort going into software
installations on Abel by now.  but i feel we cannot quite declare that
no further updates can be made and request that our users wait
patiently until after the summer, when Saga hopefully becomes
available.  and for the reasons explained in my earlier email, the ML
nodes are no immediate solution for this project either.

if the choice in this case were between a CUDA 10 or an installation
of Python 3.6, could you provide the one which requires less effort on
your end?  truth be told, my preference would be to stick to modern
versions, i.e. Python 3.7 and move towards CUDA 10 ... provided the
libraries can just be installed as a module without a need to upgrade
the actual drivers?

with thanks in advance, oe

On Fri, May 3, 2019 at 10:53 AM Stephan Oepen via RT
<hpc-drift at usit.uio.no> wrote:
>
> hi ole,
>
> one reason we are hardly making use of the ML nodes yet (or of the
> NIRD Toolkit, for that matter) is what you might call NLPL lock-in: we
> have collaboratively invested several person years in a collection of
> interoperable software and data modules (on Abel and Taito).  a lot of
> this is discipline-specific (software like Gensim, NLTK, spaCy,
> OpenNMT, and some modules developed and maintained by NLPL partners);
> some of the NLPL modules are not strictly speaking specific to NLP
> (e.g. DyNet, PyTorch, and TensorFlow), but we have nevertheless found
> it convenient to maintain our own installations: some users require
> specific versions of these frameworks, or need to run against specific
> Python versions (2.7, 3.5, and 3.7 are all in demand among NLPL
> users), and then of course they expect to combine these with the
> discipline-specific software modules.  truth be told, i believe the
> NLPL collaboration is an effective way of shielding you guys from a
> much broader range of user demands :-).
>
> we are eager to transition to Saga this summer, and i am of course
> aware that the gpu environment on Abel is on its way out.  i was
> (maybe over-) optimistically assuming that wrapping newer CUDA and
> cuDNN libraries into a module would be relatively cheap to do, i.e. i
> did not expect the actual drivers on the nodes would need to be
> upgraded.  but maybe things are not that simple, actually?
>
> cheers, oe
>
> On Fri, May 3, 2019 at 10:36 AM Ole Saastad via RT
> <hpc-drift at usit.uio.no> wrote:
> >
> > We are planning to phase out the GPUs and Abel within just a few
> > months. The jurassic card in Abel should have been replaced (and is in
> > the form of modern ML nodes) many years ago.
> >
> > We do not really want to upgrade any of the software on these cards,
> > which also are breaking up, they are past end of life many years ago.
> >
> > The modern options are the ML nodes with 4x 2080Ti cards, the NIRD
> > serviceplatform with it's top of the line V100 cards and Saga (Abel
> > replacement) with it's P100 cards.
> >
> > I suggest trying TensorFlow in the ml nodes, it's installed and work
> > nicely with outstanding performance.
> >
> >
> >
> > Regards,
> > Ole
> >
> >
> >
> >
> >
> >
> >
> > On Fri, 2019-05-03 at 09:59 +0200, Stephan Oepen via RT wrote:
> > > 2019-05-03 09:59:06: Request 3396801 was acted upon.
> > >  Transaction: Ticket created by oe
> > >        Queue: hpc-drift
> > >      Subject: newer CUDA versions on Abel
> > >        Owner: Nobody
> > >   Requestors: oe at ifi.uio.no
> > >       Status: new
> > >  Ticket <URL: https://rt.uio.no/Ticket/Display.html?id=3396801 >
> > >
> > >
> > > dear colleagues,
> > >
> > > newer TensorFlow versions (which some of my NLPL users are
> > > requesting)
> > > require CUDA 9.2 or (most recently) 10.0.  could you make available
> > > both as modules on Abel?
> > >
> > > in addition to CUDA, the current TensorFlow requirements include:
> > >
> > > + CUPTI ships with the CUDA Toolkit.
> > > + cuDNN SDK (>= 7.4.1)
> > >
> > > in the past, i believe the Abel CUDA modules have also included
> > > suitable versions of cuDNN, so maybe there is nothing worry about
> > > here?
> > >
> > > with thanks in advance!  oe
> > >
> > >
> > --
> > Ole W. Saastad, Dr.Scient.
> > UiO/USIT/UVA/ITF/FI
> > Besøk: Kristen Nygaards hus - Rom 2315
> > Post: Gaustadalléen 23A, 0349 Oslo
> > USIT, Postboks 1059 Blindern, 0316 Oslo
> > Tel: +47-22840752
> >
> >
> >
>
>




More information about the infrastructure mailing list