[NLPL Task Force (A)] CuDNN for fp16 training

Tue May 12 14:41:49 UTC 2020

hi again, vinit:

are you still interested in fairseq?  it does look like an interesting
package, though i am not quite sure we really have sufficient gpu
capacity for it :-).

in any case, the metacenter staff helped with the right NCCL version,
and i created a trial module (also including apex) which at least
appears to have some basic functionality:

$ module purge; module --ignore-cache load nlpl-fairseq/0.9.0/3.7

i would be grateful for a little more testing and hopefully a report
of good health?

cheers, oe

On Fri, May 8, 2020 at 6:50 PM Stephan Oepen <oe at ifi.uio.no> wrote:
>
> hi vinit,
>
> from a quick glance at the README for fairseq, i found no mention of
> cuDNN?  for all i recall, PyTorch (unlike TensorFlow) is independent
> of cuDNN too ... so why would you expect to benefit from cuDNN?
>
> fairseq does mention NCCL as a prerequisite (for distributed training)
> and optionally apex (for faster training).  from my (still partial :-)
> understanding of apex, it provides mixed-precision support, so should
> not be necessary, or?
>
> i am assuming you have everything installed yourself into a local
> virtualenv?  that will make it hard for others to debug, i fear.  in
> principle, fairseq to me looks like something we might want to support
> as a ready-to-run NLPL module (bundle), but i cannot promise i will be
> able to look into that so quickly.
>
> cheers, oe
>
> On Fri, May 8, 2020 at 6:10 PM Vinit Ravishankar <vinitr at ifi.uio.no> wrote:
> >
> > Hi! I’m trying to figure out how to enable half-precision floating points in Python; I’m using the fairseq library [1], which has an fp16 flag, in conjunction with my own virtual environment (Python 3.7.3). I’m not using any modules, I haven’t needed them for regular multi-GPU work. Unfortunately, my program switches to full-width floats because of a lack of support for fp16. This support was introduced in nVidia’s cuDNN, but loading any of the provided cuDNN modules results in a core dump the minute fairseq is loaded. Is there any recommended way to use these modules? Thanks!
> >
> > – Vinit
> >
> > 1. https://github.com/pytorch/fairseq
> >
> >