[NLPL Task Force (A)] [uninett.no #196965] Tensorflow issues, pt. 2
Thomas Röblitz via RT
support at metacenter.no
Fri Oct 25 19:46:42 UTC 2019
I can load tensorflow without any issues.
(python3.7) [thomarob at login-2.SAGA ~]$ salloc --account=nn9999k --time=02:00:00 --nodes=1 --partition=accel --gres=gpu:1 --ntasks-per-node=1 --mem=32G
salloc: Pending job allocation 76976
salloc: job 76976 queued and waiting for resources
salloc: job 76976 has been allocated resources
salloc: Granted job allocation 76976
salloc: Waiting for resource configuration
salloc: Nodes c7-8 are ready for job
[thomarob at login-2.SAGA ~]$ srun --pty python
Python 3.7.3 (default, Mar 27 2019, 22:11:17)
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
>>> quit()
python3.7 is a personal conda env with cudatoolkit 10.0.130 and tensorflow 1.13.1.
So, to me it seems the installation requiring GLIBC_2.23 may be wrong. Comparing GLIBC strings in the shlibs in Henrik's and my installations shows some differences
[thomarob at login-2.SAGA ~]$ strings .conda/envs/python3.7/lib/python3.7/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so | grep ^GLIBC
GLIBC_2.3
GLIBC_2.2.5
GLIBCXX_3.4.20
GLIBCXX_3.4.14
GLIBCXX_3.4.19
GLIBCXX_3.4.17
GLIBCXX_3.4.18
GLIBCXX_3.4.9
GLIBCXX_3.4.15
GLIBCXX_3.4
GLIBCXX_3.4.21
GLIBCXX_3.4.11
GLIBC_2.9
GLIBC_2.10
GLIBC_2.11
GLIBC_2.6
GLIBC_2.7
GLIBC_2.3.3
GLIBC_2.3.2
GLIBC_2.3.4
GLIBC_2.4
GLIBC_2.12
[thomarob at login-2.SAGA ~]$ sudo strings /cluster/home/hrn/.local/lib/python3.7/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so | grep ^GLIBC
[sudo] password for thomarob:
GLIBC_2.3
GLIBC_2.2.5
GLIBC_2.23
GLIBCXX_3.4.20
GLIBCXX_3.4.14
GLIBCXX_3.4.19
GLIBCXX_3.4.17
GLIBCXX_3.4.18
GLIBCXX_3.4.9
GLIBCXX_3.4.15
GLIBCXX_3.4
GLIBCXX_3.4.21
GLIBCXX_3.4.11
GLIBC_2.9
GLIBC_2.10
GLIBC_2.11
GLIBC_2.6
GLIBC_2.14
GLIBC_2.17
GLIBC_2.7
GLIBC_2.3.3
GLIBC_2.3.2
GLIBC_2.16
GLIBC_2.3.4
GLIBC_2.4
GLIBC_2.12
I think, it would be great if there is always a minimal step-by-step sequence of commands such that we can reproduce the issue. Just specifying an error message is usually too little, particularly, when software environments not fully provided by us are involved.
Thomas
On Fri Oct 25 14:48:50 2019, oe at ifi.uio.no wrote:
> > <hrn at c7-8><~> python3
> > .>>> import tensorflow
> > ...
> > ImportError: /lib64/libm.so.6: version `GLIBC_2.23' not found
> > (required by
> > /cluster/home/hrn/.local/lib/python3.7/site-
> > packages/tensorflow/python/_pywrap_tensorflow_internal.so)
> >
> >
> > GLIBC follows with the Linux kernel. In order to get a newer version
> > of GLIBC the Linux kernel must be upgraded and this will involve
> > upgrading all the software on Saga. We cannot do this. The problem is
> > that the library mentioned in the error message has not been compiled
> > on Saga. At the moment, I don't see how this problem can be solved.
>
> oh, no, are we about to run up against these issues again, already in
> the first few months of the many years that Saga will be in
> production?!
>
> it may seem as if newer versions of binary TensorFlow distributions
> (1.15.0 and 2.0.0) have mellowed their glibc requirements again, as
> the
> NLPL installations of those versions (on Saga) appear happy (and come
> from pre-compiled packages).
>
> but sooner or later there will of course be a growing number of
> packages
> that need to be compiled locally, and eventually likely also some that
> will be hard to compile on Saga.
>
> just for the record (for now :-), we had ended up creatively working
> around glibc requirements on the older Abel and Taito (in finland)
> systems by what one colleague called 'glibc gymnastics'; see:
>
> http://wiki.nlpl.eu/index.php/Infrastructure/software/glibc
>
> god helg! oe
More information about the infrastructure
mailing list