[NLPL Task Force (A)] Fwd: DellEMC Technical Support - [SRNumber:955846133]
Stephan Oepen
oe at ifi.uio.no
Mon Nov 6 10:33:31 UTC 2017
dear all,
just fyi: ‘ls.hpc’ (and, thus, the NLPL wiki) is back on-line now,
though there are remaining issues with the RAID controller, and we
will have to take the machine off-line once again sometime later this
week.
best, oe
---------- Forwarded message ----------
From: Kjell Andresen <kjell.andresen at usit.uio.no>
Date: Mon, Nov 6, 2017 at 11:07 AM
Subject: RE: DellEMC Technical Support - [SRNumber:955846133]
To: A.OConnor at dell.com
Cc: oe at ifi.uio.no, hpc-core at usit.uio.no, Sam Phung
<sam.phung at usit.uio.no>, adrian.helle at usit.uio.no, Kjetil Kirkebø
<kjetil.kirkebo at usit.uio.no>
[root at ls ~]# dmidecode -s system-serial-number
620SJ32
Hi all!
Thank you all for time and hands helping out with ls.hpc.
Status now:
ls.hpc is up running on the 4 system disks in raid5,
(The 4 x 1TB HDDs). The SSDs still have an issue, see below.
The HDDs is the only volume groups visible in the OS [1].
In the console/iDrac both raid5s are showing [2].
There are one SSD, 0:1:7 blinking both green and amber at the same
LED. It's status is a little bit strange to me, check
http://folk.uio.no/kjell/usit/ls/IMG_3899.JPG
(It's another LiteOn IT ECE 40 disk)
Aaron: I think it would be wise to replace the SSD 0:1:7 before
removing the ssdhs-vg again and remake it as raid5 with one hot spare.
I have collected a new TSR with RAID Controller Log at
http://folk.uio.no/kjell/usit/ls/TSR20171106105543_620SJ32.zip
oe: Last time I removed the ssd virtual disk we lost the virtual hdd
disk so I think it is wise to wait for tonights backup of the mounted
filesystems and replacing the ssd 0:1:7 before removing the ssd-vg
again?
We had some problems entering into to console this morning also after
entering the root password on the machine [3]
This was ok after powering off the machine and leaving it without any
power for some minutes.
I leave ls.hpc.uio.no up running as is until further notice (the
SSDs), the local mounted disks are listed in [4]
/Kjell
[1]
Status of ls.hpc now:
----------------------------------
[root at jump-ojd kjell-drift]# ssh ls.hpc
Last login: Mon Nov 6 09:16:55 2017 from jump-ojd.uio.no
[root at ls ~]# uptime
09:54:09 up 3 min, 1 user, load average: 0.93, 0.64, 0.26
[root at ls ~]# vgs
VG #PV #LV #SN Attr VSize VFree
internvg 1 8 0 wz--n- 2.73t 1.05t
[root at ls ~]# uname -r; grubby --default-kernel
2.6.32-696.13.2.el6.x86_64
/boot/vmlinuz-2.6.32-696.13.2.el6.x86_64
[2]
The status of the vdisks in ls.hoc now:
http://folk.uio.no/kjell/usit/ls/ls-lc-vdisks.png
[3]
Error after rebooting ls.hpc from the console this morning:
http://folk.uio.no/kjell/usit/ls/IMG_3897.JPG
[4]
Mounted disks at ls.hpc now:
-----------------------------------------
[root at ls ~]# df -lh
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/internvg-root
2.0G 530M 1.3G 29% /
tmpfs 190G 4.0K 190G 1% /dev/shm
/dev/sdb1 488M 107M 356M 24% /boot
/dev/mapper/internvg-opt
31G 3.3G 26G 12% /opt
/dev/mapper/internvg-tmp
51G 224M 48G 1% /tmp
/dev/mapper/internvg-usr
152G 3.7G 141G 3% /usr
/dev/mapper/internvg-var
150G 14G 129G 10% /var
/dev/mapper/internvg-system
99G 1.9G 92G 2% /usit/ls/system
/dev/mapper/internvg-scratch
1008G 216G 742G 23% /usit/ls/scratch
More information about the infrastructure
mailing list