Replace failed drive in a software raid 5 on ISCSI

Introduction

On my server where I host a couple of Virtual Machines (VM’s) I use a software raid 5. This raid 5 is build on top of ISCSI drives. These are four ISCSI targets. Which lives on 3 QNAP nas devices. So on one NAS I just have 2 targets configured.

When a failure occurs

While reconfigure a network switch, I had to reload the config of this switch. Which caused one NAS to be disconnected from the network. Which of course caused a failure on the software raid. The following messages appeared in dmesg:

[69298.238316] connection4:0: detected conn error (1022)
[69729.155489] connection2:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4312322049, last ping 4312323328, now 4312324608
[69729.733780] connection2:0: detected conn error (1022)
[69849.987513] session2: session recovery timed out after 120 secs
[71820.937756] perf: interrupt took too long (12466 > 10041), lowering kernel.perf_event_max_sample_rate to 16000
[125542.257008] sd 5:0:0:0: rejecting I/O to offline device
[125542.514793] blk_update_request: I/O error, dev sdd, sector 16 op 0x1:(WRITE) flags 0x800 phys_seg 1 prio class 0
[125543.013298] md: super_written gets error=10
[125543.215817] md/raid:md0: Disk failure on sdd, disabling device.

This is to be expected. Once the switch was back and the NAS was reachable again, the state of the software raid can be checked by:

cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md0 : active raid5 sdd[4](F) sdc[2] sde[0] sdf[1]
1572467712 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/3] [UUU_]
bitmap: 2/4 pages [8KB], 65536KB chunk

unused devices: <none>

Notice that the device sdd is marked as failed (F). More details can be obtained by the following command:

mdadm --detail /dev/md0
/dev/md0:
Version : 1.2
Creation Time : Sun Feb 27 13:08:41 2022
Raid Level : raid5
Array Size : 1572467712 (1499.62 GiB 1610.21 GB)
Used Dev Size : 524155904 (499.87 GiB 536.74 GB)
Raid Devices : 4
Total Devices : 4
Persistence : Superblock is persistent

Intent Bitmap : Internal

Update Time : Sat Mar 12 06:25:43 2022
State : clean, degraded
Active Devices : 3
Working Devices : 3
Failed Devices : 1
Spare Devices : 0

Layout : left-symmetric
Chunk Size : 512K

Consistency Policy : bitmap

Name : darklord:0 (local to host darklord)
UUID : 75a05c94:d25d97da:56950464:c5aa539a
Events : 3429

Number Major Minor RaidDevice State
0 8 64 0 active sync /dev/sde
1 8 80 1 active sync /dev/sdf
2 8 32 2 active sync /dev/sdc
- 0 0 3 removed

4 8 48 - faulty /dev/sdd

So now we now this drive has failed, how to fix it? Since this is a “ISCSI disk” the drive is not really “faulty”

Fixing the raid 5 array

To fix the raid 5 array is actually quite simple. First we remove the failed drive:

mdadm --manage /dev/md0 --remove /dev/sdd
mdadm: hot removed /dev/sdd from /dev/md0

Next we re-add the /dev/sdd device back into the array:

mdadm --manage /dev/md0 -a /dev/sdd
mdadm: re-added /dev/sdd

Next is checking the raid 5 array:

cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md0 : active raid5 sdd[4] sdc[2] sde[0] sdf[1]
1572467712 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/3] [UUU_]
[================>....] recovery = 84.8% (444654656/524155904) finish=106.8min speed=12396K/sec
bitmap: 2/4 pages [8KB], 65536KB chunk

So the raid 5 array is rebuilding. After a while:

cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md0 : active raid5 sdd[4] sdc[2] sde[0] sdf[1]
1572467712 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/4] [UUUU]
bitmap: 2/4 pages [8KB], 65536KB chunk

So all is good again.

Running kubernetes on a 10 node Raspberry PI cluster

Introduction

In this article I designed and build a 10 node PI cluster. The specs of this cluster are not bad. The cluster consists of:

5x Raspberry PI3B+ 4 cores 1.2Ghz (Broadcom BCM2837 Cortex-A53)
5x Raspberry PI4 4G 4 cores 1.5Ghz (Broadcom BCM2711 Cortex-A72)

The specs of the cluster:

Total Storage (Gb)    : 320
Total RAM (Gb)        : 25
Total CPU Cores       : 40 
Total Cpu Ghz         : 13,5 
Max Power consumption : 130 Watt

In this article I’m going to describe what my plans are for this cluster. And while the title is a dead give away, I’m not going into technical depths. If this article is going to be to long I’ll split the article up in multiple articles.

What to do with all this power?

Having the cluster up and running, it’s time to do something with it. One of the goals I have to learn is CI/CD pipelines. I have a small personal Gitlab server running. The goal is to use CI/CD pipelines with Kubernetes. And now that I have a 10 node cluster, this should be possible.

Running Kubernetes on Raspberry PI

While it’s possible to run Kubernetes on Raspberry PI, there are a couple of things to consider. And that is that most of the development being done is for major platforms like AMD, Intel etc. This means that for ARMhf and ARM64 a lot of things won’t work out of the box.

Another things to consider is: Kubernetes itself is not designed to run on a low powered platform like the Raspberry PI. However there are some Kubernetes flavors which focus on lightweight Kubernetes, by reducing for example the memory footprints. More on this later on. The bottom line is: Yes it is possible to run Kubernetes on Raspberry PI. There are however some caveats.

What is this Kubernetes anyway?

To get a overview of what Kubernetes is and what is does take a look at: What is Kubernetes – an overview The gist of it is that Kubernetes can be seen as a framework which allows the deployment of applications in containers, and to manage these containers and providing scaling and fail-over for the application being deployed. Also note that Kubernetes is also revered to as K8s, by stripping the 8 letters between K and S of Kubernetes. So don’t pronounce this as K eight, but pronounce it as: Kubernetes.

Starting with Kubernetes

Wanting to run Kubernetes on my Pi cluster is one thing. Actually getting to run it is quite a different story. So how to get Kubernetes running? Basically there are a couple of challenges hidden in this question. The first one is: Which flavor of Kubernetes to run, and the second one is: How to learn Kubernetes.

Which flavor of Kubernetes to run?

There are a couple of lightweight Kubernetes to choose from. Some of these allows you to run Kubernetes inside docker on your local machine for example K3d , Minicube. For the Raspberry Pi I investigated the following two:

- Microk8s
- K3s

Before installing Kubernetes onto a Raspberry PI notes to choose a 64bit OS. The reason for this is that most of the (docker) containers are for 64 bits OS’es. Therefore I used Ubuntu server (LTS) 64 bits for Raspberry PI. Normally I would use Rasbian, or Raspberry OS as it is called now, but since their 64bit version is still in beta during the time of writing I switched to Ubuntu server.

Let’s get started with MicroK8s

As the first candidate I installed MicroK8s. The reason for this is that the installation instruction sounds really simply. Install MicroK8s, and off you go. As it turns out, it was not that simple, and eventually I had to give up on MicroK8s, simply because I could not get is stable.

After I installed MicroK8s I noticed that after a couple of hours, the whole cluster was not responding to kubectl commands. All I get was timeout errors. Rebooting nodes was not helping. I did a lot of searching, and browsing through the issues on MicroK8s issues, and I did find that others where experience the same problems on Raspberry PI’s, but no solution.

So after two weeks of fighting I give up upon MicroK8s, and moved on to the next option

Installing K3S on Raspberry PI cluster

According to the documentation the name of K3s comes from:

We wanted an installation of Kubernetes that was half the size in terms of memory footprint. Kubernetes is a 10-letter word stylized as K8s. So something half as big as Kubernetes would be a 5-letter word stylized as K3s. There is no long form of K3s and no official pronunciation.

Installing K3s can be completely done by using k3s-Ansible. Which is perfect fit, since I do a lot with Ansible. After cloning the repository I Fowwloed the instruction (which basically tells you to change the host.ini file and hostvar file and to run the ansible playbook).

And right from the start I could tell that K3s was much more stable. I could deploy applications, and could remove them, and even after a couple of days the cluster was stable. Since the default version installed is v1.17.5+k3s1 I decided to upgrade the cluster. Due to my inexperience with Kubernetes, K3s and how to upgrade I manged to completely destroy my beautiful working cluster. So I started over, flashing all my SD cards with Ubuntu server, ran my own Ansible playbook to install the basics, and installed the latest version of Rancher’s K3s. And to my relieve everything works perfect. The cluster runs stable.

So in conclusion: Microk8s sounds great, and I really wanted to use it, but couldn’t get it stable to run. Keep in mind that at this point I have no experience with Kubernetes, so your mileage may vary. Thanks to Ansible-k3s repository I could get K3s up and running quickly. At this point I’m not interested in the details on how to install and configure Kubernetes. In case of K3s it’s one binary anyway. At this point I’m mainly interested in getting Kubernetes up and running and start learning on how to use Kubernetes.

Learning Kubernetes

Now that Kubernetes is up and running where to start learning? I came across the Kubernetes 101 series done by Jeff Geerling. I highly recommend watching his Kubernetes 101 YouTube video’s. I learned a lot watching these video’s and give me some basics to get me started.

In the next article I’m going to describe how I got Gitlab CI/CD pipelines working with my Kubernetes cluster.

Installing PyVisa on MacOS 10.14.6 (Mojave)

In this article I had some trouble installing the NI-VISA library for py-visa. So this article is a quick update on that. This article describes what I did to test the NI-VISA library. And honestly I don’t know why it was not working.

First of all, when testing the installation of pyvisa with:

>>> import pyvisa
>>> rm = pyvisa.ResourceManager()
>>> rm.list_resources()

Make sure the equipment connected to the USB GPIB adapter is on. If the connected equipment is not on, you get a empty list of resources back.

Testing the NI-VISA library

The first thing I wanted to know was: When the NI-VISA library is not working, is that due to some configuration?

Testing can be a little annoying since when you reinstall the library, or de-install and (re)install you have to reboot your machine. And I didn’t want to mess around to much, with the risk I wrecked some black magick library configuration. Which might be almost impossible to fix.

So I figured: Why not unpacking the installation package, and try the driver within the package directly ?

Unpacking a .pkg file under MacOS is really simply. First mount the Downloaded .dmg package. In my case: NI-VISA_20.0.0.dmg

Once it’s mounted, I changed to my home-dir, and created a test directory.

cd ~
mkdir test-nivisa
cd test-nivisa

Next I copied the installation package (NI-VISA_Full_20.0.0.pkg) to this test dir:

cp /Volumes/NI-VISA\ 20.0.0/NI-VISA_Full_20.0.0.pkg ./test-nivisa

Unpacking (or expanding) the install package is really easy:

pkgutil --expand nivisai.mpkg/.packages/NI-VISA_Full_20.0.0.pkg ./unpack

Note that the unpack dir is created during expanding the package. So don’t create the dir upfront! If you do the command fails with:

pkgutil --expand NI-VISA_Full_20.0.0.pkg ./unpack
Error encountered while creating ./unpack. Error 17: File exists

In the test dir where the package is unpacked, a lot of other packages can be found. One of these packages contains the library which I’m after. However all the packages contains a file called “Payload” which is a gzipped tar file.

To unpack this file for each package, the find command is our friend:

cd unpack
find ./ -name 'Payload' -exec tar xzvf {} \;

This will unpack every Payload file in your current directory. Since the “v” flag is enabled (verbose) this outputs a lot of text (files which are untarred) There is a chance this will overwrite files, but this is not something I’m worried about, as long as I can use the NI-VISA library.

This library is called “VISA”, so a second find command is needed:

find ./ -name 'VISA'

Which gives the result:

.//VISA.framework/Versions/A/VISA
.//VISA.framework/VISA

Once I had the library I tested this with Pyvisa. This can easily be done in a virtual environment (not since I already tested this, the package pyvisa is already installed):

python3 -m venv env
pip install pyvisa
Requirement already satisfied: pyvisa in /Users/edwin/.pyenv/versions/3.7.3/lib/python3.7/

python3
Python 3.7.3 (default, Dec 4 2019, 15:11:28)
[Clang 10.0.1 (clang-1001.0.46.4)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pyvisa
>>> rm = pyvisa.ResourceManager('./VISA.framework/VISA')
>>> rm.list_resources()
('GPIB0::9::INSTR',)
>>>

As can be seen on the last line:

('GPIB0::9::INSTR',)

The NI-VISA library works just fine. The actual library lives in:

/Library/Frameworks/VISA.framework/VISA

So I created a file .pyvisarc in my home dir (notice the dot (.) in front of the file!

This files contains:

cat ~/.pyvisarc
[Paths]
VISA library: /Library/Frameworks/VISA.framework/VISA

So know when I use pyvisa-info (pyvisa-shell) it works as well. pyvisa-info gives:

pyvisa-info
Machine Details:
Platform ID: Darwin-18.7.0-x86_64-i386-64bit
Processor: i386

Python:
Implementation: CPython
Executable: /Users/edwin/.pyenv/versions/3.7.3/bin/python3.7
Version: 3.7.3
Compiler: Clang 10.0.1 (clang-1001.0.46.4)
Bits: 64bit
Build: Dec 4 2019 15:11:28 (#default)
Unicode: UCS4

PyVISA Version: 1.11.3

Backends:
ivi:
Version: 1.11.3 (bundled with PyVISA)
#1: /Library/Frameworks/VISA.framework/VISA:
found by: auto
bitness: 64
Vendor: National Instruments
Impl. Version: National Instruments
Spec. Version: National Instruments
py:
Version: 0.5.1
ASRL INSTR: Available via PySerial (3.4)
USB INSTR: Available via PyUSB (1.1.1). Backend: libusb1
USB RAW: Available via PyUSB (1.1.1). Backend: libusb1
TCPIP INSTR: Available
TCPIP SOCKET: Available
GPIB INSTR:
Please install linux-gpib (Linux) or gpib-ctypes (Windows, Linux) to use this resource type. Note that installing gpib-ctypes will give you access to a broader range of funcionality.
No module named 'gpib'

So I really don’t know why it was not working the first time, and why it almost a day of pulling my hear out. There are two things I can think of:

I switch with my usb adater between a windows 10 VM maybe I didn’t release the adapter properly from Windows 10?

Or the adapter was not plugged in correctly ?

I tried switching from MacOS to my Windows 10 VM multiple times, noticing it worked in Windows 10 perfectly, but not under MacOS.

Anyways, it works now. And hopefully the steps above might be useful to someone.