Startled by “component device mismatches” on RAID1 volumes

Post author By Ralf Bergs
Post date 2009-03-01
4 Comments on Startled by “component device mismatches” on RAID1 volumes

I was startled today by a message in syslog that seems to point to a problem with my RAID1 volumes:

Mar 1 01:13:54 server mdadm[961]: RebuildFinished event detected on md device /dev/md3, component device mismatches found: 512

This value is reflected in the following counter:
root:/etc/mdadm# cat /sys/block/md3/md/mismatch_cnt 512

I tried to clarify this by googling around, but I found no definitive answer whether this is an actual problem or not. However, I found a way to resync the MD components so that no mismatches remain:

root:/etc/mdadm# echo repair >> /sys/block/md3/md/sync_action

After you execute the repair you will notice that the counter shows the same number of mismatches again:

root:/etc/mdadm# cat /sys/block/md3/md/mismatch_cnt 512

This was to be expected — because the repair corrected (and thus encountered) this number of mismatches. So, if you force a check again, the counter should be down to 0:

root:/etc/mdadm# echo check >> /sys/block/md3/md/sync_action root:/etc/mdadm# watch cat /proc/mdstat [wait until check is finished] root:/etc/mdadm# cat /sys/block/md3/md/mismatch_cnt 0

Tags raid

By Ralf Bergs

Geek, computer guy, licensed and certified electrical and computer engineer, husband, best daddy.

View Archive

4 replies on “Startled by “component device mismatches” on RAID1 volumes”

Update: I found the following statement from Neil Brown which seems to be a good explanation why these differences may happen:

Suppose I memory-map a file and often modify the mapped memory. The system will at some point decide to write that block of the file to the device. It will send a request to raid1, which will send one request each to two different devices. They will each DMA the data out of that memory to the controller at different times so they could quite possibly get different data (if I changed the mapped memory between those two DMA request). So the data on the two drives in a mirror can easily be different. If a ‘check’ happens at exactly this time it will notice.

Normally that block will be written out again (as it is still ‘dirty’) and again and again if necessary as long as I keep writing to the memory. Once I stop writing to the memory (e.g. close the file,
unmount the filesystem) a final write will be made with the same data going to both devices. During this time we will never read that block from the filesystem, so the filesystem will never be able to see any difference between the two devices in a raid1.

So: if you are actively writing to a file while ‘check’ is running on a raid1, it could show up as a difference in mismatch_cnt. But you have to get the timing just right (or wrong).

Neil Brown also says, that it is a problem mostly found with Swap on Raid1. Link to the whole discussion is: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=518834

It happened that I saw this error (again) a couple of days ago. Since it startled me again I googled for it — and came across my own above article as the first hit. I had simply forgotten that I had seen and investigated this error already a while ago… 🙂

In my case mismatches had exactly the value of 128 and appeared only for /boot partitions on raid1. I think this pretty normal, because event sequence was as follows:
1) OS installed on 2 HDDs with /boot and / partitions on RAID1 via mdadm
2) /dev/sdb at some point became faulty and was replaced
3) sfdisk -d /dev/sda | sfdisk /dev/sdb
4) mdadm /dev/md0 –add /dev/sdb1
5) echo -e “root (hd1,0)\nsetup (hd1)\nquit” | grub –batch

At stage 5) grub writes directly to HDD, omitting the raid1 level, so /dev/sda1 and /dev/sdb1 become out of sync and weekly cron job 99-raid-check complains about mismatches.

Cookie	Duration	Description
cookielawinfo-checbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

By Ralf Bergs

4 replies on “Startled by “component device mismatches” on RAID1 volumes”

Leave a Reply Cancel reply