In Netapp & IBM Nseries filers and gateways (Or any storage running the Data ONTAP and using the WAFL File system) you might face a volume or aggregate inconsistency. Its sad to say if you had this you will need to plan a downtime on the filer or the gateway to fix this. This seems to be more popular on large volumes & aggregate. I had usually seen this mostly on Volumes or Aggregates larges than 1TB. Further more its wide spread on filers and gateways running earlier version of Data ONTAP prior to 7.2.4. If you are still running a Data ONTAP prior to 7.2.4 I highly recommend you update to 184.108.40.206.
Ah all that have been said let’s see what do u do if you already have a volume or an aggregates going in restricted mode and showing inconsistent. In basic terms, you will need to run WAFL_check which basically a file consistency check like running a chkdsk on windows. Unfortunately it does not seems up to this moment that NetApp had came up with away to run WAFL_check on ONTAP filers/gateways without requiring a downtime. So before doing the below instruction you will need to plan a downtime. How long this downtime is really dependent on how many files/inodes you have on that volume/aggregate. It really depend on the number of the files rather on the size of them. So anyway plan a downtime & pray your WAFL_check will finish before the downtime.
Hmmmm…., I know you are looking for the steps only, though as I said earlier plan a downtime & don’t do it unless if you have to. Below is the steps to do it:
1- This step apply for clustered filers/gateways only if you only have a single filer/gateway then skip to step 2. If you are running a clustered filer/gateway then you will need to disable the failover on both nodes running:
Filer1> cf disable <== This is to ensue the cluster will not failover to the second controller when you reboot the filer to do the WAFL Check.
2- Follow the below instructions on the controller owning the Aggr or volume you need to run the WAFL Check on it. Ah make sure you follow step.1 on both filers if you are running a clustered configuration before proceeding with the below:
Data ONTAP (test.com)
Total number of connected SCSI clients: 1 Number of r/w, online, mapped LUNs: 4
Warning: Rebooting with clustering disabled will terminate SCSITarget services and might cause data loss and application visible errors, or other OS failures on storage clients!!
CIFS local server on vfiler vfiler11 is shutting down…
CIFS local server on vfiler Vfiler12 is shutting down…
CIFS local server on vfiler vfiler11 has shut down…
CIFS local server on vfiler vfiler12 has shut down…
CIFS local server on vfiler vfiler0 is shutting down…
CIFS local server on vfiler vfiler0 has shut down…
Enter the number of minutes to wait before disconnecting :
11 minute left until termination (^C to abort)…[LCD:info] REBOOTING
*** Warm reboot…
Starting Press CTRL-C for special boot menu
Special boot options menu will be available.
Mon Feb 26 21:43:52 GMT [cf.ic.linkEstablished:info]: The Cluster Interconnect link has been established.
Mon Feb 26 21:43:53 GMT [cf.nm.nicTransitionUp:info]: Interconnect link 0 is UP
NetApp Release 7.0.5: Wed Aug 9 00:27:38 PDT 2006
Copyright (c) 1992-2006 Network Appliance, Inc.
Starting boot on Mon Feb 26 21:43:47 GMT 2007
Mon Feb 26 21:44:02 GMT [diskown.isEnabled:info]: software ownership has been enabled for this system
LUN Ownership using low range
Please choose one of the following:
(1) Normal boot.
(2) Boot without /etc/rc.
(3) Change password.
(4) Initialize owned disks (65 disks are owned by this filer).
(4a) Same as option 4, but create a flexible root volume.
(5) Maintenance mode boot.
Selection (1-5)? WAFL_check
In a cluster, you MUST ensure that the partner is (and remains) down,
or that takeover is manually disabled on the partner node,
because clustering software is not started or fully enabled
in WAFL_check mode.
FAILURE TO DO SO CAN RESULT IN YOUR FILESYSTEMS BEING DESTROYED
Continue with boot? y
add net 127.0.0.Mon Feb 26 21:48:01 GMT [cf.noDiskownShelfCount:info]: Disk shelf count functionality is not supported on software based disk ownership configurations.
0: gateway 127.0.0.1Mon Feb 26 21:48:04 GMT [fmmbx_instanceWorke:info]: Disk disk1:0-4.125L1 is a primary mailbox disk
Mon Feb 26 21:48:04 GMT [fmmbx_instanceWorke:info]: normal mailbox instance on primary side
Mon Feb 26 21:48:10 GMT [fmmbx_instanceWorke:info]: Disk disk1:0-4.125L0 is a backup mailbox disk
Mon Feb 26 21:48:10 GMT [fmmbx_instanceWorke:info]: normal mailbox instance on backup side
Mon Feb 26 21:48:10 GMT [cf.fm.partner:info]: Cluster monitor: partner ‘filer2′
Mon Feb 26 21:48:10 GMT [cf.fm.timeMasterStatus:info]: Acting as cluster time slave
Mon Feb 26 21:48:14 GMT [localhost: cf.fm.launch:info]: Launching cluster monitor
Mon Feb 26 21:48:14 GMT [localhost: cf.fm.partner:info]: Cluster monitor: partner ‘filer2′
Mon Feb 26 21:48:14 GMT [localhost: cf.fm.notkoverClusterDisable:warning]: Cluster monitor: cluster takeover disabled (restart)
Mon Feb 26 21:48:15 GMT [localhost: cf.fsm.takeoverOfPartnerDisabled:notice]: Cluster monitor: takeover of Filer2 disabled (cluster takeover disabled)
Mon Feb 26 21:48:15 GMT [localhost: raid.cksum.replay.summary:info]: Replayed 0 checksum blocks.
Mon Feb 26 21:48:15 GMT [localhost: raid.stripe.replay.summary:info]: Replayed 0 stripes.
Check vol01? y
Check vol02? n
WAFL_check NetApp Release 7.0.5
Starting at Mon Feb 26 21:50:32 GMT 2007
Phase 1: Verify fsinfo blocks.
Phase 2: Verify metadata indirect blocks.
Phase 3: Scan inode file.
Phase 3a: Scan inode file special files.
Phase 3a time in seconds: 6
Phase 3b: Scan inode file normal files.
Phase 3b time in seconds: 2989
Phase 3 time in seconds: 2995
Phase 4: Scan directories.
Phase 4 time in seconds: 0
Phase 5: Check volumes.
Phase 5a: Check volume inodes
Phase 5a time in seconds: 0
Phase 5b: Check volume contents
Phase [5.1]: Verify fsinfo blocks.
Phase [5.2]: Verify metadata indirect blocks.
Phase [5.3]: Scan inode file.
Phase [5.3a]: Scan inode file special files.
Phase [5.3a] time in seconds: 20
Phase [5.3b]: Scan inode file normal files.
Phase [5.3b] time in seconds: 5
Phase [5.3] time in seconds: 26
Phase [5.4]: Scan directories.
Phase [5.4] time in seconds: 6
Phase [5.6]: Clean up.
Phase [5.6a]: Find lost nt streams.
Phase [5.6a] time in seconds: 5
Phase [5.6b]: Find lost files.
Phase [5.6b] time in seconds: 16
Phase [5.6c]: Find lost blocks.
Phase [5.6c] time in seconds: 0
Phase [5.6d]: Check blocks used.
Phase [5.6d] time in seconds: 722
Phase [5.6] time in seconds: 744
Clearing inconsistency flag on volume vol01.
Volume vol01 WAFL_check time in seconds: 776
Inconsistent vol vol01 marked clean.
WAFL_check output will be saved to file /vol/vol01/etc/crash/WAFL_check
Commit changes for aggregate aggr1 to disk? ? y
* Ah yeah you are right that all to it, but the only pain in the process that you have to call for a downtime and hear all the good words of your management or the business people. I hope ONTAP people fix this in the future.
Please leave a comment if you had find this post useful, or if it did not work with a certain ONTAP version to help others avoid it.
3 Responses to 'WAFL Check to fix Netapp / Nseries Volume or aggregate inconsistency'
Leave a Reply
Eiad Al-Aqqad, VCDX#89
VMware Canada PSO
- Backup Solutions (3)
- Blades (2)
- IBM Blades (2)
- Data Migration (2)
- EMC (1)
- VPLEX (1)
- How to (1)
- Management Software (5)
- Problem resolutions (0)
- Storage (21)
- Tips & Tricks (7)
- Tivoli (22)
- Tutorials (11)
- VMware (6)
- Iwan: hi… I can not re-install smi-s agent on server windows 2003, I`ve been search the registry but still...
- sindhu: hello sir,i am doing my b.tech 3rd year (e.c.e) i am intrested in doing this tsm course .is this course will...
- CR7: Hi EIAD, I couldnt figure out where to post this doubt of mine, so im just posting it here.Please help out if...
- Errol: I’m using a clean install of windows 7 Ultimate 64 bit with the latest build of Thinapp 5.1.0-2079447....
- arun: Is that tsm is growing and good field
- TSM – IBM Tivoli Storage Manager Guru Blog: Should Virtual Tape Library have a place in your backup strategy?
- Tivoli TSM guide to securing VMware: IBM Tivoli Storage Manager & VMware Consiledated Backup (VCB) I hope these...
- TSM – IBM Tivoli Storage Manager Guru Blog: Microsoft SQL Cluster Data Migration to a new SAN
- TSM – IBM Tivoli Storage Manager Guru Blog: Migrating Exchange 2007/2003 Cluster to a new SAN
- IBM Tivoli Storage Manager Guru Blog: TSM
- Should Virtual Tape Library have a place in your backup strategy?
- Symantec Backup Exec StarWind Virtual Tape Library Integration
- Comparing Online Backup Services
- Veeam & PHD Virtual comparison
- IBM Tivoli Storage Manager support for VADP
- StarWind is Named a Finalist in The Storage Awards 2012
- PHD Virtual Monitor Review
- vSphere manual Disaster Recovery failback when using VMware SRM
- Migrating Exchange 2007/2003 Cluster to a new SAN
- Microsoft SQL Cluster Data Migration to a new SAN