Storage

Setting up Data Protection on Tintri T540 – Replicate and Restore

If you read my earlier post about Tintri’s VMstore T540 you know that setting up data protection on the T540 is very simple. There is not much to it really just a few simple clicks and you’re replicating between applicances.  Once you set the Replication settings ( assigning the replicating partner and password) and add the purchased license on the appliance you are ready to configure the vms for protection.

Follow the below settings to setup replication of a vm and restore it on another T540. You’ll find that it’s very simple to setup and takes just a few minutes.


Add Replication settings to appliance

Fig4_TintriD

Add Replication License

Fig3_TintriC

 


 Configuring VMs for Snapshot & Replication to another T540

  1. Login to your T540 using the url of the appliance http://x.x.x.x/
  2. Click on “Search VM” link (top right)
  3. A listing of all the VMs will be shown
  4. Find the Virtual Machine you wish to protect and right click and select Protect. Select Protect
  5. Select the snapshot schedule, retention period, and if you want the snapshot to be replicated and the alert threshold. When you are completed Click on Protect.Tintri_ReplicateA
  6. To verify the replication state, on the Virtual Machine list right click on the header bar and select Data Protection. This will refresh the Protection tab showing the replication state, schedule and retention.Tintri_ReplicateB

 


 Restoring a Replicated VM on another T540

  1. Login to T540 hosting snapshot using the url of appliance http://x.x.x.x/
  2. Click on “Search VM” top right
  3. A listing of all the VMs will be shown. Click on Snapshots and this will lists all snapshots held on the appliance.Tintri_Snapshot
  4. Locate the snapshot you want to recover. Right click and select Clone.Tintri_Restore
  5. Create new Virtual Machine window will open. Fill out the required fields and click Clone. The window appear at the bottom right informing you that the Virtual Machine is being added to the inventory.Tintri_RestoreA
  6. Sign into vCenter and find the Virtual Machine that you just cloned complete any needed configuration changes and power on.

 

 

Tintri T540 – Simple VM storage management

It’s the storage wars. Lots of new vendors on the storage array market doing their dog and pony show,  trying to get pieces of the pie over familiar giants like NetApp and EMC. Having  managed NetApp for a few years I was comfortable with their Filers so why change?

After spending a few years managing our VMware and it’s storage, our environment had grown and  started to show some performance bottlenecks. We needed to find a solution that would meet the demanding IO needs of our VMs  and  support the HA and DR requirements for our datacenter. This is where we decided to think outside of the box and look at all the new players in the storage industry. We started doing research and eventually decided on Tintri and their VMstore T540 appliance. If you haven’t heard of Tintri before they are a flash based storage array that was designed just to be used for datastore use of VMs. It’s purpose in life is storing VMs ,making it run better and faster. The storage array is designed to not only meet the demanding IO needs of VMs but also reduce the complexity of managing the storage the VMs.

The T540 not only met our requirements but also gave us simple storage management. Even though there isn’t very much to manage with the array it still offers features that were key to us such as NFS , hot cloning VMs, deduplication, compression and hardware replication. Tintri supports VMware vSphere 4.x and 5.x  using NFS, RHEVand even HyperV. VDI deployments of VMware Horizon View and Citrix XenDesktop can also run on Tintri.

The Tintri VMstoreT540 is a 3U rackmounted appliance that comes with:

  • Capacity: 26.4 TB (8x 300 GB SSD + 8 x 3 TB HDD) with 13.5TB usable space
  • Management Networking :  2x 1GbE (RJ-45)
  • Data Network : Option 1: 2x 10GbE (10GBASE-SR LC fibre or SFP+ direct attach copper) or  2: 2x 10GbE (RJ-45)
  • Replication Network: Optional: 2x 1GbE (RJ-45) or 2x 1GbE (SFP)

From start to finish it takes about 30 – 60 minutes to rack the appliance ( dependent on how fast you can rack it) in the datacenter and connect to vCenter. There is very little day to day management with the T540. Tasks such as  Hot cloning VMs and setting up replication takes minutes and is effortless. Once the T540 is racked you give it an IP address and you are off to configuring the appliance to connect to vCenter.

Easy and Simple

The following screenshots of the appliance’s settings  will show you how easy and simple it is to setup.

Datastore IP is just that, it where you assign the appliance with the IP address that the datastore will used when connected to vCenter
Fig1_TintriA

Datastore IP

 

 

 

 

 

 

 

 

 

 
Setup Snapshot schedules using the Snapshot tab
Fig2_TintriB

Snapshot Configuration

 

 

 

 

 

 

 

 

 
After supplying the Licenses for replication shown in Fig3_TintriC you can setup replication to another T540 shown in Fig4_TintriD
Fig3_TintriC

Replication License

 

 

 

 

 

Fig4_TintriD

Replication Settings

 

 

 

 

 

 

 

 

 

 

 

Wow! I sound like an commercial for Tintri. There is no paid endorsement here, it’s just the  truth, the thing just works and does a great job at what it was designed to do. It runs just like they say it does.

To learn more about the Tintri and the their other products the VMstore T620 and T650 check out  Tintri’s website.

Don’t Forget to Assign Ownership to Disks on your Netapp Filers

The other day a co-worker was doing his rounds in the data center, sent the team an email with (photos attached) indicating amber lights on one of the shelves of our Netapp Filer.  Amber lights on a shelf of disk on your SAN usually isn’t a good thing so I jumped onto OnCommand System Manager to begin the troubleshooting.It didn’t take me very long to figure out the problem because the first thing I see after I am logged into OnCommand was a warning that I had “unowned disks”.

The aha moment hit me, just the day before we had 2 disks fail and were replaced via the configured Autosupport. Gosh, I do like the autosupport. Having a disk go bad overnight then getting a replacement before you even walk through the door is very convenient.

When these particular disks were replaced by another co-worker the day before, he had removed the bad disks and inserted the replacement disks. Since our filers are setup with software disk ownership , with disk auto assign disabled, the disks were not assigned automatically to an owner.  Ownership must be assigned for disk(s) before the filer can use them otherwise they are useless and flash amber lights at you.

To assign ownership to the disk(s) SSH into your Filer and do the following :

  1. You will need to locate the disk(s) that don’t have any owners. Type the following command
    disk show -n
  2. Once you have the disk name of the unowned disk you assign ownership with this command:
    disk assign <disk_name>
    disk assign 0b.16
    for multiple disks < disk assign 0b.43 0b.24 0b.27
    or
    Assign ownership for all unowned disks at once < disk assign all
  3. Run disk show -v to verify the disk assignments

So there you go, a pretty simple fix. Next time you replace a failed drive don’t forget to give it an owner!

SnapManager for Exchange fails to run scheduled snaps after running an upgrade to 6.0.4

Sometimes fixes & patches introduce another set of issues that will give way to another set of new patches and fixes.

In our case, it was our upgrade to SnapManager for Exchange( SME) 6.0.4 which had fixes to some bugs we were facing. Everything seemed to go real well, all the upgrades on the Exchange 2010 DAG member servers didn’t hiccup one bit. This was too good to be true, an upgrade of SME and no issues so far. I had my fingers crossed and was hoping for the best, maybe luck would be in our corner.

No Joy…

After completing the upgrade on all servers I needed to run a test of some exchange snaps. Got to make sure it works right? I first started out running manual snaps on all the databases on each node. Those worked great, No Problems.

So onward to the next test which was to kick off a scheduled snap of the DAG databases. After kicking off a scheduled snap through task scheduler the snaps failed to run. After some digging around and a few more tests, my co-worker discovered that there is bug when you upgrade to SME 6.0.4 which causes scheduled snaps to fail.

According to Netapp’s KB 649767 article it has to do the value “0” is not selectable in the “retain up-to-the-minute restorability” option in the GUI of this release like it was in previous releases.  When running the snaps through the GUI of SME 6.0.4 , you can manually enter the value “0” and the run the job immediately, backups will work. The issue occurs when SME creates a scheduled job; it creates the job with wrong parameter , it be should be NoUtmRestore if you don’t want to retain any transaction logs.

http://support.netapp.com/NOW/cgi-bin/bol?Type=Detail&Display=649767

SME604_a

Getting Backups to work again…

To get scheduled backups to work again you will need to do one of 2 things:

  • Change the -RetainUTMDays and -RetainUTMBackups from something other than “0”. Changing the value to something other than “0” will retain your transaction logs for the specified value
  • If you don’t want to keep any transaction logs, manually modify the scheduled job and remove the -RetainUTMDays or -RetainUTMBackups parameters then replace with NoUtmRestore.
    • If you are running a DAG remember you will need to modify the scheduled job across all DAG members that have the scheduled job.

SME604

Unable to resize a Netapp volume?

There maybe times that you will need or want to re-size a Netapp volume and normally this is process is very easy to do. You can re-size a Netapp volume using  the ONcommand tool, Filerview, or even SSH into the filer directly. Either of these ways is perfectly fine, no wrong way to do it except for when it fails.

A common error I have seen for failed volume re-sizing is due to the “fs_size_fixed”  error.

vol3

The “fs_size_fixed” is a parameter that has been enabled on the volume either during setup or during a snapmirror relationship break. The parameter is there to prevent any type of accidental re-sizing on the volume.  The only way to re-size the volume is to remove the “fs_size_fixed” parameter by connecting directly to filer through an SSH tool and running the following commands. Once the parameter is removed you will be able to re-size the volume.

1. Connect to the filer ( you can use Putty if you have it)
2. First verify that “fs_size_fixed” is enabled on the vol, type : vol status [name of vol]

you will see in the status that “fs_size_fixed” is set to ON

vol1

3.  At the prompt type  : vol options [name of vol] fs_size_fixed off

4. You can confirm that the parameter has been disabled by typing : vol status [name of vol]

vol2

Failed Login to Netapp Filer using SSH/Putty

Netapp filers can be accessed and managed many ways, including using Putty to SSH into the filer itself.  In addition to FilerView, there is also another web based tool called Netapp OnCommand  System Manager that is GUI based which gives a very nice graphical performance chart detailing how HOT your filers are running. The OnCommand tool is great for everyday management of the filers but sometimes you will need to access the filers via Putty to run more advanced functions , ie. killing a NDMP session that is hung.

We had an interesting issue today while trying to access one of our Netapp filers using Putty. Every time we would we try try to log  into the filer with a Putty session we would get an access denied or the Putty session would simply close. What was odd was that it didn’t happen for all of the us Storage Engineers. Thinking that maybe are accounts are locked or maybe  the access got removed I started the OnCommand session and attempted to log into the filers.

Not a single hiccup. Logged in right away on every single filer we have. hmmm….so I can log in with my credentials using OnCommand but when using a Putty session I can’t. Yet, another storage engineer can login to both and we all have the same permissions. All filers were checked for locked accounts including Active Directory, nothing was locked.

After some more head scratching one of the other Storage Engineers stumbled upon a setting within the OnCommand System Manager setting that was caching our passwords. Once the tick box to cache passwords was cleared we were able to  successfully log onto the filers.

To remove the cache passwords in OnCommand :

  1. Run OnCommand System Manager and log onto any filer
  2. In the top left hand corner select to Tools
  3. Select Options         

oncommand
4. Clear the Enable Cache Passwords tick box

Oncommand2

  1. Select Clear Existing Passwords

  2. Select Save and Close

Once the settings were changed we were both able to Putty to the filers. Gotta Love the gotchas of cached passwords.

Snapdrive services failing to start on Windows Server 2008 x64

Snapdrive for Windows  is Netapp’s storage management software that allows you to easily provision storage, backup and restore your data on a Windows server. It’s a great tool when it works but when it doesn’t it’s a bear. I just recently had the experience of troubleshooting some of our servers that had some Snapdrive issues connecting to our filer. The server’s iSCSI connection was not affected so the issue went unnoticed for some time until a request to expand luns was made….That’s when it was discovered that the Snapdrive service was not running and failing to start.

When Snapdrive was opened the mmc would crash which then resulted in the following error in the Snapdrive MMC:

Web Service Client Channel was unable to connect to the LUNProvisioningService instance on machine ServerName.
Could not connect to ‘net.tcp://ServerNameSnapDrive/LUNProvisioningService.’ The connection attempt lasted for a time span of 00:00:00. TCP error code 10061: No connection could be made because the target machine actively refused it 

The event that appeared in the application logs:

Description:
Log Name: Application

Source: SnapDrive
Date: 1/05/2013 10:41:33 AM
Event ID: 101
Task Category: Generic event
Level: Error
Keywords: Classic
User: N/A
Computer: myserverxxx.com
Description:
SnapDrive service failed to start.
Error code : SnapDrive Web Service failed to start Reason: ‘The TransportManager failed to listen on the supplied URI using the NetTcpPortSharing service: failed to start the service. Refer to the Event Log for more details.’

I immediately jumped onto Netapp’s support site and starting searching for known issues. One post had indicated to check the permissions of the account accessing the filer and make sure it had local admin rights to the server, I knew that wasn’t issue because the account already had local admin rights. Plus, Snapdrive was working up until recently so permissions would be on the bottom of the list of culprits.The next few hits on the forums indicated that IIS admin needed to be enabled and ensure that the .NetTCPSharing service was enabled. When I checked for the services , IIS admin wasn’t even installed  and the .NetTCPPortSharing was in a disabled state.  I attempted to re-enable the service but it failed as I expected it too. Odd, I thought, Where is the IIS admin service?  What would prevent these services from starting?

Since IIS admin wasn’t available I went to Server Manager and confirmed it wasn’t installed and installed the feature through server manager. After the installation was completed I attempted to start the .NetTCPSharing server and the Snapdrive services again but all of them failed. Back to scratching my head again.

It took some digging but eventually I came to Netapp KB2013168 . The article noted  the following “.NetFramework and the Net.Tcp PortSharing Service. If .Net is not properly installed or the Net.Tcp PortSharing Service service are not functioning correctly, SnapDrive will not be able to connect to the LUNProvisioningServices and the ability to manage LUNs via the MMC can be impaired.”

Oh Snap! Anybody that knows me in “real” life knows how much the word .Net just gets under my skin. I’ve had to deal with so many issues that involved corrupted installs of .Net or some sort of Microsoft patch that would  “break” .Net and the application that depended on it, that I’ve grown a hatred for the word .Net.

Now that I’ve something to go on,  I followed the steps in the KB article for issue #2  and issue #3 ( the symptoms I was experiencing);

Issue 2:
Directory permissions to C:\WINDOWS\Microsoft.NET\Framework\v3.0\Windows Communication Foundation\SMSvcHost.exe.
For the NT Authority\Local Service account to be able to start this service, users must have read and execute permissions to the above path.

Resolution to Issue 2:
Incorrect permissions where configured on the C:\windows directory.
Verify that users have read and execute permissions to the path C:\WINDOWS\Microsoft.NET\Framework\v3.0\Windows Communication Foundation\SMSvcHost.exe.

Well, permissions wasn’t it because everything was there. Now onto issue #3

Issue 3:
SnapDrive 6.x service did not start because the ‘Net.Tcp Port Sharing service’ will not stay started. This is a dependency SnapDrive 6.x has that earlier versions do not.

Resolution to Issue 3:

Reinstall Microsoft .Net.

Reinstall .Net? Great , this should be fun  I thought to myself. I confirmed via Add/Remove Programs that the .Net 3.5 was installed but  the document referenced that Snapdrive required .Net 3.0  sp1 and that particular version was not listed anywhere. On a hunch, I went to server manager > Features > to see if the .Net 3.0 framework features were installed and Yes it was! Using the Server Manager wizard I removed the .Net 3.0 Framework Features, which requires a reboot to complete.

Once the uninstall was completed I re-installed the .Net 3.0 Framework using the same Server Manager wizard.When the installation completed I rebooted the server for good measure, once the server came back online the Snapdrive service was running again. Whew! What a morning now onto expanding the Luns as the applications owner requested.