With virtualization on the rise we sys admins find ourselves managing a lot more server than normal. Gone are the days of managing a couple racks of pizza boxes. Instead one of those pizza boxes may hold a hundred virtual servers itself. And with so many servers, and clients doing the same fun things, we find ourselves looking into the random "my server rebooted why?" question.
When investigating a reboot you can search the system event log for the event ID's below. Each one corresponds to a reboot and will help determine why. And after you find the actual reboot you can check the rest of the events around that time to see if anything lead to or caused it. For example windows updates, or a BSOD.
The process Explorer.EXE has initiated the restart of computer SERVER01 on behalf of user SERVER01\UserName for the following reason: Other (Planned)
Reason Code: 0x85000000
Shutdown Type: restart
Comment: Server updates
The Event log service was stopped.6005
The Event log service was started.109
The kernel power manager has initiated a shutdown transition.19
Installation Successful: Windows successfully installed the following update: Definition Update for Windows Defender - KB2267602 (Definition 1.173.438.0)22
Restart Required: To complete the installation of the following updates, the computer will be restarted within 15 minutes:13
The operating system is shutting down at system time.12
The operating system started at system time.
I work a lot with DFSr because we use it to keep some web farm replicated and some of our customer's private farms. I can tell you it sucks, it always breaks, and it's very hard to maintain. Although I'll caveat that by saying we probably shouldn't use it for web farms with millions of little files. Seems to work fine for AD. Anyway, this is the most common issue you will run into with DFSr, the unexpected crash or shut down. Both the nodes this occurred on did not crash, in fact they didn't even reboot or shut down. But that doesn't matter, DFSr still crashed. Below is just one example and the fix for it. It's obvious from the event what you need to do, but lets review anyway.
The one thing you HAVE to remember is to leave it alone. Do not touch it after you resume replication. That's the #1 mistake I see people making with troubleshooting DFSr. Either rebooting the server or restarting the server. DFSr keeps a journal (database) of all the changes to the replicated folders. You can't just restart the service or reboot the server to fix this. That's like trying to restart SQL to recover a corrupted database. Instead you need to recover that journal, which fortunately Microsoft tells you exactly how to do in the event log.
To get to the event log go to Control Panel --> Administrative Tools --> Event Viewer --> Applications and Services Logs --> DFS Replication.
Event ID 2213
The DFS Replication service stopped replication on volume C:. This occurs when a DFSR JET database is not shut down cleanly and Auto Recovery is disabled. To resolve this issue, back up the files in the affected replicated folders, and then use the ResumeReplication WMI method to resume replication.
1. Back up the files in all replicated folders on the volume. Failure to do so may result in data loss due to unexpected conflict resolution during the recovery of the replicated folders.
2. To resume the replication for this volume, use the WMI method ResumeReplication of the DfsrVolumeConfig class. For example, from an elevated command prompt, type the following command:
wmic /namespace:\\root\microsoftdfs path dfsrVolumeConfig where volumeGuid="32A74A78-0B49-11E2-93EE-806E6F6E6963" call ResumeReplication
You will need to run the command given in step two from the event in command prompt as administrator to resume replication. Remember that each node in the DFSr replication group has a different GUID. Get the command from event viewer on each node and run it. Example below.
wmic /namespace:\\root\microsoftdfs path dfsrVolumeConfig where volumeGuid="32A74A78-0B49-11E2-93EE-806E6F6E6963" call ResumeReplication
After you run it you will see Event ID 2212 in the log.
The DFS Replication service has detected an unexpected shutdown on volume C:. This can occur if the service terminated abnormally (due to a power loss, for example) or an error occurred on the volume. The service has automatically initiated a recovery process. The service will rebuild the database if it determines it cannot reliably recover. No user action is required.
You may also see Event ID 2218
The DFS Replication service is in the second step of replication database consistency checks after an unexpected shutdown. The database will be rebuilt if it cannot be recovered. No user action is required.
Now you just need to wait for the database to recover. Depending on the amount of files and how long it has been down for it can take a few minutes, several hours, or even days. You MUST leave it alone. Do not reboot the server or restart DFSr. That will simply start the process all over again.
Once it is fully recovered you will see event ID 2214.
The DFS Replication service successfully recovered from an unexpected shutdown on volume C:.This can occur if the service terminated abnormally (due to a power loss, for example) or an error occurred on the volume. No user action is required.
Once you see that event you are good to go. More info in this MS KB.
You may also want to see this list of hotfixes for DFSr for Windows 2008 and 2008 R2.
This is a very simple script to add a local administrator. You always want a backup Admin to get into a computer, because the "Administrator" should be disabled after all. This is a simple script I use to create one that also generates a password I can store somewhere, like keepass. Here is what it does.
- Generates a 32 character complex password.
- Creates a local user.
- Adds the user to local Administrators group.
- Sets the password to never expire.
- Spits out the password to the console window so you can copy/pasta to keepass.
Add-Type -Assembly System.Web
NET USER username "$pass" /ADD /y
NET LOCALGROUP "Administrators" "username" /add
WMIC USERACCOUNT WHERE "Name='username'" SET PasswordExpires=FALSE
Write-Host "$pass" -foregroundcolor red -backgroundcolor yellow
Make sure to replace all the "username" (highlighted in red) with the username you wish to create. I use this at work to create a standard backup admin user for servers. It's always the same username with a different password for each server.
I realized this doesn't work with the execution policy set to restricted. So I made a bat file that runs it from your desktop after setting the execution policy to unrestricted. What I do is copy the two files to the desktop of the server (you can do this in RDP for any server 2008 or greater). Then right click on the bat file and "run as administrator". Here is the script for the bat.
Update deuce. Per reditor's suggestion I took the command to change the execution policy and instead bypass it.
powershell -ExecutionPolicy Bypass -file %USERPROFILE%\Desktop\name-of-your-ps1-file.ps1
In a bind for disk space on a MSSQL server that you cannot restart? Here's a way to force a shrink of the TempDB if that's your issue. Be aware this can negatively impact performance since you will be clearing the execution cache for the server. The cache that SQL uses to store execution plans after they are compiled. Meaning they have to be recompiled. But working for a small host with shared database servers that CANNOT go down during the day I've been in this bind. This is the script I found to fix the issue.
-- Shrink tempDB data file
DBCC SHRINKFILE ('tempdev' , 1 )
-- Shrink tempdb log file
dbcc shrinkfile ('templog' ,1 )
This won't always work the first time, so just keep executing it until the TempDB goes down. I usually have to run this up to 10 times before the TempDB gets down to less than a 100MB. By default the TempDB should be around 7MB when it starts. Also. restarting the SQL server should empty the database and re-intialize it so to speak. Each time you start SQL server the TempDB should be about 7MB. Of course you should probably figure out what is filling your TempDB, but when you have 700 databases on one MSSQL server and you don't control any of them or know the developers, that's easier said than done.
How to increase the number of concurrent RDP connections in Microsoft’s remote desktop connection manager
An issue that has been plaguing me for years has been the number of RDP sessions I can connect to at once in Microsoft's remote desktop connection manager. RDman is an older piece of software that is simple and easy to use. It's also something we use at work every day. The problem is that MS hasn't updated it in a while and it's x86 (32bit). Each session takes up a decent amount of memory and once you get to the 1GB mark, you start getting errors like this.
Error possibly involving 'security settings':
Error HRESULT E_FAIL has been returned from a call to a COM component.
Error possibly involving 'server name':
Attempted to read or write protected memory. This is often an indication that other memory is corrupt.
Those errors usually go away if you sign out of a couple sessions, then you can sign in to more. But there is always a limit, anywhere from 6-10 connections at one time. Except that you cannot sign into the server that gave you the error when trying to connect.
One very simple solution is to just open multiple instances of RDman, but you're still limited to only 6-10 RDP sessions at one time. So how do we connect to more?
The problem is actually pretty simple to solve, if you have Visual Studio.
1. First, open the Command Prompt for VS2012. Or whichever version of Visual Studio you have installed. I happen to have 2012 since we have a dev on staff and we always have the latest license. Yay. Here is a link to how to get to the command shell for various operating systems.
2. CD into the directory where RDMan is installed. Most likely that is:
C:\Program Files (x86)\Remote Desktop Connection Manager
Your command should look like:
C:\Windows\system32>cd C:\Program Files (x86)\Remote Desktop Connection Manager
3. Now type the following command into the prompt:
editbin /LARGEADDRESSAWARE RDCMan.exe
4. You should get something like below.
C:\Program Files (x86)\Remote Desktop Connection Manager>editbin /LARGEADDRESSAW
Microsoft (R) COFF/PE Editor Version 11.00.61030.0
Copyright (C) Microsoft Corporation. All rights reserved.
5. That's it, now you should be able connect to a lot more sessions. Unfortunately this isn't unlimited so you still can't connect to 50 at a time. My max seems to be about 16 sessions at one time. But that at least allows me to open a couple instances so I can get to various environments. Where I work we have some complex Clouds that have 20 or more servers and on patch night I often need to connect to a lot of them at the same time. Especially for management purposes so I can gracefully fail over services, etc.
If you want to skip all this, or don't have Visual Studio, you can download the RDMan I already edited. It should work. Click the link. Remote Desktop Connection Manager
Another alternative is a different remote desktop connection manager. One I have played around a little with is Terminals. It's free and open source. I'm not sure if it has the same limit, I haven't added enough connections to test it out. Find it here. Make sure when adding connections for 2008/2012 you check the box under RDP --> Extended Settings --> Enable NLA Authentication. (If you are using Network Level Auth, which you should.)
Recently ran into an issue with a Poweredge r620 and iDrac. The server was up but giving a memory error and I couldn't get into iDrac. This is the error I saw when trying to login.
RAC0218: The maximum number of user sessions is reached
Very frustrating when I knew there was no one else logged in. A brief search online revealed that we can reset iDrac through SSH, but that didn't help since even SSH gave the same error when trying to login. The server is in a data center and I avoid the data center like the plague. I know there are some sys admins that just love the DC, the rows and rows of servers, the unbearable noise, the too hot and too cold isles. Maybe I'm old, or I've been doing this too long ...
If the OS is working you can use a tool in Dell OpenManage to reset iDrac remotely.
1. Make sure you have Dell OpenManage installed on the server. Download here.
2. Next open a command prompt as Administrator and CD to "C:\Program Files\Dell\SysMgt\idrac".
3. Now run the command "racadm racreset soft" (without the "" of course). racadm is the iDrac CLI admin, racreset is the subcommand, and soft is the parameter. This particular subcommand has 3 different methods to restart Hard, Soft, Graceful, and you can also delay the restart. I recommend that you start with a soft reset so you don't lose your settings. I imagine a hard reset would remove your login info, TCP/IP settings, etc. To be honest I haven't tested it to find out since the servers I have are in production. You can find more info here.
- A hard reset resets the entire RAC and is as close to a power-on reset as can be achieved using software. The RAC log, database, and selected daemons are shutdown gracefully prior to the reset. A hard reset should be considered as a final effort. PCI configuration is lost.
- A soft reset is a microprocessor and microprocessor subsystem reset that resets the processor core to restart the software. PCI configurations are preserved. The RAC log, database, and selected daemons are shutdown gracefully prior to the reset.
- A graceful reset is the same as a soft reset.
- The user is allowed to select how many seconds of delay occur before the reset sequence is started. A valid delay entry is between 1-60 seconds. The default is 3 seconds.
4. After running the command you should see the message below.
RAC reset operation initiated successfully. It may take a few
minutes for the RAC to come online again.
5. Give it a few minutes and then try and login to iDrac through the web interface or SSH. I was able to after running this reset.
6. If you still cannot login you can try a hard reset. Run the command "racadm racreset hard".
7. If that doesn't work, there is one last option but you'll need to physically access the server. Shut the server down then pull the power from it. Make sure there is no AC power to the server. Then hold the power button on the server for 30 seconds. This should completely reset the iDrac. You may need to reconfigure your login information and TCP/IP settings.
The very first time I got serious about computers was when I decided to buy a video card to upgrade, so I could play Delta Force. I don't remember exactly what I had at the time, in fact I don't even think it was my PC. I'm pretty sure it was my parents PC, and it was an ugly little HP or Compaq that had IGD and only a PCI slot. So my first video card wasn't even AGP Not that it mattered since I had no idea there was a difference at the time. I just walked into CompUSA and bought something off the shelf. And that something was a Diamond Multimedia Stealth III S540 PCI 32M. The PCI was for the PCI slot, and the 32M was 32MB of RAM. Which was actually twice what other video cards had at the time like the Nvidia TNT and Voodoo cards. And I could play Delta Force, and Delta Force 2, and it was great.
And I still have that video card. And it still works. And I will love it forever and keep it forever because it is the birth place of what I do now.
Along with those pics I found some related links. Of course no one reviewed the crappy little PCI one, and the card wasn't that great.
Recently at work I ran into an odd issue with a customer and MSSQL backups. The customer had setup some maintenance plans to backup their databases according to a schedule, fulls one day a week followed by differential backups and then rolling over. Pretty common. But when the poop hit the fan and the customer needed to do a restore they found their diffs wouldn't work. When the customer restored the full and then tried to restore the diffs they received an error:
This differential backup cannot be restored because the database has not been restored to the correct earlier state.
The reason this error is given is because the differential backup is not part of the time line, meaning another full backup was taken in between the time the last full and differential you are trying to restore (the logs were truncated). Ok, so let's find out when that backup was taken. Below is a script to lookup the backup history for a specific database. Just replace DBNAME with the name of your database.
-- Get Backup History for required database
SELECT TOP 100
CAST(CAST(s.backup_size / 1000000 AS INT) AS VARCHAR(14)) + ' ' + 'MB' AS bkSize,
s.backup_finish_date) AS VARCHAR(4)) + ' ' + 'Seconds' TimeTaken,
CAST(s.first_lsn AS VARCHAR(50)) AS first_lsn,
CAST(s.last_lsn AS VARCHAR(50)) AS last_lsn,
WHEN 'D' THEN 'Full'
WHEN 'I' THEN 'Differential'
WHEN 'L' THEN 'Transaction Log'
END AS BackupType,
FROM msdb.dbo.backupset s
INNER JOIN msdb.dbo.backupmediafamily m ON s.media_set_id = m.media_set_id
WHERE s.database_name = DB_NAME() -- Remove this line for all the database
ORDER BY backup_start_date DESC, backup_finish_date
The mystery unfolds. I found a full backup was taken by some mysterious device. I knew we had a DPM server running virtual machine snapshots, but it isn't agent based. The DPM server is simply taking a snapshot of the virtual machine, not the SQL server itself. So it wouldn't take a full backup right? Well I thought I was right since the time stamp of the full backup and the time the job ran in DPM were hours apart, even accounting for time zone differences. On top of that the full backups over the past month were all at different times.
But the device being used was a virtual device (ID 7) and it's name was a guid. I couldn't find anything else that was taking these backups on a regular schedule so it had to be DPM. Which is when I found this. The KB is a different version of DPM and the server OS, but issue #3 is what I was facing.
Consider the following scenario:
A virtual machine (VM) is being backed up on a server that is running Hyper-V.
At the same time, an application backup operation is being performed in the same VM.
In this scenario, some data is truncated from the application backup in the VM. Therefore, this behavior causes data loss.
The resolution is also in that KB aside from applying the hotfix (I did NOT apply the hotfix).
You can apply the following registry entry in a virtual machine to fix issue 3 for that virtual machine:
Value: 0 or 1
If this registry entry is created and its value is set to 1, application backup will not be affected by the virtual machine backup operation on the server that is running Hyper-V. If this registry entry does not exist, or if its value is 0, issue 3 occurs.
Voila, after creating that registry dword the backups DPM took no longer truncated the logs in the SQL server. So going forward, if you are using data protection manager to backup Hyper-V virtual machines you need to make sure you create that registry dword. If you do not the internal VSS on the virtual machine will run a full backup of the MSSQL database in response to DPM taking a snapshot. This will in turn break any backups you have configured in the server.
Use the script below to backup your MySQL server running on a Windows server. Just create the bat file then create a scheduled task to run it. This file will append the date to the file name and keeps files up to 7 days. Adjust as needed.
*Backs up the files to C:\MySQLBackup. Make sure to create that directory.
*Make sure to change the username and password.
*Edit the lines below for your version of MySQL.
*PUSHD "C:\ProgramData\MySQL\MySQL Server 5.6\data"
*"C:\Program Files\MySQL\MySQL Server 5.6\bin\mysqldump.exe"
:: Check to see if files older than 7 day exist
FORFILES /P "C:\MySQLBackup" /M * /D -7 /C "CMD /C DEL @path"
:: MySQl DB user
:: MySQl DB users password
:: Switch to the MySQL data directory and collect the folder names
PUSHD "C:\ProgramData\MySQL\MySQL Server 5.6\data"
:: Loop through the folders and use the fnames for the sql filenames, collects all databases automatically this way
ECHO "Pass each name to mysqldump.exe and output an individual .sql file for each"
FOR /D %%F IN (*) DO (
"C:\Program Files\MySQL\MySQL Server 5.6\bin\mysqldump.exe" -u %dbuser% -p%dbpass% -P 3306 %%F > "C:\MySQLBackup\%%F_%date:~-4,4%%date:~-7,2%%date:~-10,2%.sql"
A while back I upgraded my modem from a DOCSIS 2 to a DOCSIS 3. I purchase my own cable modem to save some money over leasing it from Comcast. It pays for itself within a year. I was using a DCOSIS 2.0 modem and pretty content with the results but then a brand new DOCSIS 3.0 modem went on sale for less than $100 so I upgraded. The difference is noticeable and as you can see from the results below the speed is definitely improved. I have the Xfinity Blast package which is something like 50Mbs down and 10Mbs up.
Also, another great way of testing is torrents since they can chew up bandwidth very quickly. As you can see my download speeds on the CentOS torrent are great.