20,000 Emails

You know that feeling you get when you receive 20,000 emails in an hour?

I don’t. In fact, I don’t know if I have received 20,000 emails in total.

What I do know is that last week, somebody received 20,000 emails in the course of an hour. Then, they asked me to fix it.

And even worse, the emails were still being sent. Every minute it took me to solve this problem added at least 50 emails to the pile.

Worse yet, I wasn’t quite sure what was sending them emails. I did know, based on the from address, that it was coming from our database server – we have an email setup for SQL Server so it can email out various reports and other types of notifications. But I didn’t know what stored procedure in the server was sending the emails, and there are hundreds of stored procedures in there.

Every single email was the exact same. The subject line was the name of the report, and the body contained one line of text. Nothing in the email itself tells you where it comes from.

Trying to find things in databases can be hard sometimes, in my experience. I frequently ask the question “what updates this table?”, and given that any stored procedure in the entire database – or even a stored procedure on a different, linked database server – could be the cause, manually searching is a huge pain. Thankfully, this is a common enough problem that there are tools made for this use case, and so far I have used SQL Search.

SQL Search is basically Control+F (Find) for databases. For example, I can search for all stored procedures that reference a given table name. Or I can search for the name of a foreign key to find tables that reference other tables.

So what did I do? I used this tool and searched for part of the subject line in the email. And voila, I founded the exact stored procedure that sends the email.

A question the reader may ask at this point is why the procedure was sending so many emails. I didn’t know, at the time, and I was going to take the time to figure out. My only goal at that moment was to get it to stop.

I found the scheduled job that ran the stored procedure. It was still running, so I terminated it.

Problem Solved

All in a day’s work, amirite?

But of course, things are never that easy. As I began relaxing, thinking I had solved it, I noticed a tiny little wrinkle… the emails were still coming in.

I gave it about three minutes, because I figured a lot of those emails were probably still in transit or were being processed by the email server and hadn’t yet synced down. Which is pretty reasonable, I think. But several minutes later the emails kept coming in. That is, as they say, not good.

The next step I took was… let’s say not the brightest. But again, I was under pressure trying to solve this as quickly as possible (as it turns out, Outlook completely chokes when it tries loading that many emails, so this person wasn’t getting any work done), so I was willing to take a risk.

I thought that the stored procedure was still running, even though I had terminated the job. I thought maybe once the procedure starts it will keep running forever. In hindsight there is probably a way to terminate an individual running procedure as well, but that didn’t occur to me at the time.

So I restarted the database server. Or, more accurately, I restarted the SQL service, because that takes much less time than rebooting the windows server running underneath.

This is not normally something I would have done. Quite a few applications and reports rely on that database all day long, so there is no time during the day where you can just restart it without causing very visible disruption. But again, panicked me isn’t the sharpest bulb in the pudding drawer, so I did it.

Finally the emails stopped.

I had done it, for real this time. The emails stopped coming in, the database server was up and running again (surprisingly I didn’t receive any complaints – I guess if something is only down for two minutes people just wait a bit and try again), and life was good.

The person who was getting these emails in the first place told me that particular automated report wasn’t even useful anymore, so I disabled the job that ran it. To this day I still don’t know what exactly caused it to fail so spectacularly – presumably it was running in an infinite loop, for some reason. In any case, I didn’t have to worry about any more, because it could not run again, ever.

Later that day, a different proprietary application wasn’t working quite right. It was giving many people an error message about not being able to send emails. This was a very visible error that was affecting timecards, so it was a high priority to fix.

The problem, as somebody else discovered, was that the SQL Server service that sends emails wasn’t running. In fact, the SQL Agent service, which not only sends emails but also runs all the scheduled jobs, wasn’t running at all, for some reason. So they started the service back up, and the proprietary application started working again.

15 minutes later, I got a call. The emails had started sending again. And, in fact, they had started sending right when the database service was started. Coincidence? Of course not.

At this point I got very lucky. I don’t know why I thought of this, but I tried to figure out if SQL Server had an outgoing email queue. The stored procedure must have stopped running when I restarted the server earlier (I had stopped both the agent service and the actual database service), the job was disabled, and nothing would be queued up still on the email server. And yet, emails were still being sent, so I figured maybe the queue is on the SQL side.

And it was. SQL Server exposes the email queue as a database table, as with other metadata and server properties. I simply selected all emails with that specific subject line and mass-deleted them. There are thousands of emails left in the unsent queue.

This made the emails stop, again. And this time, because I did it the proper way, it didn’t restart again later.

Fortunately for me, these emails only went to one person. I can’t imagine the chaos if this went to a distribution list.

And yes, it takes a long, long time for Outlook to delete 20,000 emails.

 

Jacob Clarity

 

Leave a Reply