February 7, 2009
@ 03:22 PM

I just finished up a report which generates a list of customers whose birthdays fall within a specified date range. I wrote a simple query which did essentially:


DECLARE @startDate datetime
DECLARE @endDate datetime
SET @startDate = '2009-2-15'
SET @endDate = '2009-3-14'

SELECT c.iid,
        c.FirstName,
        c.LastName,
        c.BirthDate,
        c.Address1,
        c.Address2,
        c.City,
        c.State,
        c.ZipCode
   FROM Customers c
  WHERE c.BirthDate BETWEEN @startDate AND @endDate


Of course, that didn't work. That is, unless our customers happened to actually just been born (as of when I wrote this they wouldn't have been even born yet). Hmm....My first thought was maybe joining this to a date table to get the month and day split apart but that idea falls apart pretty quickly so I dismissed it. I'll come back to this idea in a second.

The other simple way to do this is to convert the date to a day of the year, so January 1st is 1, Jan. 2nd is 2, December 31st is 365. SQL Server includes a nice DATEPART() function to make this easy - you can specify that you want the day of the year with it.

So my query was rewritten as:

SELECT c.iid,
        c.FirstName,
        c.LastName,
        c.BirthDate,
        c.Address1,
        c.Address2,
        c.City,
        c.State,
        c.ZipCode
   FROM Customers c
  WHERE DATEPART(dayofyear, c.BirthDate)
BETWEEN DATEPART(dayofyear, @startDate) AND DATEPART(dayofyear, @endDate)


While this one works it's a bit slow. That's not so surprising since it has to use DATEPART on the rows to generate the day of the year in the WHERE clause. My actual code filters the customers a bit more but it was still a fair number of records. Ultimately, since this a summary/reporting table that gets populated and updated nightly I just added another (integer) column to store the precalculated day of the year number.

Surprisingly, this version isn't that much faster - maybe 15% or so. Apparently DATEPART is pretty quick.

It wasn't until after making these changes that I realized I could have added a day of the year column to my date table, done a join then used this column in my WHERE clause. That actually would have been easier if it had occurred to me sooner. I'm guessing performance is probably equivalent, especially since adding the column directly in the table didn't have a huge impact on the speed of the query.


 
Categories: SQL

October 28, 2008
@ 06:30 PM

I posted some info last night about how to move the TempDB database in SQL Server. That morning I had made the changes and just needed to restart SQL Server for the changes to take effect. This morning I restarted the SQL Server service and after it restarted, tried to log into our website (since it's the main interface to the data in SQL). I received a network connection error from ASP.NET which wasn't surprising, so I reset IIS (IISRESET from a DOS prompt). After it restarted, which only took 10 seconds or so, I tried hitting the site again. And nothing. The page sat there attempting to connect and finally timed out.

Uh oh, not good.

I still had Management Studio open so I tried a few test queries, which seemed to work OK. I opened up the Activity Monitor and didn't see any connections from ASP.NET. OK, so SQL Server is probably denying the connection or blocking for some reason. I took in a look in the new tempdb location and SQL had recreated the mdf/ldf files like I expected, so what's up?

I opened up the Event Viewer and saw a number of entries about the various databases starting up, which was good. Hey, what's that "Failure Audit" entry?

sqleventlog

"Login failed for user 'NT AUTHORITY\NETWORK SERVICE'. [CLIENT: 192.168.1.8]".

Hmmm...192.168.1.8 is the web server. What user does SQL Server run under? I opened up Services, found "SQL Server" and double clicked on it. Then I went to the "Log on" tab. It runs under NT AUTHORITY\NetworkService. Could it be??? I opened Explorer and right-clicked on the folder where tempdb was located and selected "Sharing and Security" then went to the Security tab. Sure enough, "NETWORK SERVICE" wasn't listed. I went ahead and added it to the folder level then test the site again.

Success!

What was really strange about this was that most of our SQL databases are (and have been) located in this same folder. I'm guessing the permissions for those was set on the file level, which is why they've been working OK. The other weird thing was that it was even able to create the tempdb files at all.


 
Categories: SQL

I run across this from time to time and can never remember the specifics, so I figured I'd document it here:

To move a SQL database from one drive to another drive (or just move it to a new folder):

Inside of Microsoft SQL Management Studio execute this script (of course, replace "NameOfDatabase" with the real database name):

  USE master
  GO
  sp_detach_db 'NameOfDatabase'
  GO

Copy the database (usually NameOfDatabase.MDF and NameOfDatabase_log.LDF, assuming they haven't been renamed) to the new location.

  NOTE: The paths are relative to the server SQL Server is running on.

  USE master
  GO
  sp_attach_db 'NameOfDatabase', 'E:\SQLServerDataFolder\NameOfDatabase.mdf', 'E:\SQLServerDataFolder\NameOfDatabase_log.ldf'
  GO

To move the TempDB system database to a new location:

USE master
  GO
  ALTER DATABASE tempdb modify file (name = tempdev, filename = 'E:\SQLServerDataFolder\tempdb.mdf')
  GO
  ALTER DATABASE tempdb modify file (name = templog, filename = 'E:\SQLServerDataFolder\templog.mdf')

Restart SQL Server for the changes to take effect. One it's restarted you can delete the tempdb.mdf and templog.mdf from the old location.

Some instructions for shrinking the tempdb database:

http://support.microsoft.com/kb/307487


 
Categories: SQL

September 27, 2008
@ 11:48 AM
One of the really killer features included in SQL Server 2005 was Common Table Expressions. One of the really nice uses for them is recursive queries. Imagine any kind of hierarchical set of date (org. chart, security which allows nested roles, parts/assemblies, etc.). You can use CTE's to walk up or down these trees to build it's result set. Let's look at a simple example of this. I'm going to create a table named "ItemGroups" which is nothing more than a listing of items which have a PK, Title, Description, and foreign key to a parent it may be a child of.
CREATE TABLE [dbo].[ItemGroups](
	[iid] [int] IDENTITY(1,1) NOT NULL,
	[Title] [varchar](60) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
	[Description] [text] COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
	[fk_ItemGroups] [int] NULL,
 CONSTRAINT [PK_ItemGroups] PRIMARY KEY CLUSTERED 
(
	[iid] ASC
)WITH (PAD_INDEX  = OFF, IGNORE_DUP_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]

Then, I'm going to add some sample data to this table:


iid     Title                   Description                     fk_ItemGroups
1       Root                    This is the root                NULL
2       Child 1                 This is a child of root         1
3       Child 2                 This is a child of root         1
4       Grandchild 1            This is a child of Child 1      2
5       Grandchild 2            This is a child of Child 2      3
6       Great Grandchild 1      This is a child of Grandchild 1 4

If we draw this out as a "tree", it would look something like this (note the modern looking ASCII art...)

- Root
  - Child 1
    - Grandchild 1
       - Great Grandchild 1
  - Child 2
    - Grandchild 2

OK, great - let's suppose we want to walk up or down this tree from a known starting point. How might we use a CTE to do that?

It might help to understand the basic format of a CTE:

WITH SomeTableName (List of resulting fields)
(
   SELECT -- Starting point or anchor of the query
    UNION ALL
   SELECT -- Recursive portion of the query
)
SELECT -- Final select from SomeTableName

We have the "WITH" portion which describes what our CTE cursor would look like (we can reference this in the recursive portion of the query and in the final SELECT). Then we have the first SELECT which selects the starting record(s) for the recursive portion of our query. It's basically the starting point. The second SELECT pulls in matching records which are children or parents of the record in the anchor portion.

Let's see what that would look like against our table, assuming we want to walk down the hierarchy - in this code, we're going to be starting with the root node.

DECLARE @startNode int
SET @startNode = 1; -- Note the semicolon - it's required for the command
                    -- immediately before the CTE

WITH Items (iid, Title, Description, fk_ItemGroups) AS
( -- This is the 'Anchor' or starting point of the recursive query
  SELECT ig.iid,
         ig.Title, 
         ig.Description,
         ig.fk_ItemGroups
    FROM ItemGroups ig
   WHERE ig.iid = @startNode
   UNION ALL -- This is the recursive portion of the query
  SELECT ig.iid,
         ig.Title, 
         ig.Description,
         ig.fk_ItemGroups
    FROM ItemGroups ig
   INNER JOIN Items -- Note the reference to CTE table name
      ON ig.fk_ItemGroups = Items.iid
)
SELECT *
  FROM Items

If we run this, here's our results (notice that the query automatically stops recursing when no more matches are found).


iid     Title                   Description                     fk_ItemGroups
1       Root                    This is the root                NULL
2       Child 1                 This is a child of root         1
3       Child 2                 This is a child of root         1
5       Grandchild 2            This is a child of Child 2      3
4       Grandchild 1            This is a child of Child 1      2
6       Great Grandchild 1      This is a child of Grandchild 1 4

If we change the starting node to 2 and rerun this, you'll see we only get Child 1 and it's children:

2       Child 1                 This is a child of root         1
4       Grandchild 1            This is a child of Child 1      2
6       Great Grandchild 1      This is a child of Grandchild 1 4

And if we change it to start at Grandchild 1, we get:

4       Grandchild 1            This is a child of Child 1      2
6       Great Grandchild 1      This is a child of Grandchild 1 4

What if we'd like to walk "up" the hierarchy instead? That's just as easy. In the recursive portion of the query, we need to change our join condition. The first query will return the record we want to start on (aliased as 'Item' in this example). To walk up the chain, our fk_ItemGroups will match our parents iid. So change the ON to: " Items.fk_ItemGroups = ig.iid".

Let's rerun the last query:

4       Grandchild 1    This is a child of Child 1      2
2       Child 1         This is a child of root         1
1       Root            This is the root                NULL

It might be useful to know how many levels deep of recursion were required to retrieve a row. We can modify our query to include this info by adding a new column, "Level". In our root query we set it to start at 0, and we increment it in the recursive portion of the query:

SET @startNode = 4; -- Note the semicolon - it's required for the command
                    -- immediately before the CTE

WITH Items (iid, Title, Description, fk_ItemGroups, [Level]) AS
( -- This is the 'Anchor' or starting point of the recursive query
  SELECT ig.iid,
         ig.Title, 
         ig.Description,
         ig.fk_ItemGroups,
         0 AS Level
    FROM ItemGroups ig
   WHERE ig.iid = @startNode
   UNION ALL -- This is the recursive portion of the query
  SELECT ig.iid,
         ig.Title, 
         ig.Description,
         ig.fk_ItemGroups,
         Items.Level + 1 
    FROM ItemGroups ig
   INNER JOIN Items -- Note the reference to CTE table name
      ON Items.fk_ItemGroups = ig.iid
)
SELECT *
  FROM Items

iid     Title           Description                     fk_ItemGroups   Level
4       Grandchild 1    This is a child of Child 1      2               0
2       Child 1         This is a child of root         1               1
1       Root            This is the root                NULL            2

I've mostly ignored the final SELECT * FROM Items, but in a "real" query you tend to use this portion of it to pull in all your detail from various supporting tables.

In a few cases I've found that I've actually needed to walk up and down a hierarchy from a given starting point. I've ended up just creating two CTEs - one to walk up and one to walk down the hierarchy. I insert the results of each of them into a temp. variable, then pull the final results.

DECLARE @curItems TABLE (iid int);

-- Walks up the hierarchy
WITH Items (iid]) AS
( -- This is the 'Anchor' or starting point of the recursive query
  SELECT ig.iid
    FROM ItemGroups ig
   WHERE ig.iid = @startNode
   UNION ALL -- This is the recursive portion of the query
  SELECT ig.iid
    FROM ItemGroups ig
   INNER JOIN Items -- Note the reference to CTE table name
      ON Items.fk_ItemGroups = ig.iid
)
INSERT INTO @curItems (iid) (SELECT iid FROM Items);

-- Walks down the hierarchy
WITH Items (iid]) AS
( -- This is the 'Anchor' or starting point of the recursive query
  SELECT ig.iid
    FROM ItemGroups ig
   WHERE ig.iid = @startNode
   UNION ALL -- This is the recursive portion of the query
  SELECT ig.iid
    FROM ItemGroups ig
   INNER JOIN Items -- Note the reference to CTE table name
      ON ig.fk_ItemGroups = Items.iid
)
INSERT INTO @curItems (iid) (SELECT iid FROM Items)

-- Code which does final select here

DECLARE @curItems TABLE (iid int);

-- Walks up the hierarchy
WITH Items (iid]) AS
( -- This is the 'Anchor' or starting point of the recursive query
  SELECT ig.iid
    FROM ItemGroups ig
   WHERE ig.iid = @startNode
   UNION ALL -- This is the recursive portion of the query
  SELECT ig.iid
    FROM ItemGroups ig
   INNER JOIN Items -- Note the reference to CTE table name
      ON Items.fk_ItemGroups = ig.iid
)
INSERT INTO @curItems (iid) (SELECT iid FROM Items);

-- Walks down the hierarchy
WITH Items (iid]) AS
( -- This is the 'Anchor' or starting point of the recursive query
  SELECT ig.iid
    FROM ItemGroups ig
   WHERE ig.iid = @startNode
   UNION ALL -- This is the recursive portion of the query
  SELECT ig.iid
    FROM ItemGroups ig
   INNER JOIN Items -- Note the reference to CTE table name
      ON ig.fk_ItemGroups = Items.iid
)
INSERT INTO @curItems (iid) (SELECT iid FROM Items)

-- Code which does final select here

As you can see, it's pretty simple to use CTE's. The syntax looks a little weird at first but once you've written one or two queries it's pretty straightforward.


 
Categories: SQL

November 2, 2007
@ 08:31 PM

One thing I've learned while using SQL Server is (whenever possible), perform as much as you can on the server. That means either stored procedures or SQL commands passed up to the server. It also means thinking about problems as sets, as opposed to a more procedural approach. Visual FoxPro makes it easy to blur the lines between SQL commands and normal procedural code (mixing and matching as you'd like), so you tend to start getting locked into thinking about data on those terms. While some of this translates directly to SQL Server (obviously, the SQL language itself), other things don't.

I recently decided to add a new foreign key to a table; the structural specifics aren't particularly important so I won't go into too much detail about them. Essentially, I had a parent (A) and child table (B). There is a posting process which takes a snapshot (C) of some or all of the records in the child table for a specific parent. This snapshot includes a foreign key back to the original child record but didn't include a foreign key back to the parent (since it could be derived by joining to this child table). I began to realize that, because of the number of records involved (it could be as many as 250,000 records in the child (B) for a single parent record (A), which meant up to 250,000 records in our snapshot table) that most of the queries were really slow because of the extra hop involved in resolving this parent key. So, I decided to go ahead and add a foreign key into the snapshot table (C) to the parent (A). That also meant I needed to go back and fill in the new foreign key in the Snapshot (C) table.

 

Table

 

In VFP, you might just open the tables, SCAN through the Snapshot (C) table, do a SEEK into the Child (B) table for the foreign key back to the Parent (A) table and save it into the Snapshot (C) table. That works OK if the tables are local, what about with SQL Server? I guess you could take the same approach; pull a copy of the records from the Snapshot (C) and Child (B) tables locally, index them, then run through the same SCAN loop (or if you're really old school, set a relation and just do a REPLACE). But how long will that take - remember we're talking about pulling down up to 250,000 records in the Child (B) table, plus another 250,000 records from the Snapshot (C) table, then updating all of the records in Snapshot (C) table. Multiply that by the number of master records (in my case, only 20-30 of the master records had anywhere near 250,000 records but we're still talking millions of records).

Suddenly the above approach doesn't look like such a good idea. Thankfully, SQL includes provisions in the UPDATE command to do exactly what I wanted to. It would still take a while to run (I clocked it around 25 minutes), but it was a significantly better solution than a SCAN loop. Here's what the query ends up looking like:

UPDATE Snapshot
   SET Snapshot.fk_Parent = Child.fk_Parent
  FROM Child
INNER JOIN Snapshot
    ON Snapshot.fk_Child = Child.PrimaryKey

 

So what does this actually mean? We're basically saying, inside of our Snapshot (C) table set our new foreign key equal to the Parent (A) foreign key in the Child (B) table. The FROM/INNER JOIN defines how to match the record in the Snapshot (C) to the one in Child (B). Simple and fast.


 
Categories: SQL | VFP

October 29, 2007
@ 08:48 PM

Here's a link to an article about keys in SQL. Make sure you read the comments. I wonder where most people fall on this subject and the split (if any) between developers and DBA’s (and people have to fill both roles)?

Personally, I don’t give a crap about whether identity keys are “exposed physical locators”. How about GUIDs or Stored Procedures which generate an incrementing number instead? I’ll be honest, I’ve never used them to identify the physical location. It’s just that I’ve found that “natural keys aren’t” (I think I first heard that from Jim Booth http://www.jamesbooth.com ) and having a system generated key greatly simplifies any code I’ve needed to write against a SQL backend (VFP or SQL Server). I’ve worked with systems that exclusively used natural keys and invariably what seemed like a reasonable natural key suddenly wasn’t any more. This usually meant we now had to use multiple columns to uniquely identify the record (this usually ends up happening more than once on the same table). No thanks. I’m sure some people would argue that changes like this just means we didn’t do a proper design. Yeah, that could be the case. But I’d much rather use a style of development that was more flexible to change and more forgiving of developer errors than one that leads to nightmarish spaghetti code just because someone didn’t take something into account during whatever design process there was. The other way may lead to the most elegant, clear code ever written, but if it’s dependent on people never making any kinds of mistakes (or incorrect judgment calls), I’ll skip it, thanks.

 

If you’re interested, Joe has a blog where he answers questions about SQL in his own, abrasive, insulting (if sometimes funny) style.

 

http://joecelkothesqlapprentice.blogspot.com/

 

How many of his “rules” have you broken to actually deliver usable software (within the confines of the tools you’re using)? Why is there so much support built into these tools to do things the “wrong way”? Right now, I’m specifically thinking about Crystal Reports: where are you doing most of your data manipulation – in SQL Server or in CR? I guess it could be argued that the limitations of these various development tools doesn’t mean that the various practices they (sometimes) force on you are correct. Could be, but ultimately most of us Just-Need-to-Get-it-Done. “In theory” is nice, but “in practice” pays the bills. Do you think your customers would accept the excuse that you wouldn’t generate some report for them because the reporting tool doesn’t have the ability to do what they wanted and putting it into SQL isn’t the correct way to do it?

 

As a side note, it seems like this is a distinct personality trait with some “computer” people in general. You know the type: really smart, could be really helpful to have on the team but they just don’t play well with others. Fanboys (and girls) of them will usually say things like, “I know online he can be rough, but it personal he’s great”. In other words, that just means he’s an asshole. Being blunt (as it’s sometimes described) is OK as long as it’s tempered with the understanding that it’s not an excuse to be an asshat, online or in person.

 

(I just hope the above isn’t taken as some argument for writing crappy code). Write the best code you can for the given problem and time constraints. Refactor mercilessly.

 

Links

http://www.intelligententerprise.com/channels/bi/showArticle.jhtml?articleID=201806814
http://joecelkothesqlapprentice.blogspot.com/
http://www.jamesbooth.com


 
Categories: Soapbox | SQL