A day in the life (of a developer) RSS 2.0
 Saturday, September 27, 2008
One of the really killer features included in SQL Server 2005 was Common Table Expressions. One of the really nice uses for them is recursive queries. Imagine any kind of hierarchical set of date (org. chart, security which allows nested roles, parts/assemblies, etc.). You can use CTE's to walk up or down these trees to build it's result set. Let's look at a simple example of this. I'm going to create a table named "ItemGroups" which is nothing more than a listing of items which have a PK, Title, Description, and foreign key to a parent it may be a child of.
CREATE TABLE [dbo].[ItemGroups](
	[iid] [int] IDENTITY(1,1) NOT NULL,
	[Title] [varchar](60) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
	[Description] [text] COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
	[fk_ItemGroups] [int] NULL,
 CONSTRAINT [PK_ItemGroups] PRIMARY KEY CLUSTERED 
(
	[iid] ASC
)WITH (PAD_INDEX  = OFF, IGNORE_DUP_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]

Then, I'm going to add some sample data to this table:


iid     Title                   Description                     fk_ItemGroups
1       Root                    This is the root                NULL
2       Child 1                 This is a child of root         1
3       Child 2                 This is a child of root         1
4       Grandchild 1            This is a child of Child 1      2
5       Grandchild 2            This is a child of Child 2      3
6       Great Grandchild 1      This is a child of Grandchild 1 4

If we draw this out as a "tree", it would look something like this (note the modern looking ASCII art...)

- Root
  - Child 1
    - Grandchild 1
       - Great Grandchild 1
  - Child 2
    - Grandchild 2

OK, great - let's suppose we want to walk up or down this tree from a known starting point. How might we use a CTE to do that?

It might help to understand the basic format of a CTE:

WITH SomeTableName (List of resulting fields)
(
   SELECT -- Starting point or anchor of the query
    UNION ALL
   SELECT -- Recursive portion of the query
)
SELECT -- Final select from SomeTableName

We have the "WITH" portion which describes what our CTE cursor would look like (we can reference this in the recursive portion of the query and in the final SELECT). Then we have the first SELECT which selects the starting record(s) for the recursive portion of our query. It's basically the starting point. The second SELECT pulls in matching records which are children or parents of the record in the anchor portion.

Let's see what that would look like against our table, assuming we want to walk down the hierarchy - in this code, we're going to be starting with the root node.

DECLARE @startNode int
SET @startNode = 1; -- Note the semicolon - it's required for the command
                    -- immediately before the CTE

WITH Items (iid, Title, Description, fk_ItemGroups) AS
( -- This is the 'Anchor' or starting point of the recursive query
  SELECT ig.iid,
         ig.Title, 
         ig.Description,
         ig.fk_ItemGroups
    FROM ItemGroups ig
   WHERE ig.iid = @startNode
   UNION ALL -- This is the recursive portion of the query
  SELECT ig.iid,
         ig.Title, 
         ig.Description,
         ig.fk_ItemGroups
    FROM ItemGroups ig
   INNER JOIN Items -- Note the reference to CTE table name
      ON ig.fk_ItemGroups = Items.iid
)
SELECT *
  FROM Items

If we run this, here's our results (notice that the query automatically stops recursing when no more matches are found).


iid     Title                   Description                     fk_ItemGroups
1       Root                    This is the root                NULL
2       Child 1                 This is a child of root         1
3       Child 2                 This is a child of root         1
5       Grandchild 2            This is a child of Child 2      3
4       Grandchild 1            This is a child of Child 1      2
6       Great Grandchild 1      This is a child of Grandchild 1 4

If we change the starting node to 2 and rerun this, you'll see we only get Child 1 and it's children:

2       Child 1                 This is a child of root         1
4       Grandchild 1            This is a child of Child 1      2
6       Great Grandchild 1      This is a child of Grandchild 1 4

And if we change it to start at Grandchild 1, we get:

4       Grandchild 1            This is a child of Child 1      2
6       Great Grandchild 1      This is a child of Grandchild 1 4

What if we'd like to walk "up" the hierarchy instead? That's just as easy. In the recursive portion of the query, we need to change our join condition. The first query will return the record we want to start on (aliased as 'Item' in this example). To walk up the chain, our fk_ItemGroups will match our parents iid. So change the ON to: " Items.fk_ItemGroups = ig.iid".

Let's rerun the last query:

4       Grandchild 1    This is a child of Child 1      2
2       Child 1         This is a child of root         1
1       Root            This is the root                NULL

It might be useful to know how many levels deep of recursion were required to retrieve a row. We can modify our query to include this info by adding a new column, "Level". In our root query we set it to start at 0, and we increment it in the recursive portion of the query:

SET @startNode = 4; -- Note the semicolon - it's required for the command
                    -- immediately before the CTE

WITH Items (iid, Title, Description, fk_ItemGroups, [Level]) AS
( -- This is the 'Anchor' or starting point of the recursive query
  SELECT ig.iid,
         ig.Title, 
         ig.Description,
         ig.fk_ItemGroups,
         0 AS Level
    FROM ItemGroups ig
   WHERE ig.iid = @startNode
   UNION ALL -- This is the recursive portion of the query
  SELECT ig.iid,
         ig.Title, 
         ig.Description,
         ig.fk_ItemGroups,
         Items.Level + 1 
    FROM ItemGroups ig
   INNER JOIN Items -- Note the reference to CTE table name
      ON Items.fk_ItemGroups = ig.iid
)
SELECT *
  FROM Items

iid     Title           Description                     fk_ItemGroups   Level
4       Grandchild 1    This is a child of Child 1      2               0
2       Child 1         This is a child of root         1               1
1       Root            This is the root                NULL            2

I've mostly ignored the final SELECT * FROM Items, but in a "real" query you tend to use this portion of it to pull in all your detail from various supporting tables.

In a few cases I've found that I've actually needed to walk up and down a hierarchy from a given starting point. I've ended up just creating two CTEs - one to walk up and one to walk down the hierarchy. I insert the results of each of them into a temp. variable, then pull the final results.

DECLARE @curItems TABLE (iid int);

-- Walks up the hierarchy
WITH Items (iid]) AS
( -- This is the 'Anchor' or starting point of the recursive query
  SELECT ig.iid
    FROM ItemGroups ig
   WHERE ig.iid = @startNode
   UNION ALL -- This is the recursive portion of the query
  SELECT ig.iid
    FROM ItemGroups ig
   INNER JOIN Items -- Note the reference to CTE table name
      ON Items.fk_ItemGroups = ig.iid
)
INSERT INTO @curItems (iid) (SELECT iid FROM Items);

-- Walks down the hierarchy
WITH Items (iid]) AS
( -- This is the 'Anchor' or starting point of the recursive query
  SELECT ig.iid
    FROM ItemGroups ig
   WHERE ig.iid = @startNode
   UNION ALL -- This is the recursive portion of the query
  SELECT ig.iid
    FROM ItemGroups ig
   INNER JOIN Items -- Note the reference to CTE table name
      ON ig.fk_ItemGroups = Items.iid
)
INSERT INTO @curItems (iid) (SELECT iid FROM Items)

-- Code which does final select here

DECLARE @curItems TABLE (iid int);

-- Walks up the hierarchy
WITH Items (iid]) AS
( -- This is the 'Anchor' or starting point of the recursive query
  SELECT ig.iid
    FROM ItemGroups ig
   WHERE ig.iid = @startNode
   UNION ALL -- This is the recursive portion of the query
  SELECT ig.iid
    FROM ItemGroups ig
   INNER JOIN Items -- Note the reference to CTE table name
      ON Items.fk_ItemGroups = ig.iid
)
INSERT INTO @curItems (iid) (SELECT iid FROM Items);

-- Walks down the hierarchy
WITH Items (iid]) AS
( -- This is the 'Anchor' or starting point of the recursive query
  SELECT ig.iid
    FROM ItemGroups ig
   WHERE ig.iid = @startNode
   UNION ALL -- This is the recursive portion of the query
  SELECT ig.iid
    FROM ItemGroups ig
   INNER JOIN Items -- Note the reference to CTE table name
      ON ig.fk_ItemGroups = Items.iid
)
INSERT INTO @curItems (iid) (SELECT iid FROM Items)

-- Code which does final select here

As you can see, it's pretty simple to use CTE's. The syntax looks a little weird at first but once you've written one or two queries it's pretty straightforward.

Saturday, September 27, 2008 10:48:31 AM (Eastern Standard Time, UTC-05:00)  #    Comments [2] -

SQL
 Friday, November 02, 2007

One thing I've learned while using SQL Server is (whenever possible), perform as much as you can on the server. That means either stored procedures or SQL commands passed up to the server. It also means thinking about problems as sets, as opposed to a more procedural approach. Visual FoxPro makes it easy to blur the lines between SQL commands and normal procedural code (mixing and matching as you'd like), so you tend to start getting locked into thinking about data on those terms. While some of this translates directly to SQL Server (obviously, the SQL language itself), other things don't.

I recently decided to add a new foreign key to a table; the structural specifics aren't particularly important so I won't go into too much detail about them. Essentially, I had a parent (A) and child table (B). There is a posting process which takes a snapshot (C) of some or all of the records in the child table for a specific parent. This snapshot includes a foreign key back to the original child record but didn't include a foreign key back to the parent (since it could be derived by joining to this child table). I began to realize that, because of the number of records involved (it could be as many as 250,000 records in the child (B) for a single parent record (A), which meant up to 250,000 records in our snapshot table) that most of the queries were really slow because of the extra hop involved in resolving this parent key. So, I decided to go ahead and add a foreign key into the snapshot table (C) to the parent (A). That also meant I needed to go back and fill in the new foreign key in the Snapshot (C) table.

 

Table

 

In VFP, you might just open the tables, SCAN through the Snapshot (C) table, do a SEEK into the Child (B) table for the foreign key back to the Parent (A) table and save it into the Snapshot (C) table. That works OK if the tables are local, what about with SQL Server? I guess you could take the same approach; pull a copy of the records from the Snapshot (C) and Child (B) tables locally, index them, then run through the same SCAN loop (or if you're really old school, set a relation and just do a REPLACE). But how long will that take - remember we're talking about pulling down up to 250,000 records in the Child (B) table, plus another 250,000 records from the Snapshot (C) table, then updating all of the records in Snapshot (C) table. Multiply that by the number of master records (in my case, only 20-30 of the master records had anywhere near 250,000 records but we're still talking millions of records).

Suddenly the above approach doesn't look like such a good idea. Thankfully, SQL includes provisions in the UPDATE command to do exactly what I wanted to. It would still take a while to run (I clocked it around 25 minutes), but it was a significantly better solution than a SCAN loop. Here's what the query ends up looking like:

UPDATE Snapshot
   SET Snapshot.fk_Parent = Child.fk_Parent
  FROM Child
INNER JOIN Snapshot
    ON Snapshot.fk_Child = Child.PrimaryKey

 

So what does this actually mean? We're basically saying, inside of our Snapshot (C) table set our new foreign key equal to the Parent (A) foreign key in the Child (B) table. The FROM/INNER JOIN defines how to match the record in the Snapshot (C) to the one in Child (B). Simple and fast.

Friday, November 02, 2007 8:31:41 PM (Eastern Standard Time, UTC-05:00)  #    Comments [2] -

SQL | VFP
 Monday, October 29, 2007

Here's a link to an article about keys in SQL. Make sure you read the comments. I wonder where most people fall on this subject and the split (if any) between developers and DBA’s (and people have to fill both roles)?

Personally, I don’t give a crap about whether identity keys are “exposed physical locators”. How about GUIDs or Stored Procedures which generate an incrementing number instead? I’ll be honest, I’ve never used them to identify the physical location. It’s just that I’ve found that “natural keys aren’t” (I think I first heard that from Jim Booth http://www.jamesbooth.com ) and having a system generated key greatly simplifies any code I’ve needed to write against a SQL backend (VFP or SQL Server). I’ve worked with systems that exclusively used natural keys and invariably what seemed like a reasonable natural key suddenly wasn’t any more. This usually meant we now had to use multiple columns to uniquely identify the record (this usually ends up happening more than once on the same table). No thanks. I’m sure some people would argue that changes like this just means we didn’t do a proper design. Yeah, that could be the case. But I’d much rather use a style of development that was more flexible to change and more forgiving of developer errors than one that leads to nightmarish spaghetti code just because someone didn’t take something into account during whatever design process there was. The other way may lead to the most elegant, clear code ever written, but if it’s dependent on people never making any kinds of mistakes (or incorrect judgment calls), I’ll skip it, thanks.

 

If you’re interested, Joe has a blog where he answers questions about SQL in his own, abrasive, insulting (if sometimes funny) style.

 

http://joecelkothesqlapprentice.blogspot.com/

 

How many of his “rules” have you broken to actually deliver usable software (within the confines of the tools you’re using)? Why is there so much support built into these tools to do things the “wrong way”? Right now, I’m specifically thinking about Crystal Reports: where are you doing most of your data manipulation – in SQL Server or in CR? I guess it could be argued that the limitations of these various development tools doesn’t mean that the various practices they (sometimes) force on you are correct. Could be, but ultimately most of us Just-Need-to-Get-it-Done. “In theory” is nice, but “in practice” pays the bills. Do you think your customers would accept the excuse that you wouldn’t generate some report for them because the reporting tool doesn’t have the ability to do what they wanted and putting it into SQL isn’t the correct way to do it?

 

As a side note, it seems like this is a distinct personality trait with some “computer” people in general. You know the type: really smart, could be really helpful to have on the team but they just don’t play well with others. Fanboys (and girls) of them will usually say things like, “I know online he can be rough, but it personal he’s great”. In other words, that just means he’s an asshole. Being blunt (as it’s sometimes described) is OK as long as it’s tempered with the understanding that it’s not an excuse to be an asshat, online or in person.

 

(I just hope the above isn’t taken as some argument for writing crappy code). Write the best code you can for the given problem and time constraints. Refactor mercilessly.

 

Links

http://www.intelligententerprise.com/channels/bi/showArticle.jhtml?articleID=201806814
http://joecelkothesqlapprentice.blogspot.com/
http://www.jamesbooth.com

Monday, October 29, 2007 8:48:54 PM (Eastern Standard Time, UTC-05:00)  #    Comments [0] -

Soapbox | SQL


Navigation
Archive
<October 2008>
SunMonTueWedThuFriSat
2829301234
567891011
12131415161718
19202122232425
2627282930311
2345678
About the author/Disclaimer

Disclaimer
The opinions expressed herein are my own personal opinions and do not represent my employer's view in any way.

© Copyright 2008
Paul Mrozowski / RCS Solutions, Inc.
Sign In
Statistics
Total Posts: 57
This Year: 32
This Month: 0
This Week: 0
Comments: 21
All Content © 2008, Paul Mrozowski / RCS Solutions, Inc.
DasBlog theme 'Business' created by Christoph De Baene (delarou)