The SQL Guru Answers your Questions...
Today's question comes from Tim H.:
|
Sean,
I went to a Microsoft sponsored SQL Server seminar recently where the
speaker mentioned that you should avoid using identity fields for primary
keys when the table has records inserted frequently (5000 concurrent
users?) - a type log. His argument was that it creates artificial "hot
spots" at the end of the table. Whatever that means.
1. What do you do if you need to refer to a record in a high volume table?
Have a beefy "where" clause? Multi-field primary key?
2. Is it better to have a unclustered primary key for high volume insert
table?
3. I heard of having a separate table that has the next available "ID";
taking the ID and then increment by 1 in a transaction. What do you think
of this suggestion?
I know that hardware is a factor, but just factor everything "all things
being equal". I am asking this question because I have over 5000 users
entering time into a log. After a person inserts a record, I need to update
a web page with the last 5 records they entered (as a reminder for them).
The kicker is I have to perform a Sum() on the TotalTime field, because the
user is constricted by the amount of hours they can enter per month. They
can only bill until their budget is depleted. I am tempted to have a field
in the user table that has their current budget amount thereby avoiding the
Sum() in the SQL statement. I am afraid of the amount becoming skewed for
whatever reason. Hopefully, batch updating will alleviate the possibility
of poor performance.
So, I have a table that has potential for high volume inserts and reads.
Looking forward to your thoughts.
Tim H.
|
Tim,
|
I went to a Microsoft sponsored SQL Server seminar recently where the
speaker mentioned that you should avoid using identity fields for primary
keys when the table has records inserted frequently (5000 concurrent
users?) - a type log. His argument was that it creates artificial "hot
spots" at the end of the table. Whatever that means.
|
To learn more about indexes, see our Related Links section!
A "Hot Spot" is what you end up on when your boss wants to know why the
application isn't performing well. Or is that the Hot Seat? :)
Anyway, the truth of that statement depends on the circumstances. Is this a
SQL 6.5 or 7.0 server? If 6.5, have you enabled row-level locking for the
table? Is the PK index clustered or nonclustered?
The problem he's referring to happened in SQL 6.5 if you weren't careful
with your database design. Recall that a clustered index physically sorts
the data pages in a table. If you put a clustered index on an IDENTITY
column, then all of your inserts will happen on the last page of the table -
and that page is locked for the duration of each IDENTITY. No big deal...
unless you have 5000 people that all want the last page. Then you have a
lot contention for that page, or a "hot spot".
You can turn on insert row-level locking in SQL 6.5, and that will only lock
the row being inserted, thus reducing lock contention. You can also move
your clustered index to a different column, thereby scattering the inserts
around the table.
SQL 7.0 doesn't have this problem because it adjusts the granularity of the
lock as needed. Lots of contention for the page? Let's bump the
granularity down to a row lock! It works very well. Lock contention
problem that were plaguing us in 6.5 are gone in 7.0.
1. What do you do if you need to refer to a record in a high volume table?
Have a beefy WHERE clause? Multi-field primary key?
|
IDENTITY columns work great for Primary Keys. They're small (4 bytes) if
you use an int data type. And small keys are awesome because they're faster
to index and search. IDENTITY columns in 6.5 suffer a problem if the server
is shut down unexpectedly... they forget what number they're on. I usually
recommend running DBCC CHECKIDENT on each table when the server starts up.
|
2. Is it better to have a unclustered primary key for high volume insert table?
|
Well, placing a clustered index (PK constraints create a clustered index by
default) on an IDENTITY column isn't all that useful. You're usually better
off placing it on a column(s) that you search on pretty frequently, such as
a name or a date. Especially if you tend to return a range of rows based on
the clustered index criteria, or sort by the key column(s). SQL Server
loves doing that, because it can just run right down the pages on disk..
zip, zip, zip.
Another factor to consider is that in SQL 7.0, nonclustered indexes use the
clustering key at the leaf level of the nonclustered index tree. So if you
have a clustered index with a huge key, your nonclustered indexes will be
larger as a result. Try to keep the size of the clustering key as small as
possible.
|
3. I heard of having a separate table that has the next available "ID";
taking the ID and then increment by 1 in a transaction. What do you think
of this suggestion?
|
Yep, we had to do that before SQL Server had auto-incrementing columns. Now
that we're living in the 21st century, there's not much reason to use this
tactic.
|
I know that hardware is a factor, but just factor everything "all things
being equal".
|
Buy fast disks. And lots o' memory (for data cache) will help with the
queries.
|
I am asking this question because I have over 5000 users
entering time into a log. After a person inserts a record, I need to update
a web page with the last 5 records they entered (as a reminder for them).
|
So the person ID sounds like a possible candidate for a clustered index. If
you're using 7.0, try the SELECT TOP 5 * FROM... syntax as well. SQL Server
can now optimize queries based on a limited rowcount.
The kicker is I have to perform a Sum() on the TotalTime field, because the
user is constricted by the amount of hours they can enter per month. They
can only bill until their budget is depleted. I am tempted to have a field
in the user table that has their current budget amount thereby avoiding the
Sum() in the SQL statement. I am afraid of the amount becoming skewed for
whatever reason.
|
Not a bad plan. With the judicious use of transactions, you can do both
your insert and your update to the person table without worry.
|
Hopefully, batch updating will alleviate the possibility of poor performance.
So, I have a table that has potential for high volume inserts and reads.
|
It's usually best to separate reporting queries and transaction processing.
You could, perhaps, move record out of the time table into a data warehouse
as they age.
Hope this helps,
Sean
Related Links:
Creating a Clustered Index
Technical Discussion of Nonclustered Indexes
An article on Clustered/Nonclustered Indexes