|
Sybase
Aggregate Queries
Unlike
many dialects of SQL, Transact SQL allows you to reference
columns which are
NOT part of a Group By clause in an aggregate query. Thus the following
is permissable...
select
salesman,
office,
avg(invoice)
from turnover
group
by salesman
which
might produce something like...
salesman
office
----------- ----------- --------
Al
Smith
Slough
1,654.89
Fred
Bloggs
Swindon 1,956.56
Jim
Smith
Devizes 2,243.98
Jim
Welsh
Slough
2,987.34
Paul
Jones Cirencester 2,654.23
Pete
Pring
Swindon 2,124.87
etc...
Note
that
the dependent column, 'office' has no relevance to the
aggregation - it's simply a look-up. This can lead to unexpected
effects if used without thought as in the following example, taken from
Sybase's own documentation...
select type, title_id, avg(price), avg(advance) from titles group by type
type title_id ------------ -------- ----- ------- business BU1032 13.73 6,281.25 business BU1111 13.73 6,281.25 business BU2075 13.73 6,281.25 business BU7832 13.73 6,281.25 mod_cook MC2222 11.49 7,500.00 mod_cook MC3021 11.49 7,500.00 UNDECIDED MC3026 NULL NULL popular_comp PC1035 21.48 7,500.00 popular_comp PC8888 21.48 7,500.00 popular_comp PC9999 21.48 7,500.00 psychology PS1372 13.50 4,255.00 psychology PS2091 13.50 4,255.00 psychology PS2106 13.50 4,255.00 psychology PS3333 13.50 4,255.00 psychology PS7777 13.50 4,255.00 trad_cook TC3218 15.96 6,333.33 trad_cook TC4203 15.96 6,333.33 trad_cook TC7777 15.96 6,333.33 (18 rows affected)
See
what's
happening? The title_id is completely irrelevant to the
averages in the row. At first glance it looks as if the
titles
are all generating the same figures but the figures are actually
based on the first column,
'type'. So the ability to use non-participating columns in aggregate
queries is a useful one but to be applied with care.
And
the other way around...
Sybase
also allows you to group by a column which does
not
appear in the query's
select statement. Thus...
select salesman,
count(invoice)
from turnover
group
by salesman, office
salesman
----------- -----------
Al
Smith
12
Al
Smith
NULL
Al
Smith
NULL
Al
Smith
NULL
Fred
Bloggs
19
Fred
Bloggs
NULL
Fred
Bloggs
NULL
Fred
Bloggs
NULL
Jim
Smith
9
Jim
Smith
NULL
Jim
Smith
NULL
Jim
Smith
NULL
and
so on...
Notice
the
'gotcha' here. Because, in this case, salesmen work out of
offices, there's no valid data for any salesman except at his own
office. On the other hand, if salesmen occassionaly worked out of other
offices, this might be usefull. Then again, it would be more usefull if
the office were
shown.
Look!
No hands...
Transact
SQL will allow you to group by an expression that does
not
contain an aggregate
function as in...
select
salesman,
avg(invoice),
invoice *
commission
from turnover
group by invoice * commission
salesman
----------- ----------- ----------
Al
Smith
1,654.89
221.87
Al
Smith
1,654.89
134.65
Al
Smith
1,654.89
206.34
...and
so on. Again, this example is somewhat meaningless but that's the
problem with aggregations, used without thought they can provide very
confusing output.
Nested
aggregates.
Rather
more helpfully, Sybase permits nesting of aggregate functions,
so that the following very usefull query is entirely legal...
select
max(avg(invoice))
from turnover
group by salesman
-------------
2987.34
Handling Null Values.
If
the
grouping expression produces a NULL value, the NULL is placed in
a row as if it were a real value. If more than one NULL is produced
they are all, quite logically, placed into a single NULL row...
select avg(payment),
count(*)
from salesman
group
by invoice
----------- -----------
NULL
2
1,225.00
1
1,500.00
4
1,750.00
1
and
so on. This is
one of those really usefull things that can be
extremely helpfull in certain circumstances, such as when you can't
understand why a report isn't delivering the expected result. Notice
that we used count(*) rather than count(payment) to find out how many
NULL rows we have. That's because counting NULLs against a column
returns nothing, no matter how many instances of NULL there are.
Where
There's a Where...
If
we use
a where clause in a grouping query, the where clause is
honoured first. In other words, only the set returned by the where
clause is grouped on. Hence...
select salesman,
office,
avg(invoice)
from turnover
where invoice > 3000 group
by salesman
might
produce something like...
salesman
office
----------- ----------- --------
Jim
Welsh
Slough
3,201.34
Paul
Jones Cirencester 3,501.76
Compare
this with our first example to see how dramatically this alters things.
However, there is another of those gotchas waiting because, if you do
this...
select salesman,
office,
avg(invoice),
avg(cost)
from turnover
where invoice > 3000
group
by salesman
you
might get this...
salesman
office
----------- ----------- -----------
-----------
Jim
Welsh
Slough
3,201.34 1098.12
Paul
Jones
Cirencester
3,501.76
875.32
which
makes these guys look like geniuses. The problem is that the last
column is a gold-plated, copper-bottomed lie.
Think about it. We told Sybase to find the average from the turnover
where the invoice value was greater than 3,000. We then told it to find
the average of the cost per invoice. What we didn't
do was tie the 'cost' column
to the where clause so it happily calculated the average across all the
invoices for each salesman, some of which were obviously very low. The
solution is to use a having clause...
Have
I Got a Clause for You...
The having clause works on a group by in the same way as a where clause
works on an ordinary query. Re-writing our example above as...
select salesman,
office,
avg(invoice),
avg(cost)
from turnover
group by
salesman
having invoice > 3000
Gets
us something
like this...
salesman
office
----------- ----------- -----------
-----------
Jim
Welsh
Slough
3,201.34 2,822.02
Paul
Jones
Cirencester
3,501.76 3,019.95
which
looks a lot more realistic. The having clause has dropped out all
the irrelevant rows and concentrated only on the invoices that we're
interested in.
Caveat
emptor...
Sybase
aggregate queries are one of the system's most powerful features
but, as always, with power comes responsibility. Always
test aggregate queries on
small but representative data sets to ensure that they really are doing
what you intend them to.
Find out more
by searching Google here...
|