The problem of this month concerns the following DB2 request :
SELECT * FROM SYSTEXT1 A ,SYSTEXT2 B WHERE TOPIC BETWEEN :VAR1 AND :VAR2 AND A.ITEM=B.ITEM
The transaction executed has a bad response time (2.3 seconds).
First step is to point out the DB2 statement into program involved in the problem.
To save time, I used SQLTUNE to dispaly the following panel :

Request NO 3, when opening cursor, seems to be the origin of this long elapse time.
Then I asked to display the DB2 request and this is the problem of the day.
In all performance problem analysis, once the request is pointed out, it is mandatory to know access path used by DB2 to reach data (this information maybe obtained by DB2 EXPLAIN request).
I asked SQLTUNE to display this access path and I obtained :

For memo,
SYSTEXT1 and SYSTEXT2 are tables used by ISQL HELP command.
SYSTEXT1 is used as an index and contains 2 columns, TOPIC which is the keyword for requested help
and ITEM which is a key for SYSTEXT2, table containing columns ITEM,SEQNO,LANGKEY and SQL/DS HELP containing explanation.
If access path to access SYSTEXT1 table seems correct, access path to SYSTEXT2 table (JOIN using MERGE/SCAN) is not acceptable.
To refresh your memory about different possible options used by DB2 to process a JOIN .
There are 2 possible methods..
- a)NESTED LOOP
this first method consist of selecting into the first table, rows that have to be joined to rows in the second tablee, and for such selected rows process a JOIN to second table in the most efficient way. This method is well adapted, when percentage of selected rows in the first table is low before joining the second table.
- b)MERGE SCAN
This second method consist of ordering both tables on columns involved in join condition, and access both ordered keys sequentially and select only matching keys.
This method is appropriate when a high percentage of first table has to be joined.
For our current problem, I know there are only a few rows to join compared with the size of these tables. But DB2 estimation is very different (20000
as seen of figure 2).
My job is to convince DB2, the JOIN will be processed for a very few rows.
In that case, this number of rows to join is the resulting rows for Where condition
'TOPIC BETWEEN :VAR1 AND :VAR2'
.
To understand how DB2 estimates number of resulting rows , let's use a general concept outside DB2 environment.
If I ask the more appropriate answer to the following question (appropriate means to avoid being completely wrong)
- First question.
In a room, there are english and american people. There is a total of 100 persons.
How many american people are in the room ?
You can answer 98 and if you are lucky you could get the right number but if you are wise your answer will be 50 to avoid being completely wrong.
- 2nd question
In the room there are men and women. There is a total of 100 persons.
How many women are in the room ?
Your answer will probably be 50.
- 3rd question
In the room there are english and american people, men and women. There is a total of 100 persons.
How many american women are in the room ?
Your answer should be 25 because 50% are americans and among these persons 50% are women.
The number of estimated elements has decreased from 50 to 25 by adding criteria to the " WHERE CLAUSE "
Let's try to process the DB2 request in the same way.
To lower estimated resulting rows by DB2 in the first table to process the JOIN,
we should add criteria to "where clause" :
WHERE TOPIC BETWEEN :VAR1 AND :VAR2
Table SYSTEXT1 contains only one extra column ITEM,
and it seems difficult to find a criteria on this column. TOPIC is the only available column to add criteria.
Let's try to add AND TOPIC BETWEEN :VAR1 AND :VAR2
The resulting request looks like :

Let's see the new DB2 access path using SQLTUNE

You can notice, number of estimated rows is 2000 and DB2 decides to process JOIN using
NESTED LOOP method.
After modifying the program and compiling, I try again the transaction and I get new statistics as follow :

My mission is accomplished, and I can invite you next month for a new problem analysis.