Due to the Hyperstage column-oriented data organization
and other Hyperstage-specific features, query optimization in Hyperstage
is slightly different than in traditional DBMS approaches.
- Hyperstage works well with data tables containing many columns,
where only necessary columns are accessed by query (as opposed to
SELECT *). The traditional approach suggests keeping records as
small as possible (for example, using schema normalization and table
decomposition). However, in Hyperstage, only necessary columns are
used in calculations. Therefore, queries with many limiting conditions
on many columns of the same table are especially well optimized
in Hyperstage.
- In traditional DBMS systems, better performance can be achieved
by creating indices. In Hyperstage, Knowledge Nodes are used instead
of indices (Knowledge Nodes are created automatically). To further
enhance performance, you can try to influence the data loading procedure
by keeping similar data (for example, for similar time frames) close
together. The order in which data is loaded may influence both compression
ratio and query speed.
- Avoid using OR in queries and, if possible, use IN instead.
In some cases, ORs can be translated to UNION ALL or IN. For example:
...WHERE a=1 OR a=2...
could
be replaced by
...WHERE a IN (1,2)...
- Try to replace correlated subqueries with joins and independent
subqueries.
- Executing queries in steps may also help with missing function
support. For instance, execute the bulk of the query in Hyperstage
and export the data to a MyISAM table. Then, execute the function
query on the result set.
Temp tables may be used to manage intermediate steps without
needing to do database cleanup.
To optimize your query performance, avoid the following, which
will result in the query being handled by the MySQL query engine:
- Using functions or type cast operators.
- Creating queries containing mixed Hyperstage and MySQL tables.
- Exporting data in mysql format. Note that mysql is the default
export format.
- Performing comparisons or arithmetical operations on two different
data types (such as numbers and text).
- Creating JOINs with the JOIN condition defined as NOT BETWEEN.