Due to the Hyperstage column-oriented data organization
and other Hyperstage-specific features, query optimization in Hyperstage
is slightly different than in traditional DBMS approaches.
- Hyperstage works
well with data tables containing many columns, where only necessary
columns are accessed by query (as opposed to SELECT *). The traditional approach
suggests keeping records as small as possible (for example, using
schema normalization and table decomposition). However, in Hyperstage,
only necessary columns are used in calculations. Therefore, queries
with many limiting conditions on many columns of the same table
are especially well optimized in Hyperstage.
- In traditional DBMS
systems, better performance can be achieved by creating indices.
In Hyperstage, Knowledge Nodes are used instead of indices (Knowledge Nodes
are created automatically). To further enhance performance, you
can try to influence the data loading procedure by keeping similar
data (for example, for similar time frames) close together. The
order in which data is loaded may influence both compression ratio
and query speed.
- Avoid using OR in
queries and, if possible, use IN instead. In some cases, ORs can be
translated to UNION ALL or IN. For example:
...WHERE a=1 OR a=2...
could
be replaced by
...WHERE a IN (1,2)...
- Try to replace correlated
subqueries with joins and independent subqueries.
- Executing queries
in steps may also help with missing function support. For instance,
execute the bulk of the query in Hyperstage and export the data
to a PostgreSQL table. Then, execute the function query on the result
set.
Temp tables may be used to manage intermediate steps without
needing to do database cleanup.
To optimize your query performance, avoid the following, which
will result in the query being handled by the PostgreSQL query engine:
- Using functions or
type cast operators.
- Creating queries
containing mixed Hyperstage and PostgreSQL tables.
- Performing comparisons
or arithmetical operations on two different data types (such as
numbers and text).
- Creating JOINs with
the JOIN condition defined as NOT BETWEEN.