Datasets in Quicksight run either in Spice or Direct query mode. Query mode can be seen in the bottom left corner with Spice and Direct query as options. We can switch between these two options as and when it’s required. It also has a refresh now option which does a full refresh on the data source and brings in all the new records to the dashboard.
Spice
Spice is a Super-fast parallel in-memory calculation engine, it incorporates in-memory and columnar storage technologies that extract data from tables of data sources defined in the dataset on schedule and store the data in its memory. Moreover, this spice dataset has the capacity to handle any number of requests that are made through the Quicksight dashboard because it automatically scales internally for a better user experience.
QuickSight manages the entire workload without placing a strain on the underlying data source. Therefore, it makes no difference if 10 users or 10,000 users access a dataset through one or more dashboards. SPICE layer in Quicksight was developed with performance in mind and offers lightning-fast dashboard loads. It sits between the user interface and the data source and is specifically designed for analytical queries in Quicksight.
Importing data directly from the S3 data source or uploading files such as CSV or XLS, are not allowed for querying the required field to create a new dataset for analysis, instead, all of the data will be moved to spice.
Spice Schedule
Spice datasets can be scheduled to refresh from once every 15 minutes to a year (15 minutes, 30 minutes, 1 hour, week, month, year). During the refresh schedule, the data from the source is moved to spice to hold the updated records which then produce the data on the dashboard.
Spice Capacity
Each AWS Region has a different allocation of SPICE capacity. All QuickSight users in a specific AWS Region share the SPICE capacity for each AWS account.
Administrators of QuickSight may see how much SPICE capacity is available in each AWS Region and how much of it is being used. Additionally, administrators have the option of adding more SPICE capacity or releasing any extra space.
A spice dataset can hold up to 500 million rows or 500GB for the Enterprise Edition, however, 25 million rows or 25GB for the Standard edition.
Both queries and files can be used to ingest data into SPICE. However, in terms of the file, we can have up to 2,000 columns. There can be up to 127 Unicode characters in each column name. Each manifest for an import from Amazon S3 can specify up to 1,000 files.
Features of Spice
- Highly scalable as it can enhance resources internally based on the number of requests.
- Data in spice is highly available even if the data source failed to ingest data.
- Super fast dashboard loads because the data is stored in the memory itself.
- Incremental or full data source refresh can be set for spice datasets to have updated records on schedule.
Direct query
When using the Direct query option, each time a dashboard or analysis that uses the dataset is loaded, queries are run against the backend data source.
The major advantage of having the dataset in a direct query is that there is no need for us to set a refresh to have the new records in the dashboard on schedule. It automatically pulls up all the records from the data source every time the dashboards get viewed by the users.
QuickSight can produce smart queries when used in direct query mode, joining only the necessary tables rather than all the tables defined in the dataset. You can specify which tables have unique keys when defining joins, and QuickSight will use this information to determine if it is safe to remove the table from the query if no fields are being fetched from it.
Custom SQL Query:
You can insert your query as a Custom SQL in the dataset if, for example, it already pulls data from 25 tables and you don’t want to remap every relationship in the UI (Joining the datasets using the user interface in Quicksight and it can be done by selecting the Add Data in the Quicksight data Prep). The Custom SQL can be combined with other tables in the dataset and will function as a regular table there. It is also flexible in terms of writing complex queries such as sub-queries and datasets with multiple joins.