Skip to main content

Datasets

Properly preparing your data is essential to ensure accurate and reliable results. This includes understanding the content, structure, and expected format of your dataset. A few restrictions apply to guarantee that Studio interprets your data correctly.

Studio is designed to work with time-series data, so each dataset is treated as a time series by default. Timestamps must be unique within a dataset; if duplicates exist, Studio will use the first occurrence along with its corresponding observations and ignore any subsequent duplicates.

Sampling Rate

Sampling rate is defined as the interval of constant samples present within an time-series per a unit of time. Therefore, the sampling period is defined as the difference between two consecutive samples or observations in an time-series.

Note:
Sampling rate and period is automatically inferred for an uploaded dataset.

Missing Data

The term missing data refers to gaps or absent values within a dataset, where there is no valid measurement or recorded entry for a given variable. This typically occurs due to errors, corruption, or incompatiblity.

In practice, this appears as:

  • Empty cells ("")
  • Strings such as na, nan, n/a, missing, or null
  • Non-numeric values in numeric columns
  • Values that are infinite or cannot be parsed

Missing data values are automatically detected and hanlded within Studio, ensuring that your models can still be built and that your analysis remains robust even in the face of incomplete data.

Dataset Size

You can upload a file with up to 40,000 rows of data. Keeping your file underneath this limit ensures smoothing processing and fast validation.

For example, three years of hourly time-series data provides the ideal balance of comprehrensive seasonality and practicality. The large timeframe esnures that there are sufficient seasonal patterns for observation, for instance a daily cycle repeats over 1000 times whilst a weekly pattern occurs 156 times. Three annual cycles also provides annual seasonality which is key in robust trend detection and seasonal decomposition. Capturing three years of data helps Studio distinguish true seasonal effects from random fluctuations, allowing year-on-year trends to be identified with greater statistical confidence. The image below illustrates an example of three years of hourly time-series data.

Description of image
3 year hourly Multivariate Time-Series

Expected Dataset Structure

The table below showcases the expected columns

Column Name
Format / Description
Examples
Timestamp
YYYY-mm-dd HH:MM:SS UTC — Date and time of the observation, in UTC
2025-07-21 14:30:00, 2025-07-22 09:15:00
Numeric Columns
Floating point or integer values representing measurements or features
2.84, 0.13, 17
Boolean Columns
True/False values
true, false
Categorical Columns
Nominal or ordinal categories — can be used as features (not supported as target for classification in Forecasting)
A, B, Low, Medium, High

Data Types

At its core, Studio supports two key data types; numerical and categorical. Numerical data consists of values that can be measured and represented using numbers, this may include age, temperature, or height. Categorical data is the opposite of this wherein it is consists of values that represent categories or groups.

Within the world of time-series data there are numerous types of variables. Numerical variables can be represented by numbers and can be measured, or are quantifiable. This includes variables that continuous, such as pressure, as well as variables that are discrete, such as number of items sold

Studio also supports Boolean variables. Boolean variables are a special subtype of categorical data in which there are only two possible values (0/1 or True/False). Other categorical varibales such as Nominal, Ordinal, and Multi-class categorical are also supported.

Note:
Similarily to Sampling Rate, Studio will automatically determine the data type of an uploaded dataset

Adding a Dataset

Adding a new dataset in Studio is simple. From the datasets page, first click the Add Dataset button. This will open the Add Dataset popup.

There are two steps to adding a dataset:

1

Upload File

Upload a file containing time-series data

2

Create Dataset

Once validated, configure and personalise this uploaded data

Upload File

To be able to create a dataset to Studio, there are some requirements that your dataset must meet. Studio only supports CSV files. Your file must have a column named Timestamp with the format YYYY-mm-dd HH:MM:SS UTC as a column.

Description of image
Upload File popup

Once, you have selected and uploaded, it is validated. The validation process checks that uploaded files are correctly formatted and ready to be used for forecasting. When a file is submitted, it is automatically reviewed to confirm that required columns are present, timestamps can be read correctly, and the data is complete and consistent. The system also verifies that values are numeric where expected and that the time intervals make sense for analysis. If any issues are found, the validation will highlight them so they can be fixed before continuing. Once the dataset passes these checks, the pop-up will automatically take you to the second step of the process - Create Dataset

Create Dataset

Once your uploaded file has been successfully validated, you can now name your dataset and add a brief description of your dataset. As a default, Studio will use the name of the uploaded file as the name of your dataset.

Description of image
Regarding Descriptions:
At mininum, you must add a description of ten characters before your dataset can be succesfully added to Studio

Updating a Dataset

Any dataset that has previously been uploaded into Studio can be updated with new values. This may occur when new data becomes available or when existing values need to be corrected. Updates are always based on the Timestamp column, ensuring that new or modified records are correctly aligned with your existing dataset.

Warning:
It is vital that the structure of the dataset (columns and their names) remains identical to the original. Any changes in column names or order may cause the update to fail.

How to Update a Dataset

1

Update Dataset

In the Datasets table, click the update icon for the dataset you wish to modify. This opens the Update Dataset popup.

2

Upload New Dataset

Updating a dataset follows a process similar to adding a new dataset:

  • Upload a supported file containing the updated data.

  • The system validates the file to ensure it matches the existing dataset structure.

  • Once validated, the new data can be added to the dataset.

When adding new data, only timestamps after the end of the previously uploaded dataset are appended. This means the uploaded file can have a different filename or start immediately after the last timestamp in the existing dataset - it will still be successfully integrated. The illustration below demonstrates how new data is merged with an existing dataset:

Current
Timestamp
Variable A
Variable B
...
...
...
2025-07-21
2.8
0.04
2025-07-22
3.2
0.13
+
New (Uploaded)
Timestamp
Variable A
Variable B
...
...
...
2025-07-23
2.9
0.21
2025-07-24
3.7
0.16
Updated
Timestamp
Variable A
Variable B
...
...
...
2025-07-21
2.8
0.04
2025-07-22
3.2
0.13
2025-07-23
2.9
0.21
2025-07-24
3.7
0.16

Deleting a Dataset

Deleting a dataset is easy within Studio. Each row of the datasets table has a column with a delete button. When clicked, a pop-up will appear informing you of what dataset will be deleted. Here you can either confirm the deletion or cancel it.

Deleting Datasets:
When deleting a dataset, any experiments that use this dataset will also be automatically deleted.