I have collected data from an IMU sensor to build a gesture recognition application. How can I remove outliers before the model building process?
Outliers can significantly affect the performance of machine learning models, so it’s important to handle them appropriately.
1. Understand Your Data
 Visualize Data: Plot your IMU sensor data to visually inspect for any obvious outliers. Tools like Matplotlib or Seaborn in Python can be useful for this purpose. For this purpose, you can use SensiML Python SDK to access segments of your data, and explore your data using wellknown libraries.
 Statistical Summary: Calculate statistical measures (mean, median, standard deviation) to get an understanding of the data distribution. Usually values that are far from average within a specific threshold defined by factors of standard deviation can be considered as outliers or be studied in more details to find the cause of their high variation.
2. Define What Constitutes an Outlier
 Domain Knowledge: Use your knowledge of the application to define thresholds for what values are considered normal and what values are outliers.
 Statistical Methods: Use statistical techniques such as Zscore or IQR (Interquartile Range) to identify outliers.
How to remove outliers in the Piccolo AI pipeline
This is the guide on how to remove outliers from your IMU sensor data when building your model using the SensiML model builder.
1 In the Feature Extraction block of your pipeline click on a “+” sign and select an “Outlier Filter” block.
2 Define one or more outlier filers. The objective here is to define these filters to to remove as much as unwanted data possible prior to start the modeling process. Therefore, any prior data exploration and insight might be very helpful at this stage. Here is the list of all offered filters:

Local Outlier Factor Filtering: The local outlier factor (LOF) to measure the local deviation of a given data point with respect
to its neighbors by comparing their local density.The LOF algorithm is an unsupervised outlier detection method which computes the local density deviation of a given data point with respect to its neighbors. It considers as outlier samples
that have a substantially lower density than their neighbors. 
Zscore Filter: A zscore filter is a way to standardize feature vectors by transforming each
feature in the vector to have a mean of zero and a standard deviation of one. The zscore, or standard score, is a measure of how many standard deviations a data point is from the mean of the distribution. This features that have zscore outside of a cutoff threshold are removed. 
Sigma Outliers Filtering: A sigma outlier filter algorithm is a technique used to identify and remove outliers from feature vectors based on their deviation from the mean. In this algorithm, an outlier is defined as a data point that falls outside a certain number of standard deviations (sigma) from the mean of the distribution.

One Class SVM filtering: An Unsupervised Outlier Detection to estimate the support of a highdimensional distribution. The implementation is based on libsvm.

Robust Covariance Filtering: An Unsupervised Outlier Detection for detecting outliers in a Gaussian distributed dataset.

Isolated Forest Filtering: Isolation Forest Algorithm returns the anomaly score of each sample using the Isolation Forest algorithm. The “Isolation Forest” isolates observations by randomly selecting a feature and then randomly selecting a split value between the maximum and minimum values of the selected feature.