- True Positives (TP): The number of actual anomalies correctly identified.
- False Positives (FP): The number of normal samples incorrectly classified as anomalies.
- True Negatives (TN): The number of normal samples correctly identified.
- False Negatives (FN): The number of actual anomalies incorrectly classified as normal.
How to interpret Anomaly Scores for One Class Support Vector Machines
댓글 수: 0
채택된 답변
댓글 수: 2
Hi @NCA,
In anomaly detection, the classification of samples as anomalies or normal is contingent upon the threshold set on the anomaly scores. In your case, you can indeed reverse the labeling of anomalies and normal samples; the key is consistency in your approach. If your model, such as OCSVM, designates negative scores as anomalies, you should adjust your ground truth accordingly. Regarding the creation of the groundTruth variable, it is essential to have a corresponding label for each anomaly score. For your 31 test samples, you should create a binary array where each entry reflects whether the sample is an anomaly (1) or normal (0). This will enable you to accurately compute metrics like True Positives, False Positives, and others, ensuring a robust evaluation of your model's performance.Here’s a brief code snippet to illustrate how you might set up your groundTruth:
% Example ground truth for 31 samples groundTruth = [0, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0];
The code snippet above will allow you to effectively evaluate your anomaly detection model. Hope this helps clarify your question, “Secondly I am assuming you created "groundTruth" so in my case I need to create a file termed "groundTruth" with a value of 0 or 1 against each of my "anomalyScores" for my 31 test samples?” Please let us know if you have any further questions.
추가 답변 (1개)
Hi @NCA,
To address your query regarding, “I am using One Class Support Vector Machines for anomaly detection. Here is the anomaly scores histogram (attached) for the model trained with 274 samples and tested with 31 samples. How do I determine the true/false prediction rates from the anomaly scores histogram. “
Please see my response to your comments below.
First, I generated synthetic data, as you can see in the code, rng(1) command sets the random number generator seed to 1, by making sure that the results can be reproduced, trainData generates 274 samples from a standard normal distribution (mean = 0, variance = 1) for training and testData creates a test dataset consisting of 31 normal samples and 10 anomalies (shifted by 5 units on the x-axis).
rng(1); % For reproducibility trainData = randn(274, 2); % 274 samples for training testData = [randn(31, 2); randn(10, 2) + 5]; % 31 normal samples and 10 anomalies
Then, created labels for training data which creates a label vector for the training data, where all entries are set to 1, indicating that all training samples are considered normal.
trainLabels = ones(size(trainData, 1), 1);
Now, training one class SVM is implemented in which fitcsvm function trains a One-Class SVM model using the training data and labels, KernelFunction', 'gaussian’ specifying the use of a Gaussian kernel for the SVM, ’Standardize', true normalizes the data to have zero mean and unit variance and ‘ClassNames', [1; -1] defines the class labels for the model.
ocsvmModel = fitcsvm(trainData, trainLabels, 'KernelFunction', 'gaussian', 'Standardize', true, 'ClassNames', [1; -1]);
Afterwards, predicting anomaly scores for test data which uses the trained SVM model to predict labels and scores for the test data. The score variable contains the anomaly scores, which indicate how likely each sample is to be an anomaly.
[predictedLabels, score] = predict(ocsvmModel, testData);
Then, created subplots first histograms in which a figure with two subplots is created. The first subplot displays a histogram of the anomaly scores for the test data while the second subplot shows the histogram of the anomaly scores for the training data.
figure;
% Subplot for test data subplot(2, 1, 1); histogram(score(:, 2), 30, 'FaceColor', 'b', 'FaceAlpha', 0.5); title('Anomaly Scores Histogram - Test Data'); xlabel('Anomaly Score'); ylabel('Frequency');
% Subplot for training data subplot(2, 1, 2); trainScores = predict(ocsvmModel, trainData); trainAnomalyScores = trainScores(:, 1); % Get anomaly scores for training data histogram(trainAnomalyScores, 30, 'FaceColor', 'r', 'FaceAlpha', 0.5); title('Anomaly Scores Histogram - Training Data'); xlabel('Anomaly Score'); ylabel('Frequency');
Afterwards, determining true/false prediction rates in which a threshold of 0 is set to classify scores as anomalies. Scores greater than this threshold are considered anomalies. Also, the trueLabels vector is created to represent the actual labels of the test data.
threshold = 0; % Set threshold for anomaly detection predictions = score(:, 2) > threshold; % True if score indicates anomaly
% True labels: 1 for normal, -1 for anomaly trueLabels = [ones(31, 1); -ones(10, 1)];
Then, I implemented code to calculate true positive, false positive, true negative and false negative based on the predictions and true labels.
TP = sum(predictions(trueLabels == -1)); % True Positives FP = sum(predictions(trueLabels == 1)); % False Positives TN = sum(~predictions(trueLabels == 1)); % True Negatives FN = sum(~predictions(trueLabels == -1));% False Negatives
The true positive rate (sensitivity) and false positive rate are calculated to evaluate the model's performance.
truePositiveRate = TP / (TP + FN); falsePositiveRate = FP / (FP + TN);
Finally, the true positive and false positive rates are printed to the console, providing insight into the model's effectiveness in detecting anomalies.
fprintf('True Positive Rate: %.2f\n', truePositiveRate); fprintf('False Positive Rate: %.2f\n', falsePositiveRate);
Please see attached.
Please let me know if this helped resolve your problem. Please let me know if you have any further questions.
댓글 수: 0
참고 항목
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!