Different number of classes in training and testing data
    조회 수: 3 (최근 30일)
  
       이전 댓글 표시
    
Hello Everyone!
I am trying to develop a classifier for classification of different human activities i.e., standing, walking, sitting, laying in bed etc.
I have total of 6 classes in my dataset (concatenated all activities) and training multiple classifiers and also implementing K-fold cross validation during training.
I split my data into train and test set, train my classifier and then test my classifier and accuracies are pretty good (almost 95%). (I wont go into details of segmentation, feature extraction etc.)
The problem that I am facing right now is testing my classifier with completely new data. The following are some notable things about the new test data:
1) The new test data has 5 classes instead of 6.
2) The new test has continuous activities e.g., a person performing all 5 activities in quick succession without any break. Whereas, during training data, each activity was seprately recorded and then I concatenated all activities.
3) The accuracy that I am getting is around 70% (pretty bad) and also most of the misclassifications are being thrown into the class that is not in the new testing set e.g., class 6 is not present in new test set but when I check predictions, the classifier is saying (for major portion of activities) that it belongs to class 6.
I cannot reduce the number of classes in my training set because when I implement this in real-time, the subjects will perform all the 6 activities but ofcourse not all together and one by one e.g., in real-time, the subject will perform activity 1 (the classifier will classify it as 1) then activity 2 (the classifier will classify it as 2) and so on.
Can you people help me understand, which parameters I should observe to improve the accuracy of my classifier for the new test data? Because 70% is not acceptable to me.  And any suggestions on the mistakes that you have seen here and can correct me and tell me the right path.
Thank you so much :)
댓글 수: 0
답변 (1개)
  Umar
      
      
 2024년 7월 5일
        Hi Jamil,
When dealing with new test data that differs significantly from the training set, several factors need to be considered to enhance the classifier's performance. Here are some key parameters to observe and suggestions to improve the accuracy of the classifier for the new test data:
1. Feature Representation
Ensure that the features used for training the classifier are robust and representative of the activities. Feature extraction plays a crucial role in classification accuracy. Consider exploring different feature sets or enhancing the existing features to capture the nuances of continuous activities.
2. Class Imbalance
Address the class imbalance between the training and test data. Since the new test data has fewer classes, the classifier may struggle with imbalanced class distributions. Techniques like oversampling, undersampling, or using class weights during training can help mitigate this issue.
3. Model Generalization
Check the generalization capability of your classifier. If the model is overfitting to the training data, it may not perform well on unseen data. Regularization techniques, such as dropout or L2 regularization, can prevent overfitting and improve generalization.
4. Sequence Learning
Given the nature of continuous activities in the new test data, consider incorporating sequence learning models like Recurrent Neural Networks (RNNs) or Long Short-Term Memory (LSTM) networks. These models can capture temporal dependencies in the data and improve classification accuracy for sequential activities.
5. Transfer Learning
Explore transfer learning techniques to leverage knowledge from the existing classifier trained on concatenated activities. Fine-tuning the pre-trained model on the new test data or using it as a feature extractor can help adapt the classifier to the specific characteristics of the continuous activities.
6. Evaluation Metrics
Apart from accuracy, consider using additional evaluation metrics like precision, recall, and F1-score to gain insights into the classifier's performance across different classes. This can help identify specific areas of improvement and guide adjustments in the classification strategy.
By carefully analyzing these parameters and implementing the suggested strategies, you can enhance the classifier's accuracy for the new test data and address the challenges posed by the differences in class distribution and activity continuity. Continuous refinement and adaptation of the classifier will be key to achieving higher accuracy levels and ensuring reliable performance in real-time scenarios.
댓글 수: 0
참고 항목
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!

