Review on Environment-Independent Pattern Recognition

5 minute read

Published:

*This topic integrates several papers from different conferences, including “Towards Environment Independent Device Free Human Activity” MobiCom’18, “Learning Sleep Stages from Radio Signals: A Conditional Adversarial Architecture” ICML’17, etc.

structure

Problem formulation

Though strategies like dropout or augmentation address overfitting to some extent, it still harasses deep neural network in pattern recognition if raw data contains too much target-irrelevant information. When deploying DNN for Wi-Fi based activity recognition, the problem embodies performance degradation at different environments because wireless signals usually carry substantial information of the environment (e.g., penetration, reflection…). In other words, an activity recognition model that is trained on a specific subject in a specific environment will not work well when being applied to predict another subject’s activities that are recorded in a different environment.

Researchers have a common belief towards the solution, i.e., to discard the redundant information and keep what is important (a typical feature engineering issue).

A way-out

A genre of game theoretic adversarial network is proposed. In its simplest form, there are three components: feature extractor, activity recognizer and domain discriminator. Feature extractor is equal to backbone as I perceive it, which extracts feature tensor directly from raw data that usually includes tons of extraneous information.

The methodology of the proposed adversarial network is to tick out environment information. Hence, a game is developed that the competition between activity recognizer and domain discriminator can ultimately facilitate a compact feature extractor that select information discretely. In the theory 1, Zhao et al. suggest that by calculating loss separately with opposite objectives (i.e., predict domain or activity) that “compete” with each other, the later merger (to get a final loss value), which requires additional construction, makes backpropagation of feature extractor “collaborate” with activity recognizer in the end. Note that some parameters are needed to prevent the model from collapse.

Jiang et al. 2 combine unsupervised learning to settle unlabeled data issue, which is a common challenge in activity recognition and add other constraints (e.g., confidence) to further consolidate the performance.

Discussion

A lot of other works also provide their solutions in different settings but with the same high-level principle, e.g., 3 4 5.

Identifying commonness shared across different domains has long been the goal of traditional feature engineering. While the above situations primarily hypothesize that it should be uneasy to manually discover/compute relevant statistical feature vector as in conventional machine learning methods such as SVM and trees, in some other scenarios, there are researches that are notable for describing useful features in pattern recognition though they are usually published a long while ago 6 .

Bearing that in mind, concatenating deterministic features into matrices acquired by backbone could be beneficial, which in a sense is proven by those works, and another way to boost the performance besides constraints as in 2.

Thoughts

My personal intuition is that transfer learning could be severely deteriorating if the training set is imbalanced and not carefully constructed as it directly influences feature extractor. However, Huh et al. 7 claim that “most changes in the choice of pre-training data long thought to be critical, do not significantly affect transfer performance” and the definitive answer is still missing.

The inexplicability of transfer learning is unsettling me until I find 8 actually answers my question. Researchers put forward a concept called “Feature Purification”, which describes the process where the model purifies hidden weights by removing mixtures that is the abstraction of environment-specific information in the above case. Zhu et al. also theorize the formation of the dense mixture (see Theorem E.1 in the paper), which fundamentally explains why we need adversarial network.

From a rookie’s perspective, I see these works even more important beyond their current scope but I think it might contribute more to the community if the development of theory can pace up with application.

  1. Zhao, Mingmin, Shichao Yue, Dina Katabi, Tommi S. Jaakkola, and Matt T. Bianchi. “Learning sleep stages from radio signals: A conditional adversarial architecture.” In International Conference on Machine Learning, pp. 4100-4109. PMLR, 2017. 

  2. Jiang, Wenjun, Chenglin Miao, Fenglong Ma, Shuochao Yao, Yaqing Wang, Ye Yuan, Hongfei Xue et al. “Towards environment independent device free human activity recognition.” In Proceedings of the 24th Annual International Conference on Mobile Computing and Networking, pp. 289-304. 2018.  2

  3. Wang, Yaqing, Fenglong Ma, Zhiwei Jin, Ye Yuan, Guangxu Xun, Kishlay Jha, Lu Su, and Jing Gao. “Eann: Event adversarial neural networks for multi-modal fake news detection.” In Proceedings of the 24th acm sigkdd international conference on knowledge discovery & data mining, pp. 849-857. 2018. 

  4. Wang, Jindong, Yiqiang Chen, Lisha Hu, Xiaohui Peng, and S. Yu Philip. “Stratified transfer learning for cross-domain activity recognition.” In 2018 IEEE international conference on pervasive computing and communications (PerCom), pp. 1-10. IEEE, 2018. 

  5. Wang, Jindong, Vincent W. Zheng, Yiqiang Chen, and Meiyu Huang. “Deep transfer learning for cross-domain activity recognition.” In proceedings of the 3rd International Conference on Crowd Science and Engineering, pp. 1-8. 2018. 

  6. Guyon, Isabelle, and André Elisseeff. “An introduction to variable and feature selection.” Journal of machine learning research 3, no. Mar (2003): 1157-1182. 

  7. Huh, Minyoung, Pulkit Agrawal, and Alexei A. Efros. “What makes ImageNet good for transfer learning?.” arXiv preprint arXiv:1608.08614 (2016). 

  8. Allen-Zhu, Zeyuan, and Yuanzhi Li. “Feature purification: How adversarial training performs robust deep learning.” arXiv preprint arXiv:2005.10190 (2020).