App Finger Printing
- Date: Dec 2017
- Category: Data Science
- Key Tags: SVM, pyshark, Spark
A project to identify an app from its network traffic.
The focus of the project is to identify an app from its network traffic. The dataset was collected by running 30,000 apps from 4 different categories. Extract 8 features from network packets and then used Radial Basis Function (RBF) kernel support vector machine(SVM), 10-fold cross validation on Spark to train a model that can identify an app from its network traffic. The model accuracy is around 88.4%.
Group9_Final_Project ├── README.md # Current readme file. ├── Group9Presentation.pptx # PPT for presentation. ├── pcap_parser.py # model to extract features from a single pcap file ├── data_parser.py # model to parse all the pcap files and generate input data file ├── training.py # train model using input data and get accuracy └── app_data # some data for running the code ├── communication # pcap files with label communication ├── finance # pcap files with label finance └── social # pcap files with label social
- Install tshark (https://www.wireshark.org/docs/man-pages/tshark.html)
- Python 3.6.2
Feature extractions and generate input file for ML
python3 data_parser.py --input app_data --output output.csv
Training with svm
python3 training.py --input output.csv
Round 1 - kernel='rbf', probability=True Accuracy: 0.93 (+/- 0.04) Round 2 - kernel='sigmoid', probability=True Accuracy: 0.74 (+/- 0.10) Round 3 - kernel='poly', probability=True, degree=3 Accuracy: 0.87 (+/- 0.06) Round 4 - kernel='poly', probability=True, degree=8 Accuracy: 0.80 (+/- 0.08) Round 5 - kernel='poly', C=10, probability=True, degree=8 Accuracy: 0.84 (+/- 0.08)