Course Overview
This 9-hour course is for users who want to attain operational intelligence level 4, (business insights) and covers exploratory data analysis by using statistical tools and custom visualizations.
Prerequisites
To be successful, students should have a solid understanding of the following courses:
- Intro to Splunk (Retired)
- Using Fields (SUF)Using Fields
- Scheduling Reports & Alerts (SRA)Scheduling Reports and Alerts
- Visualizations (SVZ)Visualizations
- Working with Time (WWT)Working with Time
- Statistical Processing (SSP)Statistical Processing
- Comparing Values (SCV)Comparing Values
- Result Modification (SRM)Result Modification
- Leveraging Lookups and Subsearches (LLS)Leveraging Lookups and Sub-searches
- Correlation Analysis (SCLAS)Correlation Analysis
- Search Under the Hood (SUH)Search Under the Hood
- Intro to Knowledge Objects (IKO)Intro to Knowledge Objects
- Creating Field Extractions (CFE)Creating Field Extractions
- Search Optimization (SSO)Search Optimization
Course Objectives
- Analytics Framework
- Exploring and visualizing data
- Cleaning and Preprocessing Data
- Numerical and String based clustering
- Data Correlation
- Meta Transactions
- Detecting Anomalies
- Forecasting
Outline: Exploring and Analyzing Data with Splunk (EADS)
Topic 1 – What is Data Science
- Define terms related to analytics and data science
- Describe the analytics workflow
- Describe Artificial Intelligence and Machine Learning
- Examine common Machine Learning myths
- Describe Splunk’s Machine Learning tools
Topic 2 – Exploratory Data Analysis
- Use bin and makecontinuous to restructure and visualize data
- Examine field statistics with fieldsummary
- Transform fields with eval and fillnull
- Clean text with the rex and cleantext commands
- Solve Anscombe’s Quartet
- Apply boxplots and 3d scatterplots to visualize data
Topic 3 – Event Clustering
- Take a behavioral based approach to cluster data
- Cluster numerical fields using the kmeans command
- Cluster based of string similarity with the cluster command
- Find patterns in clusters
Topic 4– Correlations and Transactions
- Define correlation and co-occurrence
- Use SPL correlation commands
- Use the statistical tests from the Machine Learning Toolkit to correlate fields
- Use streamstats and chart commands to correlate data
Topic 5– Anomaly Detection
- Define Statistical Outliers
- Use Add-hoc methods of numerical anomaly detection
- Find numerical or categorical anomalies with the AnomalyDetection command
Topic 6 – Forecasting
- Define forecasting use cases
- Use the predict command to forecast future timeseries