Exploring Data Using Graphics and Visualization

by

 
In this you will be using the churn data:  churn_data.txt
Read data into a data frame using the function read.csv() with the following options:
header=T, stringsAsFactors=F
Assume that you saved the file churn_data.txt in C:/Datasets folder. Then you can read file into a data frame as follows:
file=”C:/Datasets/churn_data.txt”
churnData=read.csv(file, stringsAsFactors = FALSE,header = TRUE)
A) Print the name of the columns.
Hint: colnames() function.
B) Print the number of rows and columns
Hint: dim()
C)  Count the number calls per state.
Hint: table() function.
D) Find mean, median,standard deviation, and variance of nightly charges, the column Night.Charge in the data.
The R functions to be used are mean(), median(), sd(), var().
E) Find maximum and minimum values of international charges (Intl.Charge), customer service calls (CustServ.Calls), and daily charges(Day.Charge).
F) Use summary() function to print information about the distribution of the following features:
“Eve.Charge”     “Night.Mins”     “Night.Calls”    “Night.Charge”   “Intl.Mins”      “Intl.Calls”
What are the min and max values printed by the summary() function for these features?
Check textbook page 34 for a sample.
G) Use unique() function to print the distinct values of the following columns:
State, Area.Code, and Churn.
H)  Extract the subset of  data for the churned customers(i.e., Churn=True). How many rows are in the subset?
Hint: Use subset() function. Check lecture notes and textbook for samples.
I)  Extract the subset of data for customers that made more than 3 customer service calls(CustServ.Calls). How many rows are in the subset?
J) Extract the subset of churned customers with no international plan (Int.l,Plan) and no voice mail plan (VMail.Plan). How many rows are in the subset?
K) Extract the data for customers from California (i.e., State is CA)  who did not churn but made more than 2 customer service calls.
L) What is the mean of customer service calls for the customers that did not churn (i.e., Churn=False)?question2 related to above 
 
In this ,we will explore the churn data using graphics and visualization. One of the primary reasons for performing exploratory data analysis (EDA) is to investigate the variables, examine the distributions of the categorical variables, look at the histograms of the numeric variables, and explore the relationships among sets of variables.
Although we are not going to develop any models for this project, in a real-world project our task is to identify patterns in the data that will help to reduce the proportion of churners.
We will use the same data set we had in Week 2 assignment:
Data file: churn_data.txt
All graphics in this assignment have to be plotted using ggplot2 library. So, you need to install ggplot2 library for graphs:install.packages(“ggplot2″)
Before using any methods from the libraries, you need to load these libraries into the R code using
library(ggplot2)
Here is how you can read data into a data frame named churnData:
churnData <- read.csv(filePath, stringsAsFactors = FALSE,header = TRUE)where filePath is the location of the churn_data.txt file. For example, if you saved file in C:/tmp, then you should use C:/tmp/churn_data.txt
The variables in the file churn_data.txt are
State: Categorical, for the 50 states and the District of Columbia.
Account length: Integer-valued, how long account has been active.
Area code: Categorical
Phone number: Essentially a surrogate for customer ID.
International plan: Dichotomous categorical, yes or no.
Voice mail plan: Dichotomous categorical, yes or no.
Number of voice mail messages: Integer-valued.
Total day minutes: Continuous, minutes customer used service during the day.
Total day calls: Integer-valued.
Total day charge: Continuous, perhaps based on above two variables.
Total eve minutes: Continuous, minutes customer used service during the evening.
Total eve calls: Integer-valued.
Total eve charge: Continuous, perhaps based on above two variables.
Total night minutes: Continuous, minutes customer used service during the night.
Total night calls: Integer-valued.
Total night charge: Continuous, perhaps based on above two variables.
Total international minutes: Continuous, minutes customer used service to make
international calls.
Total international calls: Integer-valued.
Total international charge: Continuous, perhaps based on above two variables.
Number of calls to customer service: Integer-valued.
Churn: Target. Indicator of whether the customer has left the company (true or false).
Part 1. Bar Charts
A bar chart is a histogram for discrete data: it records the frequency of every value of a categorical variable.
1.) Vertical Bar Charts
Plot the bar charts of State, Area.Code, Int.l.Plan, VMail.Plan, CustServ.Calls, and Churn.
Use the theme() function to change the text size, location, color, etc.. (An example is given in the textbook on page 61)
The following is the bar chart for State. As an example, the x-axis labels are bold, and rotated 90 degrees which can be set in the theme() function using 
axis.text.x = element_text(face=”bold”,angle=90,vjust=0.5, size=11).
Similarly, the parameter colour=”#990000″ is used for the color of the x-axis title. So, the following options for axis.title.x and axis.text.x  in theme() function display the title and text of x-axis as shown in the figure below:
axis.title.x = element_text(face=”bold”, colour=”#990000″, size=12), axis.text.x = element_text(face=”bold”,angle=90,vjust=0.5, size=11)2.) Horizontal Bar Charts
Create the horizontal bar chart of CustServ.Calls.
Hint: Textbook page 49.3.) Horizontal Bar Charts with Sorted Categories
Create horizontal bar chart where the number of calls are sorted for CustServ.Calls.
Hint: Textbook pages 50-51Part 2: Histograms and Density Plots
The histogram and the density plot are two visualizations that help you quickly examine the distribution of a numerical variable.
A basic histogram bins a variable into fixed-width buckets and returns the number of data points that falls into each bucket. You can think of a density plot as a “continuous histogram” of a variable, except the area under the density plot is equal to 1.
1.) Plot the histograms of Account.Length, VMail.Message, Day.Mins, Intl.Calls, and VMail.Message.
Based on the histograms, comment on whether any of them have outliers, close to the Normal Distribution, multi-modal, or skewed.
The histogram for Account.Length is shown below:2.) Plot the density plots of Account.Length, VMail.Message, Day.Mins, Intl.Calls, and VMail.Message.
Based on the density plots, comment on whether any of them have outliers, close to the Normal Distribution, multi-modal, or skewed.
As a sample, the density plot for VMail.Message is shown below:Part 3. Scatter Plots
In addition to examining variables in isolation, you’ll often want to look at the relationship between two variables.
Part A)
Plot the scatter plots for pairs Eve.Mins – Day.Mins, Day.Mins-Day.Charge, Eve.Mins-Eve.Charge, Day.Mins-Day.Calls.
Based on the plots, are there any relationships between the pair of features plotted?
The scatter plot of Eve.Mins vs Day.Mins is given below:Part B)For the scatter plots in part A, add color  to display churn and no-churn data points. Simply add  aes(color=Churn) to the geom_point() function as shown below:
geom_point(aes(color=Churn))Part 4. Box Plots
A box-and-whiskers plot describes the distribution of a continuous variable by  plotting its five-number summary: the minimum, lower quartile (25th  percentile), median (50th percentile), upper quartile (75th percentile), and maximum.
Plot the box plots of CustServ.Calls, Night.Calls, and Intl.Charge by Churn.
Which of the features have outliers? (can you spot them in the box plot?)
What is the median of Night.Calls for customers that did not churn? (from the box plot)
The following is the box plot of CustServ.Calls.Hint:You can find detailed information and samples of box plot at 
https://ggplot2.tidyverse.org/reference/geom_boxplot.html
Part 5. Dodged and Stacked Bar Charts
A) Display a dodged bar chart of Int.l.Plan by Churn.
Hint: Textbook pages 60-61.B) Display a stacked bar chart of CustServ.Calls and Churn.
Approximate price: $22
We value our customers and so we ensure that what we do is 100% original..

With us you are guaranteed of quality work done by our qualified experts.Your information and everything that you do with us is kept completely confidential.You have to be 100% sure of the quality of your product to give a money-back guarantee. This describes us perfectly. Make sure that this guarantee is totally transparent.The Product ordered is guaranteed to be original. Orders are checked by the most advanced anti-plagiarism software in the market to assure that the Product is 100% original. The Company has a zero tolerance policy for plagiarism.The Free Revision policy is a courtesy service that the Company provides to help ensure Customer’s total satisfaction with the completed Order. To receive free revision the Company requires that the Customer provide the request within fourteen (14) days from the first completion date and within a period of thirty (30) days for dissertations.The Company is committed to protect the privacy of the Customer and it will never resell or share any of Customer’s personal information, including credit card data, with any third party. All the online transactions are processed through the secure and reliable online payment systems.By placing an order with us, you agree to the service we provide. We will endear to do all that it takes to deliver a comprehensive paper as per your requirements. We also count on your cooperation to ensure that we deliver on this mandate.

Never use plagiarized sources. Get Your Original Essay on
Exploring Data Using Graphics and Visualization
Hire Professionals Just from $11/Page
Order Now Click here