Unit 7
Discussion 1
Big data analytics can help solve some very hard problems. One example is to detect network traffic anomalies caused by diverse machine-generated traffic attacks (known as hit inflation attacks, which “refer to the fraudulent activities of generating charges for online advertisers without a real interest in the product advertised”) by detecting the anomalous deviation from the expected Internet Protocol (IP) size distribution, where the term of IP size is defined as “the number of users sharing the same source IP” (Sakr & Gaber, 2014). The ability to detect hit inflation attacks is critical to the well-being of online advertisement because it will ensure the healthy operations of many daily used popular public Web-based services, such as search engines, e-mail, maps, and other Web-based applications. However, the network traffic data itself is also a type of large-scale data set. To process such a data set efficiently to discover the corresponding IP size distributions for all publishers’ Web sites for detecting network traffic anomalies is a very challenging task.
Complete the reading assignment, and search the Library and Internet to find and study more references that discuss detecting network traffic anomalies based on the IP size distribution. Based on the results of your research, discuss the following concepts:
· Identify 1 method of detecting network traffic anomalies based on the IP size distribution.
· What are the design principles of the method?
· How does each method address the performance issue of processing such large scale network traffic data?
Discussion 2
Different from the conventional distributed systems such as those supercomputer-based client-server systems or small-scale cluster systems, the network performance of the cloud computing system has a unique characteristic. The network bandwidth among different pairs of computers in the cloud can vary significantly, and it is called “the bandwidth unevenness among different machine pairs” (Sakr & Gaber, 2014). When a very large-scale data set, such as those in a social network, Web graph, information networks (which are known as large-scale graph data set), needs to be partitioned into many machines in a cloud computing system before it can be processed, the network performance problem caused by the bandwidth unevenness among different machine pairs needs to be seriously considered and addressed. It will impact the entire data processing performance because the partitioning of the very large-scale data set will generate a very large amount of network traffic and will impact a very large number of machines. Network performance is a critical parameter for the design in any cloud computing-based large-scale graph data set partitioning and processing method.
Complete the reading assignment, and search the Library and Internet to find and study more references that discuss the cloud computing-based large-scale graph dataset partitioning and processing. Based on the results of your research, discuss the following topics:
· Identify 2 large-scale graph data set partitioning methods used in a cloud computing system.
· What are the design principles of each method?
· How does each method address the network performance issue caused by the bandwidth unevenness among different machine pairs?