Unit 8 Discussion – One of the main purposes of processing big data

by

Unit 8 – 1

One of the main purposes of processing big data is to extract knowledge (or so called big knowledge) out from the big data set. “Knowledge is the meaningfulness about the data” (Sakr & Gaber, 2014). Knowledge representation is usually associated with the problem-solving task’s specific requirements. For example, if a problem-solving task involves time order, then a list may be a suitable data structure for the knowledge representation; if a problem-solving task involves no time order, then a set may be a suitable data structure for the knowledge representation. However, there is a universal standard for knowledge representation proposed by the World Web Consortium (W3C) called Resource Description Framework [RDF] (2014). It is the standard model for machine-readable data representation, which now has been commonly used to hold the knowledge representation in an application of processing big data set. Resource Description Framework is very helpful when you need to integrate the results of several big data set processing applications. It can facilitate knowledge integration even when the underlying data schemas differ in the original data storage systems.

Never use plagiarized sources. Get Your Original Essay on
Unit 8 Discussion – One of the main purposes of processing big data
Hire Professionals Just from $11/Page
Order Now Click here

Complete the reading assignment, and search the Library and Internet to find and study more references that discuss how to extract knowledge from a large scale data set by applying machine learning. Based on the results of your research, discuss the following:

· Identify 1 research work that involves how to extract out knowledge from a large-scale data set with specific real-world semantics (e.g., an informatics system for biomedical research) by applying machine learning.

· How is the machine learning applied in this research work?

· How is the extracted knowledge represented?

· How does the research work address the performance issue of processing such large-scale data sets?

Justify your point of view and provide examples, as necessary.