Formative Assessment Reference
Need Help Writing an Essay?
Tell us about your assignment and we will find the best writer for your paper.
Get Help NowQ2.1
Problem Set 2
5% – Q 2.1 – Study the Application Case 1.7 in Chapter 1 titled “Gilt Groupe’s Flash Sales Streamlined by Big Data Analytics” and answer the two questions about the case study.
Application Case 1.7 Gilt Groupe’s Flash Sales Streamlined by Big Data Analytics
Gilt Groupe is an online destination offering flash sales for major brands by selling their clothing and accessories. It offers its members exclusive discounts on high-end clothing and other apparel. After registering with Gilt, customers are sent e-mails containing a variety of offers. Customers are given a 36-48 hour window to make purchases using these offers. There are about 30 different sales each day. While a typical department store turns over its inventory two or three times a year, Gilt does it eight to 10 times a year. Thus, they have to manage their inventory extremely well or they could incur extremely high inventory costs. In order to do this, analytics software developed at Gilt keeps track of every customer click—ranging from what brands the customers click on, what colors they choose, what styles they pick, and what they end up buying. Then Gilt tries to predict what these customers are more likely to buy and stocks inventory according to these predictions. Customers are sent customized alerts to sale offers depending on the suggestions by the analytics software.
That, however, is not the whole process. The software also monitors what offers the customers choose from the recommended offers to make more accurate predictions and to increase the effectiveness of its personalized recommendations. Some customers do not check e-mail that often. Gilt’s analytics software keeps track of responses to offers and sends the same offer 3 days later to those customers who haven’t responded. Gilt also keeps track of what customers are saying in general about Gilt’s products by analyzing Twitter feeds to analyze sentiment. Gilt’s recommendation software is based on Teradata Aster’s technology solution that includes Big Data analytics technologies.
QUESTIONS FOR DISCUSSION
• 1. What makes this case study an example of Big Data analytics?
• 2. What types of decisions does Gilt Groupe have to make?
What We Can Learn from This Application Case
There is continuous growth in the amount of structured and unstructured data, and many organizations are now tapping these data to make actionable decisions. Big Data analytics is now enabled by the advancements in technologies that aid in storage and processing of vast amounts of rapidly growing data.
Source: Asterdata.com, “Gilt Groupe Speaks on Digital Marketing Optimization,” asterdata.com/gilt_groupe_video.php (accessed February 2013).
Q4.1
5% – 4.1 – What are the reasons for the recent emergence of visual analytics? What is the difference between information visualization and visual analytics?
4.5 THE EMERGENCE OF DATA VISUALIZATION AND VISUAL ANALYTICS
As Seth Grimes (2009) has noted, there is a “growing palette” of data visualization techniques and tools that enable the users of business analytics and business intelligence systems to better “communicate relationships, add historical context, uncover hidden correlations and tell persuasive stories that clarify and call to action.” The latest Magic Quadrant on Business Intelligence and Analytics Platforms released by Gartner in February 2013 further emphasizes the importance of visualization in business intelligence. As the chart shows, most of the solution providers in the Leaders quadrant are either relatively recently founded information visualization companies (e.g., Tableau Software, QlikTech, Tibco Spotfire) or are well-established, large analytics companies (e.g., SAS, IBM, Microsoft, SAP, MicroStrategy) that are increasingly focusing their efforts in information visualization and visual analytics. Details on the Gartner’s latest Magic Quadrant are given in Technology Insights 4.1.
TECHNOLOGY INSIGHTS 4.1 Gartner Magic Quadrant for Business Intelligence and Analytics Platforms
Gartner, Inc., the creator of Magic Quadrants, is a leading information technology research and advisory company. Founded in 1979, Gartner has 5,300 associates, including 1,280 research analysts and consultants, and numerous clients in 85 countries.
Magic Quadrant is a research method designed and implemented by Gartner to monitor and evaluate the progress and positions of companies in a specific, technology-based market. By applying a graphical treatment and a uniform set of evaluation criteria, Magic Quadrant helps users to understand how technology providers are positioned within a market.
Gartner changed the name of this Magic Quadrant from “Business Intelligence Platforms” to “Business Intelligence and Analytics Platforms” in 2012 to emphasize the growing importance of analytics capabilities to the information systems that organizations are now building. Gartner defines the business intelligence and analytics platform market as a software platform that delivers 15 capabilities across three categories: integration, information delivery, and analysis. These capabilities enable organizations to build precise systems of classification and measurement to support decision making and improve performance.
Figure 4.5 illustrates the latest Magic Quadrant for Business Intelligence and Analytics platforms. Magic Quadrant places providers in four groups (niche players, challengers, visionaries, and leaders) along two dimensions: completeness of vision (x-axis) and ability to execute (y-axis). As the quadrant clearly shows, most of the well-known BI/BA providers are positioned in the “leaders” category while many of the lesser known, relatively new, emerging providers are positioned in the “niche players” category.
Right now, most of the activity in the business intelligence and analytics platform market is from organizations that are trying to mature their visualization capabilities and to move from descriptive to diagnostic (i.e., predictive and prescriptive) analytics. The vendors in the market have overwhelmingly concentrated on meeting this user demand. If there were a single market theme in 2012, it would be that data discovery/visualization became a mainstream architecture. For years, data discovery/visualization vendors—such as QlikTech, Salient Management Company, Tableau Software, and Tibco Spotfire—received more positive feedback than vendors offering OLAP cube and semantic-layer-based architectures. In 2012, the market responded:
• • MicroStrategy significantly improved Visual Insight.
• • SAP launched Visual Intelligence.
• • SAS launched Visual Analytics.
• • Microsoft bolstered PowerPivot with Power View.
• • IBM launched Cognos Insight.
• • Oracle acquired Endeca.
• • Actuate acquired Quiterian.
FIGURE 4.5 Magic Quadrant for Business Intelligence and Analytics Platforms.
Source: gartner.com.
This emphasis on data discovery/visualization from most of the leaders in the market—which are now promoting tools with business-user-friendly data integration, coupled with embedded storage and computing layers (typically in-memory/columnar) and unfettered drilling—accelerates the trend toward decentralization and user empowerment of BI and analytics, and greatly enables organizations’ ability to perform diagnostic analytics.
Source: Gartner Magic Quadrant, released on February 5, 2013, gartner.com (accessed February 2013).
In business intelligence and analytics, the key challenges for visualization have revolved around the intuitive representation of large, complex data sets with multiple dimensions and measures. For the most part, the typical charts, graphs, and other visual elements used in these applications usually involve two dimensions, sometimes three, and fairly small subsets of data sets. In contrast, the data in these systems reside in a data warehouse. At a minimum, these warehouses involve a range of dimensions (e.g., product, location, organizational structure, time), a range of measures, and millions of cells of data. In an effort to address these challenges, a number of researchers have developed a variety of new visualization techniques.
Visual Analytics
Visual analytics is a recently coined term that is often used loosely to mean nothing more than information visualization. What is meant by visual analytics is the combination of visualization and predictive analytics. While information visualization is aimed at answering “what happened” and “what is happening” and is closely associated with business intelligence (routine reports, scorecards, and dashboards), visual analytics is aimed at answering “why is it happening,” “what is more likely to happen,” and is usually associated with business analytics (forecasting, segmentation, correlation analysis). Many of the information visualization vendors are adding the capabilities to call themselves visual analytics solution providers. One of the top, long-time analytics solution providers, SAS Institute, is approaching it from another direction. They are embedding their analytics capabilities into a high-performance data visualization environment that they call visual analytics.
Visual or not visual, automated or manual, online or paper based, business reporting is not much different than telling a story. Technology Insights 4.2 provides a different, unorthodox viewpoint to better business reporting.
TECHNOLOGY INSIGHTS 4.2 Telling Great Stories with Data and Visualization
Everyone who has data to analyze has stories to tell, whether it’s diagnosing the reasons for manufacturing defects, selling a new idea in a way that captures the imagination of your target audience, or informing colleagues about a particular customer service improvement program. And when it’s telling the story behind a big strategic choice so that you and your senior management team can make a solid decision, providing a fact-based story can be especially challenging. In all cases, it’s a big job. You want to be interesting and memorable; you know you need to keep it simple for your busy executives and colleagues. Yet you also know you have to be factual, detail oriented, and data driven, especially in today’s metric-centric world.
It’s tempting to present just the data and facts, but when colleagues and senior management are overwhelmed by data and facts without context, you lose. We have all experienced presentations with large slide decks, only to find that the audience is so overwhelmed with data that they don’t know what to think, or they are so completely tuned out, they take away only a fraction of the key points.
Start engaging your executive team and explaining your strategies and results more powerfully by approaching your assignment as a story. You will need the “what” of your story (the facts and data) but you also need the “who?,” the “how?,” the “why?,” and the often missed “so what?” It’s these story elements that will make your data relevant and tangible for your audience. Creating a good story can aid you and senior management in focusing on what is important.
Why Story?
Stories bring life to data and facts. They can help you make sense and order out of a disparate collection of facts. They make it easier to remember key points and can paint a vivid picture of what the future can look like. Stories also create interactivity—people put themselves into stories and can relate to the situation.
Cultures have long used storytelling to pass on knowledge and content. In some cultures, storytelling is critical to their identity. For example, in New Zealand, some of the Maori people tattoo their faces with mokus. A moku is a facial tattoo containing a story about ancestors—the family tribe. A man may have a tattoo design on his face that shows features of a hammerhead to highlight unique qualities about his lineage. The design he chooses signifies what is part of his “true self” and his ancestral home.
Likewise, when we are trying to understand a story, the storyteller navigates to finding the “true north.” If senior management is looking to discuss how they will respond to a competitive change, a good story can make sense and order out of a lot of noise. For example, you may have facts and data from two studies, one including results from an advertising study and one from a product satisfaction study. Developing a story for what you measured across both studies can help people see the whole where there were disparate parts. For rallying your distributors around a new product, you can employ a story to give vision to what the future can look like. Most importantly, storytelling is interactive—typically the presenter uses words and pictures that audience members can put themselves into. As a result, they become more engaged and better understand the information.
So What Is a Good Story?
Most people can easily rattle off their favorite film or book. Or they remember a funny story that a colleague recently shared. Why do people remember these stories? Because they contain certain characteristics. First, a good story has great characters. In some cases, the reader or viewer has a vicarious experience where they become involved with the character. The character then has to be faced with a challenge that is difficult but believable. There must be hurdles that the character overcomes. And finally, the outcome or prognosis is clear by the end of the story. The situation may not be resolved—but the story has a clear endpoint.
Think of Your Analysis as a Story—Use a Story Structure
When crafting a data-rich story, the first objective is to find the story. Who are the characters? What is the drama or challenge? What hurdles have to be overcome? And at the end of your story, what do you want your audience to do as a result?
Once you know the core story, craft your other story elements: define your characters, understand the challenge, identify the hurdles, and crystallize the outcome or decision question. Make sure you are clear with what you want people to do as a result. This will shape how your audience will recall your story. With the story elements in place, write out the storyboard, which represents the structure and form of your story. Although it’s tempting to skip this step, it is better first to understand the story you are telling and then to focus on the presentation structure and form. Once the storyboard is in place, the other elements will fall into place. The storyboard will help you to think about the best analogies or metaphors, to clearly set up challenge or opportunity, and to finally see the flow and transitions needed. The storyboard also helps you focus on key visuals (graphs, charts, and graphics) that you need your executives to recall.
In summary, don’t be afraid to use data to tell great stories. Being factual, detail oriented, and data driven is critical in today’s metric-centric world but it does not have to mean being boring and lengthy. In fact, by finding the real stories in your data and following the best practices, you can get people to focus on your message—and thus on what’s important. Here are those best practices:
• 1. Think of your analysis as a story—use a story structure.
• 2. Be authentic—your story will flow.
• 3. Be visual—think of yourself as a film editor.
• 4. Make it easy for your audience and you.
• 5. Invite and direct discussion.
Source: Elissa Fink and Susan J. Moore, “Five Best Practices for Telling Great Stories with Data,” 2012, white paper by Tableau Software, Inc., tableausoftware.com/whitepapers/telling-stories-with-data (accessed February 2013).
High-Powered Visual Analytics Environments
Due to the increasing demand for visual analytics coupled with fast-growing data volumes, there is an exponential movement toward investing in highly efficient visualization systems. With their latest move into visual analytics, the statistical software giant SAS Institute is now among the ones who are leading this wave. Their new product, SAS Visual Analytics, is a very high-performance, in-memory solution for exploring massive amounts of data in a very short time (almost instantaneously). It empowers users to spot patterns, identify opportunities for further analysis, and convey visual results via Web reports or a mobile platform such as tablets and smartphones. Figure 4.6 shows the high-level architecture of the SAS Visual Analytics platform. On one end of the architecture, there are universal Data Builder and Administrator capabilities, leading into Explorer, Report Designer, and Mobile BI modules, collectively providing an end-to-end visual analytics solution.
Some of the key benefits proposed by SAS analytics are:
• • Empower all users with data exploration techniques and approachable analytics to drive improved decision making. SAS Visual Analytics enables different types of users to conduct fast, thorough explorations on all available data. Subsetting or sampling of data is not required. Easy-to-use, interactive Web interfaces broaden the audience for analytics, enabling everyone to glean new insights. Users can look at more options, make more precise decisions, and drive success even faster than before.
• • Answer complex questions faster, enhancing the contributions from your analytic talent. SAS Visual Analytics augments the data discovery and exploration process by providing extremely fast results to enable better, more focused analysis. Analytically savvy users can identify areas of opportunity or concern from vast amounts of data so further investigation can take place quickly.
• • Improve information sharing and collaboration. Large numbers of users, including those with limited analytical skills, can quickly view and interact with reports and charts via the Web, Adobe PDF files, and iPad mobile devices, while IT maintains control of the underlying data and security. SAS Visual Analytics provides the right information to the right person at the right time to improve productivity and organizational knowledge.
FIGURE 4.6An Overview of SAS Visual Analytics Architecture.
Source: SAS.com.
FIGURE 4.7A Screenshot from SAS Visual Analytics.
Source: SAS.com.
• • Liberate IT by giving users a new way to access the information they need. Free IT from the constant barrage of demands from users who need access to different amounts of data, different data views, ad hoc reports, and one-off requests for information. SAS Visual Analytics enables IT to easily load and prepare data for multiple users. Once data is loaded and available, users can dynamically explore data, create reports, and share information on their own.
• • Provide room to grow at a self-determined pace. SAS Visual Analytics provides the option of using commodity hardware or database appliances from EMC Greenplum and Teradata. It is designed from the ground up for performance optimization and scalability to meet the needs of any size organization.
Figure 4.7 shows a screenshot of an SAS Analytics platform where time-series forecasting and confidence intervals around the forecast are depicted. A wealth of information on SAS Visual Analytics, along with access to the tool itself for teaching and learning purposes, can be found at teradatauniversitynetwork.com.
SECTION 4.5 REVIEW QUESTIONS
• 1. What are the reasons for the recent emergence of visual analytics?
• 2. Look at Gartner’s Magic Quadrant for Business Intelligence and Analytics Platforms. What do you see? Discuss and justify your observations.
• 3. What is the difference between information visualization and visual analytics?
• 4. Why should storytelling be a part of your reporting and data visualization?
• 5. What is a high-powered visual analytics environment? Why do we need it?
Q5.7
5% – Q 5.3 – Study the Application Case 5.7 in Chapter 5 titled “Predicting Customer Buying Patterns – the Target Story” and answer the two questions for discussion.
5.7 DATA MINING PRIVACY ISSUES, MYTHS, AND BLUNDERS
Data Mining and Privacy Issues
Data that is collected, stored, and analyzed in data mining often contains information about real people. Such information may include identification data (name, address, Social Security number, driver’s license number, employee number, etc.), demographic data (e.g., age, sex, ethnicity, marital status, number of children, etc.), financial data (e.g., salary, gross family income, checking or savings account balance, home ownership, mortgage or loan account specifics, credit card limits and balances, investment account specifics, etc.), purchase history (i.e., what is bought from where and when either from the vendor’s transaction records or from credit card transaction specifics), and other personal data (e.g., anniversary, pregnancy, illness, loss in the family, bankruptcy filings, etc.). Most of these data can be accessed through some third-party data providers. The main question here is the privacy of the person to whom the data belongs. In order to maintain the privacy and protection of individuals’ rights, data mining professionals have ethical (and often legal) obligations. One way to accomplish this is the process of de-identification of the customer records prior to applying data mining applications, so that the records cannot be traced to an individual. Many publicly available data sources (e.g., CDC data, SEER data, UNOS data, etc.) are already de-identified. Prior to accessing these data sources, users are often asked to consent that under no circumstances will they try to identify the individuals behind those figures.
There have been a number of instances in the recent past where companies shared their customer data with others without seeking the explicit consent of their customers. For instance, as most of you might recall, in 2003, JetBlue Airlines provided more than a million passenger records of their customers to Torch Concepts, a U.S. government contractor. Torch then subsequently augmented the passenger data with additional information such as family size and Social Security numbers—information purchased from a data broker called Acxiom. The consolidated personal database was intended to be used for a data mining project in order to develop potential terrorist profiles. All of this was done without notification or consent of passengers. When news of the activities got out, however, dozens of privacy lawsuits were filed against JetBlue, Torch, and Acxiom, and several U.S. senators called for an investigation into the incident (Wald, 2004). Similar, but not as dramatic, privacy-related news has come out in the recent past about the popular social network companies, which allegedly were selling customer-specific data to other companies for personalized target marketing.
There was another peculiar story about privacy concerns that made it into the headlines in 2012. In this instance, the company did not even use any private and/or personal data. Legally speaking, there was no violation of any laws. It was about Target and is summarized in Application Case 5.7.
Application Case 5.7 Predicting Customer Buying Patterns—The Target Story
In early 2012, an infamous story appeared concerning Target’s practice of predictive analytics. The story was about a teenager girl who was being sent advertising flyers and coupons by Target for the kinds of things that a new mother-to-be would buy from a store like Target. The story goes like this: An angry man went into a Target outside of Minneapolis, demanding to talk to a manager: “My daughter got this in the mail!” he said. “She’s still in high school, and you’re sending her coupons for baby clothes and cribs? Are you trying to encourage her to get pregnant?” The manager didn’t have any idea what the man was talking about. He looked at the mailer. Sure enough, it was addressed to the man’s daughter and contained advertisements for maternity clothing, nursery furniture, and pictures of smiling infants. The manager apologized and then called a few days later to apologize again. On the phone, though, the father was somewhat abashed. “I had a talk with my daughter,” he said. “It turns out there’s been some activities in my house I haven’t been completely aware of. She’s due in August. I owe you an apology.”
As it turns out, Target figured out a teen girl was pregnant before her father did! Here is how they did it. Target assigns every customer a Guest ID number (tied to their credit card, name, or e-mail address) that becomes a placeholder that keeps a history of everything they have bought. Target augments this data with any demographic information that they had collected from them or bought from other information sources. Using this information, Target looked at historical buying data for all the females who had signed up for Target baby registries in the past. They analyzed the data from all directions, and soon enough some useful patterns emerged. For example, lotions and special vitamins were among the products with interesting purchase patterns. Lots of people buy lotion, but they have noticed was that women on the baby registry were buying larger quantities of unscented lotion around the beginning of their second trimester. Another analyst noted that sometime in the first 20 weeks, pregnant women loaded up on supplements like calcium, magnesium, and zinc. Many shoppers purchase soap and cotton balls, but when someone suddenly starts buying lots of scent-free soap and extra-big bags of cotton balls, in addition to hand sanitizers and washcloths, it signals that they could be getting close to their delivery date. At the end, they were able to identify about 25 products that, when analyzed together, allowed them to assign each shopper a “pregnancy prediction” score. More important, they could also estimate a woman’s due date to within a small window, so Target could send coupons timed to very specific stages of her pregnancy.
If you look at this practice from a legal perspective, you would conclude that Target did not use any information that violates customer privacy; rather, they used transactional data that most every other retail chain is collecting and storing (and perhaps analyzing) about their customers. What was disturbing in this scenario was perhaps the targeted concept: pregnancy. There are certain events or concepts that should be off limits or treated extremely cautiously, such as terminal disease, divorce, and bankruptcy.
QUESTIONS FOR DISCUSSION
• 1. What do you think about data mining and its implications concerning privacy? What is the threshold between knowledge discovery and privacy infringement?
• 2. Did Target go too far? Did they do anything illegal? What do you think they should have done? What do you think they should do now (quit these types of practices)?
Sources: K. Hill, “How Target Figured Out a Teen Girl Was Pregnant Before Her Father Did,” Forbes, February 13, 2012; and R. Nolan, “Behind the Cover Story: How Much Does Target Know?” NYTimes.com, February 21, 2012.
Data Mining Myths and Blunders
Data mining is a powerful analytical tool that enables business executives to advance from describing the nature of the past to predicting the future. It helps marketers find patterns that unlock the mysteries of customer behavior. The results of data mining can be used to increase revenue, reduce expenses, identify fraud, and locate business opportunities, offering a whole new realm of competitive advantage. As an evolving and maturing field, data mining is often associated with a number of myths, including the following (Zaima, 2003):
Myth Reality
Data mining provides instant, crystal-ball-like predictions. Data mining is a multistep process that requires deliberate, proactive design and use.
Data mining is not yet viable for business applications. The current state-of-the-art is ready to go for almost any business.
Data mining requires a separate, dedicated database. Because of advances in database technology, a dedicated database is not required, even though it may be desirable.
Only those with advanced degrees can do data mining. Newer Web-based tools enable managers of all educational levels to do data mining.
Data mining is only for large firms that have lots of customer data. If the data accurately reflect the business or its customers, a company can use data mining.
Data mining visionaries have gained enormous competitive advantage by understanding that these myths are just that: myths.
The following 10 data mining mistakes are often made in practice (Skalak, 2001; Shultz, 2004), and you should try to avoid them:
• 1. Selecting the wrong problem for data mining.
• 2. Ignoring what your sponsor thinks data mining is and what it really can and cannot do.
• 3. Leaving insufficient time for data preparation. It takes more effort than is generally understood.
• 4. Looking only at aggregated results and not at individual records. IBM’s DB2 IMS can highlight individual records of interest.
• 5. Being sloppy about keeping track of the data mining procedure and results.
• 6. Ignoring suspicious findings and quickly moving on.
• 7. Running mining algorithms repeatedly and blindly. It is important to think hard about the next stage of data analysis. Data mining is a very hands-on activity.
• 8. Believing everything you are told about the data.
• 9. Believing everything you are told about your own data mining analysis.
• 10. Measuring your results differently from the way your sponsor measures them.
SECTION 5.7 REVIEW QUESTIONS
• 1. What are the privacy issues in data mining?
• 2. How do you think the discussion between privacy and data mining will progress? Why?
• 3. What are the most common myths about data mining?
• 4. What do you think are the reasons for these myths about data mining?
• 5. What are the most common data mining mistakes/blunders? How can they be minimized and/or eliminated?
Q6.2
5% – Q 6.2 – Study the Application Case 6.1 in Chapter 6 titled “Neural Networks are Helping to Save Lives in the Mining Industry” and answer the two questions for discussion.
Application Case 6.1 provides an interesting example of the use of neural networks as a prediction tool in the mining industry.
Application Case 6.1 Neural Networks Are Helping to Save Lives in the Mining Industry
In the mining industry, most of the underground injuries and fatalities are due to rock falls (i.e., fall of hanging wall/roof). The method that has been used for many years in the mines when determining the integrity of the hanging wall is to tap the hanging wall with a sounding bar and listen to the sound emitted. An experienced miner can differentiate an intact/solid hanging wall from a detached/loose hanging wall by the sound that is emitted. This method is subjective. The Council for Scientific and Industrial Research (CSIR) in South Africa has developed a device that assists any miner in making an objective decision when determining the integrity of the hanging wall. A trained neural network model is embedded into the device. The device then records the sound emitted when a hanging wall is tapped. The sound is then preprocessed before being input into a trained neural network model, and the trained model classifies the hanging wall as either intact or detached.
Mr. Teboho Nyareli, working as a research engineer at CSIR, who holds a master’s degree in electronic engineering from the University of Cape Town in South Africa, used NeuroSolutions, a popular artificial neural network modeling software developed by NeuroDimensions, Inc., to develop the classification type prediction models. The multilayer perceptron-type ANN architecture that he built achieved better than 70 percent prediction accuracy on the hold-out sample. Currently, the prototype system is undergoing a final set of tests before deploying it as a decision aid, followed by the commercialization phase. The following figure shows a snapshot of NeuroSolution’s model building platform.
QUESTIONS FOR DISCUSSION
• 1. How did neural networks help save lives in the mining industry?
• 2. What were the challenges, the proposed solution, and the obtained results?
Source: NeuroSolutions customer success story, neurosolutions.com/resources/nyareli.html (accessed February
5% – Q 7.1 – Study the Opening Vignette in Chapter 7 titled “Machine Versus Men on Jeopardy!: The Story of Watson” and answer questions 1, 2, 3, and 4 from questions for the opening vignette.
7.1 OPENING VIGNETTE: Machine Versus Men on Jeopardy!: The Story of Watson
Can machine beat the best of man in what man is supposed to be the best at? Evidently, yes, and the machine’s name is Watson. Watson is an extraordinary computer system (a novel combination of advanced hardware and software) designed to answer questions posed in natural human language. It was developed in 2010 by an IBM Research team as part of a DeepQA project and was named after IBM’s first president, Thomas J. Watson.
BACKGROUND
Roughly 3 years ago, IBM Research was looking for a major research challenge to rival the scientific and popular interest of Deep Blue, the computer chess-playing champion, which would also have clear relevance to IBM business interests. The goal was to advance computer science by exploring new ways for computer technology to affect science, business, and society. Accordingly, IBM Research undertook a challenge to build a computer system that could compete at the human champion level in real time on the American TV quiz show, Jeopardy! The extent of the challenge included fielding a real-time automatic contestant on the show, capable of listening, understanding, and responding—not merely a laboratory exercise.
COMPETING AGAINST THE BEST
In 2011, as a test of its abilities, Watson competed on the quiz show Jeopardy!, which was the first ever human-versus-machine matchup for the show. In a two-game, combined-point match (broadcast in three Jeopardy! episodes during February 14–16), Watson beat Brad Rutter, the biggest all-time money winner on Jeopardy!, and Ken Jennings, the record holder for the longest championship streak (75 days). In these episodes, Watson consistently outperformed its human opponents on the game’s signaling device, but had trouble responding to a few categories, notably those having short clues containing only a few words. Watson had access to 200 million pages of structured and unstructured content consuming four terabytes of disk storage. During the game Watson was not connected to the Internet.
Meeting the Jeopardy! Challenge required advancing and incorporating a variety of QA technologies (text mining and natural language processing) including parsing, question classification, question decomposition, automatic source acquisition and evaluation, entity and relation detection, logical form generation, and knowledge representation and reasoning. Winning at Jeopardy! required accurately computing confidence in your answers. The questions and content are ambiguous and noisy and none of the individual algorithms are perfect. Therefore, each component must produce a confidence in its output, and individual component confidences must be combined to compute the overall confidence of the final answer. The final confidence is used to determine whether the computer system should risk choosing to answer at all. In Jeopardy! parlance, this confidence is used to determine whether the computer will “ring in” or “buzz in” for a question. The confidence must be computed during the time the question is read and before the opportunity to buzz in. This is roughly between 1 and 6 seconds with an average around 3 seconds.
HOW DOES WATSON DO IT?
The system behind Watson, which is called DeepQA, is a massively parallel, text mining–focused, probabilistic evidence-based computational architecture. For the Jeopardy! challenge, Watson used more than 100 different techniques for analyzing natural language, identifying sources, finding and generating hypotheses, finding and scoring evidence, and merging and ranking hypotheses. What is far more important than any particular technique that they used was how they combine them in DeepQA such that overlapping approaches can bring their strengths to bear and contribute to improvements in accuracy, confidence, and speed.
DeepQA is an architecture with an accompanying methodology, which is not specific to the Jeopardy! challenge. The overarching principles in DeepQA are massive parallelism, many experts, pervasive confidence estimation, and integration of the-latest-and-greatest in text analytics.
• • Massive parallelism: Exploit massive parallelism in the consideration of multiple interpretations and hypotheses.
• • Many experts: Facilitate the integration, application, and contextual evaluation of a wide range of loosely coupled probabilistic question and content analytics.
• • Pervasive confidence estimation: No component commits to an answer; all components produce features and associated confidences, scoring different question and content interpretations. An underlying confidence-processing substrate learns how to stack and combine the scores.
• • Integrate shallow and deep knowledge: Balance the use of strict semantics and shallow semantics, leveraging many loosely formed ontologies.
Figure 7.1 illustrates the DeepQA architecture at a very high level. More technical details about the various architectural components and their specific roles and capabilities can be found in Ferrucci et al. (2010).
FIGURE 7.1 A High-Level Depiction of DeepQA Architecture.
CONCLUSION
The Jeopardy! challenge helped IBM address requirements that led to the design of the DeepQA architecture and the implementation of Watson. After 3 years of intense research and development by a core team of about 20 researchers, Watson is performing at human expert levels in terms of precision, confidence, and speed at the Jeopardy! quiz show.
IBM claims to have developed many computational and linguistic algorithms to address different kinds of issues and requirements in QA. Even though the internals of these algorithms are not known, it is imperative that they made the most out of text analytics and text mining. Now IBM is working on a version of Watson to take on surmountable problems in healthcare and medicine (Feldman et al., 2012).
QUESTIONS FOR THE OPENING VIGNETTE
• 1. What is Watson? What is special about it?
• 2. What technologies were used in building Watson (both hardware and software)?
• 3. What are the innovative characteristics of DeepQA architecture that made Watson superior?
• 4. Why did IBM spend all that time and money to build Watson? Where is the ROI?
• 5. Conduct an Internet search to identify other previously developed “smart machines” (by IBM or others) that compete against the best of man. What technologies did they use?
The post Gilt Groupe’s Flash Sales Streamlined by Big Data Analytics appeared first on Myessaythinkers.com.
I lOVE this Professional essay writing website. This is perhaps the fifth time I am placing an order with them, and they have not failed me not once! My previous essays and research papers were of excellent quality, as always. With this essay writing website, you can order essays, coursework, projects, discussion, article critique, case study, term papers, research papers, research proposal, capstone project, reaction paper, movie review, speech/presentation, book report/review, annotated bibliography, and more.
Post your homework questions and get original answers from qualified tutors!