Amazon Price: N/A (as of February 21, 2018 19:10 –
If you’ve been asked to maintain large and complex Hadoop clusters, this book is a must. Demand for operations-specific material has skyrocketed now that Hadoop is becoming the de facto standard for truly large-scale data processing in the data center. Eric Sammer, Principal Solution Architect at Cloudera, shows you the particulars of running Hadoop in production, from planning, installing, and configuring the system to providing ongoing maintenance.
Rather than run through all possible scenarios, this pragmatic operations guide calls out what works, as demonstrated in critical deployments.Get a high-level overview of HDFS and MapReduce: why they exist and how they workPlan a Hadoop deployment, from hardware and OS selection to network requirementsLearn setup and configuration details with a list of critical propertiesManage resources by sharing a cluster across multiple groupsGet a runbook of the most common cluster maintenance tasksMonitor Hadoop clusters—and learn troubleshooting with the help of real-world war storiesUse basic tools and techniques to handle backup and catastrophic failure
Amazon Price: $45.00 $43.75 You save: $1.25 (3%). (as of February 21, 2018 16:19 –
There are millions of searchable data sources on the Web and to a large extent their contents can only be reached through their own query interfaces. There is an enormous interest in making the data in these sources easily accessible. There are primarily two general approaches to achieve this objective. The first is to surface the contents of these sources from the deep Web and add the contents to the index of regular search engines. The second is to integrate the searching capabilities of these sources and support integrated access to them. In this book, we introduce the state-of-the-art techniques for extracting, understanding, and integrating the query interfaces of deep Web data sources. These techniques are critical for producing an integrated query interface for each domain. The interface serves as the mediator for searching all data sources in the concerned domain. While query interface integration is only relevant for the deep Web integration approach, the extraction and understanding of query interfaces are critical for both deep Web exploration approaches.
This book aims to provide in-depth and comprehensive coverage of the key technologies needed to create high quality integrated query interfaces automatically. The following technical issues are discussed in detail in this book: query interface modeling, query interface extraction, query interface clustering, query interface matching, query interface attribute integration, and query interface integration.
Table of Contents: Introduction / Query Interface Representation and Extraction / Query Interface Clustering and Categorization / Query Interface Matching / Query Interface Attribute Integration / Query Interface Integration / Summary and Future Research
Amazon Price: N/A (as of February 21, 2018 08:12 –
Millions of public Twitter streams harbor a wealth of data, and once you mine them, you can gain some valuable insights. This short and concise book offers a collection of recipes to help you extract nuggets of Twitter information using easy-to-learn Python tools. Each recipe offers a discussion of how and why the solution works, so you can quickly adapt it to fit your particular needs. The recipes include techniques to:Use OAuth to access Twitter dataCreate and analyze graphs of retweet relationshipsUse the streaming API to harvest tweets in realtimeHarvest and analyze friends and followersDiscover friendship cliquesSummarize webpages from short URLs
This book is a perfect companion to O’Reilly's Mining the Social Web.
Amazon Price: N/A (as of February 21, 2018 03:33 –
Get Started Fast with Apache Hadoop® 2, YARN, and Today’s Hadoop Ecosystem
With Hadoop 2.x and YARN, Hadoop moves beyond MapReduce to become practical for virtually any type of data processing. Hadoop 2.x and the Data Lake concept represent a radical shift away from conventional approaches to data usage and storage. Hadoop 2.x installations offer unmatched scalability and breakthrough extensibility that supports new and existing Big Data analytics processing methods and models.
Continue reading “Hadoop 2 Quick-Start Guide: Learn the Essentials of Big Data Computing in the Apache Hadoop 2 Ecosystem (Addison-Wesley Data & Analytics Series)”
Amazon Price: $45.00 $45.00 (as of February 21, 2018 13:04 –
Harness the power of social media to predict customer behavior and improve sales
Social media is the biggest source of Big Data. Because of this, 90% of Fortune 500 companies are investing in Big Data initiatives that will help them predict consumer behavior to produce better sales results. Written by Dr. Gabor Szabo, a Senior Data Scientist at Twitter, and Dr. Oscar Boykin, a Software Engineer at Twitter, Social Media Data Mining and Analytics shows analysts how to use sophisticated techniques to mine social media data, obtaining the information they need to generate amazing results for their businesses.
Social Media Data Mining and Analytics isn't just another book on the business case for social media. Rather, this book provides hands-on examples for applying state-of-the-art tools and technologies to mine social media – examples include Twitter, Facebook, Pinterest, Wikipedia, Reddit, Flickr, Web hyperlinks, and other rich data sources. In it, you will learn: The four key characteristics of online services-users, social networks, actions, and content The full data discovery lifecycle-data extraction, storage, analysis, and visualization How to work with code and extract data to create solutions How to use Big Data to make accurate customer predictions
Szabo and Boykin wrote this book to provide businesses with the competitive advantage they need to harness the rich data that is available from social media platforms.