Downloaded 270 times
APACHE HADOOP ON AZURE AND WINDOWS MICROSOFT’S APACHE HADOOP-BASED SERVICES FOR AZURE AND ENTERPRISE ELASTIC MAPREDUCE FOR AZURE AND ENTERPRISE PRIVATE CLOUDS Brad Sarsfield Engineering Architect Microsoft Big Data | Haodoop March 2012 | revision 1.02
ISOTOPE BRIDGES BI TO COLLABORATION TO CLOUD "The next frontier is all about uniting the power of the cloud with the power of data to gain insights that simply weren’t possible even just a few years ago" Ted Kummert, CVP Business Platforms SQL PASS, October 2011
BIG DATA IS HERE AND HADOOP IS CENTER STAGE
15 out of 17 sectors in the US have more data stored per company than the US Library of Congress 140,000-190,000 more deep analytical talent positions 1.5 million 50-60% more data savvy managers increase in the number of Hadoop developers in the US alone within organizations already using Hadoop within a year 250ドル billion Potential annual value to Europe’s public sector 300ドル billion Potential annual value to US healthcare ECONOMIC CONTEXT AND EXEMPLAR Special Report: The CEO’s Guide to Hadoop Learn how large corporations are coping with the increasing flow of unstructured data by using a free software program called Hadoop http://www.businessweek.com/technology/special-reports/ceo-guide-to-hadoop.html
THE 4Vs OF BIG DATA: VOLUME, VELOCITY, VARIABILITY, AND VARIETY Isotope is designed to enable solution building with all key dimensions in mind Deep integration and coordination with existing Microsoft enterprise, cloud, and BI tools
Cassandra Hadoop BackType MR/GFS SimpleDB Hive Oozie Hadoop Bigtable Dynamo Scribe PigLatin Pig HBase Dremel EC2/EMR/S3 Hadoop ... Cassandra ... ... Internal [ Dryad | Cosmos] and External [ Isotope | Azure | Excel | BI | SQL DW | LTH ] VIBRANT ECOSYSTEM IN ENTERPRISE AND CLOUD WITH MICROSOFT Scalable machine learning and data mining [Mahout] Statistical modeling and analysis [R] Coordination and workflow [Oozie, Cascading] Data integration and transformation [SQOOP, Flume] Social network analytics and petascale graph learning [Pegasus] Real-time stream analytics and business intelligence merged with petascale computation[HStreamming] Scale-out caching and storage [Cassandra, HBase, Riak, Redis, Couchbase, S3] Cloud-oriented data warehousing, pattern discovery, and transformation [Hive, Pig]
ENTER ISOTOPE Isotope is the internal codename for Microsoft’s suite of products to support Hadoop in Windows and Azure
Un- and Semi-Structured Sensors Crawlers SQL REPORTING Devices Interactive Reports with Crescent Bots Apps Business HADOOP SQL ANALYSIS Users Excel with PowerPivot EIS ERP SQL DATA WAREHOUSING CRM LOB Embedded BI Apps Structured OUR DIFFERENTIATORS FOR CLOUD AND ENTERPRISE Self-service business intelligence at any scale on premise or cloud Complete integration of information assets from log files to collaboration artifacts to enterprise data stores Familiar and integrated tools for analytics, insight, exploration, modeling, and strategic decision making Transparent, federated identity and security management for all big data services High availability data protection and recovery services for enterprises through cloud Enterprise-grade support for all service, frameworks, and tools
HADOOP [Azure and Enterprise] Java OM Streaming OM HiveQL PigLatin .NET/C#/F# (T)SQL OCEAN OF DATA NOSQL [unstructured, semi-structured, structured] ETL HDFS A SEAMLESS OCEAN OF INFORMATION PROCESSING AND ANALYTICS EIS / ERP RDBMS File System OData [RSS] Azure Storage
PROJECT ISOTOPE OFFERINGS • Bi-directional connectors between Hadoop and SQL and PDW • ODBC driver for Hadoop • Hive plug-in for Excel • Hosted elastic Hadoop service on Azure • Microsoft’s Apache Hadoop-based solution for Windows Azure • Microsoft’s Apache Hadoop-based solution for Windows Server • JavaScript support for Hadoop, with web-based interactive environment • Contributions back to the open source community via the Apache Foundation
HIVE PLUG-IN FOR EXCEL • Connect Excel directly to Hive • Browse Hive objects – tables, columns, etc. • Construct and issue queries
HOSTED ELASTIC HADOOP SERVICE ON AZURE • Elastic MapReduce, Hive, PigLatin, .Net, Javascript, and integration with BI, DW, and Office Collaboration tools • Simple management UI • Full Hadoop compatibility • Native support for Azure Blob Storage from HDFS
MICROSOFT’S APACHE HADOOP-BASED SOLUTION FOR WINDOWS AZURE • One-click deployment of Hadoop on Azure cluster
MICROSOFT’S APACHE HADOOP-BASED SOLUTION FOR WINDOWS • All standard Hadoop modules supported: Hadoop | HDFS | Pig | Hive | Monitoring Pages • One-click installer • Simplified cluster configuration • Integration with Microsoft ecosystem System Center | Active Directory | etc.
// Map Reduce function in JavaScript // ------------------------------------------------------- ----------- var map = function (key, value, context) { var words = value.split(/[^a-zA-Z]/); for (var i = 0; i < words.length; i++) { if (words[i] !== "") { context.write(words[i].toLowerCase(), 1); } } }; var reduce = function (key, values, context) { var sum = 0; while (values.hasNext()) { sum += parseInt(values.next()); } context.write(key, sum); }; ISOTOPE.JS: OUR VB MOMENT FOR BIG DATA • Write MapReduce jobs in JavaScript • Interactive development environment • Interactive data query and analytics of petascale datasets • HIVE command line for interactive HIVE • Charting and graphing for insight and analytics visualization
"We are excited to work with Microsoft to help make Apache Hadoop a compelling platform for storing and processing data. Hortonworks welcomes Microsoft to the Hadoop ecosystem and looks forward to lending our deep domain expertise to help accelerate the delivery of Microsoft’s Apache Hadoop- based solution for Windows Server and service for Windows Azure." Eric Baldeschwieler CEO GIVING BACK AND PARTICIPATING IN THE HADOOP COMMUNITY Microsoft will be working with the community to contribute back significant code to the Apache Foundation Microsoft has announced a partnership with Hortonworks to help accelerate our open source support
APACHE HADOOP ON AZURE AND WINDOWS MICROSOFT’S APACHE HADOOP-BASED SERVICES FOR AZURE AND ENTERPRISE SUMMARY Please visit HadoopOnAzure.com to start using Microsoft’s elastic services for Apache Hadoop Please visit www.microsoft.com/bigdata to learn more about project codename "Isotope" and the broader ecosystem of products and services Microsoft is delivering in 2012 an beyond