NoSQL Databases and Polyglot Persistence: A Curated GuidemyNoSQLTumblr (3.0; @nosql)https://nosql.mypopescu.com/Autoscaling, welcome to Google Compute Engine<a href="http://googlecloudplatform.blogspot.com/2014/11/autoscaling-welcome-to-google-compute.html">Autoscaling, welcome to Google Compute Engine</a>: <blockquote> <p>Autoscaling allows customers to build more cost effective and resilient applications. Using Compute Engine Autoscaling, you can ensure that exactly the right number of Compute Engine instances are available at any given time to handle your application’s workload. This saves you money when your application’s usage is low, and ensures your application is responsive when utilization is high.</p> </blockquote> <p><strong>Autoscaling</strong> is the the Holy grail of a distributed system. The promise is that the system is be able to adapt—both up and down—to the needs/requirements/SLAs. Basically, the system will be able to get the performance it is demanded to provide, maximum availability, and these with optimal costs.</p> <p>The first step in finding this <em>Holy grail</em> is to be able to describe the needs and requirements and SLAs of the system.</p> <p class="cc" style="font-style: italic; font-size: 0.9em;"> Original title and link: <a href="http://nosql.mypopescu.com/" rel="permalink" style="color:red" target="_blank">Autoscaling, welcome to Google Compute Engine</a> (<a href="http://nosql.mypopescu.com" style="display:none;visibility:hidden;" target="_blank">NoSQL database</a>©myNoSQL)</p> <!--quid:7ec3849189058165104d3d6fa708d17dc84c5759-->https://nosql.mypopescu.com/post/103465069225https://nosql.mypopescu.com/post/1034650692252014年11月24日 07:41:21 -0800distributed systemsscalabilityAurora for MySQL is coming<a href="http://smalldatum.blogspot.com/2014/11/aurora-for-mysql-is-coming.html">Aurora for MySQL is coming</a>: <p>Mark Callghan takes a look at: </p> <ol> <li>Amazon’s participation in the MySQL community — none</li> <li>some of the things said during the presenttions — performance seems to be inflated</li> <li>compability with existing MySQL features and especially InnoDB engine</li> <li>features — very similar to my <a href="http://nosql.mypopescu.com/post/102599302892/amazon-aurora-in-bullet-points" target="_blank">Amazon Aurora in bullet points</a></li> </ol> <blockquote> <p>What is Aurora? I don’t know and we might never find out. I assume it is a completely new storage engine rather than a new IO layer under InnoDB.</p> </blockquote> <p class="cc" style="font-style: italic; font-size: 0.9em;"> Original title and link: <a href="http://nosql.mypopescu.com/" rel="permalink" style="color:red" target="_blank">Aurora for MySQL is coming</a> (<a href="http://nosql.mypopescu.com" style="display:none;visibility:hidden;" target="_blank">NoSQL database</a>©myNoSQL)</p> <!--quid:c0448d27ce09952ddac25a72fa0742f1c026fbbc-->https://nosql.mypopescu.com/post/103454088542https://nosql.mypopescu.com/post/1034540885422014年11月24日 03:14:13 -0800AuroraAmazonMedium uses Neo4j and Go for GoSocial service<a href="https://medium.com/medium-eng/how-medium-goes-social-b7dbefa6d413">Medium uses Neo4j and Go for GoSocial service</a>: <p>Medium’s social graph stored in Neo4j and exposed through a Go service:</p> <blockquote> <p>It makes a lot of sense to store social data in a graph database. Medium users, posts and collections are represented by graph nodes, and the edges between them describe relationships — users following users, users recommending posts, or users editing collections, to name a few common examples. Using a graph database also makes our queries simpler: we don’t have to do any complicated joins or other query wizardry.</p> </blockquote> <p>It’s hard to deny that when looking at highly connected data the first answer is <em>almost</em> always a graph database. Once the amount of data stored grows, you start thinking how you access that data. In many cases, the predominant answer is not traversals.</p> <p class="cc" style="font-style: italic; font-size: 0.9em;"> Original title and link: <a href="http://nosql.mypopescu.com/" rel="permalink" style="color:red" target="_blank">Medium uses Neo4j and Go for GoSocial service</a> (<a href="http://nosql.mypopescu.com" style="display:none;visibility:hidden;" target="_blank">NoSQL database</a>©myNoSQL)</p> <!--quid:5c1359f638cb3e50aa6f5e37c3fa72f960921ce4-->https://nosql.mypopescu.com/post/103451536694https://nosql.mypopescu.com/post/1034515366942014年11月24日 01:44:29 -0800Neo4jGographdbgraph databaseStripe's Hadoop tools open sourced<a href="https://stripe.com/blog/four-new-hadoop-projects">Stripe's Hadoop tools open sourced</a>: <p>Stripe has put on <a href="http://github.com/stripe" rel="external nofollow" target="_blank">GitHub</a> 4 Hadoop related projects they’ve developed internally:</p> <ol> <li>a dashboard for Hadoop jobs</li> <li>a Scala framework for distributed learning</li> <li>a database for serving data in SequenceFile format</li> <li>a collection of command-line utilities.</li> </ol> <p>As a side note, Stripe is using Cloudera Impala with Parquet.</p> <p class="cc" style="font-style: italic; font-size: 0.9em;"> Original title and link: <a href="http://nosql.mypopescu.com/post/103272099222" rel="permalink" style="color:red" target="_blank">Stripe’s Hadoop tools open sourced</a> (<a href="http://nosql.mypopescu.com" style="display:none;visibility:hidden;" target="_blank">NoSQL database</a>©myNoSQL)</p>https://nosql.mypopescu.com/post/103272099222https://nosql.mypopescu.com/post/1032720992222014年11月22日 02:48:00 -0800HadoopImpalaParquetMapReduceBigDataNoSQL databases, Hadoop, Big Data: Pinned tabs Nov.19th<p><strong><a href="http://www.b-eye-network.com/blogs/vanderlans/archives/2014/10/querygrid_is_ne.php" title="QueryGrid is New Data Federation Technology by Teradata - Blog: Rick van der Lans - BeyeNETWORK " id="86f5dfdb0aa3ac6aad752e6035044289df0e9dff" rel="external nofollow" target="_blank">01</a></strong>: Teradata QueryGrid is the technology used to allow querying both Teradata/AsterData and external data stored in Hadoop or Oracle. <a href="#86f5dfdb0aa3ac6aad752e6035044289df0e9dff" class="ptl" target="_blank">★</a></p> <hr> <p><strong><a href="http://www.marklogic.com/press-releases/marklogic-sets-standard-for-modern-database/" title="MarkLogic Sets the Standard for Modern Database Technology | MarkLogic " id="cf879df76d55dfe5e1917b6d4704d6a2095ca83c" rel="external nofollow" target="_blank">02</a></strong>: MarkLogic 8 will bring Javascript server-side engine, RDF triple store engine with support for SPARQL 1.1, bitemporal data management. <a href="#cf879df76d55dfe5e1917b6d4704d6a2095ca83c" class="ptl" target="_blank">★</a></p> <p><em>I still believe that MarkLogic should position itself as real-time search solution.</em></p> <hr> <p><strong><a href="http://www.datastax.com/dev/blog/whats-coming-to-cassandra-in-3-0-improved-hint-storage-and-delivery" title="What’s Coming to Cassandra in 3.0: Improved Hint Storage and Delivery : DataStax " id="d6aad96e77ebb5ba32a6a9422f13c283a9bb46d2" rel="external nofollow" target="_blank">03</a></strong>: For Cassandra 3.0, there’s an completely revamped, and optimized, solution for handling <strong>hinted handoff</strong> that uses sort of a commit log instead of a Cassandra system table (thus avoiding any overhead associated). <a href="#d6aad96e77ebb5ba32a6a9422f13c283a9bb46d2" class="ptl" target="_blank">★</a></p> <hr> <p><strong><a href="http://www.pcworld.idg.com.au/article/559848/hp-plugs-vertica-analytics-engine-into-hadoop/" title="HP plugs the Vertica analytics engine into Hadoop - PC World Australia " id="3e3b8702a29791b3981500908808791e495000d9" rel="external nofollow" target="_blank">04</a></strong>: YASH. Yet another SQL-on-Hadoop. This one from HP Vertica. <a href="#3e3b8702a29791b3981500908808791e495000d9" class="ptl" target="_blank">★</a></p> <hr> <p><strong><a href="http://www.cmswire.com/cms/big-data/mapr-teradata-ink-deal-bad-timing-for-hortonworks-027253.php" title="MapR, Teradata Ink Deal, Bad Timing for Hortonworks? " id="859c0a35eae584dcdd22d0696ae1ef7bf3c9eda5" rel="external nofollow" target="_blank">05</a></strong>: Teradata and MapR are signing a partnership to collaborate on the integration and co-development of join products. Some can say this might impact <a href="http://nosql.mypopescu.com/post/103036005105/it-aint-easy-making-money-in-open-source-thoughts-on" target="_blank">the Hortonworks’s IPO</a>. <a href="#859c0a35eae584dcdd22d0696ae1ef7bf3c9eda5" class="ptl" target="_blank">★</a></p> <p class="cc" style="font-style: italic; font-size: 0.9em;"> Original title and link: <a href="http://nosql.mypopescu.com/post/103110030252" rel="permalink" style="color:red" target="_blank">NoSQL databases, Hadoop, Big Data: Pinned tabs Nov.19th</a> (<a href="http://nosql.mypopescu.com" style="display:none;visibility:hidden;" target="_blank">NoSQL database</a>©myNoSQL)</p>https://nosql.mypopescu.com/post/103110030252https://nosql.mypopescu.com/post/1031100302522014年11月20日 12:41:27 -0800TeradataMarkLogicdocument databaseThe states and transitions of a Couchbase node<p>The different states and the transitions of a Couchbase node in a diagram:</p> <p><img alt="Couchbase node states and transitions" src="https://64.media.tumblr.com/f8e32e25678455fb3300db9c8f682ecb/tumblr_nf9vqrpZjj1qavt6co1_1280.jpg" width="580" height="326"/></p> <p><a href="http://blog.couchbase.com/lifecycle-node-couchbase-server-demystified-adding-removing-nodes-rebalancing-failover" rel="external nofollow" target="_blank">This post</a> describes the states and actions that can trigger the transitions. One interesting aspect is that state changes are not applied immediately and you can <em>commit</em> multiple such changes at once when satisfied with the new topology.</p> <p class="cc" style="font-style: italic; font-size: 0.9em;"> Original title and link: <a href="http://nosql.mypopescu.com/" rel="permalink" style="color:red" target="_blank">The states and transitions of a Couchbase node</a> (<a href="http://nosql.mypopescu.com" style="display:none;visibility:hidden;" target="_blank">NoSQL database</a>©myNoSQL)</p> <!--quid:c81f93728a8fb611621b0dcfbcca7c0ee88a4b43-->https://nosql.mypopescu.com/post/103125141850https://nosql.mypopescu.com/post/1031251418502014年11月20日 07:29:34 -0800Couchbasekey-value storedocument databaseCan MapReduce Solve Planning Problems?<a href="https://www.voxxed.com/blog/2014/11/can-mapreduce-solve-planning-problems-3/">Can MapReduce Solve Planning Problems?</a>: <p><a href="http://en.wikipedia.org/wiki/Betteridge's_law_of_headlines" rel="external nofollow" target="_blank">Betteridge’s law of headlines</a>.</p> <p class="cc" style="font-style: italic; font-size: 0.9em;"> Original title and link: <a href="http://nosql.mypopescu.com/" rel="permalink" style="color:red" target="_blank">Can MapReduce Solve Planning Problems?</a> (<a href="http://nosql.mypopescu.com" style="display:none;visibility:hidden;" target="_blank">NoSQL database</a>©myNoSQL)</p> <!--quid:d2e0f3b8589c10c00ce98d0d4cf0d7c177f43da9-->https://nosql.mypopescu.com/post/103116398183https://nosql.mypopescu.com/post/1031163981832014年11月20日 04:01:45 -0800MapReduceIt Ain’t Easy Making Money in Open Source: Thoughts on the Hortonworks's IPO Filling<a href="http://www.enterpriseirregulars.com/80464/aint-easy-making-money-open-source-thoughts-hortonworks-s-1/">It Ain’t Easy Making Money in Open Source: Thoughts on the Hortonworks's IPO Filling</a>: <p>Dave Kellogg’s in-depth look at the Hortonworks’s filling for IPO, a comparison with RedHat’s model, and a definitely interesting hypothesis and conclusion:</p> <blockquote> <p>While Hadoop and big data are unarguably huge trends driving the industry and while the future of Hadoop looks very bright indeed, on reading the Hortonworks S-1, the reader is drawn to the inexorable conclusion that it’s hard to make money in open source, or more crassly, it’s hard to make money when you give the shit away.</p> </blockquote> <p>Others:</p> <ul> <li><a href="http://nosql.mypopescu.com/post/102949496417/hortonworks-ipo-why-now-or-better-who-will-benefit" target="_blank">Gartner’s Merv Adrian</a></li> <li><a href="http://nosql.mypopescu.com/post/102949958827/hortonworks-filling-for-ipo-the-marketing-of-going" target="_blank">InfoWorld’s Yves de Montcheuil</a></li> <li><a href="http://nosql.mypopescu.com/post/102342581262/game-on-hortonworks-files-for-ipo" target="_blank">myself</a></li> </ul> <p class="cc" style="font-style: italic; font-size: 0.9em;"> Original title and link: <a href="http://nosql.mypopescu.com/" rel="permalink" style="color:red" target="_blank">It Ain’t Easy Making Money in Open Source: Thoughts on the Hortonworks’s IPO Filling</a> (<a href="http://nosql.mypopescu.com" style="display:none;visibility:hidden;" target="_blank">NoSQL database</a>©myNoSQL)</p> <!--quid:8566fd5d9b2a4feac20e405ed99c08e1629a62a2-->https://nosql.mypopescu.com/post/103036005105https://nosql.mypopescu.com/post/1030360051052014年11月19日 04:25:17 -0800HortonworksHadoop marketCouchDB's long road to clustering<a href="http://www.infoworld.com/article/2848127/nosql/couchdb-20-counters-mongodb-with-improved-scaling.html">CouchDB's long road to clustering</a>: <p>Keyword is <em>partially</em>:</p> <blockquote> <p>CouchDB’s long road to clustering can be partially traced to conscious design decisions and philosophical choices made by CouchDB’s creators. As Lehnardt explained, "CouchDB has always said no to features that we know couldn’t be scalable in a cluster or even doable in a cluster. This puts us in a position to migrate upward seamlessly."</p> </blockquote> <p>Two years ago and CouchDB would have actually been somewhere.</p> <p class="cc" style="font-style: italic; font-size: 0.9em;"> Original title and link: <a href="http://nosql.mypopescu.com/" rel="permalink" style="color:red" target="_blank">CouchDB’s long road to clustering</a> (<a href="http://nosql.mypopescu.com" style="display:none;visibility:hidden;" target="_blank">NoSQL database</a>©myNoSQL)</p> <!--quid:1c5b3375f7fbb1aa56528fade1b3f18bbb504c3a-->https://nosql.mypopescu.com/post/103034903968https://nosql.mypopescu.com/post/1030349039682014年11月19日 03:52:23 -0800CouchDBCloudantdocument databaseApache CouchDB 2.0 gets clustering support<blockquote> <p>At ApacheCon Europe 2014, the Apache CouchDBTM project today announced a Developer Preview release of its CouchDB 2.0 document database. The Developer Preview release brings all-new clustering technology to the Open Source NoSQL database, enabling a range of big data capabilities that include being able to store, replicate, sync, and process large amounts of data distributed across individual servers, data centers, and geographical regions in any deployment configuration, including private, hybrid, and multi-cloud.</p> </blockquote> <p>I’m not sure who wrote <a href="https://blogs.apache.org/foundation/entry/apache_couchdb_adds_clustering_and" rel="external nofollow" target="_blank">the ASF PR announcement</a>, but if it was me I would have simply posted "Apache CouchDB 2.0 features clustering support. Finally. &lt;/eom&gt;&ldquo;</p> <p class="cc" style="font-style: italic; font-size: 0.9em;"> Original title and link: <a href="http://nosql.mypopescu.com/" rel="permalink" style="color:red" target="_blank">Apache CouchDB 2.0 gets clustering support</a> (<a href="http://nosql.mypopescu.com" style="display:none;visibility:hidden;" target="_blank">NoSQL database</a>©myNoSQL)</p> <!--quid:2caf77fe91db6194f2be893cf3a925e50fdb7b3d-->https://nosql.mypopescu.com/post/103034595127https://nosql.mypopescu.com/post/1030345951272014年11月19日 03:43:06 -0800CouchDBdocument databaseThe data flow and the massive historical Tweet index<a href="https://blog.twitter.com/2014/building-a-complete-tweet-index">The data flow and the massive historical Tweet index</a>: <p>We rarely have the opportunity to learn about the <em>almost</em> complete architecture and data flow for a massive data indexing solution. Twitter’s blog post covers many details of their indexing solution starting with design goals and getting down to technical </p> <blockquote> <p>But our long-standing goal has been to let people search through every Tweet ever published.</p> </blockquote> <p>My notes:</p> <ul> <li>half a trillion documents</li> <li>average latency under 100ms</li> <li>(super tuned) SSD used as storage</li> <li>4 components: batch data aggregation and preprocess pipeline, inverted index builder, Earlybird shards and roots; <em>what are the Earlybird roots?</em></li> <li>ingestion processes one day of tweets batches. it is run every day; in this process tweets are scored and partitioned</li> <li>Hadoop for ETL: ingestion process is run on Hadoop, with the output being stored in HDFS</li> <li>Mesos is used to parallelize the inverted index creation; results are stored in HDFS</li> <li> <p>after praising the high parallelism and statelessness of the index builders, some coordination using ZooKeeper is mentioned:</p> <blockquote> <p>These inverted index builders can coordinate with each other by placing locks on ZooKeeper, which ensures that two builders don’t build the same segment. Using this approach, we rebuilt inverted indices for nearly half a trillion Tweets in only about two days (fun fact: our bottleneck is actually the Hadoop namenode).</p> </blockquote> </li> <li> <p>the Earlybird shards are the storage of the inverted index partitioned by time and then hash; partitioning by time tiers will allow growing the storage without affecting the current time tiers</p> </li> <li>the Earlybird roots are the endpoint for the client API; they forward requests to the corresponding Earlybird shards, merge results, etc;</li> <li>not very sure how Earlybird roots decide what time tiers should not receive a query</li> <li>no words about the actual Earlybird storage; can it be <a href="https://blog.twitter.com/2014/manhattan-our-real-time-multi-tenant-distributed-database-for-twitter-scale" rel="external nofollow" target="_blank">Manhattan</a>?</li> <li>no details about the query processor</li> <li>this project started in 2012; the full index was completely built in 2014</li> </ul> <p class="cc" style="font-style: italic; font-size: 0.9em;"> Original title and link: <a href="http://nosql.mypopescu.com/" rel="permalink" style="color:red" target="_blank">The data flow and the massive historical Tweet index</a> (<a href="http://nosql.mypopescu.com" style="display:none;visibility:hidden;" target="_blank">NoSQL database</a>©myNoSQL)</p> <!--quid:cf7eed8b8b2e8fd2945c284b07eb5d2de4131e14-->https://nosql.mypopescu.com/post/103029869612https://nosql.mypopescu.com/post/1030298696122014年11月19日 00:53:43 -0800full text indexingWhat skills is a recruiting company looking for in a data scientist<a href="http://www.burtchworks.com/2014/11/17/must-have-skills-to-become-a-data-scientist/">What skills is a recruiting company looking for in a data scientist</a>: <p>For the <em>technical</em> part the list goes like this: </p> <ol> <li>SAS and/or R</li> <li>Python</li> <li>Hadoop</li> <li>SQL</li> <li>unstructure data</li> </ol> <p class="cc" style="font-style: italic; font-size: 0.9em;"> Original title and link: <a href="http://nosql.mypopescu.com/post/103023903087" rel="permalink" style="color:red" target="_blank">What skills is a recruiting company looking for in a data scientist</a> (<a href="http://nosql.mypopescu.com" style="display:none;visibility:hidden;" target="_blank">NoSQL database</a>©myNoSQL)</p>https://nosql.mypopescu.com/post/103023903087https://nosql.mypopescu.com/post/1030239030872014年11月18日 22:21:27 -0800data scienceWhy Couchbase Lite is so strategically important for you?<a href="http://www.odbms.org/blog/2014/11/mobile-data-management-interview-bob-wiederhold-2/">Why Couchbase Lite is so strategically important for you?</a>: <p>In an interview with Bob Widerhold<sup id="fnref-2-fn-Widerhold"><a class="footnote-ref" href="#fn-2-fn-Widerhold" target="_blank">1</a></sup>, Roberto V. Zicary asks: "why Couchbase Lite is so strategically important?"</p> <blockquote> <p><em>Bob Wiederhold</em>: First, because the world is going mobile. That is indisputable. Mobile initiatives top the list of every IT department. As I said above, if you don’t have a mobile data management offering, you are not looking at the complete needs of the developer or the enterprise.</p> <p>Second, let’s level set on Couchbase Lite. Couchbase Lite is our offering for an embedded mobile JSON database.</p> <p>Our complete mobile offering, Couchbase Mobile, includes Couchbase Server – for data management in the cloud, and Sync Gateway for synchronization of data stored on the device with other devices, or the database in the cloud. Today, because connectivity is unknown, data synchronization challenges force developers to either choose a total online (data stored in the cloud), or total offline (data stored on the device) data management strategy.</p> </blockquote> <p>Maybe I’m seeing things from the wrong perspective:</p> <ol> <li>the data synching between the disconnected device and the central databases needs to see very low contention; resolving conflicts on the device would be much more difficult than having a server component solving it;</li> <li>as far as I can tell, the king of storage on mobile phones is SQLite; I somehow doubt that JSON + map/reduce can beat it;</li> <li>while not an expert in iOS services, I think the CloudKit already covers the local-to-remote storage sync problem.</li> </ol> <p>What am I missing?</p> <div class="footnote"> <hr> <ol> <li id="fn-2-fn-Widerhold"> <p>Bob Widerhold is CEO of Couchbase. <a class="footnote-backref" href="#fnref-2-fn-Widerhold" title="Jump back to footnote 1 in the text" target="_blank">↩</a></p> </li> </ol> </div> <p class="cc" style="font-style: italic; font-size: 0.9em;"> Original title and link: <a href="http://nosql.mypopescu.com/post/103022354617" rel="permalink" style="color:red" target="_blank">Why Couchbase Lite is so strategically important for you?</a> (<a href="http://nosql.mypopescu.com" style="display:none;visibility:hidden;" target="_blank">NoSQL database</a>©myNoSQL)</p>https://nosql.mypopescu.com/post/103022354617https://nosql.mypopescu.com/post/1030223546172014年11月18日 21:53:15 -0800Couchbasekey-value storedocument databaseHortonwork's filling for IPO: The marketing of going public<a href="http://www.infoworld.com/article/2847244/big-data/the-marketing-of-going-public.html">Hortonwork's filling for IPO: The marketing of going public</a>: <p>Pretty much the <a href="http://nosql.mypopescu.com/post/102949496417/hortonworks-ipo-why-now-or-better-who-will-benefit" target="_blank">same perspective about Hortonwork’s filling for IPO</a> from Yves de Montcheuil (<em>InfoWorld</em>):</p> <blockquote> <p>By filing first among Hadoop distribution vendors, Hortonworks is guaranteed to get the lion’s share of publicity for the foreseeable future. Any competitor who follows suit will be perceived as a copycat. And since it’s unlikely that said competitors can produce a more attractive balance sheet anyway, they would pretty much be in the same type of criticism.</p> </blockquote> <p class="cc" style="font-style: italic; font-size: 0.9em;"> Original title and link: <a href="http://nosql.mypopescu.com/post/102949958827" rel="permalink" style="color:red" target="_blank">Hortonwork’s filling for IPO: The marketing of going public</a> (<a href="http://nosql.mypopescu.com" style="display:none;visibility:hidden;" target="_blank">NoSQL database</a>©myNoSQL)</p>https://nosql.mypopescu.com/post/102949958827https://nosql.mypopescu.com/post/1029499588272014年11月18日 02:25:28 -0800HortonworksHadoop marketHortonworks IPO - Why Now? Or better, who will benefit from the IPO<a href="http://blogs.gartner.com/merv-adrian/2014/11/17/hortonworks-ipo-why-now/">Hortonworks IPO - Why Now? Or better, who will benefit from the IPO</a>: <p>Merv Adrian is looking at 3 possible reasons for <a href="http://nosql.mypopescu.com/post/102342581262/game-on-hortonworks-files-for-ipo" target="_blank">Hortonworks’s filing for IPO</a> by switching the <em>why</em> question to <em>who will benefit</em> from this IPO. As for the <em>why now</em> part, <a href="http://nosql.mypopescu.com/post/102342581262/game-on-hortonworks-files-for-ipo" target="_blank">the main question I’ve also asked myself</a>, this seems to be the general answer:</p> <blockquote> <p>Ultimately, it’s unlikely that Hortonworks will be alone as a public company for long. MapR told the Wall Street Journal they want to IPO next year, and they claim to have more customers, high margins and "efficient cash management." Cloudera says they "are not ready yet" though they have lower rate of losses, and also claim more customers. At the end of the day, the answer may be rather simple. And again, answering a question with a question: if not now, when? There may not be a better time.</p> </blockquote> <p class="cc" style="font-style: italic; font-size: 0.9em;"> Original title and link: <a href="http://nosql.mypopescu.com/post/102949496417" rel="permalink" style="color:red" target="_blank">Hortonworks IPO - Why Now? Or better, who will benefit from the IPO</a> (<a href="http://nosql.mypopescu.com" style="display:none;visibility:hidden;" target="_blank">NoSQL database</a>©myNoSQL)</p>https://nosql.mypopescu.com/post/102949496417https://nosql.mypopescu.com/post/1029494964172014年11月18日 02:07:00 -0800HortonworksDesign consideration for Kayos messaging and durable queueing<a href="https://github.com/Damienkatz/Kayos-Design/blob/master/kayosdesign.md">Design consideration for Kayos messaging and durable queueing</a>: <p>More details about <a href="http://nosql.mypopescu.com/post/102048750227/nosql-databases-hadoop-big-data-pinned-tabs-nov-7th#43b2139d76dede91787422d1aaf6f875ec0ffce4" target="_blank">Damien Katz’s new message queue project</a>: it has a name, Kayos, and some goals:</p> <blockquote> <p>Build a fast, low cost, fault tolerant messaging and queueing system that offers predictable performance and can take advantage of high end dedicated hardware as well as unreliable, commodity infrastructure like EC2. We want to support message de-duplication (newer versions of messages eliminate older versions) while also maintaining strict consistency (ordered synchronous delivery), causal consistency (ordered asynchronous delivery) and eventual consistency (unordered asynchonous delivery).</p> </blockquote> <p>At the end of the long road ahead, "<em>Shit be awesome yo</em>".</p> <p class="cc" style="font-style: italic; font-size: 0.9em;"> Original title and link: <a href="http://nosql.mypopescu.com/" rel="permalink" style="color:red" target="_blank">Design consideration for Kayos messaging and durable queueing</a> (<a href="http://nosql.mypopescu.com" style="display:none;visibility:hidden;" target="_blank">NoSQL database</a>©myNoSQL)</p> <!--quid:d30466f0a64d0af7d8a4c606cdf2bf90f3361ccd-->https://nosql.mypopescu.com/post/102870100290https://nosql.mypopescu.com/post/1028701002902014年11月17日 04:58:23 -0800KayosKafka and Samza: Distributed stream processing in practice<p>Fantastic slide deck from Martin Kleppmann. These 2 screenshots below are a good summary of the talk, but I strongly encourage you to go through the 42 slides. <em>Totally worth the time</em>.</p> <p><img alt="Kafka and Samza: distributed stream processing in practice" src="https://64.media.tumblr.com/77017ba31fb28c52314e9e29b6d75ea3/tumblr_nf4f6p0NUN1qavt6co1_1280.jpg" width="580" height="432"/></p> <p><img alt="Kafka and Samza: distributed stream processing in practice" src="https://64.media.tumblr.com/95d4672a7494be3b2921275c6f299388/tumblr_nf4f7uS3PU1qavt6co1_1280.jpg" width="580" height="432"/></p> <p><a href="http://nosql.mypopescu.com/post/1209550007/nosql-databases-and-the-unix-philosophy" target="_blank">The parallel between the Unix philosophy and the new (big) data solutions</a> shows up <a href="http://nosql.mypopescu.com/post/102666392027/what-do-you-have-to-say-for-the-skeptics-of-hadoop-who" rel="external nofollow" target="_blank">quite frequently</a>. There’s an inherent extra complexity in the big data platform due to their distributed nature. But for some of these tools the rule of <em>"doing one thing and doing it well"</em> was relaxed; maybe too relaxed. And in some cases there’s less than optimal openness towards integration.</p> <div class="embedded smartembed speakerdeck"> <script async class="speakerdeck-embed" data-id="d34613904cb2013218e606b8621c13fd" data-ratio="1.33333333333333" src="//speakerdeck.com/assets/embed.js"></script> <div class="smartembed-ref-speakerdeck"><a href="https://speakerdeck.com/ept/kafka-and-samza-distributed-stream-processing-in-practice" rel="nofollow external" target="_blank">Kafka and Samza: Distributed stream processing in practice</a></div> </div> <p class="cc" style="font-style: italic; font-size: 0.9em;"> Original title and link: <a href="http://nosql.mypopescu.com/" rel="permalink" style="color:red" target="_blank">Kafka and Samza: Distributed stream processing in practice</a> (<a href="http://nosql.mypopescu.com" style="display:none;visibility:hidden;" target="_blank">NoSQL database</a>©myNoSQL)</p> <!--quid:64e10e083404a467d9093a5cbab2f322aeedf521-->https://nosql.mypopescu.com/post/102868229019https://nosql.mypopescu.com/post/1028682290192014年11月17日 04:06:00 -0800KafkaSamzaWhat do you have to say for the skeptics of Hadoop who think that the ecosystem is getting too complex with too many overlapping projects doing almost similar things?<a href="http://www.infoq.com/news/2014/11/hortonworks-enterprise-push">What do you have to say for the skeptics of Hadoop who think that the ecosystem is getting too complex with too many overlapping projects doing almost similar things?</a>: <blockquote> <p>There is a truth to the point of growing complexity of the entire ecosystem but there is also a misattribution of the complexity that comes with it.</p> <p>Unlike many other unified single-stack architectures that came before, the Hadoop platform is built around individual layers of individual responsibilities. This is the Unix philosophy; each of these layers is built in order to perform one thing and one thing well. This not only helps in delineating responsibilities, but it also helps in a much faster evolution. Remember that several different open developer communities are working on each layer. Sometimes, this does mean there are two or more disjoint sets of developers that work on the same layer, but that’s okay – either each of those projects carve out their niche or the single best project simply emerges. In a truly open community, a meritocracy, no single vendor ultimately decides the best approach.</p> </blockquote> <p>The other side of the coin is that to get things working you are either ready to put a lot of time and money into it or you’ll need to use one of the vendor’s distros. There’s nothing wrong with having vendor distros—polish, automation, testing, and documentation are always welcome—but their raison d’être shouldn’t just be the environment complexity. Ideally setting things up should be possible without too much hasle. But the Linux world proves that the convenience of distros cannot be challenged.</p> <p class="cc" style="font-style: italic; font-size: 0.9em;"> Original title and link: <a href="http://nosql.mypopescu.com/post/102666392027" rel="permalink" style="color:red" target="_blank">What do you have to say for the skeptics of Hadoop who think that the ecosystem is getting too complex with too many overlapping projects doing almost similar things?</a> (<a href="http://nosql.mypopescu.com" style="display:none;visibility:hidden;" target="_blank">NoSQL database</a>©myNoSQL)</p>https://nosql.mypopescu.com/post/102666392027https://nosql.mypopescu.com/post/1026663920272014年11月14日 20:43:43 -0800HadoopMapReduceBigDataCan hard drives' failure be predicted?<a href="https://www.backblaze.com/blog/hard-drive-smart-stats/">Can hard drives' failure be predicted?</a>: <p>Hardware failure is one of the major causes leading to failure of systems and implicitely to the deterioration of the quality of service. Predicting hardward failures would allow taking proactive measures, thus reducing the chances of downtime in the systems.</p> <p>Unfortunately for a large number of hardware components this is not possible. <strong>But</strong>, Backblaze, the company providing a consumer online backup solution, has published some results that show that hard drivers failure <strong>can be predicted</strong>; and that by analysing only 5 metrics (out of over 70 available):</p> <blockquote> <p>From experience, we have found the following 5 SMART metrics indicate impending disk drive failure:</p> <ul> <li>SMART 5 – Reallocated_Sector_Count.</li> <li>SMART 187 – Reported_Uncorrectable_Errors.</li> <li>SMART 188 – Command_Timeout.</li> <li>SMART 197 – Current_Pending_Sector_Count.</li> <li>SMART 198 – Offline_Uncorrectable.</li> </ul> </blockquote> <p>The rest of the post dives into each of these. If other large cluster users—I’m thinking of Amazon, Facebook, Google, Microsoft here—could back these findings, the results could have a significant impact on operating storage. </p> <p class="cc" style="font-style: italic; font-size: 0.9em;"> Original title and link: <a href="http://nosql.mypopescu.com/post/102623883667" rel="permalink" style="color:red" target="_blank">Can hard drives’ failure be predicted?</a> (<a href="http://nosql.mypopescu.com" style="display:none;visibility:hidden;" target="_blank">NoSQL database</a>©myNoSQL)</p>https://nosql.mypopescu.com/post/102623883667https://nosql.mypopescu.com/post/1026238836672014年11月14日 10:27:00 -0800Amazon Aurora in bullet points<ul> <li>relational database engine</li> <li>part of the Amazon Relational Database Service products (i.e. fully managed database)</li> <li>MySQL-compatible</li> <li>supports migrating data from Amazon RDS MySQL</li> <li>auto-scaling storage in 10GB increments and up to 64TB</li> <li>uses SSD-powered storage</li> <li>automatically replicated on 3 availability zones with 2 replicas per AZ</li> <li>replicas share storage with the primary instance</li> <li>can have up to 15 replicas improving read throughput</li> <li>writes require quorum</li> <li><em>I read this somewhere but cannot find it anymore</em>: writes: 100k/s, reads: 500k/s</li> <li>continuous backups with 1-second granularity point-in-time restoration</li> <li>backups go to Amazon S3</li> <li>designed for 99.99% availability</li> </ul> <p>The rest of the story can be read in <a href="http://aws.amazon.com/blogs/aws/highly-scalable-mysql-compat-rds-db-engine/" rel="external nofollow" target="_blank">Jeff Barr’s post</a>.</p> <p class="cc" style="font-style: italic; font-size: 0.9em;"> Original title and link: <a href="http://nosql.mypopescu.com/post/102599302892" rel="permalink" style="color:red" target="_blank">Amazon Aurora in bullet points</a> (<a href="http://nosql.mypopescu.com" style="display:none;visibility:hidden;" target="_blank">NoSQL database</a>©myNoSQL)</p>https://nosql.mypopescu.com/post/102599302892https://nosql.mypopescu.com/post/1025993028922014年11月14日 00:53:43 -0800AuroraAmazon

AltStyle によって変換されたページ (->オリジナル) /