30 06, 2014

Google launches DataFlow (a successor to MapReduce)

2017-08-09T12:30:08+00:00June 30th, 2014|

I'm in San Francisco ready to attend tomorrow to the 2014 Spark Summit. As I already mentioned in this blog Apache Spark is one technology that's emerged as a potential alternative to Mapreduce/Hadoop. But it seem that it is not the only one.  Last week, also here in San Francisco, at its Google I/O 2014 conference, Google unveiled their successor [...]

31 05, 2014

Adaptive MapReduce Scheduling in Shared Environments

2017-08-09T12:30:14+00:00May 31st, 2014|

Jordà Polo presented our last research in Map Reduce at the 14TH IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing  held in Chicago. In this paper we present a MapReduce task scheduler for shared environments in which MapReduce is executed along with other resource-consuming workloads, such as transactional applications. All workloads may potentially share the same data store, some [...]

22 05, 2014

Is Hadoop showing its age?

2017-08-09T12:31:34+00:00May 22nd, 2014|

In my opinion, yes!, the Hadoop framework is showing its age and new processing models are a must. Not only for performance but also for its lack of flexibility. In some way, it is the same that what is happening with the Big Data management. Due to the lack of flexibility of queries, NoSQL databases are adding new query [...]

21 04, 2014

Spark Ecosystem

2017-08-09T12:31:48+00:00April 21st, 2014|

In a previous post  we introduced Spark, a framework that will play an important role in the Big Data area.  You can find a good starting point to understand what is Spark following this page from DataBricks, however let me reproduce an overview in this post. Spark runs on top of existing Hadoop clusters to provide enhanced and additional functionality. Although Hadoop [...]

20 04, 2014

Spark: Big Data Analytics Beyond Hadoop

2017-08-09T12:31:51+00:00April 20th, 2014|

Hadoop is definitely the de-facto standard for large scale data processing across nearly every industry and enterprise. However, while  "Volume", "Variety" and "Velocity" of data increases, Hadoop as a batch processing framework cannot cope with the requirement for real time analytics.  As we saw in our Technology Basics  for Data Scientist course, the scientific community is offering alternatives like Storm framework that provides event [...]

13 12, 2011

Big Data: Una oportunidad para los emprendedores y las empresas

2017-08-09T12:57:35+00:00December 13th, 2011|

La aparición de Linux dio poder a los desarrolladores innovadores, que además, con el conjunto de paquetes de software Linux, Apache, MySQL y PHP (LAMP, que cambió totalmente el escenario de las aplicaciones web), les permitió programar potentes servidores web a partir de código abierto. Todo ello llevó a la creación de nuevas empresas en el sector TIC, siendo [...]