24 11, 2014

Building a Unified Data Pipeline in Spark

2017-08-09T12:27:28+00:00November 24th, 2014|

Excellent reception of sparkers to the last session of  Barcelona Spark meetup featured by Aaron Davidson (Apache Spark committer and Software Engineer at Databricks) speaking about ‘Building a Unified Data Pipeline in Spark’ . If you missed the presentation or want to revisit it, check out the video recorded here  (talk in English). Enclosed you will find some pictures of the session. Thank you very much to Aaron [...]

22 11, 2014

Strata + Hadoop World in Barcelona 2014: Videos & Slides

2017-08-09T12:27:30+00:00November 22nd, 2014|

The conference is over, and in my point of view it was a great success. The program of the conference were very good, with great networking opportunities and a good sponsor pavilion. I really enjoyed it. Let me say to the organisers that Barcelona is delighted to welcome conferences like Strata+Hadoop. And all attendees with whom I spoke were excited to be in Barcelona.  Congratulations for [...]

13 11, 2014

Get certified for Apache Spark in Barcelona

2017-08-09T12:27:34+00:00November 13th, 2014|

As all my students know I think that Hadoop is showing its age and Apache Spark is exploding. Let me share with you an important opportunity to get the Developer Certification for Apache Spark in Barcelona. Yes, I said in Barcelona!,  at the upcoming Strata + Hadoop World  next week in the CCIB - Centre Convencions Internacional de Barcelona.  If you want to learn more you can [...]

9 10, 2014

Databricks-Spark comes to Barcelona!

2017-08-09T12:29:10+00:00October 9th, 2014|

¡Lo hemos conseguido, un meetup con ingenieros llegados de USA para contarnos de primera mano lo que se cuece sobre Spark en la empresa Databricks! Este cuarto meeting contará con Aaron Davidson (Apache Spark committer e Ingeniero de Software en Databricks) y Paco Nathan (Community Evangelism Director  at Databricks) que nos hablarán acerca de 'Building a Unified Data Pipeline in Spark' (conferencia en Inglés). [...]

15 07, 2014

Big Data Open Source Landscape: Processing Technologies

2017-08-09T12:29:19+00:00July 15th, 2014|

Hadoop is a well established software framework which analyse structured/unstructured big data and distribute applications on thousands of servers. Hadoop was created in 2005 and after Hadoop several projects around in the Hadoop space appeared that tried to complement it. Sometimes those technologies overlap with each other and sometimes they are partially complementary. I will try to describe a brief map [...]

22 05, 2014

Is Hadoop showing its age?

2017-08-09T12:31:34+00:00May 22nd, 2014|

In my opinion, yes!, the Hadoop framework is showing its age and new processing models are a must. Not only for performance but also for its lack of flexibility. In some way, it is the same that what is happening with the Big Data management. Due to the lack of flexibility of queries, NoSQL databases are adding new query [...]

21 04, 2014

Spark Ecosystem

2017-08-09T12:31:48+00:00April 21st, 2014|

In a previous post  we introduced Spark, a framework that will play an important role in the Big Data area.  You can find a good starting point to understand what is Spark following this page from DataBricks, however let me reproduce an overview in this post. Spark runs on top of existing Hadoop clusters to provide enhanced and additional functionality. Although Hadoop [...]

20 04, 2014

Spark: Big Data Analytics Beyond Hadoop

2017-08-09T12:31:51+00:00April 20th, 2014|

Hadoop is definitely the de-facto standard for large scale data processing across nearly every industry and enterprise. However, while  "Volume", "Variety" and "Velocity" of data increases, Hadoop as a batch processing framework cannot cope with the requirement for real time analytics.  As we saw in our Technology Basics  for Data Scientist course, the scientific community is offering alternatives like Storm framework that provides event [...]

5 04, 2014

Hadoop distribution: Main Players-Actores principales

2017-08-09T12:31:57+00:00April 5th, 2014|

MAIN PLAYERS Apache Hadoop is the most popular framework used for processing large amounts of data in the Big Data arena. It is clear that Hadoop is here to stay. That is why I always suggest to my students that it is important to know how it works. For the courses I teach where we do not have lab [...]

13 12, 2013

Beca para hacer el doctorado en nuestro grupo de investigación en Barcelona

2017-08-09T12:38:50+00:00December 13th, 2013|

Beca de la Caixa para hacer el doctorado en nuestro grupo de investigación en Barcelona  en tema de Analítica avanzada de datos (ref. BSC-Autonomic 01/2014) Acaba de abrirse  la convocatoria de Becas para estudios de doctorado en universidades españolas de la obra social la Caixa y nuestro grupo de investigación tiene una posición de investigador/investigadora para cursar el [...]

15 11, 2013

¿Cómo empezar a programar en Big Data-Hadoop?

2017-08-09T12:38:57+00:00November 15th, 2013|

Hadoop es una de las plataformas más populares en el mundo Big Data y probablemente la mejor puerta de entrada en programación de este nuevo mundo. Pero por sus características para muchos programadores quizás no resulta fácil empezar a trabajar con él. Por ello, cuando acabé de escribir esta práctica no presencial para uno de los cursos que imparto [...]