miércoles, 1 de junio de 2011

Review of “Data Analysis with Open Source Tools” by Philipp K. Janert

9:11 by Rafael Flores · 0 comentarios

An in-depth book on data analysis with graph tools.

bktFirst of all I have to state clear that this book has “run over me” SonrisaIt is a very good and comprehensive exercise by Mr Janert on how to produce “readable” graphs (read information) on top of massive data volumes, all with open source tools such as gnuplot, matplotlib, R, numpy, chaco, etc. So what has made it somehow hard for me? The fact that I mistook it for what it is not: this is not a book showing samples or “how-to” code that you can run easily on your app (HTML- or OS-based). Instead it goes much deeper than that, explaining the math that supports the data analysis, lots of the statistical theory underlying the data analysis processes, etc. Don’t get me wrong: I really think that’s great value! And thank Mr Janert for that. But given that I read the book during commutes or trips on my Kindle it’s been a bit “though” to be on the “thinking mode” that is required to fully appreciate the value of this book. So, if you plan to read it whilst at home or in a quiet place, giving it the care and attention if deserves I am sure you’ll find the book a great one.

There are also many good things about it too: the Workshops provided are very good step-by-step descriptions of the process taken by Mr Janert to solve them. Given that the subject of the book is dense, as said, this seems like the best idea to help understanding what has been talked about.

Many kinds of graphs (like  jitter plots, scatter, mosaic plots, kohonen maps, etc.)  and the logic underlying them (logarithms, pareto, regression, estimations, Monte Carlo simulation, etc.) are covered in this book. So I find it a great source of information that can be perfectly used as a superb reference book when developing a projects requiring graphical analysis tools on big volumes of data.