On Ruby: Books

Showing posts with label Books. Show all posts

Monday, June 11, 2012

[フレーム] This is a book that I wish was on my son's required reading list. Not that his code is hard to read (for someone in their first programming class), but that there are all kinds of bad habits that wouldn't need to be broken if he and his classmates spent some time learning what good code looks like before they started to write their own.

The Art of Readable Code from O'Reilly is a quick, easy read with a lot of useful ideas for new programmers. It weighs in at 180 pages, but there's a lot of well used whitespace and a number of (mostly on topic) comic panels in those pages making it seem shorter.

Part one covers naming, code layout, and writing comments; parts two and three cover the meat of refactoring; and part four discusses testing and gives an example of applying the ideas in the book to a small coding project.

The book's examples are in C++, Python, Java, and Javascript. I would have appreciated seeing some examples in other languages as well (haskell or scala might be good candidates), especially where that language might obviate or change the advice given.

Truth in advertising note, O'Reilly sent me a free copy of this book to review.

Posted by gnupate 0 comments

Labels: Books

Thursday, February 16, 2012

The Art of R: interview and mini-review

The Art of R Programming is an approachable guide to the R programming language. While tutorial in nature, it should also serve as a reference.
Author Norman Matloff comes from an academic background, and this shows through in the text. His writing is formal, well organized, and tends toward a pedagogical style. This is not a breezy, conversational book.
Matloff approaches R from a programmer's perspective, rather than a statistician's. This approach shows through in several of the chapters: Ch 9, Object-Oriented Programming; Ch 13, debugging; Ch 14, Performance Enhancement; Ch 15, Interfacing R to other languages; and Ch 16, Parallel R. I do wish he had spoken to using R with Ruby as well as C/C++ and Python. I also would have liked to see a chapter on Functional Programming with R, especially after the teaser in the Introduction.
I asked Norm and an R using friend if they could help me get my head around things a little better, and the following mini-interview is the result.

Almost every language has some kind of math support. Why bother with R? Where does it fit in a programmer's toolkit?
Norm: It's crucial to have matrix support, not necessarily in terms of linear algebra operations but at least having good matrix subsetting capability. MATLAB and the Python extension NumPy have this, but I'm not sure how far they go with it. And since MATLAB is not a free product (in fact very expensive) I'm summarily excluding it anyway. :-)
Second, R has a very rich graphics capability, which really sets it apart from the others. You can see some nice examples (with the underlying R code) in The R Graph Gallery.
Third, R is "statistically correct." It was created by top professional statisticians in industry and academia.
Russel: As something of a polyglot, I find that each language comes with something of an attitude of how problems should be approached. The grammatical structure and keyword vocabulary of each language drives a way of thinking about problems, as well as what sorts of libraries must be created to cover what may be base structures and functions in other languages. R has a particularly rich data representation vocabulary which lends itself very nicely to a data-centric problem solving mindset. While many more general-purpose languages can, with appropriate libraries, deal well with data, R reduces the cognitive load required for working with multidimensional data sets. In my (relatively limited) work with R, I've come to think of R as a domain-specific language that happens to have some general-purpose functionality, while other languages such as Ruby, Python, Perl, etc., are general-purpose languages with many domain-specific libraries.
I really feel drawn to the idea that languages drive approaches to problem solving. It reminds me of the ##PragProg idea of a language of the year. With that in mind, what do you think a dynamic language (Perl, Python, Ruby, etc.) programmer going to find new and different in R? What about a programmer coming from a system programming language (C, C++, etc.)?
Russel There is much in R which is from the "dynamic language" camp you mentioned: dynamically typed variables, an interactive shell, dynamically loaded libraries, etc. These will be pretty quickly noticeable to a C/C++/Java/C# programmer.
The structure and forced-forethought enforced by those languages are part of their value proposition: they force programmers into design paradigms and ways of thinking that scale up well, while dynamic languages, with their looser syntax rules, do not enforce that sort of engineering discipline on the programmer. For highly organized people who think in very structured ways, dynamic languages are "freeing", while less structured thinking programmers can find that the lack of enforced structure puts a lot of onus upon them to be disciplined in their coding as program sizes get larger. For example, a simple flat namespace is great for a small program with a few dozen lines, but namespacing becomes much more important as your programs come to the thousands of lines and dozens of individual functions or components -- especially as programs become the shared workspace of multiple programmers.
I personally use R as a dynamic language, most of the time not even writing programs in it so much as using it in interpreted mode for data analysis and "analysis prototyping." In that sense, R does for data analysis what dynamic languages do for task automation: it allows you to easily play with scenarios and prototype your thinking about data quickly and easily. You can then codify the best of those techniques into a small (or large) program that can automate that work for various data sets.
Similarly, R has a very powerful and interactive help system. Most packages not only have a quickly available set of API and help documents, but sample data sets built right into the library. From a command line, R users can get examples of how to use almost any library, with sample data included specifically for that particular library.
R has some inconsistencies from its history that can make it feel more "old school" in some ways. For example,there are two object models and the older (S3-style) object model is widely used in older libraries. However, it's nowhere near as "bolted-onto" as languages like Perl or C. R has an extremely rich set of libraries easily available via CRAN (a la CPAN), but the flip side of this wealth is that these libraries work in many ways, expecting data in various formats, etc. Again, it's not as spotty as CPAN or the Python Cheese Shop, or even Pear—most packages are quite good— but it can leave some beginners feeling a little lost when they want to accomplish a certain task. That's pretty common in the open source world, of course, but can be an issue.
R's rich first-class data types build a foundation that is nicely added to by the various libraries and simple interactive shell. Enough libraries are written in native code that performance is generally top notch. For my part, I almost always find that the available libraries far exceed my generally limited statistical needs, so I rarely find myself needing to rewrite some particular statistical code. I'm not a statistician, so I find it quite valuable to not have to worry about that aspect of the work I'm doing in any given project. Additionally, the rich libraries generally spur me on to doing a richer analysis of the data than I would if I did not have such a fully-featured tool available.
Norm, in the Introduction of your book, you talk about R as a functional language. I wish there had been a chapter on this. Can you give some examples of what you mean? Russel, do you have any thoughts about R as an FP language?
Russel: Many languages have recognized the value of functional constructs and added at least simple implementations of lambda and map functions, first-class functions and the like . FP is generally considered to be more easily parallelized, and should thus scale better on modern multi-core and CUDA-like systems. This will be quite advantageous in large data processing jobs.
Norm: Every operation in R is a function. For instance, the operation y = x[5]is really the function call y = "["(x,5) Same for + and so on.
This is brought up throughout the book, starting with the vector chapter.
The biggest implication of this, in my opinion, is in performance. One can often speed up a computation by a factor in the hundreds by exploiting the FP nature of R.
What are some of the things you've done with R that show off it's power and/or niche?
Russel R works beautifully for many types of data analysis problems. I recently used R to generate annotated graphs of Bayesian content filter scorings against timestamps, with lowess smooth and regression line and other enhancements, all built into the graphs without additional effort. This was done for all permutations of the 5 variables used in the study which had tens of thousands of data points. I was using this as a script because of my need to regenerate the graphs repeatedly, but before I'd codified that process, I used R in a "tweak and go" sort of way, as R lends itself well to ad hoc data exploration. Adding and removing data attributes, filtering data, generating data models, regressions, etc., are all easy to do in an on-the-fly manner.
Norm: A fun application I've done is R code to analyze the differences and similarities between the various dialects of Chinese. It can be used as a learning aid for those who know one Chinese dialect but not another. This is an example in my book, in the chapter on data frames.

If you're interested in adding R to your arsenal of programming tools, this is a great way to get started.
Truth in posting—No Starch Press sent me a free copy of this book to review.

Posted by gnupate 0 comments

Labels: Books, Interview, R

Wednesday, March 23, 2011

Review - Eloquent Ruby

The system management/administration team that I work on is starting to do more scripting and tool building. That means bringing a bunch of people up to speed on Ruby. We're using a combination of the Pickaxe Book and pair programming/mentoring to help bootstrap people. So far it's been working pretty well.
Watching everyone else reading and learning made me want to get in on the action. Fortunately, there's a new book out from Russ Olsen (@russolsen) — Eloquent Ruby. I had the opportunity to interview Russ (look for it to show up soon) about his book, and Addison-Wesley was kind enough to send me a copy of it.
Eloquent Ruby is primarily aimed at people coming to Ruby from other languages. It aims to explore and explain the idioms common to our community, and I think it does a great job of it.
It will serve current rubyists well too. I learned several things from it as I read, and cemented other concepts as well. Some of the book's explanations have crept their way into discussions about Ruby here at work. It's good stuff.
Section One, on basics, is a rich source of little gems that will help you use Ruby's built in classes more effectively. Section Two, on modules and classes, helps you build better classes of your own. Section Three, which covers meta-programming, dives into an oft discussed but underused side of Ruby to push your Ruby-fu to the limit. Section Four covers a variety of things that either don't fit into the other sections or build on concepts from them.
With this book, Russ has hit another one out of the park. Go grab a copy for yourself.
If you're interested inDesign Patterns in Ruby you can read my review here. You can also read a previous interview with Russ.

Posted by gnupate 0 comments

Labels: Books, Russ Olsen

Wednesday, September 08, 2010

Reading List Update 9/8/2010

The recent news that GDB now supports D makes The D Programming Language jump up a notch or two on my reading list.
I've finished 52 Loaves: One Man's Relentless Pursuit of Truth, Meaning, and a Perfect Crust, it was a fun read. I really identified with his trip to the French monastery. It seemed like a great climax to his year, with the perfect denouement as he came home to bake his final loaves.
With a chance to get involved with a local restaurant group (not behind the counter though), the food books are still winning out in my what to read next decisions.

Posted by gnupate 0 comments

Labels: Books

Tuesday, August 31, 2010

My Reading List on 8/31/2010

Thanks to Prentice Hall and Addison-Weseley giving me three new books, my reading list has bulked back up. Here's what I'm working through at the moment:

The Freebies
- UNIX and Linux System Administration Handbook (4th Edition) — I'm really excited about this one, I've loved the first three editions, and this looks like a really solid revamping of a classic in the Sys Admin field.
- The D Programming Language — I'd like to see how well D stacks up to C and C++ (though I've got pretty minimal chops with either).
- Distributed Programming with Ruby — another Ruby Book? I've always got room for one of those.
Three More I'm Working On.
- The Little MLer — I'm tired of waiting for a good OCaml book, and this looks like the best option for getting up to speed in the ML world.
- Charcuterie: The Craft of Salting, Smoking, and Curing — yeah, besides languages and communities, I like to hack food. I've had a couple of goes (each) at jerky, bacon, and sausage. Now, it's time to take a step up to the big leagues.
- 52 Loaves: One Man's Relentless Pursuit of Truth, Meaning, and a Perfect Crust — Another fun food book.

What are you reading? Why?

Posted by gnupate 2 comments

Labels: Books

Tuesday, October 13, 2009

Leveraging Books

As I talk about leveraging community to be more effective at what you do, let's start out with books. I think this is a good theme to develop because it really shows how the three levels of passive, engaged, and committed involvement provide successively more benefit. Books are also an easy gateway into improving yourself because people are used to reading as a learning method — we did it in school, and we're used to picking up a book on a programming language and moving on from there.

Just picking up a book and reading it is a pretty passive approach. You're letting the author push information to you without doing anything to better assimilate it. Even at this level there are some things you can do though:

reading intentionally, as espoused in , The Passionate Programmer
working on exercises presented in the book, or that you come up with yourself
or just taking notes in the margins or in a lab book about how you plan on using the ideas presented.

You can do better by writing or talking about the book. If you run a blog or belong to a user group (discussed in later articles in this series), you can write or present a book review or a synopsis. You could send out a short 'what I learned' email to co-workers. You could even bring these ideas out in a code review or similar setting. By synthesizing the ideas from the book with your existing expertise, you're forced to work with them in a way that teaches you more than just reading.

To really get the most out of the book, it helps to work with other people. Join a reading group (or start one). You don't have to be super formal about it, just get together with some friends over lunch or on-line. Set up a reading schedule and talk about it. Joshua Kerievsky has put together a great guide to book study groups. Even if you're going for something less structured than he discusses, there are some great ideas to be mined there.
It takes more effort, and sometimes means stepping out of you comfort zone, to be committed rather than just passively involved. The rewards are tremendous though. I'd encourage everyone to use books to become better at whatever it is you do. What books are you reading/studying? What are you doing to wring more value out of them?

Click here to Tweet this

Posted by gnupate 4 comments

Labels: Books, community

Tuesday, July 07, 2009

Finding or Keeping a Tech Job -- An interview with Andy Lester and Chad Fowler

Andy Lester (@theworkinggeek) and Chad Fowler (@chadfowler) the authors of Land the Tech Job You Love and The Passionate Programmer, respectively, agreed to do a joint interview with me. It was a lot of fun to talk with these guys, I hope you enjoy reading this interview as much as I did doing it.

Your books look like great companions to each other. Did you interact at all when writing them?

Chad We didn't interact much, no. We actually "met" each other because of the fact that we were writing complementary books. It was good fortune more than anything else. Surprisingly, we don't overlap much in content despite the lack of a coordinated effort.

Andy I was happy when I found out that Chad was updating "My Job Went To India", because that book was one of my reasons for writing my book. It was inspiring to me to see an author who had such a positive, proactive way to look at one's career. I remember reading it and every page there was one of those "Yeah, that's exactly right!" moments.

There are a lot of people who hack code, fewer who hack communities, and still fewer who hack themselves. Your books (along with a few others) seem aimed at this last group. Why do you think people are more willing to work on code than on habits, health, or career?

Andy Because it's uncomfortable to admit that you might have areas for improvement. When you're improving code, you're working on something with no feelings. Besides, cleaning up code, even your own, shows positive results in only a few minutes. Changing your habits takes time, and is difficult, never mind having to admit that you might be imperfect.

Chad I think there's also a common belief that we can change everything except ourselves. That's we're somehow stuck with the "self" we were born with and it's a static thing. It's a self-fulfilling misconception in that the commonly held nature of the belief makes it indeed harder to change ourselves than to change the things around us. But paradoxically, your self is the one thing you have complete and total control over.

It's also scarier to tackle big stuff like health, habits, and career than it is to work on relatively inconsequential things like code. I semi-recently read The War of Art and my major takeaway was that we tend to procrastinate the things that are most important to us, because we're afraid of tackling them. So, says the book, you can figure out which things are important by looking at which big things you're avoiding. Interesting idea. Not universally true but I've found it valuable to keep in mind anyway.

I think doing things intentionally is an underlying theme in both books. The stress of day to day work, or being out of work often pushes our intentions out the window. What advice can you give us about keeping focused on them?

Chad One thing I've learned over years of putting a lot of thought into how to best manage my career is that I am always going to be "too busy" to do the important things. I think that's true for most people. Parkinson's Law applies outside the workplace just as well as it does to an individual project or set of tasks. Almost everyone I know is "busy" no matter what they've got going on.

Andy It's all about Quadrant Two.

In The 7 Habits of Highly Effective People, Stephen Covey draws a square with two axes: Importance and Urgency. Things that are important and urgent get done automatically. Things that aren't important shouldn't get done. That leaves one quadrant, Quadrant Two, which is where you find activities that are important but not urgent. The problem is carving out the time out of your day or your week to do those things.

Parkinson's Law takes hold here as Chad says, but we also find that it's amplified by urgency of everything else. The new website has to go live by the first day of the trade show, or a security patch sends us scrambling to update a dozen servers, or your boss needs a new report by the end of the week. All that urgency leaves us drained, but ignoring the Quadrant Two activities that allow us to move quickly when things get crazy only makes the urgency all the more stressful.

Chad So it's really a matter of prioritizing (sorry for the obvious answer). One thing I do not advocate is cutting down on fun, relaxation, health, or family time. Those are the natural things to let slip when you're faced with stress (after you let career development slip). But when you sacrifice the "living" part of life, you burn out fast. And from my experience, burnout is the fastest ticket to mediocrity.

So if burnout is the fastest way to being unremarkable, it's the thing we have to attack most ruthlessly. How do you do it? I don't have a definitive answer but my current philosophy is that if you're faced with too much to do and are stressing, you should take inventory of all the stuff you don't want to do and stop doing those things.

It may sound like I'm advocating laziness — and I am, but in the same way Larry Wall does. Some of the stuff we hate doing really isn't worth doing. We can just stop doing it. But most of what we hate doing still needs to be done. As programmers, we have a unique advantage here: automation. We don't have to rely on tools others have created to automate our work. We can do it ourselves.

Andy As a long-time Perl user and advocate, it's no surprise I love Larry's view on laziness. The corollary to laziness is the idea that "machines should work, people should think." Any time you're spending doing some sort of work that the machine should be doing is wasted time, because you're spending your brain power, which is incredibly valuable, doing things that can be done with computer power, which is cheap and getting cheaper every day. We have these amazingly powerful tools, if we just put the time and mental effort into making them do even more.

The barrier to entry, however, is the unwillingness to increase our knowledge to let us know how to make the computer do a given task, or take the time make it happen. Larry calls this "false laziness." If you've ever said "It's faster for me to retype this than to hack together a conversion tool for this data", that might well be false laziness. What about the next time you need to convert similar data? It doesn't take many iterations before you say "Geez, I should have spent the time up front."

It takes determination to get over the hump, but after a while you get into the groove and you find yourself saying "I don't mind doing extra work up front because I know I'll get more brain cycles back days down the road." More free brain cycles = less burnout.

Chad If you start automating everything in the workplace, you'll not only make your life better, freeing yourself to focus on the important task of leveling up as a software developer. You'll also save your company money, reduce the turnaround time for tasks, and reduce the chance of human errors.

I think it was Martin Fowler who said, "If you can't change your organization, change your organization". How do we choose between investing in ourselves (in place) and investing in finding a better place?

Andy It's simple, if not easy, cost-benefit analysis. The problem that most people run into is not knowing what benefits that they are looking for. If we only go by reflex, we might look at salary, perceived job security, and technical specifics of the job that we have and the one that we might be going to. Unfortunately, those three factors are rarely the only relevant issues. What about your co-workers? Work/life balance? The industry you'll be working on?

The one crucial point I tell anyone who is unhappy with a job and looking to go elsewhere is to make sure that when they finally look to make the jump, that they are moving to somewhere good, not running away from somewhere bad. Running away leaves in you a position of weakness. You're more likely to make desperate moves, take unnecessary risks, and accept jobs or salaries that you would ordinarily turn away.

Chad Andy's point about going to good vs. running from bad is profoundly right.

I'd add that it's easier to hope for external forces to change your situation than to change your own situation. So I'd recommend starting with the assumption that there is a better way to perceive and respond to any work situation until that proves to be incorrect.

Also, I think a lot of "knowledge workers" tend to develop an unhealthy attachment and relationship to their employers. We come to think of a job as a place to go and live. Therefore we establish a sense of entitlement about how things should be and how "fair" work life should be. Ultimately, the employer/employee relationship is a series of business transactions. It's not a family or a home. Remembering that it's a business relationship can help you make better decisions both for yourself and for your job.

Lots of people don't understand the technology job space. What makes it so different than working in other industries?

Andy As far as job hunting, your peers are as smart as you are, and you're held to a higher standard than most other industries. The hiring manager is probably as much of a geek as you are, and is going to scrutinize you like the detail-oriented person she is. We also understand the Internet in a way that other professionals usually don't. For example, I've seen plenty of articles for non-techie job hunters that warn that your online activity could be checked out by a potential employer, so watch out with those drunken frat party photos on MySpace. Show that article to anyone in our industry and he'll say "Duh, of course."

We also have to perform at a higher level to show that we can do the job. It's not enough to come in for an interview, answer some questions and hope you get picked. You need to show that you can do the job, either by showing prior work that you've done, or by telling compelling stories about your background. If you're not going to take the time and effort to step up the level at which your job hunting, someone else will, and you'll be shut out.

Chad I don't think it's very different from other "knowledge work" industries. The one big difference is that technologists (particularly programmers) have the ability to build up a portfolio of work that anyone can use. Software is everywhere, so it's possible for a programmer to create something that can touch literally anyone's life. And software doesn't cost per-unit like, say, a piece of furniture does. This means that if a software developer creates something on his or her own time, it can be shared at no cost with anyone they want to share it with. From the perspective of the job market, this is a powerful tool. There aren't many industries where a candidate could give a piece of their work to every potential employer to actually keep and use. I think programmers should take advantage of this and create Free software to distribute as part of the job search process. (There are countless other benefits to creating Free software that I'm not focusing on here obviously).

Andy Samples of your work are crucial to the job search. I think that anyone hiring programmers who doesn't see samples of the code written by the candidate is crazy. You wouldn't hire a chef without tasting his cooking, so why are programmers different? Ten-minute exercises at the interview like "Write a function to do X" may weed out the bottom feeders, but seeing a sizable chunk of code tells so much more. I can tell so much about a programmer by reading five pages of code that verbal discussion just doesn't bring out. How does she name her variables? Does she create beautiful code? What gets documented? Has the code been maintained, or is it a pile of hacks and bolt-ons?

Even if the hiring manager doesn't ask for code samples at the interview, bring them with you anyway. Seeing samples of your good work lower the risk in the mind of the manager. Given two candidates who are roughly similar in skills, but one can show evidence of his working abilities, who do you think the manager is going to pick?

The problem with code samples is that many people are not at liberty to disclose they've worked on in their jobs. It doesn't have to be military contractors or stealth startups to run into this problem. Working on Free Software or open source software is your way around this. You can work on existing projects or start one of your own, and your code is available for anyone in the world to see, especially your future employer.

If you've read each other's book, what's the best advice you took from it?

Chad What I like about Andy's book is how tactical and detailed it gets. For example, his description of the actual interview process is spot on. For those of us who haven't gone through interviewing in a long time, it presents a clear walk-through of what to expect on a real interview. It's like actually being there but with Andy sitting by you giving advice the whole time. For me, I can imagine that taking the pressure out of the situation if I were nervous about an interview. If you follow all of his advice on the interview day, you'll likely be one of the most prepared candidates the interviewers ever see.

Andy The advice about making sure that you're the worst guy in the room is tough. It struck me when I first read it in the first version, and I try always to remember it.

The idea comes from jazz guitarist Pat Metheny's advice to "always be the worst guy in any band you're in," because you'll be surrounded by better players and will naturally play better and will learn along the way. When you're the best player in a band, you're less likely to learn from others.

I find myself mostly applying this to open source projects, where I'm surrounded by fantastic programmers from around the world. I need to remember to appreciate the skills of those around me, and learn as much as I can. The tough part is that it's so intimidating.

A lot of hackers find the non-programming aspects of our jobs unpleasant or worse. What's been the hardest non-programming task to adapt to for you?

Andy As always, it's dealing with other people, and remembering the robustness principle of "Be conservative in what you do; be liberal in what you accept from others" when dealing with others from work.

For being conservative on output, it's always tough to remember that our geek argot and our overly direct way of saying things can be off-putting to others. It's especially dangerous because one slip of the tongue can leave you marked as a jerk for quite a while.

Easier, although still frustrating, is being liberal in what I accept as input. Say I've got someone who's reporting a bug in the software, and he says "Yeah, I tried to update a batch, and it didn't work," and I've got to go through the process of "How did it not work" and "what specifically did you try" and all the classic debugging questions. When I first starting programming, I'd be so mad that the person I was talking to didn't say all the right things, or didn't have my thought process. It took a while to learn to accept that.

Now, these are minor annoyances, but they don't bother me. It's like being annoyed at the rain, but not letting it ruin my day.

Chad For me it was probably before I became a programmer. But it still applies, because it happened while I was a hacker of a different sort: a musician. At nights I did my "real" career as a professional saxophonist. In the mornings, I had an extra job as a forklift operator. I actually loved both jobs. Some days I even preferred the fork lift job.

I was really good at the fork lift job. I got great satisfaction as a part time contractor at beating all of the full time employees on the truck dock where I worked in productivity every day by a significant percentage. I would basically get to work and run for my entire shift, either on foot or virtually on a forklift. It was a rewarding job and the bosses were taking notice. They wanted to bring me on full time, which would give me benefits and a great deal more pay. Especially given my musician's living, it was a tempting offer.

But I was painfully introverted. I was so shy, in fact, that it was almost a problem on the truck dock. People constantly took shots at me because I was such a pushover and so obviously uncomfortable around people.

I knew this was going to be a life-long limitation if I didn't do something about it.

So I quit my beloved fork lift job and started working as a waiter.

It was a miserable experience. I panicked daily at first having to interact with group after group of strangers while also juggling their orders, special requests, and (physically juggling) their plates. I was the worst waiter I've ever witnessed. I would sometimes leave work with less money than when I started due to the way waiters have to give a percentage of sales to the supporting restaurant staff.

But over time, though I never really became a passable waiter (great respect to all of you who have ever done that job successfully!), it was that experience that gave me the comfort to interact with people that has probably been the single most important career development move I will ever make. I threw myself way out of my comfort zone and have become the sort of person who is (hilariously) described as "the most extroverted programmer ever" and that sort of silly thing. It's been the key to my success in corporate environments as well as the change that enables me to do things like organize and speak at conferences, give training, do on-site consulting for clients, and basically most of what I make my living at now.

What about you? What's been the hardest thing for you to pull into your work-a-day lives?

Posted by gnupate 3 comments

Labels: Books, Interview

Monday, June 01, 2009

Communities, Publishers, and Conferences.

JavaOne [h]as a *very* different feel than that of a Ruby show, obviously :P — Leah Silber

Really starting to believe that small, short, regional conferences are the way to go. — Andrew O'Brien

I think events of many sizes can be worthwhile -- they just have different profiles and risks/rewards. — David Black

Some of the discussion recently on Twitter has made me think about how we organized MountainWest RubyConf. We've been very focused on keeping it intimate and engaging. From the comments I hear, it seems like we did a good job. It certainly feels like we've for the community behind us, and that can only help us get better each yer.

It also reminded me of some of my earlier writing about publishing. I think there's a lot of overlap in building community for a conference and for a book/publisher.

To me, I guess it boils down to conferences and books with strong ties to the community feel better. What do you think?

This post is part of a collection of articles about Publishing, Growing Markets, and Books.

Posted by gnupate 4 comments

Labels: Books, Conferences, Tech Publishing

Monday, April 13, 2009

Book Review: Beautiful Architecture

[フレーム]

I've been reading O'Reilly's Beautiful Architecture lately. While I'm not as sold on it as editor Diomidis Spinellis earlier book, Code Reading[1], it's still a keeper.

Spinellis and his co-editor Georgios Gousios have done a good job of selecting interesting essayists and of putting their works together into a collection that feels solid. Reading it will certainly make you think more about your own project's architecture.

In the Preface, the editors put forward a collections of Principles, Properties, and Structures for architecture. These would be a great way to index the contents of the book. Chapters 3-12 (covering Enterprise, Systems, and End User Applications) each begin with a table showing which of these principles, properties, and collections they touch on. Sadly, that's the extent of use to which they're put. I would have loved to have an appendix with a guide to which sections of which essays I could go to for more detail on 'Entropy Resistance', 'Buildability', or 'Dependency'.

Like most anthologies it has some chapters that different people will like or not. To me, some of the real winners are: Peter Goodliffe's "A Tale of Two Systems", Jim Blandy's "GNU EMACS", Till Adam and Mirko Boehm's "When the Bazaar Sets Out to Build Cathedrals". There's also plenty of meat for the OS or Enterprise level if that's where you'd rather read. Which chapters stood out to you?

Whether it's something you're building on the weekends or on the job, Beautiful Architecture will certainly do your code base good.

1 I wrote about Code Reading in my blog post Three More Good Books a while ago.

Click here to Tweet this article

Posted by gnupate 2 comments

Labels: Books

Thursday, April 02, 2009

Book Review: Real World Haskell

"Real World Haskell? Isn't that an oxymoron?" I heard the question asked in one way or another many times as I lugged the book between meetings (looking for spare minutes to read it). As the authors explained in my Real World Haskell interview, Functional Programming languages generally and Haskell specifically, might have once be confined to the ivory tower but no longer. And this book is one great way to help bring the benefits of haskell to your coding projects.

I'm still not sold on Static vs. Dynamic Typing, and Ruby remains my language of choice, but I've got to say that Haskell is not nearly as intimidating as it once was. Maybe with enough intentional use it will be a tool I reach for more often without having to think about it.

Real World Haskell is a big, solid book with a lot to commend it. It's well organized, easy to read, and loaded with good examples. Best of all, it's written by long-term members of the Haskell community, so you're getting idiomatic code and well reasoned explanations by guys who have been there. The only down side is a somewhat weak index.

If you're a rubyist looking to understand more about Haskell or Functional Programming in general, this is the book for you. In fact, if you're a rubyist, you should be looking at this kind of book in general. I can't wait to see RWH reading groups start up within Ruby Brigades ... it will certainly make us better programmers.

Click here to Tweet this article

This post is sponsored by Cowork Utah, a member-based alternative for “cubicle adverse”solopreneurs, writers, web developers, designers and other independents who prefer working in a stimulating environment away from home. We’re free agents who collaborate while maintaining our autonomy. Specifically, our primary focus revolves around web 2.0 social media tools and strategies.

Posted by gnupate 2 comments

Labels: Books, functional programming, haskell, real world haskell

Thursday, March 26, 2009

Author Interview - Venkat Subramaniam

Here's the third of three Scala book interviews. Venkat Subramaniam (@venkat_s), author of the Prag's Programming Scala, has a lot to say about Scala and why you should be learning and using it.

You can find more posts about Scala (and Haskell and Erlang) on my Functional Programming page.

Functional Programming languages seem to be an increasingly popular topic. Why? What do FP languages teach or do for us?

Venkat Functional Programming has been around for a long time. Programmers are taking the time over the past few decades to rediscover it.

A few decades ago low level languages like Assembly and C were popular. As OOP and C++ because prominent, it became increasingly clear that memory management is quite unmanageable. We raised the bar, let the platform deal with memory management and garbage collection. We figured that by relying on a higher level of abstraction we can focus our effort, for better, on application development rather than memory management. This, among other things, brought languages like Java and C# to prominence. When programming in these languages, we don't have to worry any more about memory management.

As we moved on to create highly responsive applications, we began to realize that dealing with threading and concurrency is another real pain. As soon as you learn how to create a thread, the next thing you learn is how to limit and control it. Multithreaded applications are plagued with issues related to contention, deadlocks, correctness, and complexity of the low level APIs. The gaining availability of multi core processors and multiprocessors is only exasperating these concern. We're eager to put an end to the concurrency issues like we took care of memory management. This is where functional programming comes in.

Functional Programming advocates assignment-less programming and higher order functions with no side effect. By higher order functions, we mean functions that accept functions as parameters. You pass functions, return functions, and nest functions. Furthermore, these functions solely rely on the input you send them. They're not influenced by any external shared state, and, in turn, do not affect any external shared state. They promote immutability. This eliminates the concern of contention and the need for synchronization. It makes it easier to understand and verify behavior of such functions. However, threads still need to communicate and exchange information. Functional Programming languages like Scala provide an actor based message passing model to realize this.

What makes Scala the right FP language for people to pick up?

Venkat Scala is a relatively new language. Other prominent FP languages have come before it. However, quite a few things make Scala special and attractive. It is by far the most powerful statically typed, highly expressive, yet concise language that runs on the Java Virtual Machine (JVM). Scala is fully object oriented, intermixes with Java seamlessly, and provides very strong static typing but without the ceremony that Java adds. Though Scala is entirely a new language, for a moment entertain the thought that Scala is a refactored version of Java, stripped out of its ceremony and threading API, then enhanced with sensible type inference and actor based concurrency model. If Java were to be redesigned in 21st century, it may look a lot like Scala. So, if you are programming on the JVM, and are interested in taking advantage of functional programming capabilities, Scala is a great choice.

Why is the JVM a good platform for Scala and FP?

Venkat Or should we pose the question the other way around, asking why is Scala and FP a good choice on the JVM? I think these two questions go hand in hand.

The real strength of Java is not the language but the platform, the JVM. There are three things going for it. First, the capability of the virtual machine itself is superb in terms of performance and scalability. It did not start out that way, however, we've seen tremendous improvements over the past few years. Second, a rich set of libraries and frameworks are available for various tasks that enterprise applications demand. Name it, and there is something available already for you (the concern often in this space is availability of too many options, not too little!). Third, influenced by the two above factors, a significant number of enterprise applications already run on the JVM platform.

Some developers have the luxury of starting out on green field projects and do not have to integrate with other existing applications and components. They have the ability to choose any language and style they desire.

However, a large number of developers don't have that choice. They develop applications that run and integrate with things on the JVM. Emigrating to another language or another platform to enjoy the benefits of functional programming is really not an option for them. Scala provides a tremendous opportunity here. They can stay right on the powerful platform they're on, continue to developer and maintain their existing Java applications, and at the same time take advantage of functional programming and other related benefits that Scala bring to the platform.

Since Scala code compiles down to bytecode and integrates seamlessly with Java, you can take advantage of Scala to the extent you desire and makes sense of the project. You can have a small part of your application written in Scala and the rest of the application in Java. As you get comfortable and make the paradigm shift, more and more code and even entire applications can be in Scala. At this point, you can continue to integrate and utilize other components and services running on the JVM. So, the JVM is not simply a good platform. It is a compelling platform to utilize and benefit from functional style of programming.

What kinds of problems are a good fit for Scala?

Scala's strengths are strong sensible typing, conciseness, pattern matching, and functional style. The first three strengths can help with your traditional programming needs. It will take you less code to achieve your day to day tasks like working with XML, parsing data, manipulating collections of data, ... Scala can help you do the heavy weight lifting with lesser effort. The last strength I mentioned along with the actor based model will help you develop highly concurrent applications without the pain and perils of thread synchronization. On one hand, your concurrent application will be smaller in size. On the other hand, it is devoid of the uncertainty of program correctness due to state contention and deadlocking.

Design patterns and algorithms look differently when implemented in different languages. Can you give us some examples of elegant patterns or algorithms in Scala?

The language idioms have a great influence on the design. There are a number of patterns that are easier to implement in Java compared to C++. Similarly, languages that support closures and provide conciseness make lots of things easier and elegant.

Let's explore one example.

You are asked to total values in a range from 1 to a given number. You can write a simple loop to do this. Now, you are asked to total only select values, say only even numbers. You could duplicate the code that totals and put a condition statement in it. Now, you are asked to support different criteria, select even number, or select odd numbers, or prime numbers, etc. Spend a few minutes thinking about how you can achieve this in Java.

I can think of two ways to implement this in Java.

Approach 1:

Create an abstract class with an abstract method that will evaluate the selection of a value. Use this abstract method in determining the total. This follows the factory method pattern. Here is the code


// Java code
public abstract class TotalHelper {
 abstract public boolean isOKToUse(int number);
 public int totalSelectValuesInRange(int upperLimit) {
 int sum = 0;
 for(int i = 1; i <= upperLimit; i++) { if (isOKToUse(i)) sum += i; } return sum; } }

Now to use this method, you will have to extend the class and override the isOKToUse() method. Each time you need a different criteria, you have to write yet another class. Quite a bit of work.

Approach 2:

Create an interface with the isOKToUse() method and call that method from within the totalSelectValuesInRange() method as shown below:


//Java code
public interface OKToUse {
 public boolean isOKToUse(int number);
}
//Java code
public class TotalHelper {
 public int totalSelectValuesInRange(int upperLimit, OKToUse okToUse) {
 int sum = 0;
 for(int i = 1; i <= upperLimit; i++) { if (okToUse.isOKToUse(i)) sum += i; } return sum; } }

In this case you don't have to create a separate class to derive from TotalHelper, however, you do have to create (inner) classes that implement the OKToUse interface. Here you are using the strategy pattern to solve the problem.

While both of the above approaches will work, it is a lot of work to implement either approach. You can use IDEs to help generate quite a bit of that code, but that is still a lot of code that you have to look at and maintain.

Let's see how you can solve the above problem in Scala. We will follow along the lines of the second approach above, but without the interface. Since you can pass functions to functions, the code will be a lot simpler to write. At the same time, since Scala is statically typed, you have compile time type checking to ensure you are dealing with the right types in your code. Let's take a look:


def totalSelectValuesInRange(upperLimit: Int, isOKToUse: Int => Boolean) = {
 val range = 1 to upperLimit
 (0 /: range) { (sum, number) =>
 sum + (if (isOKToUse(number)) number else 0) }
}

The totalSelectValuesInRange() function takes two parameters. The first one is the upper limit for the range you want to work with. The second parameter is a function which accepts an Int and returns a Boolean. Within the totalSelectValuesInRange() function, you create a range and for each element in the range, you include the element in the sum if it passes the criteria evaluated by the function given in the parameter.

Here are two examples of using the above code:


Console println "Total of even numbers from 1 to 10 is " +
 totalSelectValuesInRange(10, _ % 2 == 0)
Console println "Total of odd numbers from 1 to 10 is " +
 totalSelectValuesInRange(10, _ % 2 == 1)

The output from the above code is shown below:

>scala totalExample.scala
Total of even numbers from 1 to 10 is 30
Total of odd numbers from 1 to 10 is 25

When looking at the above example, don't focus on the syntax. Instead focus on the semantics. You can't program any language without understanding the syntax. At the same time, once you get a grasp of the syntax, you navigate at the speed possible. This is when the language conciseness will help. There are quite a few interesting things to observe from the above example.

The code is very concise.
You did not spend time creating interfaces. Instead you passed functions around.
You did not specify the type repeatedly. When you called the totalSelectValuesInRange, Scala verified that the operation you perform on the parameter to the function (represented by the underscore) is valid operation on an int. For example, if you wrote
```
totalSelectValuesInRange(10, _.length()> 0)
```
Scala will give you a compile time error as shown below:
```
>scala totalExample.scala
(fragment of totalExample.scala):15: error: value length is not a member of Int
 totalSelectValuesInRange(10, _.length() > 0)
 ^
one error found
!!!
discarding <script preamble>
```
Notice how it recognized that the parameter represented by _ is an Int.
You did not perform a single assignment (val represents immutable data, you can't change or assign to it once you create it). In the Java example you initialized sum to 0 and then continued to update it. In the Scala example, however, you did not assign a value to any variable at all. This last feature comes in very handy when dealing with concurrency.

Imagine for a minute that the numbers you like to total are your asset values in stock investment. You can write a little code that concurrently fetches the stock prices from the web and determine the total value you have invested in it. You can then total these values without any issues of concurrency. And you'd be surprised, you can do that in about two dozen lines of code in Scala, as shown below.


import scala.actors._
import Actor._
val symbols = Map("AAPL" -> 200, "GOOG" -> 500, "IBM" -> 300)
val receiver = self
symbols.foreach { stock =>
 val (symbol, units) = stock
 actor { receiver ! getTotalValue(symbol, units) }
}
Console println "Total worth of your investment as of today is " + totalWorth()
def getTotalValue(symbol : String, units : Int) = {
 val url = "http://ichart.finance.yahoo.com/table.csv?s=" + symbol +
 "&a=00&b=01&c=" + (new java.util.Date()).getYear()
 val data = io.Source.fromURL(url).mkString
 units * data.split("\n")(1).split(",")(4).toDouble
}
def totalWorth() = {
 val range = 1 to symbols.size
 (0.0 /: range) { (sum, index) => sum + receiveWithin(10000) { case price : Double => price } }
}

The actor helped you to dispatch separate concurrent calls to the yahoo web service to fetch the price for the symbols you hold. You multiplied the response you got with the number of units to determine the total value. Finally you messaged that back to the calling actor. In the totalWorth() method you received the response from those calls using the receiveWithin() method and added them up.

Click here to Tweet this article

Posted by gnupate 1 comments

Labels: Books, Interview, scala, scala books

Friday, March 06, 2009

Author Interview: Assaf Arkin - Ruby In Practice

[フレーム]

With the recent release of Ruby in Practice, I've contacted both Assaf Arkin (@assaf) and Jeremy Macanally (@jm) to do some interviews about their book. These will be posted on my best posts about the best books page.

Assaf got his answers back to me first, and I liked them so much I decided to post them as a stand alone interview. Look for Jeremy's responses soon. Until then, enjoy Assaf's!

This book has been a long time coming, how does it feel to see it finally hitting the book shelves?

Assaf Like sitting down for a cup of water after a long run.

If you had a chance to start it today, what are the big things you'd want to address?

Assaf Talk more about BDD. I liked RSpec since I first started using it, but the latest releases are such a leap in making specs easier to write and maintain. That and Cucumber which I'm now getting into. It works at a higher level, describing scenarios or features, and it means that you can drive tests directly from the design, and have that design be part of the source code, not in a separate unmaintained file.

Background processing is something I missed when I first got into Ruby. With Java I had more options for queuing, scheduling and batching. That changed and now when I'm evaluating for a new project I'm working on and there are a lot of good options to choose from. That would make great material for a chapter in the book.

Ruby recently got a serious kick in the XML pants. First libxml came back from the dead with renewed force, then nokogiri showing up from nowhere, hpricot made huge strides. I know people who look at Ruby before, looked at REXML, and decided to go elsewhere. It's time to reconsider Ruby.

I really like where ActiveRecord is going, and I'm always very cautious around ORM because of their tendency to turn relational databases into dumb object stores. I'm using relational databases because they're relational, and I'd rather see the language take on some of these relational aspects. I think Ruby can do interesting things because it's not so OO dogmatic and much more malleable. I use ActiveRecord outside Rails a lot, and that shows in the book, where we just use ActiveRecord when we need to use a database in a context other than the database chapter. I think a chapter about advance ActiveRecord techniques would be great.

Not entirely related to Ruby, but I think this will interest a lot of people: the development workflow. We talk about specs and testing and how to use these as starting point to build and refactor code. Git hit the soft spot with Ruby developer, and there are some practices worth talking about. Also the different tools you would use to take code from development through testing, staging and finally production deployment.

Phusion Passenger. We had to wrap up the book before we could cover it.

What would you drop?

Assaf Tough. We tried to cover things that other books don't talk about, so I'll have to take a look at books that just come out before I can answer that, but if there's a good book on the subject, I would drop that subject.

Except for testing/BDD. The RSpec book is coming out soon and I would recommend it, but I would still have a chapter devoted to testing. I think testing should be talked about more often, and take every opportunity to introduce people to BDD as a better way to design and build your application.

What's the most exciting thing happening in Ruby and it's community today?

Assaf For me, Rails 2.3, which is coming out soon. For the apps I'm working on, 2.3 is a major step forward in smoothing out the rough bumps of previous releases. I'm continually amazed by the things that are happening around Rails. By intentionally not solving all the world's problems, it's leading to an amazing eco-systems of adds and plugins. My latest favorite is Scrooge which inspects your queries and what you do with the results, builds a profile out of that and uses it to optimize your code.

Testing. Manual testing sucks the life out of me, so do bad test frameworks. I love what you can do with Ruby, and even the prospect of using Ruby code to test Java applications. I mentioned RSpec and Cucumber, there's also Should and Webrat, and just a lot of continuous improvement that keep making my life easier.

Ruby 1.9. Faster, better, but not entirely backwards compatible, so that's still work in progress waiting for all the libraries to catch up.

Github. It's a fabulous service that puts all others to shame. It's a social network built around sharing code, and a great tool when working in teams, open source or not. It's also an amazing resource. Sourceforge and even more modern services like Google Code force you to make multiple mouse clicks before you can see the first line of code. Github puts it right there in front of you on the welcome page. It's telling you "come, look at the code, learn from the knowledge of others".

What's next for you?

Assaf Two projects I'm currently involved with which are taking Ruby in interesting directions.

The first is Singleshot, it's a task manager, of if you prefer the workflow term, a worklist manager. It gives you a task list that you can manage, allows you to delegate, and also allows various applications to delegate tasks to you. It's classical enterprise technology and we're trying to make something that has usability at its forefront. That and being a test case for a Web API that takes hypermedia seriously.

The other one is Buildr, a build system that we're using for large applications that have a build cycle. So yes, we're using Ruby code to build applications written mostly in languages like Java, Scala and Flex. Ruby is ideal here because you can use it declaratively, at the level of defining what a project is, but also do grunt work like moving files around or calling command line tools. It's the first project that's strictly Ruby going on at Apache.

Other than the fact that you're both great guys, why should Rubyists run out and shell out their hard earned money for this book?

Assaf Because they're lazy. I know a lot of people who come to Ruby with a strong background in Java and the mentality that you need big APIs, big frameworks, dependency injection, XML configuration, scalable failover containers, and a thousand lines of code later you managed to read a list of names from a file and sort them alphabetically.

Complexity kills, so one thing I tried to show in the book is how you can size the complexity of the solution to the complexity of the problem. Maybe, instead of setting up an ESB with connectors on each side an pipeline in the middle, you can write a few lines of Ruby script, get the same job done in 1/10th the time. Even when, and this is one of my favorite examples, the job requires moving XML messages from WebSphere MQ to Salesforce. Ruby will get you there faster and cheaper than big architecture tooling.

Click here to Tweet this article

Posted by gnupate 0 comments

Labels: Books, Interview, ruby books, Ruby In Practice

Monday, February 16, 2009

Matt Bauer Interview

A little over a year ago, I picked up a copy of Visualizing Data, and wrote a review of it. About a month ago, I discovered that Matt Bauer (@mattbauer) was writing Data Processing and Visualization with Ruby. The book's out as a rough cut, but not yet on bookshelves. It looks plenty interesting though, so I asked Matt to join me for a quick interview.

Update: Just in case you want a direct link to the rough cut, it's here.

Data visualization seems to be an increasingly popular topic. What are some interesting ways in which you're seeing it used?

Matt I think some of the interactive animations of data are really impressive. It's a great way to show a lot of data and their interactions. The recent code swarm videos are an example of this. I've also seen animations that loop and allow for various data dimensions to be added and removed to see it has an affect or not on an aggregate. It's a great way to quickly identify what data dimension is responsible for some observed change. Take an international shipping company whose having 10% of it's shipments from Hong Kong arriving late to San Diego for example. The company likely has a lot of data such as origin, vessel, crew, inspections, route, weather, destination ports, times, contents, maintenance records, and a ton more data dimensions. Using a looping animation that shows the path of packages on a global map over time, various data dimensions or groups of data dimensions could be added to see if the path color (representing delay time) changes at all. You could even do a Minard style map too. The ability to interact with the data is a much faster way to understand the data than looking at a number of individual static graphs.

Converting data into audio is also an interesting way to represent large amounts of data when looking for abnormalities. The idea is rather simple actually. Each data dimension is a separate track or instrument with the overall beat being determined by one dimension. For example, requests per second could determine the beat, drums the database activity and a hi top the memcached cache misses. It can take some time to create a pleasant enough orchestration but once the right instruments are assigned to the data dimensions, it makes it incredibly easy to hear problems. It's much like a mechanic listening to an engine and knowing if it's working properly or not. Again, this works best for doing a quick check of a system such as when a user calls since listening to it all the time is more likely to cause a headache rather than avoid it.

How does Tufte fit in to all of this?

Matt Tufte really comes into play for that second set of graphs. If his ideas and principals are followed, you should have a successful graph, illustration, table, report, etc. That's not to say only use the graphs, illustrations, tables, and reports he uses. It's to say come up with your own graphs, illustrations, tables, and reports that work with the data you have. Just make sure you stay true to his ideas and principals.

Can you give us a quick walk through of your approach to finding the right visualization for a dataset?

Matt My approach is two fold as there are two graphs (graph sets) to most data. The first set of graphs is figure out what the hell the data is. It could have a logarithmic distribution, maybe exponential. Maybe four of the variables are dependent but the other two aren't. The point is, you need a number of graphs to figure it out. I often start with a simple scatter plot and go from there. This isn't so bad with software like Tableau or other graphing programs. Once I know what I'm looking at, then I move to the second set of graphs. The purpose of the second set of graphs is to sell the next person on what you see in the data as quickly as possible. It's the second set of graphs that take the most amount of time.

Can you talk to me a bit about the commonalities and differences between data mining, collective intelligence, and the kind of data processing you're writing about?

Matt Collective intelligence is made up of multiple components: cognition, cooperation and coordination. Of the three parts, data mining can be used to provide cognition. That is data mining or determining patterns from data can be used to predict future events which is a necessary part of collective intelligence. What I'm writing about is dimensional data modeling which is the technique use to allow data warehousing and data mining. When I talk to less technical people I tell them I'm writing a book on how to use all the data they collect to make business decisions which will result in increased profits. The book starts with a couple chapters about dimensional data modeling theory. It then shows how to implement the theories in an RDMS and using ActiveRecord to query it. ActiveRecord works but it's not the best pattern to use. As a result I next talk about using Coal, a dimensional data modeling framework I've developed and used on a number of projects. I'm in the process of extracting it and open sourcing it; soon I hope. I also talk about extracting, transforming (cleaning up/normalizing) and loading data to and from various systems. The book ends with discussions on visualization techniques ranging from sparklines to mpeg videos.

How does dimensional data modeling fit into non-relational DBs (.e.g, CouchDB or BerkeleyDB, which you mentioned earlier)?

Matt The most popular non-general purpose RDBMS systems out there are probably the OLAP systems from companies like Microsoft, Oracle and IBM. I'm not positive but I think often times their a general purpose RDBMS with additional code for doing cubes and aggregations quickly. CouchDB and BerkeleyDB as you mention, aren't an RDMS system. BerkeleyDB is a really excellent, fast, highly concurrent Btree and HashTable for the most part. That's not to belittle it; just the best way to explain it. It a great place to start if you want to build a database system yourself. In fact, MySQL in the beginning used it as it's backend. You could use BerkeleyDB as a store for dimensional data. One thing to remember though is BerkeleyDB is doesn't have a query language. So unless you have a fixed set of queries, you'll likely have to write code to breakdown your query language into gets and puts for BerkeleyDB to work. CouchDB too could work as a dimensional data store. I don't think I would though. It's the same reason I don't like most DBs out there for dimensional data store, they store the data inefficiently for the task at hand. Most RDBMS are row stores meaning they store all the attributes (columns) of a row together. This works great for transactional systems where most calls are like User.find(1) and you need to operate on the entire state of the User model. It's not great when you're just concerned with the age attribute for all rows. The real solution is to use a column store like MonetDB or Vertica. I personally would like to build a better open source one but am having problems finding time. With a column store, each column in a row is stored separately on disk. This makes a query on a column for all rows very fast. It also allows for great compression and encoding. Column stores have shown 100x-1000x improvements compared to row stores.

Ruby isn't the fastest language around, what makes it the right language for data processing and visualization?

Matt Ruby doesn't have the fastest execution time but I'd argue no language is going to have a fastest enough execution time. The truth is when processing large datasets you often run into physical limitations. For example, a 100GB dataset on a Fibre Channel drive theoretically takes about 2 minutes to read. So even before you add code, you're looking at a minimum of 2 minutes. A faster language cannot change that. So in order to speed things up you have to look for better algorithms such optimized b-trees, encoding, compression, indexes, projections, etc.

So if execution time isn't important, why Ruby then? Why not use Java, C or Erlang? I think there are two main reasons. The first is Ruby's ability to easily access and transform data and Ruby's ability to integrate with almost anything. The success of a data processing project often rests in the quality and quantity of data to process. Ruby with it's scripting ability and large number of gems make it easy to create programs to fetch data from a variety of databases, web services, web sites (scraping), ftp sites, etc. Of course data from multiple sources often have different names for the same thing and this is also where Ruby shines. Ruby's regular expressions, blocks and dynamic typing make data transformation much easier than in other languages.

The second main reason to use Ruby is to interact with the diverse number of components needed to query and report in data sets. This includes everything from data stores like BerkeleyDB, PostgreSQL, and Vertica, to various visualization libraries like Processing, Graphviz, ImageMagick, and FFMpeg. In short, you can use the best tool for each job and control them all with Ruby or just one language.

What makes you the right person to write about it?

Matt I've been intrigued with data ever since college. My degree is actually in biochemistry but I worked at the Space Science Engineering Center providing support to scientists as they studied weather data from satellites, sea buoys, inframeters, Antarctica ice cores and other remote sensing equipment. Most of the work was done in C or Fortran and the data was typically structured as a number of matrices some of which were absolutely huge. It wasn't uncommon for a program take four days to run using the latest SGI Origin hardware available. After college I worked for a number of places that dealt with very large datasets including the United States Postal Service, later at the Federal Reserve and now as a consultant. During my years I've had to build everything from databases to visualization systems. I've also spent much time working with the end users of such systems to understand how they interact with data. This includes typically 2D graphs to complex animations to completely immersive cave systems. It's quite easy to have two people interpret the exact same complex data visualization completely different. I know what makes a successful project and maybe more importantly I also know what guarantees complete failure.

Click here to Tweet this article

Posted by gnupate 0 comments

Labels: Books, Data Processing and Visualization, Interview

Monday, February 09, 2009

Real World Haskell: Pre-Reading Survey

A long time ago, I was an aficionado of a language that told me that the three traits of a great programmer are laziness, impatience, and hubris. Then I discovered Ruby, which taught me about the Principle Of Least Surprise and that programming should be fun. Now, in Real World Haskell, Bryan O'Sullivan, John Goerzen, and Don Stewart promise me three things as I read their book to learn about Haskell: novelty, power, and enjoyment. That sounds like a pretty good deal to me.

After conducting an interview with Bryan, John, and Don I kept looking for a break in my reading list where I could put RWH, and I finally decided to make the room instead of waiting for it to occur on its own. Yesterday, my copy arrived.

I was immediately struck by the size of the book. Programming in Haskell (which I wrote about, briefly, here ) is a relatively modest 155 pages while RWH weighs in at 640 — and what I've read so far is very approachable.

Reading through the ToC and Introduction, I've built the following list of questions I want to keep in mind as I read:

What value do I gain from strict, static typing? How does this compare to the value I gain from strict, dynamic typing?
What about Lazy Evaluation?
What about Polymorphism?
Why bother with whitespace?
How do I think in Haskell?
What about Composition and code re-use? (How does it conpare to Factor/Forth?)
How do I keep code readable? How is this different than in Ruby?
How does the FFI work? How does this compare to Ruby's?
How does concurrent programming work? How does this compare withErlang? With Ruby?
How does STM fit into things? What should this teach me about threads? About Actors?
How do I profile, benchmark, and optimize?

And of course, the biggest question of them all: When should I be reaching for haskell instead of Ruby, bash, or C?

I've also put together three little goals for myself. By the time I finish the book, I want to use haskell to write:

a wiki
a twitter scanner
a log analyzer

I'll try to write about my progress through the book, insights into the questions above, and progress toward my three goals. Feel free to share your thoughts as well.

Click here to Tweet this article

Posted by gnupate 3 comments

Labels: Books, haskell, real world haskell

Thursday, February 05, 2009

Beautiful Architecture: my survey

In my last post, I alluded to my efforts to read more intentionally and admitted that I chose to jump off of the wagon at nearly the first opportunity. I haven't abandoned my attempt, and I thought it might be worth sharing some of my thoughts and notes as I survey[0] Beautiful Architecture

The first thing that jumped out at me is a statement on the page before the ToC: "All royalties from this book will be donated to Doctors Without Borders." While this has nothing to do with the text, I appreciate passion and commitment. Being willing to make a commitment like this strikes a chord with me.

Moving on to the table of contents, I found that each chapter is an essay from a different person or team. I recognized some, but not all of them. Who are they? Why did Diomidis and Georgios select them? This isn't a knock on the book, but a recognition that I need to do some studying to understand who these people are and why I should be listening to them. Perhaps knowing more about their backgrounds will also help me better understand their positions (and biases) and improve the value of the book to me.

Some of the chapter titles stand out to me as well. "A Tale of Two Systems: a Modern-Day Software Fable", "Data Grows Up: The Architecture of the Facebook Platform", "When the Bazaar Sets Out to Build Cathedrals", and "Rereading the Classics" all evoke a desire to dig into them and see what they have to say to me.

As I step into the preface, I start to find some questions that I really want to find the answers to:

How will architecture impact the my role in infrastructure and operations?
How should I approach data centricity vs. application centricity?
What can I learn from functional programming/architectural approaches?
What trade-offs should I be looking at between stability, extensibility, performance, and aesthetics?
How do I define beautiful architecture, and do I see that beauty in my projects or the systems I work on in my day job?

I'm not sure where I'll come out on the other end of this book, but it looks like it's going to be a fun ride.

Pragmatic Thinking and Learning recommends a five step approach to reading that Andy calls SQ3R — Survey, Question, Read, Recite, Review. Not only does this look like a great idea, it reminds me of the SPQR shirt I used to where as a classics geek in high school, so you know I've got to try it.

Posted by gnupate 0 comments

Labels: beautiful architecture, Books

Beautiful Architecture: a first look

[フレーム]

I recently got my copy of Beautiful Architecture. Following the SQ3R reading pattern I picked up from Pragmatic Thinking and Learning, I started my survey in the Table of Contents, and what did I find but a chapter on Emacs and Creeping Featurism by my friend, Jim Blandy

All discipline shot, I stopped my survey and jumped in to read jimb's essay. It was all I expected it to be — jimb's a really smart guy after all. The only nit that I'd pick is that in one spot he sells emacs short, saying:

...Emacs has only a limited understanding of the semantic structure of the programs it edits, and can't offer comparable [refactoring] support.

In truth, all emacs need to provide refactoring support for a language is an external program that it can use to provide that support. In Ruby-land, I have high hopes for combinations of tools like reek, flay, and RFactor underlying emacs and making ruby refactoring easy.

Posted by gnupate 0 comments

Labels: Books

Thursday, January 29, 2009

Wicked Cool Ruby Scripts Review

[フレーム]

No Starch Press has put out a number of books that I've really enjoyed (The Manga Guide to Statistics and Ruby By Example among them), so I was very excited to see Wicked Cool Ruby Scripts announced.

Subtitled "Useful Scripts That Solve Difficult Problems", I was hoping to find a good collection of idiomatic scripts that I could recommend to folks getting started with Ruby. That's not really what I found though.

The book contains 58 scripts which represent a fairly wide swath of problems, but most of the programs are so short that they don't really show good, idiomatic Ruby. I question why some of the scripts are included (e.g., adding a user to a linux system — writing an wrapper around useradd doesn't seem useful, even as an exercise).

I'm not saying that a reader won't learn something from this book, but I don't think it would be a good first or second book on Ruby. If you're already a rubyist and you're looking for a book with some good ideas, this might be a good book to pick up.

Posted by gnupate 0 comments

Labels: Books, Wicked Cool Ruby Scripts

Friday, January 23, 2009

Author Interview: Relax with CouchDB (Round 2)

I had a great opportunity to trade emails with Jan Lehnardt in a second interview about Relax with CouchDB. This time, we touched TDD, refactoring, and of course, the book.

The initial chapters have been available for over a month now, gathering feedback. What's been the biggest change you've made due to feedback?

Jan We still have things to integrate, but we took a lot of notes. The biggest thing we've seen is where we tried to explain concepts in CouchDB by contrasting them to how things are done in the RDBMS world. Production systems often do not follow theory to the book because of performance reasons (denormalization comes to mind). So we are saying in CouchDB your data is denormalized, thus fast, and actually true to the "CouchDB Theory" but now people are (rightfully) pointing out that the RDBMS systems have been used wrongly. Fact is: We don't want to say bad things about the RDBMS world, we just tried to explain things by comparison and a lot of people coming to CouchDB have an RDBMS background, so we thought it is a good idea to contrast them.

We learned that this is not the best approach and we are moving things a little towards explaining CouchDB on its own instead of comparing it to relational databases in the first chapters. Again, I'm not saying anybody is more right or wrong here, it was just poor choice on our part because we didn't know we'd cause such a ruckus :)

PS: CouchDB is not a relational database and we all support the idea of using the right tool for the job. This is sometimes an RDBMS and sometimes CouchDB :)

As I looked over Chapter 4, one blurb stood out to me: Applications "live" inside design documents. You can replicate design documents just like everything else in CouchDB. Because design documents can be replicated, whole CouchApps can be replicated. Can you explain this in a little more depth?

Jan CouchDB is an application server in disguise. It can host HTML+CSS+JavaScript applications like any other web server, but it also provides an HTTP API to a database system. This is the perfect basis to write standalone applications using the web-technologies everybody knows.

CouchDB's replication becomes a distribution channel not only for data (what books to you have in your library?) but also entire applications (I enhanced the library application to also handle my board game collection, do you want my patch?). Think of GitHub, but for applications and peer to peer distribution.

You can also read more about this topic in a series blog posts by Chris

Standalone Applications with CouchDB

My Couch or Yours? Shareable Apps Are The Future

Refactoring is on my mind a lot right now, and with that comes testing. How testable are CouchDB apps? What kinds of tools or frameworks exist to do testing?

Jan We are currently working with TDD experts to find a good solution to allow CouchApp developers to test their applications inside-out.

Since this is all web-technology, we expect we can re-use some of the existing tools. We just want to go the extra mile and make it really easy for the developer.

What about refactoring proper, what's the state of the art in CouchDB refactoring?

Jan That depends a bit on what you mean. Refactoring CouchApps has not been tackled yet. But CouchDB is schema-free so you can just play around and change things. Documents (that includes the design documents that hold your application) are versioned, so you can go back to an old revision (not forever, but for a little while)) if you screwed up.

About refactoring your data: Say you have an app that stores user profiles and you started out with separate fields for first- and last name. But user-feedback and UI-design found out that a single `name` field is better suited for your app. Your map function to get all first and last names originally looked like this:


 function(doc) {
 emit([doc.firstname, doc.lastname], null);
 }

And your new one looks like this:


 function(doc) {
 emit([doc.name, null);
 }

You can consolidate both to support legacy data:


 function(doc) {
 if(doc.name) {
 emit(doc.name, null);
 } else {
 emit(doc.firstname + " " + doc.lastname, null);
 }
 }

You change your UI-code to deal with a single `name` value and this view will consolidate old and new documents.

Yes, this is a little dirty, but also pretty neat. At some point, you'd want to clean up all your data and get rid of the special cases. Our off-the-hand suggestion is that for minor versions, where you want to add features quickly and make updates painless, you use the duck-typing above and for major versions you take the time to consolidate the cruft and update your data properly and prune your map and reduce functions.

This is good advice (hopefully), but we might be able to provide you with tools and libraries that handle the dirty work for you so you can concentrate on improving your app instead of fussing with the database. After all, this should be relaxing!

Posted by gnupate 0 comments

Labels: Books, CouchDB, Interview

Monday, June 11, 2012

Thursday, February 16, 2012

Wednesday, March 23, 2011

Wednesday, September 08, 2010

Tuesday, August 31, 2010

Tuesday, October 13, 2009

Tuesday, July 07, 2009

Monday, June 01, 2009

Monday, April 13, 2009

Thursday, April 02, 2009

Thursday, March 26, 2009

Friday, March 06, 2009

Monday, February 16, 2009

Monday, February 09, 2009

Thursday, February 05, 2009

Thursday, January 29, 2009

Friday, January 23, 2009

About Me

Subscribe Now: Feed Icon

Most Popular Posts

My Best

Blog Archive

Links & Blogs