May 29th, 2010
|03:54 am - Off the Rails|
Warning: Serious Computer Geekery Ahead.
I've been throwing together scripts using AppleScript for years. It's quirky, with some nifty features and some incredibly irritating ones, but mostly, it's the easiest way to talk to the secret scripting communication ports on well-written Mac software. It's absolutely amazing what you can do when you can leverage the uber-clever search functions of TextWrangler, the sophisticated contact-management knowledge of AddressBook, and/or the webalicious DOM models of Safari.
But Applescript is not, in the end, a 'real' programming language. It's a scripting language. It's great for whipping little widgets together. Apple offered me a bit of a sop by building AppleScript Studio into XCode, their incredibly impressive program construction system that comes free with every Mac. However, the vast majority of the MacOS is written in Objective-C, and AppleScript Studio was a scheme that let somebody using AppleScript access parts of it, but not all of it.
I eventually decided I needed to get on board with a full-fledged general-purpose programming language. Back in the day, I'd done plenty of programming in BASIC before moving to Pascal and Modula-2. In the meantime, the larger body of programmers moved toward C. Modula-2 is like a Bentley or a Lexus: ginormous huge engine, but richly outfitted with leather, real wood, and every imaginable safety feature. C is like a Ferarri, incredibly fast and lean, hot, noisy,and waaaay to easy to wrap around a tree. A total pain in the ass to keep running, too. Objective-C is a vast improvement over its predecessor, C++, which is, in turn, a huge leap over plain old C, but C was all about efficient use of the hardware, not efficient use of the programmer's time, and Objective-C still has so much of its progenitor in it that I had no interest in dealing with it. Instead of utter caca, it was merely tripe, but I'm not putting it on my plate.
I looked at what else came already included with the Mac OS. (BTW, you would be *amazed* at what comes already installed on every Mac. Postfix, the industrial-grade industry-standard mail server. Apache, the super-industrial-grade ubiquitous web server. BIND, the key component of the domain name system. And on and on.) There were four other programming languages ready-to-roll under the hood: the venerable and immensely-well-supported PERL, the indispensible-for-websites PHP, the newer sophisticated Python, and something brand-new: Ruby.
I'll skip ahead: I picked Ruby, which has gone on to take the world of programming by storm. A big part of why Ruby became so popular so fast is because of Ruby on Rails, a new middleware program written in Ruby. Usually just called "Rails" these days, it functions much like Microsoft's Active Server Pages, or ColdFusion, or CGI, or (my former favorite) Tango, or Apple's WebObjects. It's a program that sits between a web server and a database (thus 'middleware'). Somebody surfs to a web site, the web server (say, Apache) figures out which page somebody wants, hands that info over to the middleware, which pulls data from the database, fills in the blanks on the page or otherwise constructs a lovely chunk of data-enhanced HTML, and gives it back to the web server to pass along the Internet to the person on the far end.
I have really been looking forward to getting into Ruby on Rails. I really like Ruby itself. I think it's a brilliantly designed language, and it really is all about making my work easier, not the computer's. To date, most of my Ruby work has been making my Mac do fabulous new tricks, so I've used attachments like Appscript, which lets Ruby gracefully access all those scripting portals on my well-written Mac software, and RubyCocoa, which hooks Ruby into all the infrastructure that is the Mac operating system itself. Appscript lets me use Ruby to make OmniGraffle construct bus routing charts for me from timetables, or have Photoshop construct thumbnails of my Fanucci cards, rounding off the corners like the real thing. RubyCocoa lets me create a whole new program that opens its own windows, lets me drag files onto it, and all that other tasty Mac-alicious behavior.
But now I've got a an assignment to turbocharge the fundamental flow of information for a business. Enough with writing stuff down on paper, filing it, copying it into QuickBooks for billing, typing it into Word to print labels, retyping it into a spreadsheet for the catalog, and making somebody reformat it again to provide a list of product for the web site. That's insanely wasteful. It's time to fix that, so when somebody in the manufacturing dept. makes a new thing, QuickBooks automatically becomes aware of the change in inventory; the label printer figures out for itself what to list for the ingredients; the web site can reflect what's really in stock; the next catalog printed will figure out for itself what new products have arrived and which old ones have been discontinued.
I started working with data and databases by leaping headfirst into the deep end. No FoxPro or Access or FileMakerPro for me. No, my first real work with a database was Microsoft SQLServer 6.5, which Microsoft bought from a company that had been providing this thing for minicomputers. And the data that I needed it to hold was the data behind Alexandria Digital LIterature. Millions of recommendations from users; an incredibly complex relational design (a story can be written by one or more people, can have multiple titles, appear in myriad editions, be split into multiple volumes or appear as part of one volume, and so forth), and a particular query (the recommending query) that frequently needed to be run, and that required massive amounts of data to be re-analyzed every time.
One of the features I really came to appreciate about SQLServer is how protective, even paranoid, it could be about the data it was holding. The way data is actually organized in a database is known as its schema. SQLServer supported sophisticated tools like foreign key constraints (you aren't allowed to list person #554 as the author of this story unless there actually is an entry for person #554 in the 'author' table, and you are not allowed to erase person #554 unless and until there are no stories left that list them as an author), check constraints (a person's login name must have at least one character in it), unique constraints (no user can have the same username as another), transactions (these three changes to the database belong together. If we get to the third one, and it turns out that there's a problem, then the other two entries are undone as well, so that the entire group is all-or-nothing) and other tools. More than once, the web site suddenly would start throwing off error messages because there'd been a power blip or something had happened, and SQLServer locked down the entire database. "I cannot swear for certainty that every last bit of data is entirely intact, so I'm freezing everything. Nothing changes until I'm sure it's all good!"
I really came to appreciate that. It's really easy to make some little mistake when programming some data entry web page or when re-arranging information, and being able to give the database engine very specific and sophisticated rules about what was related to what, in what way, so that it could categorically refuse to accept a modification that would break the rules, was a great back-stop for human error. The schema is a critical foundation upon which all the utility of the data will be built, and enforcing the integrity of the data is vital. Errors can quietly accumulate until somebody finally notices that something funny's going on, and by then, it might be impossible to figure out what bits of data are bogus, poisoning the entire database.
Which brings me back to Ruby on Rails. I'll take a moment to name a couple of web sites in particular that were built using Ruby on Rails. Hulu. Twitter. So I've been all "oo, this is gonna be cool."
Well, that lasted about two hours. The more I worked on getting a copy of Rails up and running the more horrified and incredulous I became. You see, there's a really strong fundamental philosophical premise built into Rails. You can fling together a whole database-driven website in just a couple of hours because of the system known as "scaffolding." You basically say "OK, Rails, there's gonna be a thing called a User, and Users will have usernames, and realnames, and passwords, and email-addresses," and Rails leans over to the database and constructs a table to hold that information, and then it leans over the other way and builds a web page to show you that information. If you tell Rails that a password is a series of characters (at least six, no more than 100), then it knows how to tell the database what to do to store that information and what to tell the web server whenever it needs to present that information.
It's designed to work with a bunch of different database engines: Oracle, SQLServer, PostgreSQL, MySQL, SQLite, and more. As far as Rails is concerned, the database is just a box to throw things in and get them out again.
In order to do that, Rails needs to live at the intersection of the feature sets of these databases. I kept reading "How to get started with Ruby on Rails" tutorials that were so excited about the fact that you didn't need to know anything about how the data was actually stored. Rails took care of that for you! Don't worry about the fact that, because SQLlite can't do transactions or foreign key constraints, that Rails won't include that when building tables in any database, because you can tell Rails to check the data instead. And you can use one database when you're testing, and then a whole different one when you deploy the website! Isn't that cool?
If what you care about is getting a web site up and running in a weekend, then treating your database as if it were a bunch of Excel worksheets might be enough. But if what you care about is the integrity of your data, that's two and a half tons of horseshit. I'm using PostgreSQL as the database engine for this project because of the database engines I've listed so far, only Oracle, PostgreSQL, and SQLServer are adequately trustworthy. Access, FilemakerPro, and SQLite are toy databases, and MySQL is also missing a number of abilities I consider pretty important. Rails, since it isn't a database itself, cannot possibly provide the level of security and integrity that a real, industrial-strength database engine can, and (to its credit) it admits that. I'm just flabbergasted that so many programmers are so unconcerned about their databases getting all screwed up.
Derek Sievers wrote a blog entry entitled "7 reasons I switched back to PHP after 2 years on Rails" and in a comment by William Pietri, I read "Treating the database as the main focus, rather than an implementation detail is, from the Rails perspective, the wrong approach. And Rails is very opinionated software; it does its thing well, but if you want it to do something else, it's not a good match."
Whew. It's not just me.
In the years I've worked with the AlexLit database, despite my attempts to construct and enforce data integrity, I've still seen an awful lot of anomalies get in there. At one point, my database was certain that we'd had three months's worth of people using the site three years in the future. (I think the server's clock had gotten screwed up.) I had some sales records that showed the purchaser as nonexistant. ("No, 'null' is NOT an acceptable value for 'sold to'!") Hardware is so fast, and PostgreSQL and MySQL are so affordable (i.e 'free'), that I think it's remarkably stupid to blow off all the functionality that they offer in order to be 'platform agnostic.'
I am amazed. Astounded. And unquestionably, I'm off the Rails.
Current Mood: flummoxed
|Date:||August 25th, 2010 11:17 pm (UTC)|| |
Maybe. I'd have to say, though, that 60% of the crappiness that I've run across is related to not embracing normalization. "Never put the same piece of data in the database more than once." 38% of the remaining crappiness is failing to install the proper constraints. OTOH, a fully normalized database means that when it's time to collect various bits of information together to present to the users, the command to get that info might be a multi-table monstrosity. So programmers *don't* commit to fully normalizing because they don't want to have to deal with having to write the queries required to get the data back out. Instead, somebody else years later gets to pull their hair out when *their* query doesn't work because the data integrity has been compromised.
I'm still in the 'data first' camp, although I have sympathy for the monster-query-phobic people. That's what Views are for, and boy have I been putting them to work on my current project.
The other potential downside to full normalization is performance. Thoughtful, careful indexing is absolutely essential to avoid bogging down. And using a competent DB engine. I have a query I've been running on SQLServer for years, and have ported to Postgres. I tried it on MySQL with the same indexing structure, and what the other two DBs did in two to four seconds, MySQL took 45 minutes to do. It was a seven-table join. I am *so* not impressed with MySQL.