about me

Nathan McWilliams, a 23 year old web developer living in San Francisco.

work

At my first full-time job with salesforce.com. Before that, I was a graduate student at Carnegie Mellon University.

not work

My latest project is a website + iphone app that will help you learn scriptures.

music

One of my hobbies is creating music on the piano. It's amateur at best, but it's all original. Listen in.

words

I like to write both articles and short stories. Take a look.

portfolio

My work ranges from complete websites to forward-looking business analysis. Explore.

Stream

August 31

August 29

August 23

6:21am

Top track for past 2 months: Kazham


6:20am

Top artist for past 5 weeks: Keiko Matsui

August 8

August 5

July 24

9:35pm

Loading large SQL data files in Rails 3

So what's the best way to load large sql data sets in rails apps?

I've just started learning ruby on rails, and I'm trying to port one of my existing sites to it. Part of the conversion is dumping and importing a rather large set of MySql tables. These tables contain the text for whole books. In some cases, we are talking 30,000+ records.

This made it unfeasible to use the existing conventions I've found so far for bulk loading, or "seeding" data into the database. Using any kind of fixture or seed mechanism would be too slow; I really just needed to load the SQL statements directly into the database.

However I wanted to do this all through rails, using a rake task if possible

Rails 3 comes with a seeds.rb file under /db which lets you run arbitrary Ruby code. This is loaded when you run the task "rake db:seed"

Using that, I wrote some code that will read in sql files and load them directly into the database. Here's what I put in my seeds.rb file:

base_dir = "#{RAILS_ROOT}/db/data"

bulk_load = [Book, Chapter, Verse]

bulk_load.each do |table|
  if table.first.nil? # only bulk load into empty tables
    f = File.new "#{base_dir}/#{table.table_name}.sql"
    
    while statements = f.gets("") do
      ActiveRecord::Base.connection.execute(statements)
    end  
  end
end

Basically, I'm creating an array of several model classes that need to load bulk data. I loop through this array, and if the model doesn't have any records I look for an sql file with the same name as the table.

For example, the Book model will look for books.sql. The sql file itself comes directly from a MySql dump.

Now one hangup is that for my larger data sets, it was just too much to load the file into memory all at once. Fortunately the MySql dump uses multiple INSERT statements for every 350 records or so.

Using a basic regex expression, I inserted an extra newline "\n" before each INSERT statement in the sql file.

The line f.gets("") will actually read the file in chunks, separating on two newline characters (\n\n). Each chunk is then executed directly with the ActiveRecord's execute function.

Ultimately, this lets me populate my tables with the contents of my books all from the command line using rake db:seed.

I wonder how this fits into "the Rails way".

July 22

July 21

July 15

4:42pm

Top track for past 3 weeks: Somnia Memorias (Parasite Eve)


4:42pm

Top album for past 1 week: NKJV Audio Bible: Fully Dramatized

June 20

3:30pm

Officer blocked newlyweds from ER during bride's stroke
Dugg in Odd Stuff

June 19

10:20pm

How defensive can deers be? (video)
Dugg in Pets & Animals


10:12pm

Actual map of LOST island
Dugg in Television


June 13

3:33am

Top album for past 5 months: NKJV Audio Bible: Fully Dramatized

June 12

11:09am

Top artist for past 1 week: Keiko Matsui

June 4

6:51pm

Caught in the oil - The Big Picture - Boston.com
Dugg in Environment

May 21

4:06pm

Top track for past 4 months: Somnia Memorias (Parasite Eve)

May 11

6:07pm

Organic Potatoes FAIL (pic)
Dugg in Food & Drink


6:02pm

Did He Find Brains In His KFC?
Dugg in Food & Drink