So what's the best way to load large sql data sets in rails apps?
I've just started learning ruby on rails, and I'm trying to port one of my existing sites to it. Part of the conversion is dumping and importing a rather large set of MySql tables. These tables contain the text for whole books. In some cases, we are talking 30,000+ records.
This made it unfeasible to use the existing conventions I've found so far for bulk loading, or "seeding" data into the database. Using any kind of fixture or seed mechanism would be too slow; I really just needed to load the SQL statements directly into the database.
However I wanted to do this all through rails, using a rake task if possible
Rails 3 comes with a seeds.rb file under /db which lets you run arbitrary Ruby code. This is loaded when you run the task "rake db:seed"
Using that, I wrote some code that will read in sql files and load them directly into the database. Here's what I put in my seeds.rb file:
base_dir = "#{RAILS_ROOT}/db/data"
bulk_load = [Book, Chapter, Verse]
bulk_load.each do |table|
if table.first.nil? # only bulk load into empty tables
f = File.new "#{base_dir}/#{table.table_name}.sql"
while statements = f.gets("") do
ActiveRecord::Base.connection.execute(statements)
end
end
end
Basically, I'm creating an array of several model classes that need to load bulk data. I loop through this array, and if the model doesn't have any records I look for an sql file with the same name as the table.
For example, the Book model will look for books.sql. The sql file itself comes directly from a MySql dump.
Now one hangup is that for my larger data sets, it was just too much to load the file into memory all at once. Fortunately the MySql dump uses multiple INSERT statements for every 350 records or so.
Using a basic regex expression, I inserted an extra newline "\n" before each INSERT statement in the sql file.
The line f.gets("")
will actually read the file in chunks, separating on two newline characters (\n\n). Each chunk is then executed directly with the ActiveRecord's execute function.
Ultimately, this lets me populate my tables with the contents of my books all from the command line using rake db:seed.
I wonder how this fits into "the Rails way".