Cached Externals
December 3rd, 2008
Jan. 14, 2009 Update: The techniques discussed in this post are still relevant but the externals only apply to Mephisto 0.8. See Upgrading to Mephisto 0.8.1 for information about the externals that I'm currently caching.
Cached Externals is a Rails plugin that extends Capistrano to speed up deployments by not deploying vendor gems and plugins unless they change. The way it accomplishes the performance gain also results in explicit management of the cached gems and plugins, something worthwhile in it's own right. In this this post I'll walk you through installation and use of the plugin by describing how I used it for this Mephisto blog application. gitting Started with Mephisto describes how the application was set-up and Deploying Mephisto with Capistrano to DreamHost provides the pre-Cached Externals deployment details. Basically the app is just a customized copy of Mephisto in a git repository. The vendor directory holds Rails 2.0.2 and TZInfo along with all the gems and plugins that are part of the Mephisto distribution. The repository is kept locally on a laptop so Capistrano's copy strategy is used to deploy it to the production server.
Using Cached Externals
The first step is is to install Cached Externals. It's a Rails plugin on github so if your using any Rails version greater than 2.1.0 you can install it directly:
[mac]$ script/plugin install git://github.com/37signals/cached_externals.git
but since the Mephisto implementation is using Rails 2.0.2, I had to install it manually:
[mac]$ git clone --depth 1 git://github.com/37signals/cached_externals.git \ > vendor/plugins/cached_externals [mac]$ rm -rf vendor/plugins/cached_externals/.git
Once the plugin is installed it's just a matter of following the instructions in the README.rdoc and deciding which externals to cache. Capistrano 2 is the minimum requirement, this blog is deployed using 2.5.2, so no problem there. The next step is to check to make sure this line is in the Capify file:
Dir['vendor/plugins/*/recipes/*.rb'].each { |plugin| load(plugin) } |
It was in mine. If it's not in yours then you'll need to add the line or rerun capify .
To support an external gem or plugin you need to let Cached Externals know about it by creating an entry for it in a config/externals.yml file. The gem or plugin needs to be in an external repository that's accessible locally and on the deployment target. The YAML entry for each external contains information about the type of repository, where it is and which revision is required.
If you know the repository and revision information for your gems and plugins then creating the YAML file should be easy. If you don't have or don't want to list all of the information then start with the big externals. It's a good idea to have an explicit list of this information for all of your gems and plugins but you won't see nearly as much of a performance benefit from the smaller externals. For my Mephisto application I'd installed the Rails and the TZInfo gems myself but the other gems and plugins came bundled as part of Mephisto so I didn't have any of the revision information and, since I'm only merging in the Mephisto master branch and not maintaining the application details, I was only interested in performance gains, not the management benefit. Digging up data for all the gems and plugins would be more effort than it was worth, especially for some of those unversioned plugins. So, instead of trying to cache everything, I did a check for low hanging fruit:
[mac]$ cd vendor [mac]$ du -sk * [mac]$ cd plugins [mac]$ du -sk * [mac]$ cd ../..
Of course Rails was the monster at about 9.8M, TZInfo was 3.8M, and the other 4 gems only totaled about 700K with the largest being 336K. All 25 plugins totaled 4.8M with RSpec weighing in at 2.1M, RSpec on Rails about 600K and the next largest plugin was only about 300K. So if I only cached 4 of 31 externals: Rails, TZInfo, RSpec and RSpec on Rails, I could cover 16.3 of 19.1 MB.
After finding the repositories on github and RubyForge (I'll describe how I found the exact revision information later) I created this config/externals.yml file:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
vendor/rails: # v2.0.2 :type: git # DH old git checkout (1.4.4.4) doesn't support -q :scm_verbose: true :repository: git://github.com/rails/rails.git :revision: e454cf721ea2deca1b1b991339a5d30161dcfb9c vendor/tzinfo-0.3.12: :type: subversion :repository: http://tzinfo.rubyforge.org/svn/trunk/tzinfo :revision: "210" vendor/plugins/rspec: # pre 1.1.0 :type: subversion :repository: http://rspec.rubyforge.org/svn/trunk/rspec :revision: "2962" vendor/plugins/rspec_on_rails: # pre 1.1.0 :type: subversion :repository: http://rspec.rubyforge.org/svn/trunk/rspec_on_rails :revision: "2962" |
The file assumes that Subversion and git are installed on both the local deployment PC and on the deployment target. If you aren't sure if you have the correct source code control software on the client and target then check for it. For example, I used git --version and svn --version on my hosting service to see what was installed. If you're missing an SCM that you need then either install it, delete the appropriate entries in your externals.yml or move the gem or plugin to another external repository that you can support.
Here are some additional notes about the YAML file:
- Cached Externals creates a new SCM instance for each repository. So any SCM option variables that are normally set in
deploy.rbneed to be set for each cached item. - In my case
:scm_verbosehad to be set because my DreamHost server has an old verion of git (1.4.4.4) and the -q (--quiet) flag that's used in git-checkout when:scm_verboseisn't set won't work with old versions of git. - I used the SHA1 code for the Rails revision but it's a tagged release so
v2.0.2could have been used instead. - I always use the
http://link for RubyForge subversion repositories. Thesvn://transport to RubyForge isn't reliable for me but thehttp://transport works fine. I've never investigated why but I do know it's a common issue.
The next step is to remove externals referenced in the externals.yml from the vendor directory. If your following along then at this point I'd recommend you're prepared for any problems by having a clean repository (everything's committed or stashed) or by making a backup copy of the vendor items before doing this:
[mac]$ rm -rf vendor/rails [mac]$ rm -rf vendor/tzinfo-0.3.12 [mac]$ rm -rf vendor/plugins/rspec [mac]$ rm -rf vendor/plugins/rspec_on_rails
Somehow that feels uncomfortable, but no worries, the files will get replaced when the local cached copies are retrieved. Well, actually the files will be placed in a ../shared/externals/vendor directory and symlinks to them will be created in the proper locations in the vendor directory. Yes, the shared directory will be at the same level as the application directory. Here's how to load up the cache:
[mac]$ cap local externals:setup
That will result in a big retrieval so after it's done it's a good time to make sure everything looks correct with git status. Here's what I saw with git status after loading the local cache using the externals.yml file from above:
[mac]$ show git status # On branch robseaman # Changed but not updated: # (use "git add/rm..." to update what will be committed) # # deleted: vendor/rails/REVISION_5fa0457542b0ff541d0a80ff8c3561eec8e35959 # modified: vendor/tzinfo-0.3.12/lib/tzinfo/ruby_core_support.rb # modified: vendor/tzinfo-0.3.12/test/tc_ruby_core_support.rb # # Untracked files: # (use "git add ..." to include in what will be committed) # # vendor/plugins/rspec # vendor/plugins/rspec_on_rails # vendor/rails # vendor/tzinfo-0.3.12 no changes added to commit (use "git add" and/or "git commit -a")
The untracked files were the symlinks, no problem. The deleted rails/REVISION_5fa045... was just an empty touch file that was created when Rails was frozen into the vendor directory. It doesn't indicate that the 5fa04575 revision should have used in the externals.yml file. The 5fa04575... revision number is the SHA1 of the latest commit on github at the time when Rails was downloaded. There were no other issues in the Rails directory so it looked good. Ah, but the TZinfo files looked like trouble, so I used git-diff to find out:
[mac]$ git diff
diff --git a/vendor/rails/REVISION_5fa0457542b0ff541d0a80ff8c3561eec8e35959 \
b/vendor/rails/REVISION_5fa0457542b0ff541d0a80ff8c3561eec8e35959
deleted file mode 100644
index e69de29..0000000
diff --git a/vendor/tzinfo-0.3.12/lib/tzinfo/ruby_core_support.rb \
b/vendor/tzinfo-0.3.12/lib/tzinfo/ruby_core_support.rb
old mode 100644
new mode 100755
diff --git a/vendor/tzinfo-0.3.12/test/tc_ruby_core_support.rb \
b/vendor/tzinfo-0.3.12/test/tc_ruby_core_support.rb
old mode 100644
new mode 100755
Hmm, the only difference was the addition of execute permission on the files. I'm not sure why the settings were different but it didn't seem like a big deal to me so I considered it good.
I was waiting to discuss how I found external revision numbers until I covered cap local externals:setup, git status and git diff because they were a big part of my search. I started by picking likely candidates. For Rails this was just a matter of using the tags drop-down on github and selecting the revision for v2.0.2. For TZInfo I used svnX to browse the repository on RubyForge and the 0.3.12 release comments were near the top. For the RSpec plugins I examined CHANGES file in Mephisto's vendor copy which led me to 1.1.0. I found 1.1.0 and other old versions of RSpec and RSpec on Rails in a Subversion repository on RubyForge, so I used svnX again and plugged the 1.1.0 release revision in externals.yml. Then I ran cap local externals:setup and used a combination of git-status and git-diff to see what was different. Rails and TZInfo matched but the RSpec plugins didn't. There were enough differences that I ran git-diff on one file at time, trying to pick the file most likely to help me find the correct version. From the diff of the CHANGES file I could tell that Mephisto was using a pre-1.1.0 edge version. Looking at checkin history with svnX gave me a pretty good idea how far back to go, so I trashed the ../shared/externals/vendor/plugins directory, re-ran cap local externals:setup and checked the differences again. The CHANGES file didn't have every detail so I ended up having to do this a few times, narrowing it down each time until I found a match.
Given that I was setting this up for an application that I knew little about, I was lucky. If I'd have wanted to do this for a gem or plugin that had been customized I would have had to put the custom version in an external repository like github or left it out of the cache. Also, all of the externals I looked for were on github or RubyForge, were well tagged and were good about documenting version information in their distribution. Searching for the RSpec revision gave me an clue about how much of a pain this could have been if good tagging and release documentation practices hadn't been followed. The lesson here is to not wait until final deployment before you start to keep track of externals; do it as the externals are installed.
After you're satisfied with your shared cache you'll want to commit the results to source control. For me that meant:
[mac]$ git add . [mac]$ git commit -a -m "Cached Externals: rails, tzinfo & rspec/rspec_rails"
cap local externals:setup will need to be run anytime externals.yml is updated. And, if you're using git, there's one last related housekeeping item. Cached Externals includes post-checkout and post-merge git hooks to automatically makes sure the correct symlinks are used. They work by running cap local externals:setup when switching between branches, pulling and merging. Before installing the hooks check (using ls .git/hooks) to make sure that you don't already have your own versions of the hooks. I didn't have either so I used the Cached Externals rake task to install the hooks:
[mac]$ rake git:hooks:install
If you already have one or both of the hooks then you'll need to merge in the Cached Externals versions from the plugin's script/git-hooks directory or decide whether you want your old hooks or the Cached Externals hooks.
At this point I did a git-tag to mark the application as a release and then did a new deployment:
$ cap deploy
It's not mandatory to deploy right after setting up Cached Externals but I wanted to isolate it as a separate release in case anything went wrong. The first deployment was slower because it has to clone the external repositories. I also noticed false error reporting on the git status messages. You'll probably see the same thing; there will be a lot of messages that look like this:
... *** [err :: blog.robseaman.com] 11% (186/1690) done *** [err :: blog.robseaman.com] 12% (203/1690) done *** [err :: blog.robseaman.com] 13% (220/1690) done ...
It looks bad but it's not a problem. It just indicates git must be using the error stream for status reporting. Of course it's a good idea to do some verification that the symlinks and directories look correct after the deployment. When I checked my results everything was fine.
After the first run things should be fast until one or more externals need updating. For the Mephisto blog my deployment gzip file went from 3.2 MB to under 1 MB with what seemed to be a corresponding time savings.
Additional Information
Cached External's main purpose is to speed up deployments but the way it accomplishes this inherently helps you track and share the externals that you decide to cache. Externals management is a category where there are already several solutions. Piston, Braid, the subtree merge strategy, git submodules and svn:externals are some of the more popular ways to manage the vendor externals. Gary Lam has put together a github repository for his Dependancy Management Talk with a Keynote presentation, example implementations and notes that cover Cached Externals, the solutions I mentioned and others.
Gary's repository is a great resource and this note in his notes.markdown file caught my attention:
* script/generate has issues with symlinks? did not detect generators from symlinked plugins
When I first read Gary's note I did try the rspec_model generator script from the rspec_on_rails plugin after the symlinks were in place and it worked fine. After looking into it more I found out that the problem Gary refers to is an issue with some versions of Rails. It looks like the problem was fixed in February 2007, inadvertently broken as part of another fix in March 2008 and finally fixed again in June, 2008. Going by dates it would mean generators in symlinked plugins worked from Rails 1.2.3 to 2.0.2, were broken in Rails 2.0.3 and have worked fine since 2.1.1. The Mephisto blog uses Rails 2.0.2, so it's not surprising that I didn't see the issue. If you run into this problem then you'll need to update your Rails version, eliminate the affected plugin from the cache or temporarily replace the symlink with the real thing when you use a generator.
Jamis Buck, the author of Cached Externals, discussed the importance of deployment performance and how Cached Externals is different than other Capistrano caching techniques in his CachedExternals: managing application dependencies post on the Signal vs. Noise blog. Finally, I should mention that I found out about Cached External from a Rails Envy podcast.
Sorry, comments are closed for this article.