Media72 Hosting Articles and Tips
Archive for November, 2007
Domain Registration Site Down
Thursday, November 29th, 2007Rails Development on Windows
Thursday, November 29th, 2007Processing HTML with Hpricot
Wednesday, November 21st, 2007In this world of Web2.0 mashups and easy API access, it is quite refreshing how easy it is to pull data for third party sites and re-mash it into something new. Unfortunately, not everyone has been bitten by this bug, so we as developers sometimes have to do a little more leg work to get the information we need. A common technique is called a screen scrape where your application acts like a browser and parses the HTML returned from the third party server.
Although this should be simple enough, anyone who has ever tried to do this knows the pain of dancing with regular expressions in an attempt to find the the tags that you need. Luckily, us rubyists have the Hpricot library which takes the hard work out of parsing HTML. Hpricot allows developers to access html elements via CSS-selectors and X-Path, so you can target specific tags really easily. And because it is written in C, it is pretty fast too.
Installation
Hpricot is a gem, so installation is as easy as:
gem install hpricot
The just require the library at the top of the ruby file:
require 'hpricot'
Usage
Lets take this HTML snippet:
<html>
<head>
<title>Snippet</title>
</head>
<body>
<div id="container">
<div id="navigation">
<ul>
<li><a href="/">Home</a></li>
<li><a href="/contact></a></li>
</ul>
</div>
<div id="sub-content">
<p>This would be some sort of sidebar</p>
</div>
<div id="content">
<p>This is paragraph 1</p>
<p>This is paragraph 2</p>
</div>
</div>
</body>
</html>
We can easily pull out the content of the paragraphs by doing this (Let’s assume the HTML is already stored in the variable @html)
doc = Hpricot(@html)
pars = Array.new
doc.search("div[@id=content]/p").each do |p|
pars << p.inner_html
end
Yep - that’s it. You now have an array with two elements that are the same as the copy in the two p tags. Notice that the p tag in the sub-content div isn’t pulled in?
It doesn’t end there though, you can also manipulate the HTML - which can come in handy if you wanted to, say, create a quick and dirty mobile version. Let’s say we wanted to remove the sub-content div from the mobile version, we could do this:
doc = Hpricot(@html)
doc.search("div[@id=sub-content]").remove
puts doc
The resultant HTML no longer has a div called sub-content!
To add a new class to the navigation ul is as simple as:
doc = Hpricot(@html)
doc.search("div[@id=navigation]/ul").set("class", "nav")
This is just the tip of the iceberg - the library is really powerful and simple to use. Go and check out the official page for more (less trivial) examples.
Disclaimer: You should make sure you have permission for the website owner before screen-scraping their site.
Preparing for Rails 2.0: Controller-based exception handling
Wednesday, November 7th, 2007Since Ruby is a pure Object-Oriented Language, exceptions play a big role in the flow of control. Previously, you had the choice of rescuing exceptions at a local level or you could override the rescue_action method in your controller.
The former method gave you really fine-grained control of what to do in the case of an exception:
begin
user.save!
rescue ActiveRecord::RecordInvalid
render :action => 'new'
end
In this case, if the ActiveRecord::RecordInvalid exception is raised (the save! method will raise this if validation fails) Rails will render the ‘new’ action. It became clear, though that adding begin/rescues around the same methods is pretty time consuming and not very DRY - which is where the rescue_action became helpful:
def rescue_action(exception)
if exception == ActionView::TemplateError
render :template => 'errors/404'
else
super
end
end
This (rather contrived) example will trap any ActionView::TemplateError and render the 404.erb file in the /app/views/errors directory. You are able to drop that method into any controller (including app_controller), but again, there is a lot of work involved in setting them up, making sure each controller performs the correct action for a given exception, which is why Rails 2.0 introduces rescue_from.
rescue_from is an attribute of each controller, and allows you to define a method to run when a particular exception is called. So the above example would become:
rescue_from ActionView::TemplateError, :with => :render_404
def render_404(exception)
render :template => 'errors/404'
end
or if you prefer to use a inline block:
rescue_from ActionView::TemplateError do { render :template => 'errors/404' }
In the current PR release, the rescue_from technique can only catch exceptions of an exact type, so it wouldn’t catch sub-classed exceptions - if you had a custom exception called MyTemplateError that extends ActionView::TemplateError the above code won’t work. The good news is a patch to edge rails fixes this issue, so the release version of Rails 2.0 will work as expected.
This article provided by sitepoint.com.

All services are subject to our terms & conditions


