I was thinking about how facebook updates parse out the url, then construct a nice looking preview. I think what it takes is scanning the textarea for a url, and then using ajax to tell the server to fetch a summary of the page at the url. get the data back, and insert it into a template, and reveal it.
This is a republication of a post from January 2012. I would not write it this way today.
This script does much of the server-side cleverness. What’s interesting is that you can get a pretty good summary using some simple heuristics. The “semantic web” doesn’t (yet) exist, but the typical article has a lot of metadata, H1 and H2 tags, and a big block of text for the article. This script manages to get that most of the time.
Also, as of PHP 5.3, we have a GOTO statement. Finally! While goto is considered harmful, it’s the right tool for dealing with state machines.
A lot of functions were cribbed from PHP.net.