{"rowid": 186, "title": "The Web Is Your CMS", "contents": "It is amazing what you can do these days with the services offered on the web. Flickr stores terabytes of photos for us and converts them automatically to all kind of sizes, finds people in them and even allows us to edit them online. YouTube does almost the same complete job with videos, LinkedIn allows us to maintain our CV, Delicious our bookmarks and so on.\n\nWe don\u2019t have to do these tasks ourselves any more, as all of these systems also come with ways to use the data in the form of Application Programming Interfaces, or APIs for short. APIs give us raw data when we send requests telling the system what we want to get back.\n\nThe problem is that every API has a different idea of what is a simple way of accessing this data and in which format to give it back.\n\nMaking it easier to access APIs\n\nWhat we need is a way to abstract the pains of different data formats and authentication formats away from the developer \u2014 and this is the purpose of the Yahoo Query Language, or YQL for short. \n\nLibraries like jQuery and YUI make it easy and reliable to use JavaScript in browsers (yes, even IE6) and YQL allows us to access web services and even the data embedded in web documents in a simple fashion \u2013 SQL style.\n\nSelect * from the web and filter it the way I want\n\nYQL is a web service that takes a few inputs itself:\n\n\n\tA query that tells it what to get, update or access\n\tAn output format \u2013 XML, JSON, JSON-P or JSON-P-X\n\tA callback function (if you defined JSON-P or JSON-P-X)\n\n\nYou can try it out yourself \u2013 check out this link to get back Flickr photos for the search term \u2018santa\u2019*%20from%20flickr.photos.search%20where%20text%3D%22santa%22&format=xml in XML format. The YQL query for this is \n\nselect * from flickr.photos.search where text=\"santa\"\n\nThe easiest way to take your first steps with YQL is to look at the console. There you get sample queries, access to all the data sources available to you and you can easily put together complex queries. In this article, however, let\u2019s use PHP to put together a web page that pulls in Flickr photos, blog posts, Videos from YouTube and latest bookmarks from Delicious.\n\nCheck out the demo and get the source code on GitHub.\n\n<?php\n  /* YouTube RSS */\n  $query = 'select description from rss(5) where url=\"http://gdata.youtube.com/feeds/base/users/chrisheilmann/uploads?alt=rss&v=2&orderby=published&client=ytapi-youtube-profile\";';\n  /* Flickr search by user id */\n  $query .= 'select farm,id,owner,secret,server,title from flickr.photos.search where user_id=\"11414938@N00\";';\n  /* Delicious RSS */\n  $query .= 'select title,link from rss where url=\"http://feeds.delicious.com/v2/rss/codepo8?count=10\";';\n  /* Blog RSS */\n  $query .= 'select title,link from rss where url=\"http://feeds.feedburner.com/wait-till-i/gwZf\"';\n  /* The YQL web service root with JSON as the output */\n  $root = 'http://query.yahooapis.com/v1/public/yql?format=json&env=store%3A%2F%2Fdatatables.org%2Falltableswithkeys';\n  /* Assemble the query */\n  $query = \"select * from query.multi where queries='\".$query.\"'\";\n  $url = $root . '&q=' . urlencode($query);\n  /* Do the curl call (access the data just like a browser would) */\n  $ch = curl_init(); \n  curl_setopt($ch, CURLOPT_URL, $url); \n  curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); \n  curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);\n  curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, false);\n  $output = curl_exec($ch); \n  curl_close($ch);\n  $data = json_decode($output);\n  $results = $data->query->results->results;\n  /* YouTube output */\n  $youtube = '<ul id=\"youtube\">';\n  foreach($results[0]->item as $r){\n\t$cleanHTML = undoYouTubeMarkupCrimes($r->description);\n\t$youtube .= '<li>'.$cleanHTML.'</li>';\n  }\n  $youtube .= '</ul>';\n  /* Flickr output */\n  $flickr = '<ul id=\"flickr\">';\n  foreach($results[1]->photo as $r){\n\t$flickr .= '<li>'.\n\t\t\t   '<a href=\"http://www.flickr.com/photos/codepo8/'.$r->id.'/\">'.\n\t\t\t   '<img src=\"http://farm' .$r->farm . '.static.flickr.com/'.\n\t\t\t   $r->server . '/' . $r->id . '_' . $r->secret . \n\t\t\t   '_s.jpg\" alt=\"'.$r->title.'\"></a></li>';\n  }\n  $flickr .= '</ul>';\n  /* Delicious output */\n  $delicious = '<ul id=\"delicious\">';\n  foreach($results[2]->item as $r){\n\t$delicious .= '<li><a href=\"'.$r->link.'\">'.$r->title.'</a></li>';\n  }\n  $delicious .= '</ul>';\n  /* Blog output */\n  $blog = '<ul id=\"blog\">';\n  foreach($results[3]->item as $r){\n\t$blog .= '<li><a href=\"'.$r->link.'\">'.$r->title.'</a></li>';\n  }\n  $blog .= '</ul>';\n  function undoYouTubeMarkupCrimes($str){\n\t$cleaner = preg_replace('/555px/','100%',$str);\n\t$cleaner = preg_replace('/width=\"[^\"]+\"/','',$cleaner);\n\t$cleaner = preg_replace('/<tbody>/','<colgroup><col width=\"20%\"><col width=\"50%\"><col width=\"30%\"></colgroup><tbody>',$cleaner);\n\treturn $cleaner;\n  }\n?>\n\nWhat we are doing here is create a few different YQL statements and queue them together with the query.multi table. Each of these can be run inside YQL itself. Check out the YouTube, Flickr, Delicious and Blog example in the console if you don\u2019t believe me. The benefit of using this table is that we don\u2019t make individual requests for each query but we get all the data in one single request \u2013 which means a much better performing solution as the YQL server farm is faster on the web than our servers.\n\nWe point the query to the YQL web service end point and get the resulting data using cURL. All that we need to do then is to convert the returned data to HTML lists that can be printed out inside an HTML template.\n\nMixing, matching and using HTML as a data source\n\nThis was a simple example of what YQL can do for you. Where it gets really powerful however is by mixing and matching different APIs. YQL is also a good tool to get information from HTML documents. By using the html table you can load the content of an HTML document (which gets fixed automatically by HTMLTidy) and use XPATH to filter down results to what you need. Take the following example which takes headlines from the news.bbc.co.uk homepage and runs the results through Yahoo\u2019s Term Extractor API to give you a list of currently hot topics.\n\nselect * from search.termextract where context in (\n  select content from html where url=\"http://news.bbc.co.uk\" and xpath=\"//table[@width=800]//a\"\n)\n\nTry it out in the console or see the results here. In English, this means:\n\n\n\tGo to http://news.bbc.co.uk and get me the HTML\n\tRun it through HTML Tidy to clean it up.\n\tGet me only the links inside the table with an attribute of width and the value 800\n\tGet only the content of the link and for each of the links\n\t\n\t\tTake the content and send it as context to the Yahoo Term Extractor API\n\t\n\t\n\nIf we choose JSON-P as the output format we can use the outcome directly in JavaScript (see this demo or see its source):\n\n<ul id=\"hottopics\"></ul>\n<script type=\"text/javascript\">\nfunction hottopics(o){\n  var res = o.query.results.Result,\n\t  all = res.length,\n\t  topics = {},\n\t  out = [],\n\t  html = '',\n\t  i=0;\n  /* create hash from topics to prevent repetition */\t \n  for(i=0;i<all;i++){\n\ttopics[res[i]] = res[i];\n  };\n  for(i in topics){\n\tout.push(i);\n  };\n  html = '<li>' + out.join('</li><li>') + '</li>';\n  document.getElementById('hottopics').innerHTML = html;\n};\n</script>\n<script type=\"text/javascript\" src=\"http://query.yahooapis.com/v1/public/yql?q=select%20content%20from%20search.termextract%20where %20context%20in%20(select%20content%20from%20html%20where%20url%3D%22http%3A%2F%2Fnews.bbc.co.uk%22%20and%20xpath%3D%22%2F%2Ftable%5B%40width%3D800%5D%2F%2Fa%22)&format=json&callback=hottopics\"></script>\n\nUsing JSON, we can also use PHP which means the demo works for everybody \u2013 not only those with JavaScript enabled (see this demo or see its source):\n\n<ul id=\"hottopics\"><li>\n<?php\n$url = 'http://query.yahooapis.com/v1/public/yql?q=select%20content'.\n\t   '%20from%20search.termextract%20where%20context%20in'.\n\t   '%20(select%20content%20from%20html%20where%20url%3D%22'.\n\t   'http%3A%2F%2Fnews.bbc.co.uk%22%20and%20xpath%3D%22%2F%2F'.\n\t   'table%5B%40width%3D800%5D%2F%2Fa%22)&format=json';\n$ch = curl_init(); \ncurl_setopt($ch, CURLOPT_URL, $url); \ncurl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); \ncurl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);\ncurl_setopt($ch, CURLOPT_SSL_VERIFYHOST, false);\n$output = curl_exec($ch); \ncurl_close($ch);\n$data = json_decode($output);\n$topics = array_unique($data->query->results->Result);\necho join('</li><li>',$topics);\n?>\n</li></ul>\n\nSummary\n\nThis article could only scratch the surface of YQL. You have not only read access to the web but you can also write to web services. For example you can update Twitter, post to your WordPress blog or shorten a URL with bit.ly. Using Open Tables you can add any web service to the YQL interface and you can even run server-side JavaScript which is for example useful to return Flickr photos as HTML or get the HTML content from a document that needs POST data.\n\nThe web of data is already here, and using YQL you don\u2019t have to be a web services expert to use it and be part of it.", "year": "2009", "author": "Christian Heilmann", "author_slug": "chrisheilmann", "published": "2009-12-17T00:00:00+00:00", "url": "https://24ways.org/2009/the-web-is-your-cms/", "topic": "code"}