{"rowid": 16, "title": "URL Rewriting for the Fearful", "contents": "I think it was Marilyn Monroe who said, \u201cIf you can\u2019t handle me at my worst, please just fix these rewrite rules, I\u2019m getting an internal server error.\u201d Even the blonde bombshell hated configuring URL rewrites on her website, and I think most of us know where she was coming from.\n\nThe majority of website projects I work on require some amount of URL rewriting, and I find it mildly enjoyable \u2014 I quite like a good rewrite rule. I suspect you may not share my glee, so in this article we\u2019re going to go back to basics to try to make the whole rigmarole more understandable.\n\nWhen we think about URL rewriting, usually that means adding some rules to an .htaccess file for an Apache web server. As that\u2019s the most common case, that\u2019s what I\u2019ll be sticking to here. If you work with a different server, there\u2019s often documentation specifically for translating from Apache\u2019s mod_rewrite rules. I even found an automatic converter for nginx.\n\nThis isn\u2019t going to be a comprehensive guide to every URL rewriting problem you might ever have. That would take us until Christmas. If you consider yourself a trial-and-error dabbler in the HTTP 500-infested waters of URL rewriting, then hopefully this will provide a little bit more of a basis to help you figure out what you\u2019re doing. If you\u2019ve ever found yourself staring at the white screen of death after screwing up your .htaccess file, don\u2019t worry. As Michael Jackson once insipidly whined, you are not alone.\n\nThe basics\n\nRewrite rules form part of the Apache web server\u2019s configuration for a website, and can be placed in a number of different locations as part of your virtual host configuration. By far the simplest and most portable option is to use an .htaccess file in your website root. Provided your server has mod_rewrite available, all you need to do to kick things off in your .htaccess file is:\n\nRewriteEngine on\n\nThe general formula for a rewrite rule is:\n\nRewriteRule URL/to/match URL/to/use/if/it/matches [options]\n\nWhen we talk about URL rewriting, we\u2019re normally talking about one of two things: redirecting the browser to a different URL; or rewriting the URL internally to use a particular file. We\u2019ll look at those in turn.\n\nRedirects\n\nRedirects match an incoming URL, and then redirect the user\u2019s browser to a different address. These can be useful for maintaining legacy URLs if content changes location as part of a site redesign. Redirecting the old URL to the new location makes sure that any incoming links, such as those from search engines, continue to work. \n\nIn 1998, Sir Tim Berners-Lee wrote that Cool URIs don\u2019t change, encouraging us all to go the extra mile to make sure links keep working forever. I think that sometimes it\u2019s fine to move things around \u2014 especially to correct bad URL design choices of the past \u2014 provided that you can do so while keeping those old URLs working. That\u2019s where redirects can help.\n\nA redirect might look like this\n\nRewriteRule ^article/used/to/be/here.php$ /article/now/lives/here/ [R=301,L]\n\nRewriting\n\nBy default, web servers closely map page URLs to the files in your site. On receiving a request for http://example.com/about/history.html the server goes to the configured folder for the example.com website, and then goes into the about folder and returns the history.html file.\n\nA rewrite rule changes that process by breaking the direct relationship between the URL and the file system. \u201cWhen there\u2019s a request for /about/history.html\u201d a rewrite rule might say, \u201cuse the file /about_section.php instead.\u201d\n\nThis opens up lots of possibilities for creative ways to map URLs to the files that know how to serve up the page. Most MVC frameworks will have a single rule to rewrite all page URLs to one single file. That file will be a script which kicks off the framework to figure out what to do to serve the page.\n\nRewriteRule ^for/this/url/$ /use/this/file.php [L] \n\nMatching patterns\n\nBy now you\u2019ll have noted the weird ^ and $ characters wrapped around the URL we\u2019re trying to match. That\u2019s because what we\u2019re actually using here is a pattern. Technically, it is what\u2019s called a Perl Compatible Regular Expression (PCRE) or simply a regex or regexp. We\u2019ll call it a pattern because we\u2019re not animals.\n\nWhat are these patterns? If I asked you to enter your credit card expiry date as MM/YY then chances are you\u2019d wonder what I wanted your credit card details for, but you\u2019d know that I wanted a two-digit month, a slash, and a two-digit year. That\u2019s not a regular expression, but it\u2019s the same idea: using some placeholder characters to define the pattern of the input you\u2019re trying to match.\n\nWe\u2019ve already met two regexp characters.\n\n\n\t^\n\tMatches the beginning of a string\n\t$\n\tMatches the end of a string\n\n\nWhen a pattern starts with ^ and ends with $ it\u2019s to make sure we match the complete URL start to finish, not just part of it. There are lots of other ways to match, too:\n\n\n\t[0-9]\n\tMatches a number, 0\u20139. [2-4] would match numbers 2 to 4 inclusive.\n\t[a-z]\n\tMatches lowercase letters a\u2013z\n\t[A-Z]\n\tMatches uppercase letters A\u2013Z\n\t[a-z0-9]\n\tCombining some of these, this matches letters a\u2013z and numbers 0\u20139\n\n\nThese are what we call character groups. The square brackets basically tell the server to match from the selection of characters within them. You can put any specific characters you\u2019re looking for within the brackets, as well as the ranges shown above. \n\nHowever, all these just match one single character. [0-9] would match 8 but not 84 \u2014 to match 84 we\u2019d need to use [0-9] twice.\n\n[0-9][0-9]\n\nSo, if we wanted to match 1984 we could to do this:\n\n[0-9][0-9][0-9][0-9] \n\n\u2026but that\u2019s getting silly. Instead, we can do this:\n\n[0-9]{4}\n\nThat means any character between 0 and 9, four times. If we wanted to match a number, but didn\u2019t know how long it might be (for example, a database ID in the URL) we could use the + symbol, which means one or more.\n\n[0-9]+\n\nThis now matches 1, 123 and 1234567.\n\nPutting it into practice\n\nLet\u2019s say we need to write a rule to match article URLs for this website, and to rewrite them to use /article.php under the hood. The articles all have URLs like this:\n\n2013/article-title/\n\nThey start with a year (from 2005 up to 2013, currently), a slash, and then have a URL-safe version of the article title (a slug), ending in a slash. We\u2019d match it like this:\n\n^[0-9]{4}/[a-z0-9-]+/$\n\nIf that looks frightening, don\u2019t worry. Breaking it down, from the start of the URL (^) we\u2019re looking for four numbers ([0-9]{4}). Then a slash \u2014 that\u2019s just literal \u2014 and then anything lowercase a\u2013z or 0\u20139 or a dash ([a-z0-9-]) one or more times (+), ending in a slash (/$).\n\nPutting that into a rewrite rule, we end up with this:\n\nRewriteRule ^[0-9]{4}/[a-z0-9-]+/$ /article.php\n\nWe\u2019re getting close now. We can match the article URLs and rewrite them to use article.php. Now we just need to make sure that article.php knows which article it\u2019s supposed to display.\n\nCapturing groups, and replacements\n\nWhen rewriting URLs you\u2019ll often want to take important parts of the URL you\u2019re matching and pass them along to the script that handles the request. That\u2019s usually done by adding those parts of the URL on as query string arguments. For our example, we want to make sure that article.php knows the year and the article title we\u2019re looking for. That means we need to call it like this:\n\n/article.php?year=2013&slug=article-title\n\nTo do this, we need to mark which parts of the pattern we want to reuse in the destination. We do this with round brackets or parentheses. By placing parentheses around parts of the pattern we want to reuse, we create what\u2019s called a capturing group. To capture an important part of the source URL to use in the destination, surround it in parentheses.\n\nOur pattern now looks like this, with parentheses around the parts that match the year and slug, but ignoring the slashes:\n\n^([0-9]{4})/([a-z0-9-]+)/$ \n\nTo use the capturing groups in the destination URL, we use the dollar sign and the number of the group we want to use. So, the first capturing group is $1, the second is $2 and so on. (The $ is unrelated to the end-of-pattern $ we used before.)\n\nRewriteRule ^([0-9]{4})/([a-z0-9-]+)/$ /article.php?year=$1&slug=$2 \n\nThe value of the year capturing group gets used as $1 and the article title slug is $2. Had there been a third group, that would be $3 and so on. In regexp parlance, these are called back-references as they refer back to the pattern.\n\nOptions\n\nSeveral brain-taxing minutes ago, I mentioned some options as the final part of a rewrite rule. There are lots of options (or flags) you can set to change how the rule is processed. The most useful (to my mind) are:\n\n\n\tR=301\n\tPerform an HTTP 301 redirect to send the user\u2019s browser to the new URL. A status of 301 means a resource has moved permanently and so it\u2019s a good way of both redirecting the user to the new URL, and letting search engines know to update their indexes.\n\tL\n\tLast. If this rule matches, don\u2019t bother processing the following rules.\n\n\nOptions are set in square brackets at the end of the rule. You can set multiple options by separating them with commas:\n\nRewriteRule ^([0-9]{4})/([a-z0-9-]+)/$ /article.php?year=$1&slug=$2 [L]\n\nor\n\nRewriteRule ^about/([a-z0-9-]+).jsp/$ /about/$1/ [R=301,L] \n\nCommon pitfalls\n\nOnce you\u2019ve built up a few rewrite rules, things can start to go wrong. You may have been there: a rule which looks perfectly good is somehow not matching. One common reason for this is hidden behind that [L] flag. \n\nL for Last is a useful option to tell the rewrite engine to stop once the rule has been matched. This is what it does \u2014 the remaining rules in the .htaccess file are then ignored. However, once a URL has been rewritten, the entire set of rules are then run again on the new URL. If the new URL matches any of the rules, that too will be rewritten and on it goes. \n\nOne way to avoid this problem is to keep your \u2018real\u2019 pages under a folder path that will never match one of your rules, or that you can exclude from the rewrite rules.\n\nUseful snippets\n\nI find myself reusing the same few rules over and over again, just with minor changes. Here are some useful examples to refer back to.\n\nExcluding a directory\n\nAs mentioned above, if you\u2019re rewriting lots of fancy URLs to a collection of real files it can be helpful to put those files in a folder and exclude it from rewrite rules. This helps solve the issue of rewrite rules reapplying to your newly rewritten URL. To exclude a directory, put a rule like this at the top of your file, before your other rules. Our files are in a folder called _source, the dash in the rule means do nothing, and the L flag means the following rules won\u2019t be applied.\n\nRewriteRule ^_source - [L]\n\nThis is also useful for excluding things like CMS folders from your website\u2019s rewrite rules\n\nRewriteRule ^perch - [L] \n\nAdding or removing www from the domain\n\nSome folk like to use a www and others don\u2019t. Usually, it\u2019s best to pick one and go with it, and redirect the one you don\u2019t want. On this site, we don\u2019t use www.24ways.org so we redirect those requests to 24ways.org.\n\nThis uses a RewriteCond which is like an if for a rewrite rule: \u201cIf this condition matches, then apply the following rule.\u201d In this case, it\u2019s if the HTTP HOST (or domain name, basically) matches this pattern, then redirect everything:\n\nRewriteCond %{HTTP_HOST} ^www.24ways.org$ [NC]\nRewriteRule ^(.*)$ http://24ways.org/$1 [R=301,L]\n\nThe [NC] flag means \u2018no case\u2019 \u2014 the match is case-insensitive. The dots in the domain are escaped with a backslash, as a dot is a regular expression character which means match anything, so we escape it because we literally mean a dot in this instance.\n\nRemoving file extensions\n\nSometimes all you need to do to tidy up a URL is strip off the technology-specific file extension, so that /about/history.php becomes /about/history. This is easily achieved with the help of some more rewrite conditions.\n\nRewriteCond %{REQUEST_FILENAME} !-f\nRewriteCond %{REQUEST_FILENAME} !-d\nRewriteCond %{REQUEST_FILENAME}.php -f\nRewriteRule ^(.+)$ $1.php [L,QSA]\n\nThis says if the file being asked for isn\u2019t a file (!-f) and if it isn\u2019t a directory (!-d) and if the file name plus .php is an actual file (-f) then rewrite by adding .php on the end. The QSA flag means \u2018query string append\u2019: append the existing query string onto the rewritten URL.\n\nIt\u2019s these sorts of more generic catch-all rules that you need to watch out for when your .htaccess gets rerun after a successful match. Without care they can easily rematch the newly rewritten URL.\n\nLogging for when it all goes wrong\n\nAlthough not possible within your .htaccess file, if you have access to your Apache configuration files you can enable rewrite logging. This can be useful to track down where a rule is going wrong, if it\u2019s matching incorrectly or failing to match. It also gives you an overview of the amount of work being done by the rewrite engine, enabling you to rearrange your rules and maximise performance.\n\nRewriteEngine On\nRewriteLog \"/full/system/path/to/rewrite.log\"\nRewriteLogLevel 5\n\nTo be doubly clear: this will not work from an .htaccess file \u2014 it needs to be added to the main Apache configuration files. (I sometimes work using MAMP PRO locally on my Mac, and this can be pasted into the snappily named Customized virtual host general settings box in the Advanced tab for your site.)\n\nThe white screen of death\n\nOne of the most frustrating things when working with rewrite rules is that when you make a mistake it can result in the server returning an HTTP 500 Internal Server Error. This in itself isn\u2019t an error message, of course. It\u2019s more of a notification that an error has occurred. The real error message can usually be found in your Apache error log.\n\nIf you have access to your server logs, check the Apache error log and you\u2019ll usually find a much more descriptive error message, pointing you towards your mistake. (Again, if using MAMP PRO, go to Server, Apache and the View Log button.)\n\nIn conclusion\n\nRewriting URLs can be a bear, but the advantages are clear. Keeping a tidy URL structure, disconnected from the technology or file structure of your site can result in URLs that are easier to use and easier to maintain into the future.\n\nIf you\u2019re redesigning a site, remember that cool URIs don\u2019t change, so budget some time to make sure that any content you move has a rewrite rule associated with it to keep any links working.\n\nFurther reading\n\nTo find out more about URL rewriting and perhaps even learn more about regular expressions, I can recommend the following resources.\n\n\n\tFrom the horse\u2019s mouth, the Apache mod_rewrite documentation\n\tParticularly useful with that documentation is the RewriteRule Flags listing\n\tYou may wish to don sunglasses to follow the otherwise comprehensive Regular-Expressions.info tutorial\n\tFriend of 24 ways, Neil Crosby has a mod_rewrite Beginner\u2019s Guide which I\u2019ve found handy over the years.\n\n\nAs noted at the start, this isn\u2019t a fully comprehensive guide, but I hope it\u2019s useful in finding your feet with a powerful but sometimes annoying technology. Do you have useful snippets you often use on projects? Feel free to share them in the comments.", "year": "2013", "author": "Drew McLellan", "author_slug": "drewmclellan", "published": "2013-12-01T00:00:00+00:00", "url": "https://24ways.org/2013/url-rewriting-for-the-fearful/", "topic": "code"}