\nThis code only worked in Internet Explorer and in some versions of Safari, but that was plenty of people to befriend. However, MySpace was prepared for this: they also filtered the word javascript from
\nSamy discovered that by inserting a line break into his code, MySpace would not filter out the word javascript. The browser would continue to run the code just fine! Samy had now broken past MySpace\u2019s first line of defence and was able to start running code on his profile page. Now he started looking at what he could do with that code.\nalert(document.body.innerHTML)\nSamy wondered if he could inspect the page\u2019s source to find the details of other MySpace users to befriend. To do this, you would normally use document.body.innerHTML, but MySpace had filtered this too.\nalert(eval('document.body.inne' + 'rHTML'))\nThis isn\u2019t a problem if you build up JavaScript code inside a string and execute it using the eval() function. This trick also worked with XMLHttpRequest.onReadyStateChange, which allowed Samy to send friend requests to the MySpace API and install the JavaScript code on his new friends\u2019 pages.\nOne final obstacle stood in his way. The same origin policy is a security mechanism that prevents scripts hosted on one domain interacting with sites hosted on another domain.\nif (location.hostname == 'profile.myspace.com') {\n document.location = 'http://www.myspace.com'\n + location.pathname + location.search\n}\nSamy discovered that only the http://www.myspace.com domain would accept his API requests, and requests from http://profile.myspace.com were being blocked by the browser\u2019s same-origin policy. By redirecting the browser to http://www.myspace.com, he discovered that he could load profile pages and successfully make requests to MySpace\u2019s API. Samy installed this code on his profile page, and he waited.\n\nOver the course of the next day, over a million people unwittingly installed Samy\u2019s code into their MySpace profile pages and invited their friends. The load of friend requests on MySpace was so large that the site buckled and shut down. It took them two hours to remove Samy\u2019s code and patch the security holes he exploited. Samy was raided by the United States secret service and sentenced to do 90 days of community service.\nThis is the power of installing a little bit of JavaScript on someone else\u2019s website. It is called cross site scripting, and its effects can be devastating. It is suspected that cross-site scripting was to blame for the 2018 British Airways breach that leaked the credit card details of 380,000 people.\nSo how can you help protect yourself from cross-site scripting?\nAlways sanitise user input when it comes in, using a library such as sanitize-html. Open source tools like this benefit from hundreds of hours of work from dozens of experienced contributors. Don\u2019t be tempted to roll your own protection. MySpace was prepared, but they were not prepared enough. It makes no sense to turn this kind of help down.\nYou can also use an auto-escaping templating language to make sure nobody else\u2019s HTML can get into your pages. Both Angular and React will do this for you, and they are extremely convenient to use.\nYou should also implement a content security policy to restrict the domains that content like scripts and stylesheets can be loaded from. Loading content from sites not under your control is a significant security risk, and you should use a CSP to lock this down to only the sources you trust. CSP can also block the use of the eval() function.\nFor content not under your control, consider setting up sub-resource integrity protection. This allows you to add hashes to stylesheets and scripts you include on your website. Hashes are like fingerprints for digital files; if the content changes, so does the fingerprint. Adding hashes will allow your browser to keep your site safe if the content changes without you knowing.\nnpm audit: Protecting yourself from code you don\u2019t own\nJavaScript and npm run the modern web. Together, they make it easy to take advantage of the world\u2019s largest public registry of open source software. How do you protect yourself from code written by someone you\u2019ve never met? Enter npm audit.\nnpm audit reviews the security of your website\u2019s dependency tree. You can start using it by upgrading to the latest version of npm:\nnpm install npm -g\nnpm audit\nWhen you run npm audit, npm submits a description of your dependencies to the Registry, which returns a report of known vulnerabilities for the packages you have installed.\n\nIf your website has a known cross-site scripting vulnerability, npm audit will tell you about it. What\u2019s more, if the vulnerability has been patched, running npm audit fix will automatically install the patched package for you!\nSecuring your site like it\u2019s 2019\nThe truth is that since the early days of the web, the stakes of a security breach have become much, much higher. The web is so much more than fandom and mailing DVDs - online banking is now mainstream, social media and dating websites store intimate information about our personal lives, and we are even inviting the internet into our homes.\nHowever, we have powerful new allies helping us stay safe. There are more resources than ever before to teach us how to write secure code. Tools like Angular and React are designed with security features baked-in from the start. We have a new generation of security tools like npm audit to watch over our dependencies.\nAs we roll over into 2019, let\u2019s take the opportunity to reflect on the security of the code we write and be grateful for the everything we\u2019ve learned in the last twenty years.", "year": "2018", "author": "Katie Fenn", "author_slug": "katiefenn", "published": "2018-12-01T00:00:00+00:00", "url": "https://24ways.org/2018/securing-your-site-like-its-1999/", "topic": "code"}
{"rowid": 256, "title": "Develop Your Naturalist Superpowers with Observable Notebooks and iNaturalist", "contents": "We\u2019re going to level up your knowledge of what animals you might see in an area at a particular time of year - a skill every naturalist* strives for - using technology! Using iNaturalist and Observable Notebooks we\u2019re going to prototype seasonality graphs for particular species in an area, and automatically create a guide to what animals you might see in each month.\n*(a Naturalist is someone who likes learning about nature, not someone who\u2019s a fan of being naked, that\u2019s a \u2018Naturist\u2019\u2026 different thing!)\nLooking for critters in rocky intertidal habitats\nOne of my favourite things to do is going rockpooling, or as we call it over here in California, \u2018tidepooling\u2019. Amounting to the same thing, it\u2019s going to a beach that has rocks where the tide covers then uncovers little pools of water at different times of the day. All sorts of fun creatures and life can be found in this \u2018rocky intertidal habitat\u2019\nA particularly exciting creature that lives here is the Nudibranch, a type of super colourful \u2018sea slug\u2019. There are over 3000 species of Nudibranch worldwide. (The word \u201cnudibranch\u201d comes from the Latin nudus, naked, and the Greek \u03b2\u03c1\u03b1\u03bd\u03c7\u03b9\u03b1 / brankhia, gills.)\n\u200b\n\nThey are however quite tricky to find! Even though they are often brightly coloured and interestingly shaped, some of them are very small, and in our part of the world in the Bay Area in California their appearance in our rockpools is seasonal. We see them more often in Summer months, despite the not-as-low tides as in our Winter and Spring seasons.\nMy favourite place to go tidepooling here is Pillar Point in Half Moon bay (at other times of the year more famously known for the surf competition \u2018Mavericks\u2019). The rockpools there are rich in species diversity, of varied types and water-coverage habitat zones as well as being relatively accessible.\n\u200b\n\nI was rockpooling at Pillar Point recently with my parents and we talked to a lady who remarked that she hadn\u2019t seen any Nudibranchs on her visit this time. I realised that having an idea of what species to find where, and at what time of year is one of the many superpower goals of every budding Naturalist. \nUsing technology and the croudsourced species observations of the iNaturalist community we can shortcut our way to this superpower!\nFinding nearby animals with iNaturalist\nWe\u2019re going to be getting our information about what animals you can see in Pillar Point using iNaturalist. iNaturalist is a really fun platform that helps connect people to nature and report their findings of life in the outdoors. It is also a community of nature-loving people who help each other identify and confirm those observations. iNaturalist is a project run as a joint initiative by the California Academy of Sciences and the National Geographic Society.\nI\u2019ve been using iNaturalist for over two years to record and identify plants and animals that I\u2019ve found in the outdoors. I use their iPhone app to upload my pictures, which then uses machine learning algorithms to make an initial guess at what it is I\u2019ve seen. The community is really active, and I often find someone else has verified or updated my species guess pretty soon after posting. \nThis process is great because once an observation has been identified by at least two people it becomes \u2018verified\u2019 and is considered research grade. Research grade observations get exported and used by scientists, as well as being indexed by the Global Biodiversity Information Facility, GBIF.\n\u200b\n\niNaturalist has a great API and API explorer, which makes interacting and prototyping using iNaturalist data really fun. For example, if you go to the API explorer and expand the Observations : Search and fetch section and then the GET /observations API, you get a selection of input boxes that allow you to play with options that you can then pass to the API when you click the \u2018Try it out\u2019 button.\n\u200b\n\nYou\u2019ll then get a URL that looks a bit like\nhttps://api.inaturalist.org/v1/observations?captive=false &geo=true&verifiable=true&taxon_id=47113&lat=37.495461&lng=-122.499584 &radius=5&order=desc&order_by=created_at \nwhich you can call and interrrogate using a programming language of your choice.\nIf you would like to see an all-JavaScript application that uses the iNaturalist API, take a look at OwlsNearMe.com which Simon and I built one weekend earlier this year. It gets your location and shows you all iNaturalist observations of owls near you and lists which species you are likely to see (not adjusted for season).\nRapid development using Observable Notebooks\nWe\u2019re going to be using Observable Notebooks to prototype our examples, pulling data down from iNaturalist. I really like using visual notebooks like Observable, they are great for learning and building things quickly. You may be familiar with Jupyter notebooks for Python which is similar but takes a bit of setup to get going - I often use these for prototyping too. Observable is amazing for querying and visualising data with JavaScript and since it is a hosted product it doesn\u2019t require any setup at all.\nYou can follow along and play with this example on my Observable notebook. If you create an account there you can fork my notebook and create your own version of this example. \nEach \u2018notebook\u2019 consists of a page with a column of \u2018cells\u2019, similar to what you get in a spreadsheet. A cell can contain Markdown text or JavaScript code and the output of evaluating the cell appears above the code that generated it. There are lots of tutorials out there on Observable Notebooks, I like this code introduction one from Observable (and D3) creator Mike Bostock.\nDeveloping your Naturalist superpowers\nIf you have an idea of what plants and critters you might see in a place at the time you visit, you can hone in on what you want to study and train your Naturalist eye to better identify the life around you.\nFor our example, we care about wildlife we can see at Pillar Point, so we need a way of letting the iNaturalist API know which area we are interested in.\nWe could use a latitide, longitude and radius for this, but a rectangular bounding box is a better shape for the reef. We can use this tool to draw the area we want to search within: boundingbox.klokantech.com\n\u200b\n\nThe tool lets you export the bounding box in several forms using the dropdown at the bottom left under the map givese We are going to use the \u2018DublinCore\u2019 format as it\u2019s closest to the format needed by the iNaturalist API.\n westlimit=-122.50542; southlimit=37.492805; eastlimit=-122.492738; northlimit=37.499811\nA quick map primer:\nThe higher the latitude the more north it is\nThe lower the latitude the more south it is\nLatitude 0 = the equator\n\nThe higher the longitude the more east it is of Greenwich\nThe lower the longitude the more west it is of Greenwich\nLongitude 0 = Greenwich\nIn the iNaturalst API we want to use the parameters nelat, nelng, swlat, swlng to create a query that looks inside a bounding box of Pillar Point near Half Moon Bay in California:\nnelat = highest latitude = north limit = 37.499811\nnelng = highest longitude = east limit = -122.492738\nswlat = smallest latitude = south limit = 37.492805\nswlng = smallest longitude = west limit = 122.50542\nAs API parameters these look like this:\n?nelat=37.499811&nelng=-122.492738&swlat=37.492805&swlng=122.50542\nThese parameters in this format can be used for most of the iNaturalist API methods.\nNudibranch seasonality in Pillar Point\nWe can use the iNaturalist observation_histogram API to get a count of Nudibranch observations per week-of-year across all time and within our Pillar Point bounding box.\nIn addition to the geographic parameters that we just worked out, we are also sending the taxon_id of 47113, which is iNaturalists internal number associated with the Nudibranch taxon. By using this we can get all species which are under the parent \u2018Order Nudibranchia\u2019. \nAnother useful piece of naturalist knowledge is understanding the biological classification scheme of Taxanomic Rank - roughly, when a species has a Latin name of two words eg \u2018Glaucus Atlanticus\u2019 the first Latin word is the \u2018Genus\u2019 like a family name \u2018Glaucus\u2019, and the second word identifies that particular species, like a given name \u2018Atlanticus\u2019. \nThe two Latin words together indicate a specific species, the term we use colloquially to refer to a type of animal often differs wildly region to region, and sometimes the same common name in two countries can refer to two different species. The common names for the Glaucus Atlanticus (which incidentally is my favourite sea slug) include: sea swallow, blue angel, blue glaucus, blue dragon, blue sea slug and blue ocean slug! Because this gets super confusing, Scientists like using this Latin name format instead.\nThe following piece of code asks the iNaturalist Histogram API to return per-week counts for verified observations of Nudibranchs within our Pillar Point bounding box:\npillar_point_counts_per_week = fetch(\n \"https://api.inaturalist.org/v1/observations/histogram?taxon_id=47113&nelat=37.499811&nelng=-122.492738&swlat=37.492805&swlng=-122.50542&date_field=observed&interval=week_of_year&verifiable=true\"\n ).then(response => {\n return response.json();\n})\nOur next step is to take this data and draw a graph! We\u2019ll be using Vega-Lite for this, which is a fab JavaScript graphing libary that is also easy and fun to use with Observable Notebooks. \n(Here is a great tutorial on exploring data and drawing graphs with Observable and Vega-Lite)\nThe iNaturalist API returns data that looks like this:\n{\n \"total_results\": 53,\n \"page\": 1,\n \"per_page\": 53,\n \"results\": {\n \"week_of_year\": {\n \"1\": 136,\n \"2\": 20,\n \"3\": 150,\n \"4\": 65,\n \"5\": 186,\n \"6\": 74,\n \"7\": 47,\n \"8\": 87,\n \"9\": 64,\n \"10\": 56,\nBut for our Vega-Lite graph we need data that looks like this:\n[{\n \"week\": \"01\",\n \"value\": 136\n}, {\n \"week\": \"02\",\n \"value\": 20\n}, ...]\nWe can convert what we get back from the API to the second format using a loop that iterates over the object keys:\nobjects_to_plot = {\n let objects = [];\n Object.keys(pillar_point_counts_per_week.results.week_of_year).map(function(week_index) {\n objects.push({\n week: `Wk ${week_index.toString()}`,\n observations: pillar_point_counts_per_week.results.week_of_year[week_index]\n });\n })\n return objects;\n}\nWe can then plug this into Vega-Lite to draw us a graph:\nvegalite({\n data: {values: objects_to_plot},\n mark: \"bar\",\n encoding: {\n x: {field: \"week\", type: \"nominal\", sort: null},\n y: {field: \"observations\", type: \"quantitative\"}\n },\n width: width * 0.9\n})\n\nIt\u2019s worth noting that we have a lot of observations of Nudibranchs particularly at Pillar Point due in no small part to the intertidal monitoring research that Alison Young and Rebecca Johnson facilitate for the California Achademy of Sciences. \nSo, what if we want to look for the seasonality of observations of a particular species of adorable sea slug? We want our interface to have a select box with a list of all the species you might find at any time of year. We can do this using the species_counts API to create us an object with the iNaturalist species ID and common & Latin names.\npillar_point_nudibranches = {\n let api_results = await fetch(\n \"https://api.inaturalist.org/v1/observations/species_counts?taxon_id=47113&nelat=37.499811&nelng=-122.492738&swlat=37.492805&swlng=-122.50542&date_field=observed&verifiable=true\"\n ).then(r => r.json())\n\n let species_list = api_results.results.map(i => ({\n value: i.taxon.id,\n label: `${i.taxon.preferred_common_name} (${i.taxon.name})`\n }));\n\n return species_list\n}\nWe can create an interactive select box by importing code from Jeremy Ashkanas\u2019 Observable Notebook: add import {select} from \"@jashkenas/inputs\" to a cell anywhere in our notebook. Observable is magic: like a spreadsheet, the order of the cells doesn\u2019t matter - if one cell is referenced by any other cell then when that cell updates all the other cells refresh themselves. You can also import and reference one notebook from another!\nviewof select_species = select({\n title: \"Which Nudibranch do you want to see seasonality for?\",\n options: [{value: \"\", label: \"All the Nudibranchs!\"}, ...pillar_point_nudibranches],\n value: \"\"\n})\nThen we go back to our old favourite, the histogram API just like before, only this time we are calling it with the value created by our select box ${select_species} as taxon_id instead of the number 47113.\npillar_point_counts_per_month_per_species = fetch(\n `https://api.inaturalist.org/v1/observations/histogram?taxon_id=${select_species}&nelat=37.499811&nelng=-122.492738&swlat=37.492805&swlng=-122.50542&date_field=observed&interval=month_of_year&verifiable=true`\n).then(r => r.json())\nNow for the fun graph bit! As we did before, we re-format the result of the API into a format compatible with Vega-Lite:\nobjects_to_plot_species_month = {\n let objects = [];\n Object.keys(pillar_point_counts_per_month_per_species.results.month_of_year).map(function(month_index) {\n objects.push({\n month: (new Date(2018, (month_index - 1), 1)).toLocaleString(\"en\", {month: \"long\"}),\n observations: pillar_point_counts_per_month_per_species.results.month_of_year[month_index]\n });\n })\n return objects;\n}\n(Note that in the above code we are creating a date object with our specific month in, and using toLocalString() to get the longer English name for the month. Because the JavaScript Date object counts January as 0, we use month_index -1 to get the correct month)\nAnd we draw the graph as we did before, only now if you interact with the select box in Observable the graph will dynamically update!\nvegalite({\n data: {values: objects_to_plot_species_month},\n mark: \"bar\",\n encoding: {\n x: {field: \"month\", type: \"nominal\", sort:null},\n y: {field: \"observations\", type: \"quantitative\"}\n },\n width: width * 0.9\n})\nNow we can see when is the best time of year to plan to go tidepooling in Pillar Point if we want to find a specific species of Nudibranch.\n\u200b\n\nThis tool is great for planning when we to go rockpooling at Pillar Point, but what about if you are going this month and want to pre-train your eye with what to look for in order to impress your friends with your knowledge of Nudibranchs?\nWell\u2026 we can create ourselves a dynamic guide that you can with a list of the species, their photo, name and how many times they have been observed in that month of the year!\nOur select box this time looks as follows, simpler than before but assigning the month value to the variable selected_month.\nviewof selected_month = select({\n title: \"When do you want to see Nudibranchs?\",\n options: [\n { label: \"Whenever\", value: \"\" },\n { label: \"January\", value: \"1\" },\n { label: \"February\", value: \"2\" },\n { label: \"March\", value: \"3\" },\n { label: \"April\", value: \"4\" },\n { label: \"May\", value: \"5\" },\n { label: \"June\", value: \"6\" },\n { label: \"July\", value: \"7\" },\n { label: \"August\", value: \"8\" },\n { label: \"September\", value: \"9\" },\n { label: \"October\", value: \"10\" },\n { label: \"November\", value: \"11\" },\n { label: \"December\", value: \"12\" },\n ],\n value: \"\"\n })\nWe then can use the species_counts API to get all the relevant information about which species we can see in month=${selected_month}. We\u2019ll be able to reference this response object and its values later with the variable we just created, eg: all_species_data.results[0].taxon.name.\nall_species_data = fetch(\n `https://api.inaturalist.org/v1/observations/species_counts?taxon_id=47113&month=${selected_month}&nelat=37.499811&nelng=-122.492738&swlat=37.492805&swlng=-122.50542&verifiable=true`\n).then(r => r.json())\nYou can render HTML directly in a notebook cell using Observable\u2019s html tagged template literal:\n\n\n
If you go to Pillar Point ${\n {\"\": \"\",\n \"1\":\"in January\",\n \"2\":\"in Febrary\",\n \"3\":\"in March\",\n \"4\":\"in April\",\n \"5\":\"in May\",\n \"6\":\"in June\",\n \"7\":\"in July\",\n \"8\":\"in August\",\n \"9\":\"in September\",\n \"10\":\"in October\",\n \"11\":\"in November\",\n \"12\":\"in December\",\n }[selected_month]\n } you might see\u2026 \n\n
\n${all_species_data.results.map(s => `
${s.taxon.name} \n
Seen ${s.count} times
\n
\n`)}\n
\nThese few lines of HTML are all you need to get this exciting dynamic guide to what Nudibranchs you will see in each month!\n\u200b\n\nPlay with it yourself in this Observable Notebook.\nConclusion\nI hope by playing with these examples you have an idea of how powerful it can be to prototype using Observable Notebooks and how you can use the incredible crowdsourced community data and APIs from iNaturalist to augment your naturalist skills and impress your friends with your new \u2018knowledge of nature\u2019 superpower.\nLastly I strongly encourage you to get outside on a low tide to explore your local rocky intertidal habitat, and all the amazing critters that live there.\nHere is a great introduction video to tidepooling / rockpooling, by Rebecca Johnson and Alison Young from the California Academy of Sciences.", "year": "2018", "author": "Natalie Downe", "author_slug": "nataliedowne", "published": "2018-12-18T00:00:00+00:00", "url": "https://24ways.org/2018/observable-notebooks-and-inaturalist/", "topic": "code"}
{"rowid": 243, "title": "Researching a Property in the CSS Specifications", "contents": "I frequently joke that I\u2019m \u201creading the specs so you don\u2019t have to\u201d, as I unpack some detail of a CSS spec in a post on my blog, some documentation for MDN, or an article on Smashing Magazine. However waiting for someone like me to write an article about something is a pretty slow way to get the information you need. Sometimes people like me get things wrong, or specifications change after we write a tutorial. \nWhat if you could just look it up yourself? That\u2019s what you get when you learn to read the CSS specifications, and in this article my aim is to give you the basic details you need to grab quick information about any CSS property detailed in the CSS specs.\nWhere are the CSS Specifications?\nThe easiest way to see all of the CSS specs is to take a look at the Current Work page in the CSS section of the W3C Website. Here you can see all of the specifications listed, the level they are at and their status. There is also a link to the specification from this page. I explained CSS Levels in my article Why there is no CSS 4.\nWho are the specifications for?\nCSS specifications are for everyone who uses CSS. You might be a browser engineer - referred to as an implementor - needing to know how to implement a feature, or a web developer - referred to as an author - wanting to know how to use the feature. The fact that both parties are looking at the same document hopefully means that what the browser displays is what the web developer expected.\nWhich version of a spec should I look at?\nThere are a couple of places you might want to look. Each published spec will have the latest published version, which will have TR in the URL and can be accessed without a date (which is always the newest version) or at a date, which will be the date of that publication. If I\u2019m referring to a particular Working Draft in an article I\u2019ll typically link to the dated version. That way if the information changes it is possible for someone to see where I got the information from at the time of writing.\nIf you want the very latest additions and changes to the spec, then the Editor\u2019s Draft is the place to look. This is the version of the spec that the editors are committing changes to. If I make a change to the Multicol spec and push it to GitHub, within a few minutes that will be live in the Editor\u2019s Draft. So it is possible there are errors, bits of text that we are still working out and so on. The Editor\u2019s Draft however is definitely the place to look if you are wanting to raise an issue on a spec, as it may be that the issue you are about to raise is already fixed.\nIf you are especially keen on seeing updates to specifications keep an eye on https://drafts.csswg.org/ as this is a list of drafts, along with the date they were last updated.\nHow to approach a spec\nThe first thing to understand is that most CSS Specifications start with the most straightforward information, and get progressively further into the weeds. For an author the initial examples and explanations are likely to be of interest, and then the property definitions and examples. Therefore, if you are looking at a vast spec, know that you probably won\u2019t need to read all the way to the bottom, or read every section in detail.\nThe second thing that is useful to know about modern CSS specifications is how modularized they are. It really never is a case of finding everything you need in a single document. If we tried to do that, there would be a lot of repetition and likely inconsistency between specs. There are some key specifications that many other specifications draw on, such as:\n\nValues and Units\nIntrinsic and Extrinsic Sizing\nBox Alignment\n\nWhen something is defined in another specification the spec you are reading will link to it, so it is worth opening that other spec in a new tab in order that you can refer back to it as you explore.\nResearching your property\nAs an example we will take a look at the property grid-auto-rows, this property defines row tracks in the implicit grid when using CSS Grid Layout. The first thing you will need to do is find out which specification defines this property.\nYou might already know which spec the property is part of, and therefore you could go directly to the spec and search using your browser or look in the navigation for the spec to find it. Alternatively, you could take a look at the CSS Property Index, which is an automatically generated list of CSS Properties.\nClicking on a property will take you to the TR version of the spec, the latest published draft, and the definition of that property in it. This definition begins with a panel detailing the syntax of this property. For grid-auto-rows, you can see that it is listed along with grid-auto-columns as these two properties are essentially identical. They take the same values and work in the same way, one for rows and the other for columns.\nValue\nFor value we can see that the property accepts a value
. The next thing to do is to find out what that actually means, clicking will take you to where it is defined in the Grid spec.\nThe value is defined as accepting various values:\n\n\nminmax( , )\nfit-content( \n\nWe need to head down the rabbit hole to find out what each of these mean. From here we essentially go down line by line until we have unpacked the value of track-size.\n is defined just below as:\n\n\n\nmin-content\nmax-content\nauto\n\nSo these are all things that would be valid to use as a value for grid-auto-rows.\nThe first value of is something you will see in many specifications as a value. It means that you can use a length unit - for example px or em - or a percentage. Some properties only accept a in which case you know that you cannot use a percentage as the value. This means that you could have grid-auto-rows with any of the following values.\ngrid-auto-rows: 100px;\ngrid-auto-rows: 1em;\ngrid-auto-rows: 30%;\nWhen using percentages, it is important to know what it is a percentage of. As a percentage has to resolve from something. There is text in the spec which explains how column and row percentages work.\n\n\u201c values are relative to the inline size of the grid container in column grid tracks, and the block size of the grid container in row grid tracks.\u201d\n\nThis means that in a horizontal writing mode such as when using English, a percentage when used as a track-size in grid-auto-columns would be a percentage of the width of the grid, and a percentage in grid-auto-rows a percentage of the height of the grid.\nThe second value of is also defined here, as \u201cA non-negative dimension with the unit fr specifying the track\u2019s flex factor.\u201d This is the fr unit, and the spec links to a fuller definition of fr as this unit is only used in Grid Layout so it is therefore defined in the grid spec. We now know that a valid value would be:\ngrid-auto-rows: 1fr;\nThere is some useful information about the fr unit in this part of the spec. It is noted that the fr unit has an automatic minimum. This means that 1fr is really minmax(auto, 1fr). This is why having a number of tracks all at 1fr does not mean that all are equal sized, as a larger item in any of the tracks would have a large auto size and therefore would be larger after spare space had been distributed.\nWe then have min-content and max-content. These keywords can be used for track sizing and the specification defines what they mean in the context of sizing a track, representing the min and max-sizing contributions of the grid tracks. You will see that there are various terms linked in the definition, so if you do not know what these mean you can follow them to find out.\nFor example the spec links max-content contribution to the CSS Intrinsic and Extrinsic Sizing specification. This is one of those specs which is drawn on by many other specifications. If we follow that link we can read the definition there and follow further links to understand what each term means. The more that you read specifications the more these terms will become familiar to you. Just like learning a foreign language, at first you feel like you have to look up every little thing. After a while you remember the vocabulary.\nWe can now add min-content and max-content to our available values.\ngrid-auto-rows: min-content;\ngrid-auto-rows: max-content;\nThe final item in our list is auto. If you are familiar with using Grid Layout, then you are probably aware that an auto sized track for will grow to fit the content used. There is an interesting note here in the spec detailing that auto sized rows will stretch to fill the grid container if there is extra space and align-content or justify-content have a value of stretch. As stretch is the default value, that means these tracks stretch by default. Tracks using other types of length will not behave like this.\ngrid-auto-rows: auto;\nSo, this was the list for , the next possible value is minmax( , ). So this is telling us that we can use minmax() as a value, the final (max) value will be and we have already unpacked all of the allowable values there. The first value (min) is detailed as an . If we look at the values for this, we discover that they are the same as , minus the value:\n\n\nmin-content\nmax-content\nauto\n\nWe already know what all of these do, so we can add possible minmax() values to our list of values for .\ngrid-auto-rows: minmax(100px, 200px);\ngrid-auto-rows: minmax(20%, 1fr);\ngrid-auto-rows: minmax(1em, auto);\ngrid-auto-rows: minmax(min-content, max-content);\nFinally we can use fit-content( . We can see that fit-content takes a value of which we already know to be either a length unit, or a percentage. The spec details how fit-content is worked out, and it essentially allows a track which acts as if you had used the max-content keyword, however the track stops growing when it hits the length passed to it.\ngrid-auto-rows: fit-content(200px);\ngrid-auto-rows: fit-content(20%);\nThose are all of our possible values, and to round things off, check again at the initial value, you can see it has a little + sign next to it, click that and you will be taken to the CSS Values and Units module to find that, \u201cA plus (+) indicates that the preceding type, word, or group occurs one or more times.\u201d This means that we can pass a single track size to grid-auto-rows or multiple track sizes as a space separated list. Below the box is an explanation of what happens if you pass in more than one track size:\n\n\u201cIf multiple track sizes are given, the pattern is repeated as necessary to find the size of the implicit tracks. The first implicit grid track after the explicit grid receives the first specified size, and so on forwards; and the last implicit grid track before the explicit grid receives the last specified size, and so on backwards.\u201d\n\nTherefore with the following CSS, if five implicit rows were needed they would be as follows:\n\n100px\n1fr\nauto\n100px\n1fr\n\n.grid {\n display: grid;\n grid-auto-rows: 100px 1fr auto;\n}\nInitial\nWe can now move to the next line in the box, and you\u2019ll be glad to know that it isn\u2019t going to require as much unpacking! This simply defines the initial value for grid-auto-rows. If you do not specify anything, created rows will be auto sized. All CSS properties have an initial value that they will use if they are invoked as part of the usage of the specification they are in, but you do not set a value for them. In the case of grid-auto-rows it is used whenever rows are created in the implicit grid, so it needs to have a value to be used even if you do not set one.\nApplies to\nThis line tells us what this property is used for. Some properties are used in multiple places. For example if you look at the definition for justify-content in the Box Alignment specification you can see it is used in multicol containers, flex containers, and grid containers. In our case the property only applies for grid containers.\nInherited\nThis tells us if the property can be inherited from a parent element if it is not set. In the case of grid-auto-rows it is not inherited. A property such as color is inherited, so you do not need to set it on each element.\nPercentages\nAre percentages allowed for this property, and if so how are they calculated. In this case we are referred to the definition for grid-template-columns and grid-template-rows where we discover that the percentage is from the corresponding dimension of the content area.\nMedia\nThis defines the media group that the property belongs to. In this case visual.\nComputed Value\nThis details how the value is resolved. The grid-auto-rows property again refers to track sizing as defined for grid-template-columns and grid-template-rows, which tells us the computed value is as specified with lengths made absolute.\nCanonical Order\nIf you have a property\u2013generally a shorthand property\u2013which takes multiple values in a set order, then those values need to be serialized in the order detailed in the grammar for that property. In general you don\u2019t need to worry about this value in the table.\nAnimation Type\nThis details whether the property can be animated, and if so what type of animation. This is useful if you are trying to animate something and not getting the result that you expect. Note that just because something is listed in the spec as animatable does not mean that browsers will have implemented animation for that property yet!\nThat\u2019s (mostly) it!\nSometimes the property will have additional examples - there is one underneath the table for grid-auto-rows. These are worth looking at as they will highlight usage of the property that the spec editor has felt could use an example. There may also be some additional text explaining anythign specific to this property.\nIn selecting grid-auto-rows I chose a fairly complex property in terms of the work we needed to do to unpack the value. Many properties are far simpler than this. However ultimately, even when you come across a complex value, it really is just a case of stepping through the definitions until you come to the bottom of the rabbit hole.\nBeing able to work out what is valid for each property is incredibly useful. It means you don\u2019t waste time trying to use a value that doesn\u2019t work for that property. You also may find that there are values you weren\u2019t aware of, that solve problems for you.\nFurther reading\nSpecifications are not designed to be user manuals, and while they often contain examples, these are pretty terse as they need to be clear to demonstrate their particular point. The manual for the Web Platform is MDN Web Docs. Pairing reading a specification with the examples and information on an MDN property page such as the one for grid-auto-rows is a really great way to ensure that you have all the information and practical usage examples you might need.\nYou may also find useful:\n\nValue Definition Syntax on MDN.\nThe MDN Glossary defines many common terms.\nUnderstanding the CSS Property Value Syntax goes into more detail in terms of reading the syntax.\nHow to read W3C Specs - from 2001 but still relevant.\n\nI hope this article has gone some way to demystify CSS specifications for you. Even if the specifications are not your preferred first stop to learn about new CSS, being able to go directly to the source and avoid having your understanding filtered by someone else, can be very useful indeed.", "year": "2018", "author": "Rachel Andrew", "author_slug": "rachelandrew", "published": "2018-12-14T00:00:00+00:00", "url": "https://24ways.org/2018/researching-a-property-in-the-css-specifications/", "topic": "code"}
{"rowid": 255, "title": "Inclusive Considerations When Restyling Form Controls", "contents": "I would like to begin by saying 2018 was the year that we, as developers, visual designers, browser implementers, and inclusive design and experience specialists rallied together and achieved a long-sought goal: We now have the ability to fully style form controls, across all modern browsers, while retaining their ease of declaration, native functionality and accessibility.\nI would like to begin by saying all these things. However, they\u2019re not true. I think we spent the year debating about what file extension CSS should be written in, or something. Or was that last year? Maybe I\u2019m thinking of next year.\nReturning to reality, styling form controls is more tricky and time consuming these days rather than flat out \u201chard\u201d. In fact, depending on the length of the styling-leash a particular browser provides, there are controls you can style quite a bit. As for browsers with shorter leashes, there are other options to force their controls closer to the visual design you\u2019re tasked to match.\nHowever, when striving for custom styled controls, one must be careful not to forget about the inherent functionality and accessibility that many provide. People expect and deserve the products and services they use and pay for to work for them. If these services are visually pleasing, but only function for those who fit the handful of personas they\u2019ve been designed for, then we\u2019ve potentially deprived many people the experiences they deserve.\nQuick level setting\nGetting down to brass tacks, when creating custom styled form controls that should retain their expected semantics and functionality, we have to consider the following:\n\nMany form elements can be styled directly through standard and browser specific selectors, as well as through some clever styling of markup patterns. We should leverage these native options before reinventing any wheels.\nIt is important to preserve the underlying semantics of interactive controls. We must not unintentionally exclude people who use assistive technologies (ATs) that rely on these semantics. \nMake sure you test what you create. There is a lot of underlying complexity to form controls which may not be immediately apparent if they\u2019re judged solely by their visual presentation in a single browser, or with limited AT testing.\n\nVisually resetting and restyling form controls\nOver the course of 2018, I worked on a project where I tested and reported on the accessibility impact of styling various form controls. In conducting my research, I reviewed many of the form controls available in HTML, testing to see how malleable they were to direct styling from standardized CSS selectors. \nAs I expected, controls such as the various text fields could be restyled rather easily. However, other controls like radio buttons and checkboxes, or sub-elements of special text fields like date, search, and number spinners were resistant to standard-based styling. These particular controls and their sub-elements required specific pseudo-elements to reset and allow for restyling of some of their default presentation.\nSee the Pen form control styling comparisons by Scott (@scottohara) on CodePen.\nhttps://codepen.io/scottohara/pen/gZOrZm/\nOver the years, the ability to directly style form controls has been something many people have clamored for. However, one should realize the benefits of being able to restyle some of these controls may involve more effort than originally anticipated. \nIf you want to restyle a control from the ground up, then you must also recreate any :active, :focus, and :hover states for the control\u2014all those things that were previously taken care of by browsers. Not only that, but anything you restyle should also work with Windows High Contrast mode, styling for dark mode, and other OS-level settings that browser respect without you even realizing. \n\n You ever try playing with the accessibility settings of your display on macOS, or similar Windows setting?\n \nIt is also worth mentioning that any browser prefixed pseudo-elements are not standardized CSS selectors. As MDN mentions at the top of their pages documenting these pseudo-elements:\n\nNon-standard\nThis feature is non-standard and is not on a standards track. Do not use it on production sites facing the Web: it will not work for every user. There may also be large incompatibilities between implementations and the behavior may change in the future.\n\nWhile this may be a deterrent for some, it\u2019s my opinion the risks are often only skin-deep. By which I mean if a non-standard selector does change, the control may look a bit quirky, but likely won\u2019t cease to function. A bug report which requires a CSS selector change can be an easy JIRA ticket to close, after all.\nCan\u2019t make it? Fake it.\nInternet Explorer 11 (IE11) is still neck-and-neck with other browsers in vying for the number 2 spot in desktop browser share. Due to IE not recognizing vendor-prefixed appearance properties, some essential controls like checkboxes won\u2019t render as intended. \nAdditionally, some controls like select boxes, file uploads, and sub-elements of date fields (calendar popups) cannot be modified by just relying on styling their HTML selectors alone. This means that unless your company designs and develops with a progressive enhancement, or graceful degradation mindset, you\u2019ll need to take a different approach in styling.\nGetting clever with markup and CSS\nThe following CodePen demonstrates how we can create a custom checkbox markup pattern. By mindfully utilizing CSS sibling selectors and positioning of the native control, we can create custom visual styling while also retaining the functionality and accessibility expectations of a native checkbox.\nSee the Pen Accessible Styled Native Checkbox by Scott (@scottohara) on CodePen.\nhttps://codepen.io/scottohara/pen/RqEayN/\nCustomizing checkboxes by visually hiding the input and styling well-placed markup with sibling selectors may seem old hat to some. However, many variations of these patterns do not take into account how their method of visually hiding the checkboxes can create discovery issues for certain screen reader navigation methods. For instance, if someone is using a mobile device and exploring by touch, how will they be able to drag their finger over an input that has been reduced to a single pixel, or positioned off screen?\nAs we move away from the simplicity of declaring a single HTML element and using clever CSS and markup patterns to create restyled form controls, we increase the need for additional testing to ensure no expected behaviors are lost. In other words, what should work in theory may not work in practice when you introduce the various different ways people may engage with a form control. It\u2019s worth remembering: what might be typical interactions for ourselves may be problematic if not impossible for others.\nLimitations to cleverness\nCreative coding will allow us to apply more consistent custom styles to some of the more problematic form controls. There will be a varied amount of custom markup, CSS, and sometimes JavaScript that will be needed to preserve the control\u2019s inherent usability and accessibility for each control we take this approach to.\nHowever, this method of restyling still doesn\u2019t solve for the lack of feature parity across different browsers. Nor is it a means to account for controls which don\u2019t have a native HTML element equivalent, such as a switch or multi-thumb range slider? Maybe there\u2019s a control that calls for a visual design or proposed user experience that would require too much fighting with a native control\u2019s behavior to be worth the level of effort to implement. Here\u2019s where we need to take another approach.\nUsing ARIA when appropriate\nSometimes we have no other option than to roll up our sleeves and start building custom form controls from scratch. Fair warning though: just because we\u2019re not leveraging a native HTML control as our foundation, it doesn\u2019t mean we have carte blanche to throw semantics out the window. Enter Accessible Rich Internet Applications (ARIA).\nARIA is a set of attributes that can modify existing elements, or extend HTML to include roles, properties and states that aren\u2019t native to the language. While divs and spans have no meaningful semantic information for us to leverage, with help from the ARIA specification and ARIA Authoring Practices we can incorporate these elements to help create the UI that we need while still following the first rule of Using ARIA:\n\nIf you can use a native HTML element or attribute with the semantics and behavior you require already built in, instead of re-purposing an element and adding an ARIA role, state or property to make it accessible, then do so.\n\nBy using these documents as guidelines, and testing our custom controls with people of various abilities, we can do our best to make sure a custom control performs as expected for as many people as possible.\nExceptions to the rule\nOne example of a control that allows for an exception to the first rule of Using ARIA would be a switch control.\nSwitches and checkboxes are similar components, in that they have both on/checked and off/unchecked states. However, checkboxes are often expected within the context of forms, or used to filter search queries on e-commerce sites. Switches are typically used to instantly enable or deactivate a particular setting at a component or app-based level, as this is their behavior in the native mobile apps in which they were popularized.\nWhile a switch control could be created by visually restyling a checkbox, this does not automatically mean that the underlying semantics and functionality will match the visual representation of the control. For example, the following CodePen restyles checkboxes to look like a switch control, but the semantics of the checkboxes remain which communicate a different way of interacting with the control than what you might expect from a native switch control.\nSee the Pen Switch Boxes - custom styled checkboxes posing as switches by Scott (@scottohara) on CodePen.\nhttps://codepen.io/scottohara/pen/XyvoeE/\nBy adding a role=\"switch\" to these checkboxes, we can repurpose the inherent checked/unchecked states of the native control, it\u2019s inherent ability to be focused by Tab key, and Space key to toggle state.\nBut while this is a valid approach to take in building a switch, how does this actually match up to reality?\nDoes it pass the test(s)?\nWhether deconstructing form controls to fully restyle them, or leveraging them and other HTML elements as a base to expand on, or create, a non-native form control, building it is just the start. We must test that what we\u2019ve restyled or rebuilt works the way people expect it to, if not better.\nWhat we must do here is run a gamut of comparative tests to document the functionality and usability of native form controls. For example:\n\n\nIs the control implemented in all supported browsers?\nIf not: where are the gaps? Will it be necessary to implement a custom solution for the situations that degrade to a standard text field? \nIf so: is each browser\u2019s implementation a good user experience? Is there room for improvement that can be tested against the native baseline? \n\n\nTest with multiple input devices.\nWhere the control is implemented, what is the quality of the user experience when using different input devices, such as mouse, touchscreen, keyboard, speech recognition or switch device, to name a few. \nYou\u2019ll find some HTML5 controls (like date pickers and number spinners) have additional UI elements that may not be announced to AT, or even allow keyboard accessibility. Often these controls can be adjusted by other means, such as text entry, or using arrow keys to increase or decrease values. If restyling or recreating a custom version of a control like these, it may make sense to maintain these native experiences as well.\n\n\nHow well does the control take to custom styles?\nIf a control can be styled enough to not need to be rebuilt from scratch, that\u2019s great! But make sure that there are no adverse affects on the accessibility of it. For instance, range sliders can be restyled and maintain their functionality and accessibility. However, elements like progress bars can be negatively affected by direct styling. \nAlways test with different browser and AT pairings to ensure nothing is lost when controls are restyled. \n\n\nDo specifications match reality?\nIf recreating controls to get around native limitations, such as the inability to style the options of a select element, or requiring a Switch control which is not native to HTML, do your solutions match user expectations? \nFor instance, selects have unique picker interfaces on touch devices. And switches have varied levels of support for different browser and screen reader pairings. Test with real people, and check your analytics. If these experiences don\u2019t match people\u2019s expectations, then maybe another solution is in order? \n\n\nWrapping up\nWhile styling form controls is definitely easier than it\u2019s ever been, that doesn\u2019t mean that it\u2019s at all simple, nor will it likely ever be. The level of difficulty you\u2019re going to face is going to depend entirely on what it is you\u2019re hoping to style, add-on to, or recreate. And even if you build your custom control exactly to specification, you\u2019ll still be reliant on browsers and assistive technologies being able to fully understand the component they\u2019ve been presented.\nForms and their controls are an incredibly important part of what we need the Internet for. Paying bills, scheduling appointments, ordering groceries, renewing your license or even ordering gifts for the holidays. These are all important tasks that people should be able to complete with as little effort as possible. Especially since for some, completing these tasks online might be their only option.\n2018 didn\u2019t end up being the year we got full customization of form controls sorted out. But that\u2019s OK. If we can continue to mindfully work with what we have, and instead challenge ourselves to follow inclusive design principles, well thought out Form Design Patterns, and solve problems with an accessibility first approach, we may come to realize that we can get along just fine without fully branded drop downs. \nAnd hey. There\u2019s always next year, right?", "year": "2018", "author": "Scott O'Hara", "author_slug": "scottohara", "published": "2018-12-13T00:00:00+00:00", "url": "https://24ways.org/2018/inclusive-considerations-when-restyling-form-controls/", "topic": "code"}
{"rowid": 249, "title": "Fast Autocomplete Search for Your Website", "contents": "Every website deserves a great search engine - but building a search engine can be a lot of work, and hosting it can quickly get expensive.\nI\u2019m going to build a search engine for 24 ways that\u2019s fast enough to support autocomplete (a.k.a. typeahead) search queries and can be hosted for free. I\u2019ll be using wget, Python, SQLite, Jupyter, sqlite-utils and my open source Datasette tool to build the API backend, and a few dozen lines of modern vanilla JavaScript to build the interface.\n\nTry it out here, then read on to see how I built it.\nFirst step: crawling the data\nThe first step in building a search engine is to grab a copy of the data that you plan to make searchable.\nThere are plenty of potential ways to do this: you might be able to pull it directly from a database, or extract it using an API. If you don\u2019t have access to the raw data, you can imitate Google and write a crawler to extract the data that you need.\nI\u2019m going to do exactly that against 24 ways: I\u2019ll build a simple crawler using wget, a command-line tool that features a powerful \u201crecursive\u201d mode that\u2019s ideal for scraping websites.\nWe\u2019ll start at the https://24ways.org/archives/ page, which links to an archived index for every year that 24 ways has been running.\nThen we\u2019ll tell wget to recursively crawl the website, using the --recursive flag.\nWe don\u2019t want to fetch every single page on the site - we\u2019re only interested in the actual articles. Luckily, 24 ways has nicely designed URLs, so we can tell wget that we only care about pages that start with one of the years it has been running, using the -I argument like this: -I /2005,/2006,/2007,/2008,/2009,/2010,/2011,/2012,/2013,/2014,/2015,/2016,/2017\nWe want to be polite, so let\u2019s wait for 2 seconds between each request rather than hammering the site as fast as we can: --wait 2\nThe first time I ran this, I accidentally downloaded the comments pages as well. We don\u2019t want those, so let\u2019s exclude them from the crawl using -X \"/*/*/comments\".\nFinally, it\u2019s useful to be able to run the command multiple times without downloading pages that we have already fetched. We can use the --no-clobber option for this.\nTie all of those options together and we get this command:\nwget --recursive --wait 2 --no-clobber \n -I /2005,/2006,/2007,/2008,/2009,/2010,/2011,/2012,/2013,/2014,/2015,/2016,/2017 \n -X \"/*/*/comments\" \n https://24ways.org/archives/ \nIf you leave this running for a few minutes, you\u2019ll end up with a folder structure something like this:\n$ find 24ways.org\n24ways.org\n24ways.org/2013\n24ways.org/2013/why-bother-with-accessibility\n24ways.org/2013/why-bother-with-accessibility/index.html\n24ways.org/2013/levelling-up\n24ways.org/2013/levelling-up/index.html\n24ways.org/2013/project-hubs\n24ways.org/2013/project-hubs/index.html\n24ways.org/2013/credits-and-recognition\n24ways.org/2013/credits-and-recognition/index.html\n...\nAs a quick sanity check, let\u2019s count the number of HTML pages we have retrieved:\n$ find 24ways.org | grep index.html | wc -l\n328\nThere\u2019s one last step! We got everything up to 2017, but we need to fetch the articles for 2018 (so far) as well. They aren\u2019t linked in the /archives/ yet so we need to point our crawler at the site\u2019s front page instead:\nwget --recursive --wait 2 --no-clobber \n -I /2018 \n -X \"/*/*/comments\" \n https://24ways.org/\nThanks to --no-clobber, this is safe to run every day in December to pick up any new content.\nWe now have a folder on our computer containing an HTML file for every article that has ever been published on the site! Let\u2019s use them to build ourselves a search index.\nBuilding a search index using SQLite\nThere are many tools out there that can be used to build a search engine. You can use an open-source search server like Elasticsearch or Solr, a hosted option like Algolia or Amazon CloudSearch or you can tap into the built-in search features of relational databases like MySQL or PostgreSQL.\nI\u2019m going to use something that\u2019s less commonly used for web applications but makes for a powerful and extremely inexpensive alternative: SQLite.\nSQLite is the world\u2019s most widely deployed database, even though many people have never even heard of it. That\u2019s because it\u2019s designed to be used as an embedded database: it\u2019s commonly used by native mobile applications and even runs as part of the default set of apps on the Apple Watch!\nSQLite has one major limitation: unlike databases like MySQL and PostgreSQL, it isn\u2019t really designed to handle large numbers of concurrent writes. For this reason, most people avoid it for building web applications.\nThis doesn\u2019t matter nearly so much if you are building a search engine for infrequently updated content - say one for a site that only publishes new content on 24 days every year.\nIt turns out SQLite has very powerful full-text search functionality built into the core database - the FTS5 extension.\nI\u2019ve been doing a lot of work with SQLite recently, and as part of that, I\u2019ve been building a Python utility library to make building new SQLite databases as easy as possible, called sqlite-utils. It\u2019s designed to be used within a Jupyter notebook - an enormously productive way of interacting with Python code that\u2019s similar to the Observable notebooks Natalie described on 24 ways yesterday.\nIf you haven\u2019t used Jupyter before, here\u2019s the fastest way to get up and running with it - assuming you have Python 3 installed on your machine. We can use a Python virtual environment to ensure the software we are installing doesn\u2019t clash with any other installed packages:\n$ python3 -m venv ./jupyter-venv\n$ ./jupyter-venv/bin/pip install jupyter\n# ... lots of installer output\n# Now lets install some extra packages we will need later\n$ ./jupyter-venv/bin/pip install beautifulsoup4 sqlite-utils html5lib\n# And start the notebook web application\n$ ./jupyter-venv/bin/jupyter-notebook\n# This will open your browser to Jupyter at http://localhost:8888/\nYou should now be in the Jupyter web application. Click New -> Python 3 to start a new notebook.\nA neat thing about Jupyter notebooks is that if you publish them to GitHub (either in a regular repository or as a Gist), it will render them as HTML. This makes them a very powerful way to share annotated code. I\u2019ve published the notebook I used to build the search index on my GitHub account. \n\u200b\n\nHere\u2019s the Python code I used to scrape the relevant data from the downloaded HTML files. Check out the notebook for a line-by-line explanation of what\u2019s going on.\nfrom pathlib import Path\nfrom bs4 import BeautifulSoup as Soup\nbase = Path(\"/Users/simonw/Dropbox/Development/24ways-search\")\narticles = list(base.glob(\"*/*/*/*.html\"))\n# articles is now a list of paths that look like this:\n# PosixPath('...24ways-search/24ways.org/2013/why-bother-with-accessibility/index.html')\ndocs = []\nfor path in articles:\n year = str(path.relative_to(base)).split(\"/\")[1]\n url = 'https://' + str(path.relative_to(base).parent) + '/'\n soup = Soup(path.open().read(), \"html5lib\")\n author = soup.select_one(\".c-continue\")[\"title\"].split(\n \"More information about\"\n )[1].strip()\n author_slug = soup.select_one(\".c-continue\")[\"href\"].split(\n \"/authors/\"\n )[1].split(\"/\")[0]\n published = soup.select_one(\".c-meta time\")[\"datetime\"]\n contents = soup.select_one(\".e-content\").text.strip()\n title = soup.find(\"title\").text.split(\" \u25c6\")[0]\n try:\n topic = soup.select_one(\n '.c-meta a[href^=\"/topics/\"]'\n )[\"href\"].split(\"/topics/\")[1].split(\"/\")[0]\n except TypeError:\n topic = None\n docs.append({\n \"title\": title,\n \"contents\": contents,\n \"year\": year,\n \"author\": author,\n \"author_slug\": author_slug,\n \"published\": published,\n \"url\": url,\n \"topic\": topic,\n })\nAfter running this code, I have a list of Python dictionaries representing each of the documents that I want to add to the index. The list looks something like this:\n[\n {\n \"title\": \"Why Bother with Accessibility?\",\n \"contents\": \"Web accessibility (known in other fields as inclus...\",\n \"year\": \"2013\",\n \"author\": \"Laura Kalbag\",\n \"author_slug\": \"laurakalbag\",\n \"published\": \"2013-12-10T00:00:00+00:00\",\n \"url\": \"https://24ways.org/2013/why-bother-with-accessibility/\",\n \"topic\": \"design\"\n },\n {\n \"title\": \"Levelling Up\",\n \"contents\": \"Hello, 24 ways. Iu2019m Ashley and I sell property ins...\",\n \"year\": \"2013\",\n \"author\": \"Ashley Baxter\",\n \"author_slug\": \"ashleybaxter\",\n \"published\": \"2013-12-06T00:00:00+00:00\",\n \"url\": \"https://24ways.org/2013/levelling-up/\",\n \"topic\": \"business\"\n },\n ...\nMy sqlite-utils library has the ability to take a list of objects like this and automatically create a SQLite database table with the right schema to store the data. Here\u2019s how to do that using this list of dictionaries.\nimport sqlite_utils\ndb = sqlite_utils.Database(\"/tmp/24ways.db\")\ndb[\"articles\"].insert_all(docs)\nThat\u2019s all there is to it! The library will create a new database and add a table to it called articles with the necessary columns, then insert all of the documents into that table.\n(I put the database in /tmp/ for the moment - you can move it to a more sensible location later on.)\nYou can inspect the table using the sqlite3 command-line utility (which comes with OS X) like this:\n$ sqlite3 /tmp/24ways.db\nsqlite> .headers on\nsqlite> .mode column\nsqlite> select title, author, year from articles;\ntitle author year \n------------------------------ ------------ ----------\nWhy Bother with Accessibility? Laura Kalbag 2013 \nLevelling Up Ashley Baxte 2013 \nProject Hubs: A Home Base for Brad Frost 2013 \nCredits and Recognition Geri Coady 2013 \nManaging a Mind Christopher 2013 \nRun Ragged Mark Boulton 2013 \nGet Started With GitHub Pages Anna Debenha 2013 \nCoding Towards Accessibility Charlie Perr 2013 \n...\n\nThere\u2019s one last step to take in our notebook. We know we want to use SQLite\u2019s full-text search feature, and sqlite-utils has a simple convenience method for enabling it for a specified set of columns in a table. We want to be able to search by the title, author and contents fields, so we call the enable_fts() method like this:\ndb[\"articles\"].enable_fts([\"title\", \"author\", \"contents\"])\nIntroducing Datasette\nDatasette is the open-source tool I\u2019ve been building that makes it easy to both explore SQLite databases and publish them to the internet.\nWe\u2019ve been exploring our new SQLite database using the sqlite3 command-line tool. Wouldn\u2019t it be nice if we could use a more human-friendly interface for that?\nIf you don\u2019t want to install Datasette right now, you can visit https://search-24ways.herokuapp.com/ to try it out against the 24 ways search index data. I\u2019ll show you how to deploy Datasette to Heroku like this later in the article.\nIf you want to install Datasette locally, you can reuse the virtual environment we created to play with Jupyter:\n./jupyter-venv/bin/pip install datasette\nThis will install Datasette in the ./jupyter-venv/bin/ folder. You can also install it system-wide using regular pip install datasette.\nNow you can run Datasette against the 24ways.db file we created earlier like so:\n./jupyter-venv/bin/datasette /tmp/24ways.db\nThis will start a local webserver running. Visit http://localhost:8001/ to start interacting with the Datasette web application.\nIf you want to try out Datasette without creating your own 24ways.db file you can download the one I created directly from https://search-24ways.herokuapp.com/24ways-ae60295.db\nPublishing the database to the internet\nOne of the goals of the Datasette project is to make deploying data-backed APIs to the internet as easy as possible. Datasette has a built-in command for this, datasette publish. If you have an account with Heroku or Zeit Now, you can deploy a database to the internet with a single command. Here\u2019s how I deployed https://search-24ways.herokuapp.com/ (running on Heroku\u2019s free tier) using datasette publish:\n$ ./jupyter-venv/bin/datasette publish heroku /tmp/24ways.db --name search-24ways\n-----> Python app detected\n-----> Installing requirements with pip\n\n-----> Running post-compile hook\n-----> Discovering process types\n Procfile declares types -> web\n\n-----> Compressing...\n Done: 47.1M\n-----> Launching...\n Released v8\n https://search-24ways.herokuapp.com/ deployed to Heroku\nIf you try this out, you\u2019ll need to pick a different --name, since I\u2019ve already taken search-24ways.\nYou can run this command as many times as you like to deploy updated versions of the underlying database.\nSearching and faceting\nDatasette can detect tables with SQLite full-text search configured, and will add a search box directly to the page. Take a look at http://search-24ways.herokuapp.com/24ways-b607e21/articles to see this in action.\n\u200b\n\nSQLite search supports wildcards, so if you want autocomplete-style search where you don\u2019t need to enter full words to start getting results you can add a * to the end of your search term. Here\u2019s a search for access* which returns articles on accessibility:\nhttp://search-24ways.herokuapp.com/24ways-ae60295/articles?_search=acces%2A\nA neat feature of Datasette is the ability to calculate facets against your data. Here\u2019s a page showing search results for svg with facet counts calculated against both the year and the topic columns:\nhttp://search-24ways.herokuapp.com/24ways-ae60295/articles?_search=svg&_facet=year&_facet=topic\nEvery page visible via Datasette has a corresponding JSON API, which can be accessed using the JSON link on the page - or by adding a .json extension to the URL:\nhttp://search-24ways.herokuapp.com/24ways-ae60295/articles.json?_search=acces%2A\nBetter search using custom SQL\nThe search results we get back from ../articles?_search=svg are OK, but the order they are returned in is not ideal - they\u2019re actually being returned in the order they were inserted into the database! You can see why this is happening by clicking the View and edit SQL link on that search results page.\nThis exposes the underlying SQL query, which looks like this:\nselect rowid, * from articles where rowid in (\n select rowid from articles_fts where articles_fts match :search\n) order by rowid limit 101\nWe can do better than this by constructing a custom SQL query. Here\u2019s the query we will use instead:\nselect\n snippet(articles_fts, -1, 'b4de2a49c8', '8c94a2ed4b', '...', 100) as snippet,\n articles_fts.rank, articles.title, articles.url, articles.author, articles.year\nfrom articles\n join articles_fts on articles.rowid = articles_fts.rowid\nwhere articles_fts match :search || \"*\"\n order by rank limit 10;\nYou can try this query out directly - since Datasette opens the underling SQLite database in read-only mode and enforces a one second time limit on queries, it\u2019s safe to allow users to provide arbitrary SQL select queries for Datasette to execute.\nThere\u2019s a lot going on here! Let\u2019s break the SQL down line-by-line:\nselect\n snippet(articles_fts, -1, 'b4de2a49c8', '8c94a2ed4b', '...', 100) as snippet,\nWe\u2019re using snippet(), a built-in SQLite function, to generate a snippet highlighting the words that matched the query. We use two unique strings that I made up to mark the beginning and end of each match - you\u2019ll see why in the JavaScript later on.\n articles_fts.rank, articles.title, articles.url, articles.author, articles.year\nThese are the other fields we need back - most of them are from the articles table but we retrieve the rank (representing the strength of the search match) from the magical articles_fts table.\nfrom articles\n join articles_fts on articles.rowid = articles_fts.rowid\narticles is the table containing our data. articles_fts is a magic SQLite virtual table which implements full-text search - we need to join against it to be able to query it.\nwhere articles_fts match :search || \"*\"\n order by rank limit 10;\n:search || \"*\" takes the ?search= argument from the page querystring and adds a * to the end of it, giving us the wildcard search that we want for autocomplete. We then match that against the articles_fts table using the match operator. Finally, we order by rank so that the best matching results are returned at the top - and limit to the first 10 results.\nHow do we turn this into an API? As before, the secret is to add the .json extension. Datasette actually supports multiple shapes of JSON - we\u2019re going to use ?_shape=array to get back a plain array of objects:\nJSON API call to search for articles matching SVG\nThe HTML version of that page shows the time taken to execute the SQL in the footer. Hitting refresh a few times, I get response times between 2 and 5ms - easily fast enough to power a responsive autocomplete feature.\nA simple JavaScript autocomplete search interface\nI considered building this using React or Svelte or another of the myriad of JavaScript framework options available today, but then I remembered that vanilla JavaScript in 2018 is a very productive environment all on its own.\nWe need a few small utility functions: first, a classic debounce function adapted from this one by David Walsh:\nfunction debounce(func, wait, immediate) {\n let timeout;\n return function() {\n let context = this, args = arguments;\n let later = () => {\n timeout = null;\n if (!immediate) func.apply(context, args);\n };\n let callNow = immediate && !timeout;\n clearTimeout(timeout);\n timeout = setTimeout(later, wait);\n if (callNow) func.apply(context, args);\n };\n};\nWe\u2019ll use this to only send fetch() requests a maximum of once every 100ms while the user is typing.\nSince we\u2019re rendering data that might include HTML tags (24 ways is a site about web development after all), we need an HTML escaping function. I\u2019m amazed that browsers still don\u2019t bundle a default one of these:\nconst htmlEscape = (s) => s.replace(\n />/g, '>'\n).replace(\n /Autocomplete search\n\n
\nAnd now the autocomplete implementation itself, as a glorious, messy stream-of-consciousness of JavaScript:\n// Embed the SQL query in a multi-line backtick string:\nconst sql = `select\n snippet(articles_fts, -1, 'b4de2a49c8', '8c94a2ed4b', '...', 100) as snippet,\n articles_fts.rank, articles.title, articles.url, articles.author, articles.year\nfrom articles\n join articles_fts on articles.rowid = articles_fts.rowid\nwhere articles_fts match :search || \"*\"\n order by rank limit 10`;\n\n// Grab a reference to the \nconst searchbox = document.getElementById(\"searchbox\");\n\n// Used to avoid race-conditions:\nlet requestInFlight = null;\n\nsearchbox.onkeyup = debounce(() => {\n const q = searchbox.value;\n // Construct the API URL, using encodeURIComponent() for the parameters\n const url = (\n \"https://search-24ways.herokuapp.com/24ways-866073b.json?sql=\" +\n encodeURIComponent(sql) +\n `&search=${encodeURIComponent(q)}&_shape=array`\n );\n // Unique object used just for race-condition comparison\n let currentRequest = {};\n requestInFlight = currentRequest;\n fetch(url).then(r => r.json()).then(d => {\n if (requestInFlight !== currentRequest) {\n // Avoid race conditions where a slow request returns\n // after a faster one.\n return;\n }\n let results = d.map(r => `\n \n
\n
${htmlEscape(r.author)} - ${r.year}
\n
${highlight(r.snippet)}
\n
\n `).join(\"\");\n document.getElementById(\"results\").innerHTML = results;\n });\n}, 100); // debounce every 100ms\nThere\u2019s just one more utility function, used to help construct the HTML results:\nconst highlight = (s) => htmlEscape(s).replace(\n /b4de2a49c8/g, ''\n).replace(\n /8c94a2ed4b/g, ' '\n);\nThis is what those unique strings passed to the snippet() function were for.\nAvoiding race conditions in autocomplete\nOne trick in this code that you may not have seen before is the way race-conditions are handled. Any time you build an autocomplete feature, you have to consider the following case:\n\nUser types acces\nBrowser sends request A - querying documents matching acces*\nUser continues to type accessibility\nBrowser sends request B - querying documents matching accessibility*\nRequest B returns. It was fast, because there are fewer documents matching the full term\nThe results interface updates with the documents from request B, matching accessibility*\nRequest A returns results (this was the slower of the two requests)\nThe results interface updates with the documents from request A - results matching access*\n\nThis is a terrible user experience: the user saw their desired results for a brief second, and then had them snatched away and replaced with those results from earlier on.\nThankfully there\u2019s an easy way to avoid this. I set up a variable in the outer scope called requestInFlight, initially set to null.\nAny time I start a new fetch() request, I create a new currentRequest = {} object and assign it to the outer requestInFlight as well.\nWhen the fetch() completes, I use requestInFlight !== currentRequest to sanity check that the currentRequest object is strictly identical to the one that was in flight. If a new request has been triggered since we started the current request we can detect that and avoid updating the results.\nIt\u2019s not a lot of code, really\nAnd that\u2019s the whole thing! The code is pretty ugly, but when the entire implementation clocks in at fewer than 70 lines of JavaScript, I honestly don\u2019t think it matters. You\u2019re welcome to refactor it as much you like.\nHow good is this search implementation? I\u2019ve been building search engines for a long time using a wide variety of technologies and I\u2019m happy to report that using SQLite in this way is genuinely a really solid option. It scales happily up to hundreds of MBs (or even GBs) of data, and the fact that it\u2019s based on SQL makes it easy and flexible to work with.\nA surprisingly large number of desktop and mobile applications you use every day implement their search feature on top of SQLite.\nMore importantly though, I hope that this demonstrates that using Datasette for an API means you can build relatively sophisticated API-backed applications with very little backend programming effort. If you\u2019re working with a small-to-medium amount of data that changes infrequently, you may not need a more expensive database. Datasette-powered applications easily fit within the free tier of both Heroku and Zeit Now.\nFor more of my writing on Datasette, check out the datasette tag on my blog. And if you do build something fun with it, please let me know on Twitter.", "year": "2018", "author": "Simon Willison", "author_slug": "simonwillison", "published": "2018-12-19T00:00:00+00:00", "url": "https://24ways.org/2018/fast-autocomplete-search-for-your-website/", "topic": "code"}