This code only worked in Internet Explorer and in some versions of Safari, but that was plenty of people to befriend. However, MySpace was prepared for this: they also filtered the word javascript from
.
Samy discovered that by inserting a line break into his code, MySpace would not filter out the word javascript. The browser would continue to run the code just fine! Samy had now broken past MySpace’s first line of defence and was able to start running code on his profile page. Now he started looking at what he could do with that code.
alert(document.body.innerHTML)
Samy wondered if he could inspect the page’s source to find the details of other MySpace users to befriend. To do this, you would normally use document.body.innerHTML, but MySpace had filtered this too.
alert(eval('document.body.inne' + 'rHTML'))
This isn’t a problem if you build up JavaScript code inside a string and execute it using the eval() function. This trick also worked with XMLHttpRequest.onReadyStateChange, which allowed Samy to send friend requests to the MySpace API and install the JavaScript code on his new friends’ pages.
One final obstacle stood in his way. The same origin policy is a security mechanism that prevents scripts hosted on one domain interacting with sites hosted on another domain.
if (location.hostname == 'profile.myspace.com') {
document.location = 'http://www.myspace.com'
+ location.pathname + location.search
}
Samy discovered that only the http://www.myspace.com domain would accept his API requests, and requests from http://profile.myspace.com were being blocked by the browser’s same-origin policy. By redirecting the browser to http://www.myspace.com, he discovered that he could load profile pages and successfully make requests to MySpace’s API. Samy installed this code on his profile page, and he waited.
Over the course of the next day, over a million people unwittingly installed Samy’s code into their MySpace profile pages and invited their friends. The load of friend requests on MySpace was so large that the site buckled and shut down. It took them two hours to remove Samy’s code and patch the security holes he exploited. Samy was raided by the United States secret service and sentenced to do 90 days of community service.
This is the power of installing a little bit of JavaScript on someone else’s website. It is called cross site scripting, and its effects can be devastating. It is suspected that cross-site scripting was to blame for the 2018 British Airways breach that leaked the credit card details of 380,000 people.
So how can you help protect yourself from cross-site scripting?
Always sanitise user input when it comes in, using a library such as sanitize-html. Open source tools like this benefit from hundreds of hours of work from dozens of experienced contributors. Don’t be tempted to roll your own protection. MySpace was prepared, but they were not prepared enough. It makes no sense to turn this kind of help down.
You can also use an auto-escaping templating language to make sure nobody else’s HTML can get into your pages. Both Angular and React will do this for you, and they are extremely convenient to use.
You should also implement a content security policy to restrict the domains that content like scripts and stylesheets can be loaded from. Loading content from sites not under your control is a significant security risk, and you should use a CSP to lock this down to only the sources you trust. CSP can also block the use of the eval() function.
For content not under your control, consider setting up sub-resource integrity protection. This allows you to add hashes to stylesheets and scripts you include on your website. Hashes are like fingerprints for digital files; if the content changes, so does the fingerprint. Adding hashes will allow your browser to keep your site safe if the content changes without you knowing.
npm audit: Protecting yourself from code you don’t own
JavaScript and npm run the modern web. Together, they make it easy to take advantage of the world’s largest public registry of open source software. How do you protect yourself from code written by someone you’ve never met? Enter npm audit.
npm audit reviews the security of your website’s dependency tree. You can start using it by upgrading to the latest version of npm:
npm install npm -g
npm audit
When you run npm audit, npm submits a description of your dependencies to the Registry, which returns a report of known vulnerabilities for the packages you have installed.
If your website has a known cross-site scripting vulnerability, npm audit will tell you about it. What’s more, if the vulnerability has been patched, running npm audit fix will automatically install the patched package for you!
Securing your site like it’s 2019
The truth is that since the early days of the web, the stakes of a security breach have become much, much higher. The web is so much more than fandom and mailing DVDs - online banking is now mainstream, social media and dating websites store intimate information about our personal lives, and we are even inviting the internet into our homes.
However, we have powerful new allies helping us stay safe. There are more resources than ever before to teach us how to write secure code. Tools like Angular and React are designed with security features baked-in from the start. We have a new generation of security tools like npm audit to watch over our dependencies.
As we roll over into 2019, let’s take the opportunity to reflect on the security of the code we write and be grateful for the everything we’ve learned in the last twenty years.",2018,Katie Fenn,katiefenn,2018-12-01T00:00:00+00:00,https://24ways.org/2018/securing-your-site-like-its-1999/,code
256,Develop Your Naturalist Superpowers with Observable Notebooks and iNaturalist,"We’re going to level up your knowledge of what animals you might see in an area at a particular time of year - a skill every naturalist* strives for - using technology! Using iNaturalist and Observable Notebooks we’re going to prototype seasonality graphs for particular species in an area, and automatically create a guide to what animals you might see in each month.
*(a Naturalist is someone who likes learning about nature, not someone who’s a fan of being naked, that’s a ‘Naturist’… different thing!)
Looking for critters in rocky intertidal habitats
One of my favourite things to do is going rockpooling, or as we call it over here in California, ‘tidepooling’. Amounting to the same thing, it’s going to a beach that has rocks where the tide covers then uncovers little pools of water at different times of the day. All sorts of fun creatures and life can be found in this ‘rocky intertidal habitat’
A particularly exciting creature that lives here is the Nudibranch, a type of super colourful ‘sea slug’. There are over 3000 species of Nudibranch worldwide. (The word “nudibranch” comes from the Latin nudus, naked, and the Greek βρανχια / brankhia, gills.)
They are however quite tricky to find! Even though they are often brightly coloured and interestingly shaped, some of them are very small, and in our part of the world in the Bay Area in California their appearance in our rockpools is seasonal. We see them more often in Summer months, despite the not-as-low tides as in our Winter and Spring seasons.
My favourite place to go tidepooling here is Pillar Point in Half Moon bay (at other times of the year more famously known for the surf competition ‘Mavericks’). The rockpools there are rich in species diversity, of varied types and water-coverage habitat zones as well as being relatively accessible.
I was rockpooling at Pillar Point recently with my parents and we talked to a lady who remarked that she hadn’t seen any Nudibranchs on her visit this time. I realised that having an idea of what species to find where, and at what time of year is one of the many superpower goals of every budding Naturalist.
Using technology and the croudsourced species observations of the iNaturalist community we can shortcut our way to this superpower!
Finding nearby animals with iNaturalist
We’re going to be getting our information about what animals you can see in Pillar Point using iNaturalist. iNaturalist is a really fun platform that helps connect people to nature and report their findings of life in the outdoors. It is also a community of nature-loving people who help each other identify and confirm those observations. iNaturalist is a project run as a joint initiative by the California Academy of Sciences and the National Geographic Society.
I’ve been using iNaturalist for over two years to record and identify plants and animals that I’ve found in the outdoors. I use their iPhone app to upload my pictures, which then uses machine learning algorithms to make an initial guess at what it is I’ve seen. The community is really active, and I often find someone else has verified or updated my species guess pretty soon after posting.
This process is great because once an observation has been identified by at least two people it becomes ‘verified’ and is considered research grade. Research grade observations get exported and used by scientists, as well as being indexed by the Global Biodiversity Information Facility, GBIF.
iNaturalist has a great API and API explorer, which makes interacting and prototyping using iNaturalist data really fun. For example, if you go to the API explorer and expand the Observations : Search and fetch section and then the GET /observations API, you get a selection of input boxes that allow you to play with options that you can then pass to the API when you click the ‘Try it out’ button.
You’ll then get a URL that looks a bit like
https://api.inaturalist.org/v1/observations?captive=false &geo=true&verifiable=true&taxon_id=47113&lat=37.495461&lng=-122.499584 &radius=5&order=desc&order_by=created_at
which you can call and interrrogate using a programming language of your choice.
If you would like to see an all-JavaScript application that uses the iNaturalist API, take a look at OwlsNearMe.com which Simon and I built one weekend earlier this year. It gets your location and shows you all iNaturalist observations of owls near you and lists which species you are likely to see (not adjusted for season).
Rapid development using Observable Notebooks
We’re going to be using Observable Notebooks to prototype our examples, pulling data down from iNaturalist. I really like using visual notebooks like Observable, they are great for learning and building things quickly. You may be familiar with Jupyter notebooks for Python which is similar but takes a bit of setup to get going - I often use these for prototyping too. Observable is amazing for querying and visualising data with JavaScript and since it is a hosted product it doesn’t require any setup at all.
You can follow along and play with this example on my Observable notebook. If you create an account there you can fork my notebook and create your own version of this example.
Each ‘notebook’ consists of a page with a column of ‘cells’, similar to what you get in a spreadsheet. A cell can contain Markdown text or JavaScript code and the output of evaluating the cell appears above the code that generated it. There are lots of tutorials out there on Observable Notebooks, I like this code introduction one from Observable (and D3) creator Mike Bostock.
Developing your Naturalist superpowers
If you have an idea of what plants and critters you might see in a place at the time you visit, you can hone in on what you want to study and train your Naturalist eye to better identify the life around you.
For our example, we care about wildlife we can see at Pillar Point, so we need a way of letting the iNaturalist API know which area we are interested in.
We could use a latitide, longitude and radius for this, but a rectangular bounding box is a better shape for the reef. We can use this tool to draw the area we want to search within: boundingbox.klokantech.com
The tool lets you export the bounding box in several forms using the dropdown at the bottom left under the map givese We are going to use the ‘DublinCore’ format as it’s closest to the format needed by the iNaturalist API.
westlimit=-122.50542; southlimit=37.492805; eastlimit=-122.492738; northlimit=37.499811
A quick map primer:
The higher the latitude the more north it is
The lower the latitude the more south it is
Latitude 0 = the equator
The higher the longitude the more east it is of Greenwich
The lower the longitude the more west it is of Greenwich
Longitude 0 = Greenwich
In the iNaturalst API we want to use the parameters nelat, nelng, swlat, swlng to create a query that looks inside a bounding box of Pillar Point near Half Moon Bay in California:
nelat = highest latitude = north limit = 37.499811
nelng = highest longitude = east limit = -122.492738
swlat = smallest latitude = south limit = 37.492805
swlng = smallest longitude = west limit = 122.50542
As API parameters these look like this:
?nelat=37.499811&nelng=-122.492738&swlat=37.492805&swlng=122.50542
These parameters in this format can be used for most of the iNaturalist API methods.
Nudibranch seasonality in Pillar Point
We can use the iNaturalist observation_histogram API to get a count of Nudibranch observations per week-of-year across all time and within our Pillar Point bounding box.
In addition to the geographic parameters that we just worked out, we are also sending the taxon_id of 47113, which is iNaturalists internal number associated with the Nudibranch taxon. By using this we can get all species which are under the parent ‘Order Nudibranchia’.
Another useful piece of naturalist knowledge is understanding the biological classification scheme of Taxanomic Rank - roughly, when a species has a Latin name of two words eg ‘Glaucus Atlanticus’ the first Latin word is the ‘Genus’ like a family name ‘Glaucus’, and the second word identifies that particular species, like a given name ‘Atlanticus’.
The two Latin words together indicate a specific species, the term we use colloquially to refer to a type of animal often differs wildly region to region, and sometimes the same common name in two countries can refer to two different species. The common names for the Glaucus Atlanticus (which incidentally is my favourite sea slug) include: sea swallow, blue angel, blue glaucus, blue dragon, blue sea slug and blue ocean slug! Because this gets super confusing, Scientists like using this Latin name format instead.
The following piece of code asks the iNaturalist Histogram API to return per-week counts for verified observations of Nudibranchs within our Pillar Point bounding box:
pillar_point_counts_per_week = fetch(
""https://api.inaturalist.org/v1/observations/histogram?taxon_id=47113&nelat=37.499811&nelng=-122.492738&swlat=37.492805&swlng=-122.50542&date_field=observed&interval=week_of_year&verifiable=true""
).then(response => {
return response.json();
})
Our next step is to take this data and draw a graph! We’ll be using Vega-Lite for this, which is a fab JavaScript graphing libary that is also easy and fun to use with Observable Notebooks.
(Here is a great tutorial on exploring data and drawing graphs with Observable and Vega-Lite)
The iNaturalist API returns data that looks like this:
{
""total_results"": 53,
""page"": 1,
""per_page"": 53,
""results"": {
""week_of_year"": {
""1"": 136,
""2"": 20,
""3"": 150,
""4"": 65,
""5"": 186,
""6"": 74,
""7"": 47,
""8"": 87,
""9"": 64,
""10"": 56,
But for our Vega-Lite graph we need data that looks like this:
[{
""week"": ""01"",
""value"": 136
}, {
""week"": ""02"",
""value"": 20
}, ...]
We can convert what we get back from the API to the second format using a loop that iterates over the object keys:
objects_to_plot = {
let objects = [];
Object.keys(pillar_point_counts_per_week.results.week_of_year).map(function(week_index) {
objects.push({
week: `Wk ${week_index.toString()}`,
observations: pillar_point_counts_per_week.results.week_of_year[week_index]
});
})
return objects;
}
We can then plug this into Vega-Lite to draw us a graph:
vegalite({
data: {values: objects_to_plot},
mark: ""bar"",
encoding: {
x: {field: ""week"", type: ""nominal"", sort: null},
y: {field: ""observations"", type: ""quantitative""}
},
width: width * 0.9
})
It’s worth noting that we have a lot of observations of Nudibranchs particularly at Pillar Point due in no small part to the intertidal monitoring research that Alison Young and Rebecca Johnson facilitate for the California Achademy of Sciences.
So, what if we want to look for the seasonality of observations of a particular species of adorable sea slug? We want our interface to have a select box with a list of all the species you might find at any time of year. We can do this using the species_counts API to create us an object with the iNaturalist species ID and common & Latin names.
pillar_point_nudibranches = {
let api_results = await fetch(
""https://api.inaturalist.org/v1/observations/species_counts?taxon_id=47113&nelat=37.499811&nelng=-122.492738&swlat=37.492805&swlng=-122.50542&date_field=observed&verifiable=true""
).then(r => r.json())
let species_list = api_results.results.map(i => ({
value: i.taxon.id,
label: `${i.taxon.preferred_common_name} (${i.taxon.name})`
}));
return species_list
}
We can create an interactive select box by importing code from Jeremy Ashkanas’ Observable Notebook: add import {select} from ""@jashkenas/inputs"" to a cell anywhere in our notebook. Observable is magic: like a spreadsheet, the order of the cells doesn’t matter - if one cell is referenced by any other cell then when that cell updates all the other cells refresh themselves. You can also import and reference one notebook from another!
viewof select_species = select({
title: ""Which Nudibranch do you want to see seasonality for?"",
options: [{value: """", label: ""All the Nudibranchs!""}, ...pillar_point_nudibranches],
value: """"
})
Then we go back to our old favourite, the histogram API just like before, only this time we are calling it with the value created by our select box ${select_species} as taxon_id instead of the number 47113.
pillar_point_counts_per_month_per_species = fetch(
`https://api.inaturalist.org/v1/observations/histogram?taxon_id=${select_species}&nelat=37.499811&nelng=-122.492738&swlat=37.492805&swlng=-122.50542&date_field=observed&interval=month_of_year&verifiable=true`
).then(r => r.json())
Now for the fun graph bit! As we did before, we re-format the result of the API into a format compatible with Vega-Lite:
objects_to_plot_species_month = {
let objects = [];
Object.keys(pillar_point_counts_per_month_per_species.results.month_of_year).map(function(month_index) {
objects.push({
month: (new Date(2018, (month_index - 1), 1)).toLocaleString(""en"", {month: ""long""}),
observations: pillar_point_counts_per_month_per_species.results.month_of_year[month_index]
});
})
return objects;
}
(Note that in the above code we are creating a date object with our specific month in, and using toLocalString() to get the longer English name for the month. Because the JavaScript Date object counts January as 0, we use month_index -1 to get the correct month)
And we draw the graph as we did before, only now if you interact with the select box in Observable the graph will dynamically update!
vegalite({
data: {values: objects_to_plot_species_month},
mark: ""bar"",
encoding: {
x: {field: ""month"", type: ""nominal"", sort:null},
y: {field: ""observations"", type: ""quantitative""}
},
width: width * 0.9
})
Now we can see when is the best time of year to plan to go tidepooling in Pillar Point if we want to find a specific species of Nudibranch.
This tool is great for planning when we to go rockpooling at Pillar Point, but what about if you are going this month and want to pre-train your eye with what to look for in order to impress your friends with your knowledge of Nudibranchs?
Well… we can create ourselves a dynamic guide that you can with a list of the species, their photo, name and how many times they have been observed in that month of the year!
Our select box this time looks as follows, simpler than before but assigning the month value to the variable selected_month.
viewof selected_month = select({
title: ""When do you want to see Nudibranchs?"",
options: [
{ label: ""Whenever"", value: """" },
{ label: ""January"", value: ""1"" },
{ label: ""February"", value: ""2"" },
{ label: ""March"", value: ""3"" },
{ label: ""April"", value: ""4"" },
{ label: ""May"", value: ""5"" },
{ label: ""June"", value: ""6"" },
{ label: ""July"", value: ""7"" },
{ label: ""August"", value: ""8"" },
{ label: ""September"", value: ""9"" },
{ label: ""October"", value: ""10"" },
{ label: ""November"", value: ""11"" },
{ label: ""December"", value: ""12"" },
],
value: """"
})
We then can use the species_counts API to get all the relevant information about which species we can see in month=${selected_month}. We’ll be able to reference this response object and its values later with the variable we just created, eg: all_species_data.results[0].taxon.name.
all_species_data = fetch(
`https://api.inaturalist.org/v1/observations/species_counts?taxon_id=47113&month=${selected_month}&nelat=37.499811&nelng=-122.492738&swlat=37.492805&swlng=-122.50542&verifiable=true`
).then(r => r.json())
You can render HTML directly in a notebook cell using Observable’s html tagged template literal:
If you go to Pillar Point ${
{"""": """",
""1"":""in January"",
""2"":""in Febrary"",
""3"":""in March"",
""4"":""in April"",
""5"":""in May"",
""6"":""in June"",
""7"":""in July"",
""8"":""in August"",
""9"":""in September"",
""10"":""in October"",
""11"":""in November"",
""12"":""in December"",
}[selected_month]
} you might see…
${all_species_data.results.map(s => `
${s.taxon.name}
Seen ${s.count} times
`)}
These few lines of HTML are all you need to get this exciting dynamic guide to what Nudibranchs you will see in each month!
Play with it yourself in this Observable Notebook.
Conclusion
I hope by playing with these examples you have an idea of how powerful it can be to prototype using Observable Notebooks and how you can use the incredible crowdsourced community data and APIs from iNaturalist to augment your naturalist skills and impress your friends with your new ‘knowledge of nature’ superpower.
Lastly I strongly encourage you to get outside on a low tide to explore your local rocky intertidal habitat, and all the amazing critters that live there.
Here is a great introduction video to tidepooling / rockpooling, by Rebecca Johnson and Alison Young from the California Academy of Sciences.",2018,Natalie Downe,nataliedowne,2018-12-18T00:00:00+00:00,https://24ways.org/2018/observable-notebooks-and-inaturalist/,code
243,Researching a Property in the CSS Specifications,"I frequently joke that I’m “reading the specs so you don’t have to”, as I unpack some detail of a CSS spec in a post on my blog, some documentation for MDN, or an article on Smashing Magazine. However waiting for someone like me to write an article about something is a pretty slow way to get the information you need. Sometimes people like me get things wrong, or specifications change after we write a tutorial.
What if you could just look it up yourself? That’s what you get when you learn to read the CSS specifications, and in this article my aim is to give you the basic details you need to grab quick information about any CSS property detailed in the CSS specs.
Where are the CSS Specifications?
The easiest way to see all of the CSS specs is to take a look at the Current Work page in the CSS section of the W3C Website. Here you can see all of the specifications listed, the level they are at and their status. There is also a link to the specification from this page. I explained CSS Levels in my article Why there is no CSS 4.
Who are the specifications for?
CSS specifications are for everyone who uses CSS. You might be a browser engineer - referred to as an implementor - needing to know how to implement a feature, or a web developer - referred to as an author - wanting to know how to use the feature. The fact that both parties are looking at the same document hopefully means that what the browser displays is what the web developer expected.
Which version of a spec should I look at?
There are a couple of places you might want to look. Each published spec will have the latest published version, which will have TR in the URL and can be accessed without a date (which is always the newest version) or at a date, which will be the date of that publication. If I’m referring to a particular Working Draft in an article I’ll typically link to the dated version. That way if the information changes it is possible for someone to see where I got the information from at the time of writing.
If you want the very latest additions and changes to the spec, then the Editor’s Draft is the place to look. This is the version of the spec that the editors are committing changes to. If I make a change to the Multicol spec and push it to GitHub, within a few minutes that will be live in the Editor’s Draft. So it is possible there are errors, bits of text that we are still working out and so on. The Editor’s Draft however is definitely the place to look if you are wanting to raise an issue on a spec, as it may be that the issue you are about to raise is already fixed.
If you are especially keen on seeing updates to specifications keep an eye on https://drafts.csswg.org/ as this is a list of drafts, along with the date they were last updated.
How to approach a spec
The first thing to understand is that most CSS Specifications start with the most straightforward information, and get progressively further into the weeds. For an author the initial examples and explanations are likely to be of interest, and then the property definitions and examples. Therefore, if you are looking at a vast spec, know that you probably won’t need to read all the way to the bottom, or read every section in detail.
The second thing that is useful to know about modern CSS specifications is how modularized they are. It really never is a case of finding everything you need in a single document. If we tried to do that, there would be a lot of repetition and likely inconsistency between specs. There are some key specifications that many other specifications draw on, such as:
Values and Units
Intrinsic and Extrinsic Sizing
Box Alignment
When something is defined in another specification the spec you are reading will link to it, so it is worth opening that other spec in a new tab in order that you can refer back to it as you explore.
Researching your property
As an example we will take a look at the property grid-auto-rows, this property defines row tracks in the implicit grid when using CSS Grid Layout. The first thing you will need to do is find out which specification defines this property.
You might already know which spec the property is part of, and therefore you could go directly to the spec and search using your browser or look in the navigation for the spec to find it. Alternatively, you could take a look at the CSS Property Index, which is an automatically generated list of CSS Properties.
Clicking on a property will take you to the TR version of the spec, the latest published draft, and the definition of that property in it. This definition begins with a panel detailing the syntax of this property. For grid-auto-rows, you can see that it is listed along with grid-auto-columns as these two properties are essentially identical. They take the same values and work in the same way, one for rows and the other for columns.
Value
For value we can see that the property accepts a value
. The next thing to do is to find out what that actually means, clicking will take you to where it is defined in the Grid spec.
The value is defined as accepting various values:
minmax( , )
fit-content(
We need to head down the rabbit hole to find out what each of these mean. From here we essentially go down line by line until we have unpacked the value of track-size.
is defined just below as:
min-content
max-content
auto
So these are all things that would be valid to use as a value for grid-auto-rows.
The first value of is something you will see in many specifications as a value. It means that you can use a length unit - for example px or em - or a percentage. Some properties only accept a in which case you know that you cannot use a percentage as the value. This means that you could have grid-auto-rows with any of the following values.
grid-auto-rows: 100px;
grid-auto-rows: 1em;
grid-auto-rows: 30%;
When using percentages, it is important to know what it is a percentage of. As a percentage has to resolve from something. There is text in the spec which explains how column and row percentages work.
“ values are relative to the inline size of the grid container in column grid tracks, and the block size of the grid container in row grid tracks.”
This means that in a horizontal writing mode such as when using English, a percentage when used as a track-size in grid-auto-columns would be a percentage of the width of the grid, and a percentage in grid-auto-rows a percentage of the height of the grid.
The second value of is also defined here, as “A non-negative dimension with the unit fr specifying the track’s flex factor.” This is the fr unit, and the spec links to a fuller definition of fr as this unit is only used in Grid Layout so it is therefore defined in the grid spec. We now know that a valid value would be:
grid-auto-rows: 1fr;
There is some useful information about the fr unit in this part of the spec. It is noted that the fr unit has an automatic minimum. This means that 1fr is really minmax(auto, 1fr). This is why having a number of tracks all at 1fr does not mean that all are equal sized, as a larger item in any of the tracks would have a large auto size and therefore would be larger after spare space had been distributed.
We then have min-content and max-content. These keywords can be used for track sizing and the specification defines what they mean in the context of sizing a track, representing the min and max-sizing contributions of the grid tracks. You will see that there are various terms linked in the definition, so if you do not know what these mean you can follow them to find out.
For example the spec links max-content contribution to the CSS Intrinsic and Extrinsic Sizing specification. This is one of those specs which is drawn on by many other specifications. If we follow that link we can read the definition there and follow further links to understand what each term means. The more that you read specifications the more these terms will become familiar to you. Just like learning a foreign language, at first you feel like you have to look up every little thing. After a while you remember the vocabulary.
We can now add min-content and max-content to our available values.
grid-auto-rows: min-content;
grid-auto-rows: max-content;
The final item in our list is auto. If you are familiar with using Grid Layout, then you are probably aware that an auto sized track for will grow to fit the content used. There is an interesting note here in the spec detailing that auto sized rows will stretch to fill the grid container if there is extra space and align-content or justify-content have a value of stretch. As stretch is the default value, that means these tracks stretch by default. Tracks using other types of length will not behave like this.
grid-auto-rows: auto;
So, this was the list for , the next possible value is minmax( , ). So this is telling us that we can use minmax() as a value, the final (max) value will be and we have already unpacked all of the allowable values there. The first value (min) is detailed as an . If we look at the values for this, we discover that they are the same as , minus the value:
min-content
max-content
auto
We already know what all of these do, so we can add possible minmax() values to our list of values for .
grid-auto-rows: minmax(100px, 200px);
grid-auto-rows: minmax(20%, 1fr);
grid-auto-rows: minmax(1em, auto);
grid-auto-rows: minmax(min-content, max-content);
Finally we can use fit-content( . We can see that fit-content takes a value of which we already know to be either a length unit, or a percentage. The spec details how fit-content is worked out, and it essentially allows a track which acts as if you had used the max-content keyword, however the track stops growing when it hits the length passed to it.
grid-auto-rows: fit-content(200px);
grid-auto-rows: fit-content(20%);
Those are all of our possible values, and to round things off, check again at the initial value, you can see it has a little + sign next to it, click that and you will be taken to the CSS Values and Units module to find that, “A plus (+) indicates that the preceding type, word, or group occurs one or more times.” This means that we can pass a single track size to grid-auto-rows or multiple track sizes as a space separated list. Below the box is an explanation of what happens if you pass in more than one track size:
“If multiple track sizes are given, the pattern is repeated as necessary to find the size of the implicit tracks. The first implicit grid track after the explicit grid receives the first specified size, and so on forwards; and the last implicit grid track before the explicit grid receives the last specified size, and so on backwards.”
Therefore with the following CSS, if five implicit rows were needed they would be as follows:
100px
1fr
auto
100px
1fr
.grid {
display: grid;
grid-auto-rows: 100px 1fr auto;
}
Initial
We can now move to the next line in the box, and you’ll be glad to know that it isn’t going to require as much unpacking! This simply defines the initial value for grid-auto-rows. If you do not specify anything, created rows will be auto sized. All CSS properties have an initial value that they will use if they are invoked as part of the usage of the specification they are in, but you do not set a value for them. In the case of grid-auto-rows it is used whenever rows are created in the implicit grid, so it needs to have a value to be used even if you do not set one.
Applies to
This line tells us what this property is used for. Some properties are used in multiple places. For example if you look at the definition for justify-content in the Box Alignment specification you can see it is used in multicol containers, flex containers, and grid containers. In our case the property only applies for grid containers.
Inherited
This tells us if the property can be inherited from a parent element if it is not set. In the case of grid-auto-rows it is not inherited. A property such as color is inherited, so you do not need to set it on each element.
Percentages
Are percentages allowed for this property, and if so how are they calculated. In this case we are referred to the definition for grid-template-columns and grid-template-rows where we discover that the percentage is from the corresponding dimension of the content area.
Media
This defines the media group that the property belongs to. In this case visual.
Computed Value
This details how the value is resolved. The grid-auto-rows property again refers to track sizing as defined for grid-template-columns and grid-template-rows, which tells us the computed value is as specified with lengths made absolute.
Canonical Order
If you have a property–generally a shorthand property–which takes multiple values in a set order, then those values need to be serialized in the order detailed in the grammar for that property. In general you don’t need to worry about this value in the table.
Animation Type
This details whether the property can be animated, and if so what type of animation. This is useful if you are trying to animate something and not getting the result that you expect. Note that just because something is listed in the spec as animatable does not mean that browsers will have implemented animation for that property yet!
That’s (mostly) it!
Sometimes the property will have additional examples - there is one underneath the table for grid-auto-rows. These are worth looking at as they will highlight usage of the property that the spec editor has felt could use an example. There may also be some additional text explaining anythign specific to this property.
In selecting grid-auto-rows I chose a fairly complex property in terms of the work we needed to do to unpack the value. Many properties are far simpler than this. However ultimately, even when you come across a complex value, it really is just a case of stepping through the definitions until you come to the bottom of the rabbit hole.
Being able to work out what is valid for each property is incredibly useful. It means you don’t waste time trying to use a value that doesn’t work for that property. You also may find that there are values you weren’t aware of, that solve problems for you.
Further reading
Specifications are not designed to be user manuals, and while they often contain examples, these are pretty terse as they need to be clear to demonstrate their particular point. The manual for the Web Platform is MDN Web Docs. Pairing reading a specification with the examples and information on an MDN property page such as the one for grid-auto-rows is a really great way to ensure that you have all the information and practical usage examples you might need.
You may also find useful:
Value Definition Syntax on MDN.
The MDN Glossary defines many common terms.
Understanding the CSS Property Value Syntax goes into more detail in terms of reading the syntax.
How to read W3C Specs - from 2001 but still relevant.
I hope this article has gone some way to demystify CSS specifications for you. Even if the specifications are not your preferred first stop to learn about new CSS, being able to go directly to the source and avoid having your understanding filtered by someone else, can be very useful indeed.",2018,Rachel Andrew,rachelandrew,2018-12-14T00:00:00+00:00,https://24ways.org/2018/researching-a-property-in-the-css-specifications/,code
255,Inclusive Considerations When Restyling Form Controls,"I would like to begin by saying 2018 was the year that we, as developers, visual designers, browser implementers, and inclusive design and experience specialists rallied together and achieved a long-sought goal: We now have the ability to fully style form controls, across all modern browsers, while retaining their ease of declaration, native functionality and accessibility.
I would like to begin by saying all these things. However, they’re not true. I think we spent the year debating about what file extension CSS should be written in, or something. Or was that last year? Maybe I’m thinking of next year.
Returning to reality, styling form controls is more tricky and time consuming these days rather than flat out “hard”. In fact, depending on the length of the styling-leash a particular browser provides, there are controls you can style quite a bit. As for browsers with shorter leashes, there are other options to force their controls closer to the visual design you’re tasked to match.
However, when striving for custom styled controls, one must be careful not to forget about the inherent functionality and accessibility that many provide. People expect and deserve the products and services they use and pay for to work for them. If these services are visually pleasing, but only function for those who fit the handful of personas they’ve been designed for, then we’ve potentially deprived many people the experiences they deserve.
Quick level setting
Getting down to brass tacks, when creating custom styled form controls that should retain their expected semantics and functionality, we have to consider the following:
Many form elements can be styled directly through standard and browser specific selectors, as well as through some clever styling of markup patterns. We should leverage these native options before reinventing any wheels.
It is important to preserve the underlying semantics of interactive controls. We must not unintentionally exclude people who use assistive technologies (ATs) that rely on these semantics.
Make sure you test what you create. There is a lot of underlying complexity to form controls which may not be immediately apparent if they’re judged solely by their visual presentation in a single browser, or with limited AT testing.
Visually resetting and restyling form controls
Over the course of 2018, I worked on a project where I tested and reported on the accessibility impact of styling various form controls. In conducting my research, I reviewed many of the form controls available in HTML, testing to see how malleable they were to direct styling from standardized CSS selectors.
As I expected, controls such as the various text fields could be restyled rather easily. However, other controls like radio buttons and checkboxes, or sub-elements of special text fields like date, search, and number spinners were resistant to standard-based styling. These particular controls and their sub-elements required specific pseudo-elements to reset and allow for restyling of some of their default presentation.
See the Pen form control styling comparisons by Scott (@scottohara) on CodePen.
https://codepen.io/scottohara/pen/gZOrZm/
Over the years, the ability to directly style form controls has been something many people have clamored for. However, one should realize the benefits of being able to restyle some of these controls may involve more effort than originally anticipated.
If you want to restyle a control from the ground up, then you must also recreate any :active, :focus, and :hover states for the control—all those things that were previously taken care of by browsers. Not only that, but anything you restyle should also work with Windows High Contrast mode, styling for dark mode, and other OS-level settings that browser respect without you even realizing.
You ever try playing with the accessibility settings of your display on macOS, or similar Windows setting?
It is also worth mentioning that any browser prefixed pseudo-elements are not standardized CSS selectors. As MDN mentions at the top of their pages documenting these pseudo-elements:
Non-standard
This feature is non-standard and is not on a standards track. Do not use it on production sites facing the Web: it will not work for every user. There may also be large incompatibilities between implementations and the behavior may change in the future.
While this may be a deterrent for some, it’s my opinion the risks are often only skin-deep. By which I mean if a non-standard selector does change, the control may look a bit quirky, but likely won’t cease to function. A bug report which requires a CSS selector change can be an easy JIRA ticket to close, after all.
Can’t make it? Fake it.
Internet Explorer 11 (IE11) is still neck-and-neck with other browsers in vying for the number 2 spot in desktop browser share. Due to IE not recognizing vendor-prefixed appearance properties, some essential controls like checkboxes won’t render as intended.
Additionally, some controls like select boxes, file uploads, and sub-elements of date fields (calendar popups) cannot be modified by just relying on styling their HTML selectors alone. This means that unless your company designs and develops with a progressive enhancement, or graceful degradation mindset, you’ll need to take a different approach in styling.
Getting clever with markup and CSS
The following CodePen demonstrates how we can create a custom checkbox markup pattern. By mindfully utilizing CSS sibling selectors and positioning of the native control, we can create custom visual styling while also retaining the functionality and accessibility expectations of a native checkbox.
See the Pen Accessible Styled Native Checkbox by Scott (@scottohara) on CodePen.
https://codepen.io/scottohara/pen/RqEayN/
Customizing checkboxes by visually hiding the input and styling well-placed markup with sibling selectors may seem old hat to some. However, many variations of these patterns do not take into account how their method of visually hiding the checkboxes can create discovery issues for certain screen reader navigation methods. For instance, if someone is using a mobile device and exploring by touch, how will they be able to drag their finger over an input that has been reduced to a single pixel, or positioned off screen?
As we move away from the simplicity of declaring a single HTML element and using clever CSS and markup patterns to create restyled form controls, we increase the need for additional testing to ensure no expected behaviors are lost. In other words, what should work in theory may not work in practice when you introduce the various different ways people may engage with a form control. It’s worth remembering: what might be typical interactions for ourselves may be problematic if not impossible for others.
Limitations to cleverness
Creative coding will allow us to apply more consistent custom styles to some of the more problematic form controls. There will be a varied amount of custom markup, CSS, and sometimes JavaScript that will be needed to preserve the control’s inherent usability and accessibility for each control we take this approach to.
However, this method of restyling still doesn’t solve for the lack of feature parity across different browsers. Nor is it a means to account for controls which don’t have a native HTML element equivalent, such as a switch or multi-thumb range slider? Maybe there’s a control that calls for a visual design or proposed user experience that would require too much fighting with a native control’s behavior to be worth the level of effort to implement. Here’s where we need to take another approach.
Using ARIA when appropriate
Sometimes we have no other option than to roll up our sleeves and start building custom form controls from scratch. Fair warning though: just because we’re not leveraging a native HTML control as our foundation, it doesn’t mean we have carte blanche to throw semantics out the window. Enter Accessible Rich Internet Applications (ARIA).
ARIA is a set of attributes that can modify existing elements, or extend HTML to include roles, properties and states that aren’t native to the language. While divs and spans have no meaningful semantic information for us to leverage, with help from the ARIA specification and ARIA Authoring Practices we can incorporate these elements to help create the UI that we need while still following the first rule of Using ARIA:
If you can use a native HTML element or attribute with the semantics and behavior you require already built in, instead of re-purposing an element and adding an ARIA role, state or property to make it accessible, then do so.
By using these documents as guidelines, and testing our custom controls with people of various abilities, we can do our best to make sure a custom control performs as expected for as many people as possible.
Exceptions to the rule
One example of a control that allows for an exception to the first rule of Using ARIA would be a switch control.
Switches and checkboxes are similar components, in that they have both on/checked and off/unchecked states. However, checkboxes are often expected within the context of forms, or used to filter search queries on e-commerce sites. Switches are typically used to instantly enable or deactivate a particular setting at a component or app-based level, as this is their behavior in the native mobile apps in which they were popularized.
While a switch control could be created by visually restyling a checkbox, this does not automatically mean that the underlying semantics and functionality will match the visual representation of the control. For example, the following CodePen restyles checkboxes to look like a switch control, but the semantics of the checkboxes remain which communicate a different way of interacting with the control than what you might expect from a native switch control.
See the Pen Switch Boxes - custom styled checkboxes posing as switches by Scott (@scottohara) on CodePen.
https://codepen.io/scottohara/pen/XyvoeE/
By adding a role=""switch"" to these checkboxes, we can repurpose the inherent checked/unchecked states of the native control, it’s inherent ability to be focused by Tab key, and Space key to toggle state.
But while this is a valid approach to take in building a switch, how does this actually match up to reality?
Does it pass the test(s)?
Whether deconstructing form controls to fully restyle them, or leveraging them and other HTML elements as a base to expand on, or create, a non-native form control, building it is just the start. We must test that what we’ve restyled or rebuilt works the way people expect it to, if not better.
What we must do here is run a gamut of comparative tests to document the functionality and usability of native form controls. For example:
Is the control implemented in all supported browsers?
If not: where are the gaps? Will it be necessary to implement a custom solution for the situations that degrade to a standard text field?
If so: is each browser’s implementation a good user experience? Is there room for improvement that can be tested against the native baseline?
Test with multiple input devices.
Where the control is implemented, what is the quality of the user experience when using different input devices, such as mouse, touchscreen, keyboard, speech recognition or switch device, to name a few.
You’ll find some HTML5 controls (like date pickers and number spinners) have additional UI elements that may not be announced to AT, or even allow keyboard accessibility. Often these controls can be adjusted by other means, such as text entry, or using arrow keys to increase or decrease values. If restyling or recreating a custom version of a control like these, it may make sense to maintain these native experiences as well.
How well does the control take to custom styles?
If a control can be styled enough to not need to be rebuilt from scratch, that’s great! But make sure that there are no adverse affects on the accessibility of it. For instance, range sliders can be restyled and maintain their functionality and accessibility. However, elements like progress bars can be negatively affected by direct styling.
Always test with different browser and AT pairings to ensure nothing is lost when controls are restyled.
Do specifications match reality?
If recreating controls to get around native limitations, such as the inability to style the options of a select element, or requiring a Switch control which is not native to HTML, do your solutions match user expectations?
For instance, selects have unique picker interfaces on touch devices. And switches have varied levels of support for different browser and screen reader pairings. Test with real people, and check your analytics. If these experiences don’t match people’s expectations, then maybe another solution is in order?
Wrapping up
While styling form controls is definitely easier than it’s ever been, that doesn’t mean that it’s at all simple, nor will it likely ever be. The level of difficulty you’re going to face is going to depend entirely on what it is you’re hoping to style, add-on to, or recreate. And even if you build your custom control exactly to specification, you’ll still be reliant on browsers and assistive technologies being able to fully understand the component they’ve been presented.
Forms and their controls are an incredibly important part of what we need the Internet for. Paying bills, scheduling appointments, ordering groceries, renewing your license or even ordering gifts for the holidays. These are all important tasks that people should be able to complete with as little effort as possible. Especially since for some, completing these tasks online might be their only option.
2018 didn’t end up being the year we got full customization of form controls sorted out. But that’s OK. If we can continue to mindfully work with what we have, and instead challenge ourselves to follow inclusive design principles, well thought out Form Design Patterns, and solve problems with an accessibility first approach, we may come to realize that we can get along just fine without fully branded drop downs.
And hey. There’s always next year, right?",2018,Scott O'Hara,scottohara,2018-12-13T00:00:00+00:00,https://24ways.org/2018/inclusive-considerations-when-restyling-form-controls/,code
249,Fast Autocomplete Search for Your Website,"Every website deserves a great search engine - but building a search engine can be a lot of work, and hosting it can quickly get expensive.
I’m going to build a search engine for 24 ways that’s fast enough to support autocomplete (a.k.a. typeahead) search queries and can be hosted for free. I’ll be using wget, Python, SQLite, Jupyter, sqlite-utils and my open source Datasette tool to build the API backend, and a few dozen lines of modern vanilla JavaScript to build the interface.
Try it out here, then read on to see how I built it.
First step: crawling the data
The first step in building a search engine is to grab a copy of the data that you plan to make searchable.
There are plenty of potential ways to do this: you might be able to pull it directly from a database, or extract it using an API. If you don’t have access to the raw data, you can imitate Google and write a crawler to extract the data that you need.
I’m going to do exactly that against 24 ways: I’ll build a simple crawler using wget, a command-line tool that features a powerful “recursive” mode that’s ideal for scraping websites.
We’ll start at the https://24ways.org/archives/ page, which links to an archived index for every year that 24 ways has been running.
Then we’ll tell wget to recursively crawl the website, using the --recursive flag.
We don’t want to fetch every single page on the site - we’re only interested in the actual articles. Luckily, 24 ways has nicely designed URLs, so we can tell wget that we only care about pages that start with one of the years it has been running, using the -I argument like this: -I /2005,/2006,/2007,/2008,/2009,/2010,/2011,/2012,/2013,/2014,/2015,/2016,/2017
We want to be polite, so let’s wait for 2 seconds between each request rather than hammering the site as fast as we can: --wait 2
The first time I ran this, I accidentally downloaded the comments pages as well. We don’t want those, so let’s exclude them from the crawl using -X ""/*/*/comments"".
Finally, it’s useful to be able to run the command multiple times without downloading pages that we have already fetched. We can use the --no-clobber option for this.
Tie all of those options together and we get this command:
wget --recursive --wait 2 --no-clobber
-I /2005,/2006,/2007,/2008,/2009,/2010,/2011,/2012,/2013,/2014,/2015,/2016,/2017
-X ""/*/*/comments""
https://24ways.org/archives/
If you leave this running for a few minutes, you’ll end up with a folder structure something like this:
$ find 24ways.org
24ways.org
24ways.org/2013
24ways.org/2013/why-bother-with-accessibility
24ways.org/2013/why-bother-with-accessibility/index.html
24ways.org/2013/levelling-up
24ways.org/2013/levelling-up/index.html
24ways.org/2013/project-hubs
24ways.org/2013/project-hubs/index.html
24ways.org/2013/credits-and-recognition
24ways.org/2013/credits-and-recognition/index.html
...
As a quick sanity check, let’s count the number of HTML pages we have retrieved:
$ find 24ways.org | grep index.html | wc -l
328
There’s one last step! We got everything up to 2017, but we need to fetch the articles for 2018 (so far) as well. They aren’t linked in the /archives/ yet so we need to point our crawler at the site’s front page instead:
wget --recursive --wait 2 --no-clobber
-I /2018
-X ""/*/*/comments""
https://24ways.org/
Thanks to --no-clobber, this is safe to run every day in December to pick up any new content.
We now have a folder on our computer containing an HTML file for every article that has ever been published on the site! Let’s use them to build ourselves a search index.
Building a search index using SQLite
There are many tools out there that can be used to build a search engine. You can use an open-source search server like Elasticsearch or Solr, a hosted option like Algolia or Amazon CloudSearch or you can tap into the built-in search features of relational databases like MySQL or PostgreSQL.
I’m going to use something that’s less commonly used for web applications but makes for a powerful and extremely inexpensive alternative: SQLite.
SQLite is the world’s most widely deployed database, even though many people have never even heard of it. That’s because it’s designed to be used as an embedded database: it’s commonly used by native mobile applications and even runs as part of the default set of apps on the Apple Watch!
SQLite has one major limitation: unlike databases like MySQL and PostgreSQL, it isn’t really designed to handle large numbers of concurrent writes. For this reason, most people avoid it for building web applications.
This doesn’t matter nearly so much if you are building a search engine for infrequently updated content - say one for a site that only publishes new content on 24 days every year.
It turns out SQLite has very powerful full-text search functionality built into the core database - the FTS5 extension.
I’ve been doing a lot of work with SQLite recently, and as part of that, I’ve been building a Python utility library to make building new SQLite databases as easy as possible, called sqlite-utils. It’s designed to be used within a Jupyter notebook - an enormously productive way of interacting with Python code that’s similar to the Observable notebooks Natalie described on 24 ways yesterday.
If you haven’t used Jupyter before, here’s the fastest way to get up and running with it - assuming you have Python 3 installed on your machine. We can use a Python virtual environment to ensure the software we are installing doesn’t clash with any other installed packages:
$ python3 -m venv ./jupyter-venv
$ ./jupyter-venv/bin/pip install jupyter
# ... lots of installer output
# Now lets install some extra packages we will need later
$ ./jupyter-venv/bin/pip install beautifulsoup4 sqlite-utils html5lib
# And start the notebook web application
$ ./jupyter-venv/bin/jupyter-notebook
# This will open your browser to Jupyter at http://localhost:8888/
You should now be in the Jupyter web application. Click New -> Python 3 to start a new notebook.
A neat thing about Jupyter notebooks is that if you publish them to GitHub (either in a regular repository or as a Gist), it will render them as HTML. This makes them a very powerful way to share annotated code. I’ve published the notebook I used to build the search index on my GitHub account.
Here’s the Python code I used to scrape the relevant data from the downloaded HTML files. Check out the notebook for a line-by-line explanation of what’s going on.
from pathlib import Path
from bs4 import BeautifulSoup as Soup
base = Path(""/Users/simonw/Dropbox/Development/24ways-search"")
articles = list(base.glob(""*/*/*/*.html""))
# articles is now a list of paths that look like this:
# PosixPath('...24ways-search/24ways.org/2013/why-bother-with-accessibility/index.html')
docs = []
for path in articles:
year = str(path.relative_to(base)).split(""/"")[1]
url = 'https://' + str(path.relative_to(base).parent) + '/'
soup = Soup(path.open().read(), ""html5lib"")
author = soup.select_one("".c-continue"")[""title""].split(
""More information about""
)[1].strip()
author_slug = soup.select_one("".c-continue"")[""href""].split(
""/authors/""
)[1].split(""/"")[0]
published = soup.select_one("".c-meta time"")[""datetime""]
contents = soup.select_one("".e-content"").text.strip()
title = soup.find(""title"").text.split("" ◆"")[0]
try:
topic = soup.select_one(
'.c-meta a[href^=""/topics/""]'
)[""href""].split(""/topics/"")[1].split(""/"")[0]
except TypeError:
topic = None
docs.append({
""title"": title,
""contents"": contents,
""year"": year,
""author"": author,
""author_slug"": author_slug,
""published"": published,
""url"": url,
""topic"": topic,
})
After running this code, I have a list of Python dictionaries representing each of the documents that I want to add to the index. The list looks something like this:
[
{
""title"": ""Why Bother with Accessibility?"",
""contents"": ""Web accessibility (known in other fields as inclus..."",
""year"": ""2013"",
""author"": ""Laura Kalbag"",
""author_slug"": ""laurakalbag"",
""published"": ""2013-12-10T00:00:00+00:00"",
""url"": ""https://24ways.org/2013/why-bother-with-accessibility/"",
""topic"": ""design""
},
{
""title"": ""Levelling Up"",
""contents"": ""Hello, 24 ways. Iu2019m Ashley and I sell property ins..."",
""year"": ""2013"",
""author"": ""Ashley Baxter"",
""author_slug"": ""ashleybaxter"",
""published"": ""2013-12-06T00:00:00+00:00"",
""url"": ""https://24ways.org/2013/levelling-up/"",
""topic"": ""business""
},
...
My sqlite-utils library has the ability to take a list of objects like this and automatically create a SQLite database table with the right schema to store the data. Here’s how to do that using this list of dictionaries.
import sqlite_utils
db = sqlite_utils.Database(""/tmp/24ways.db"")
db[""articles""].insert_all(docs)
That’s all there is to it! The library will create a new database and add a table to it called articles with the necessary columns, then insert all of the documents into that table.
(I put the database in /tmp/ for the moment - you can move it to a more sensible location later on.)
You can inspect the table using the sqlite3 command-line utility (which comes with OS X) like this:
$ sqlite3 /tmp/24ways.db
sqlite> .headers on
sqlite> .mode column
sqlite> select title, author, year from articles;
title author year
------------------------------ ------------ ----------
Why Bother with Accessibility? Laura Kalbag 2013
Levelling Up Ashley Baxte 2013
Project Hubs: A Home Base for Brad Frost 2013
Credits and Recognition Geri Coady 2013
Managing a Mind Christopher 2013
Run Ragged Mark Boulton 2013
Get Started With GitHub Pages Anna Debenha 2013
Coding Towards Accessibility Charlie Perr 2013
...
There’s one last step to take in our notebook. We know we want to use SQLite’s full-text search feature, and sqlite-utils has a simple convenience method for enabling it for a specified set of columns in a table. We want to be able to search by the title, author and contents fields, so we call the enable_fts() method like this:
db[""articles""].enable_fts([""title"", ""author"", ""contents""])
Introducing Datasette
Datasette is the open-source tool I’ve been building that makes it easy to both explore SQLite databases and publish them to the internet.
We’ve been exploring our new SQLite database using the sqlite3 command-line tool. Wouldn’t it be nice if we could use a more human-friendly interface for that?
If you don’t want to install Datasette right now, you can visit https://search-24ways.herokuapp.com/ to try it out against the 24 ways search index data. I’ll show you how to deploy Datasette to Heroku like this later in the article.
If you want to install Datasette locally, you can reuse the virtual environment we created to play with Jupyter:
./jupyter-venv/bin/pip install datasette
This will install Datasette in the ./jupyter-venv/bin/ folder. You can also install it system-wide using regular pip install datasette.
Now you can run Datasette against the 24ways.db file we created earlier like so:
./jupyter-venv/bin/datasette /tmp/24ways.db
This will start a local webserver running. Visit http://localhost:8001/ to start interacting with the Datasette web application.
If you want to try out Datasette without creating your own 24ways.db file you can download the one I created directly from https://search-24ways.herokuapp.com/24ways-ae60295.db
Publishing the database to the internet
One of the goals of the Datasette project is to make deploying data-backed APIs to the internet as easy as possible. Datasette has a built-in command for this, datasette publish. If you have an account with Heroku or Zeit Now, you can deploy a database to the internet with a single command. Here’s how I deployed https://search-24ways.herokuapp.com/ (running on Heroku’s free tier) using datasette publish:
$ ./jupyter-venv/bin/datasette publish heroku /tmp/24ways.db --name search-24ways
-----> Python app detected
-----> Installing requirements with pip
-----> Running post-compile hook
-----> Discovering process types
Procfile declares types -> web
-----> Compressing...
Done: 47.1M
-----> Launching...
Released v8
https://search-24ways.herokuapp.com/ deployed to Heroku
If you try this out, you’ll need to pick a different --name, since I’ve already taken search-24ways.
You can run this command as many times as you like to deploy updated versions of the underlying database.
Searching and faceting
Datasette can detect tables with SQLite full-text search configured, and will add a search box directly to the page. Take a look at http://search-24ways.herokuapp.com/24ways-b607e21/articles to see this in action.
SQLite search supports wildcards, so if you want autocomplete-style search where you don’t need to enter full words to start getting results you can add a * to the end of your search term. Here’s a search for access* which returns articles on accessibility:
http://search-24ways.herokuapp.com/24ways-ae60295/articles?_search=acces%2A
A neat feature of Datasette is the ability to calculate facets against your data. Here’s a page showing search results for svg with facet counts calculated against both the year and the topic columns:
http://search-24ways.herokuapp.com/24ways-ae60295/articles?_search=svg&_facet=year&_facet=topic
Every page visible via Datasette has a corresponding JSON API, which can be accessed using the JSON link on the page - or by adding a .json extension to the URL:
http://search-24ways.herokuapp.com/24ways-ae60295/articles.json?_search=acces%2A
Better search using custom SQL
The search results we get back from ../articles?_search=svg are OK, but the order they are returned in is not ideal - they’re actually being returned in the order they were inserted into the database! You can see why this is happening by clicking the View and edit SQL link on that search results page.
This exposes the underlying SQL query, which looks like this:
select rowid, * from articles where rowid in (
select rowid from articles_fts where articles_fts match :search
) order by rowid limit 101
We can do better than this by constructing a custom SQL query. Here’s the query we will use instead:
select
snippet(articles_fts, -1, 'b4de2a49c8', '8c94a2ed4b', '...', 100) as snippet,
articles_fts.rank, articles.title, articles.url, articles.author, articles.year
from articles
join articles_fts on articles.rowid = articles_fts.rowid
where articles_fts match :search || ""*""
order by rank limit 10;
You can try this query out directly - since Datasette opens the underling SQLite database in read-only mode and enforces a one second time limit on queries, it’s safe to allow users to provide arbitrary SQL select queries for Datasette to execute.
There’s a lot going on here! Let’s break the SQL down line-by-line:
select
snippet(articles_fts, -1, 'b4de2a49c8', '8c94a2ed4b', '...', 100) as snippet,
We’re using snippet(), a built-in SQLite function, to generate a snippet highlighting the words that matched the query. We use two unique strings that I made up to mark the beginning and end of each match - you’ll see why in the JavaScript later on.
articles_fts.rank, articles.title, articles.url, articles.author, articles.year
These are the other fields we need back - most of them are from the articles table but we retrieve the rank (representing the strength of the search match) from the magical articles_fts table.
from articles
join articles_fts on articles.rowid = articles_fts.rowid
articles is the table containing our data. articles_fts is a magic SQLite virtual table which implements full-text search - we need to join against it to be able to query it.
where articles_fts match :search || ""*""
order by rank limit 10;
:search || ""*"" takes the ?search= argument from the page querystring and adds a * to the end of it, giving us the wildcard search that we want for autocomplete. We then match that against the articles_fts table using the match operator. Finally, we order by rank so that the best matching results are returned at the top - and limit to the first 10 results.
How do we turn this into an API? As before, the secret is to add the .json extension. Datasette actually supports multiple shapes of JSON - we’re going to use ?_shape=array to get back a plain array of objects:
JSON API call to search for articles matching SVG
The HTML version of that page shows the time taken to execute the SQL in the footer. Hitting refresh a few times, I get response times between 2 and 5ms - easily fast enough to power a responsive autocomplete feature.
A simple JavaScript autocomplete search interface
I considered building this using React or Svelte or another of the myriad of JavaScript framework options available today, but then I remembered that vanilla JavaScript in 2018 is a very productive environment all on its own.
We need a few small utility functions: first, a classic debounce function adapted from this one by David Walsh:
function debounce(func, wait, immediate) {
let timeout;
return function() {
let context = this, args = arguments;
let later = () => {
timeout = null;
if (!immediate) func.apply(context, args);
};
let callNow = immediate && !timeout;
clearTimeout(timeout);
timeout = setTimeout(later, wait);
if (callNow) func.apply(context, args);
};
};
We’ll use this to only send fetch() requests a maximum of once every 100ms while the user is typing.
Since we’re rendering data that might include HTML tags (24 ways is a site about web development after all), we need an HTML escaping function. I’m amazed that browsers still don’t bundle a default one of these:
const htmlEscape = (s) => s.replace(
/>/g, '>'
).replace(
/Autocomplete search
And now the autocomplete implementation itself, as a glorious, messy stream-of-consciousness of JavaScript:
// Embed the SQL query in a multi-line backtick string:
const sql = `select
snippet(articles_fts, -1, 'b4de2a49c8', '8c94a2ed4b', '...', 100) as snippet,
articles_fts.rank, articles.title, articles.url, articles.author, articles.year
from articles
join articles_fts on articles.rowid = articles_fts.rowid
where articles_fts match :search || ""*""
order by rank limit 10`;
// Grab a reference to the
const searchbox = document.getElementById(""searchbox"");
// Used to avoid race-conditions:
let requestInFlight = null;
searchbox.onkeyup = debounce(() => {
const q = searchbox.value;
// Construct the API URL, using encodeURIComponent() for the parameters
const url = (
""https://search-24ways.herokuapp.com/24ways-866073b.json?sql="" +
encodeURIComponent(sql) +
`&search=${encodeURIComponent(q)}&_shape=array`
);
// Unique object used just for race-condition comparison
let currentRequest = {};
requestInFlight = currentRequest;
fetch(url).then(r => r.json()).then(d => {
if (requestInFlight !== currentRequest) {
// Avoid race conditions where a slow request returns
// after a faster one.
return;
}
let results = d.map(r => `
${htmlEscape(r.author)} - ${r.year}
${highlight(r.snippet)}
`).join("""");
document.getElementById(""results"").innerHTML = results;
});
}, 100); // debounce every 100ms
There’s just one more utility function, used to help construct the HTML results:
const highlight = (s) => htmlEscape(s).replace(
/b4de2a49c8/g, ''
).replace(
/8c94a2ed4b/g, ' '
);
This is what those unique strings passed to the snippet() function were for.
Avoiding race conditions in autocomplete
One trick in this code that you may not have seen before is the way race-conditions are handled. Any time you build an autocomplete feature, you have to consider the following case:
User types acces
Browser sends request A - querying documents matching acces*
User continues to type accessibility
Browser sends request B - querying documents matching accessibility*
Request B returns. It was fast, because there are fewer documents matching the full term
The results interface updates with the documents from request B, matching accessibility*
Request A returns results (this was the slower of the two requests)
The results interface updates with the documents from request A - results matching access*
This is a terrible user experience: the user saw their desired results for a brief second, and then had them snatched away and replaced with those results from earlier on.
Thankfully there’s an easy way to avoid this. I set up a variable in the outer scope called requestInFlight, initially set to null.
Any time I start a new fetch() request, I create a new currentRequest = {} object and assign it to the outer requestInFlight as well.
When the fetch() completes, I use requestInFlight !== currentRequest to sanity check that the currentRequest object is strictly identical to the one that was in flight. If a new request has been triggered since we started the current request we can detect that and avoid updating the results.
It’s not a lot of code, really
And that’s the whole thing! The code is pretty ugly, but when the entire implementation clocks in at fewer than 70 lines of JavaScript, I honestly don’t think it matters. You’re welcome to refactor it as much you like.
How good is this search implementation? I’ve been building search engines for a long time using a wide variety of technologies and I’m happy to report that using SQLite in this way is genuinely a really solid option. It scales happily up to hundreds of MBs (or even GBs) of data, and the fact that it’s based on SQL makes it easy and flexible to work with.
A surprisingly large number of desktop and mobile applications you use every day implement their search feature on top of SQLite.
More importantly though, I hope that this demonstrates that using Datasette for an API means you can build relatively sophisticated API-backed applications with very little backend programming effort. If you’re working with a small-to-medium amount of data that changes infrequently, you may not need a more expensive database. Datasette-powered applications easily fit within the free tier of both Heroku and Zeit Now.
For more of my writing on Datasette, check out the datasette tag on my blog. And if you do build something fun with it, please let me know on Twitter.",2018,Simon Willison,simonwillison,2018-12-19T00:00:00+00:00,https://24ways.org/2018/fast-autocomplete-search-for-your-website/,code