1 row where "published" is on date 2018-12-19

View and edit SQL

published (date)

  • 2018-12-19 · 1
Link rowid ▼ title contents year author author_slug published url topic
249 Fast Autocomplete Search for Your Website Every website deserves a great search engine - but building a search engine can be a lot of work, and hosting it can quickly get expensive. I’m going to build a search engine for 24 ways that’s fast enough to support autocomplete (a.k.a. typeahead) search queries and can be hosted for free. I’ll be using wget, Python, SQLite, Jupyter, sqlite-utils and my open source Datasette tool to build the API backend, and a few dozen lines of modern vanilla JavaScript to build the interface. Try it out here, then read on to see how I built it. First step: crawling the data The first step in building a search engine is to grab a copy of the data that you plan to make searchable. There are plenty of potential ways to do this: you might be able to pull it directly from a database, or extract it using an API. If you don’t have access to the raw data, you can imitate Google and write a crawler to extract the data that you need. I’m going to do exactly that against 24 ways: I’ll build a simple crawler using wget, a command-line tool that features a powerful “recursive” mode that’s ideal for scraping websites. We’ll start at the https://24ways.org/archives/ page, which links to an archived index for every year that 24 ways has been running. Then we’ll tell wget to recursively crawl the website, using the --recursive flag. We don’t want to fetch every single page on the site - we’re only interested in the actual articles. Luckily, 24 ways has nicely designed URLs, so we can tell wget that we only care about pages that start with one of the years it has been running, using the -I argument like this: -I /2005,/2006,/2007,/2008,/2009,/2010,/2011,/2012,/2013,/2014,/2015,/2016,/2017 We want to be polite, so let’s wait for 2 seconds between each request rather than hammering the site as fast as we can: --wait 2 The first time I ran this, I accidentally downloaded the comments pages as well. We don’t want those, so let’s exclude them from the crawl using -X "/*/*/comments". Finally, it’s useful to be able to run the command multiple times… 2018 Simon Willison simonwillison 2018-12-19T00:00:00+00:00 https://24ways.org/2018/fast-autocomplete-search-for-your-website/ code

Advanced export

JSON shape: default, array, newline-delimited

CSV options:

CREATE TABLE [articles] (
               [title] TEXT  ,
   [contents] TEXT  ,
   [year] TEXT  ,
   [author] TEXT  ,
   [author_slug] TEXT  ,
   [published] TEXT  ,
   [url] TEXT  ,
   [topic] TEXT  
        );