Browse Source
I've started working on the data in this file, but it's getting late (at time of commit). So, this is an end-of-session commit and the code in this file is in a mess and not finished.master
Craig Oates
2 months ago
1 changed files with 131 additions and 0 deletions
@ -0,0 +1,131 @@
|
||||
#+options: ':nil *:t -:t ::t <:t H:3 \n:nil ^:t arch:headline author:t |
||||
#+options: broken-links:nil c:nil creator:nil d:(not "LOGBOOK") date:t e:t |
||||
#+options: email:nil expand-links:t f:t inline:t num:t p:nil pri:nil prop:nil |
||||
#+options: stat:t tags:t tasks:t tex:t timestamp:t title:t toc:t todo:t |:t |
||||
#+title: Ideal Flatmate Manchester |
||||
#+date: \today |
||||
#+author: Craig Oates |
||||
#+email: craig@craigoates.net |
||||
#+language: en |
||||
#+select_tags: export |
||||
#+exclude_tags: noexport |
||||
#+creator: Emacs 29.1.90 (Org mode 9.7-pre) |
||||
#+cite_export: |
||||
|
||||
* Gather Ideal Flatmate Data (Manually) |
||||
|
||||
- [[https://www.idealflatmate.co.uk/][Ideal Flatmate]] |
||||
|
||||
Having had a quick look on the website and did a search with the following |
||||
filters: |
||||
|
||||
- Date: 2024-02-24 Sat |
||||
- Location: Manchester (City) |
||||
- Price Range: £0-1200 |
||||
- Distance: +20 km |
||||
|
||||
There are only two pages of results – sixteen listings in total. So, I’ve just |
||||
saved the HTML manually, from within the browser. Because they are HTML files |
||||
and come with JavaScript, CSS, images etc., I stored in |
||||
=raw-data/external/2024-02-24_ideal-flatmate-manc=. These files will not be |
||||
committed to the repository because I don’t want to clog it up with excess files |
||||
and images. I just want the rent rates and location data. |
||||
|
||||
* Setup Common Lisp Environment |
||||
|
||||
You will not need to execute this code block if you've already set up SLIME in |
||||
another ORG file. This is just in case this is the only file you're working on |
||||
today, or it's your first file of the day. |
||||
|
||||
*Run ~m-x slime~ before running the following code.* And, make note of the |
||||
~:session~ attribute. It allows you to use the code in the code block to be use |
||||
in other code blocks which also use the ~:session~ attribute. |
||||
|
||||
#+begin_src lisp :session :results silent |
||||
(ql:quickload :com.inuoe.jzon) ; JSON parser. |
||||
(ql:quickload :dexador) ; HTTP requests. |
||||
(ql:quickload :plump) ; HTML/XML parser. |
||||
(ql:quickload :lquery) ; HTML/DOM manipulation. |
||||
(ql:quickload :lparallel) ; Parallel programming. |
||||
(ql:quickload :cl-ppcre) ; RegEx. library. |
||||
(ql:quickload :plot/vega) ; Vega plotting library. |
||||
(ql:quickload :lisp-stat) ; Stat's library. |
||||
(ql:quickload :data-frame) ; Data frame library eqv. to Python's Numpy. |
||||
(ql:quickload :str) ; String library, expands on 'string' library. |
||||
#+end_src |
||||
|
||||
* Clean Up and Parse Data |
||||
|
||||
I'm taking a leaf out of the [[file:./spare-room-manchester.org][Spare Room (Manc)]] book and separating the |
||||
individual listings into their own files. I've already got code I can quickly |
||||
adapt to do this and it gives me more confidence around attaching values to the |
||||
wrong listings. |
||||
|
||||
#+begin_src shell :results silent |
||||
mkdir raw-data/external/2024-02-24_ideal-flatmate-manc-listings/ |
||||
#+end_src |
||||
|
||||
#+begin_src lisp :results silent |
||||
(let ((counter 0)) |
||||
(loop for file-path |
||||
in (directory #P"raw-data/external/2024-02-24_ideal-flatmate-manc/*.html") |
||||
do (with-open-file (in-stream file-path) |
||||
(let* ((doc (plump:parse in-stream)) |
||||
(listings (lquery:$ doc ".card-infos-flex-row" (serialize)))) |
||||
(loop for item across listings |
||||
do (let ((out-path |
||||
(merge-pathnames #P"raw-data/external/2024-02-24_ideal-flatmate-manc-listings/" |
||||
(format nil "listing-~a.html" (write-to-string counter))))) |
||||
(with-open-file (out-stream |
||||
out-path |
||||
:direction :output |
||||
:if-exists :supersede) |
||||
(format out-stream "~a" item)) |
||||
(incf counter))))))) |
||||
#+end_src |
||||
|
||||
** TODO Create CSV of Listings |
||||
|
||||
Need to come back to this and finish it off. Left it in a mess because it's the |
||||
end of the day (at time of commit) and need to get some sleep. |
||||
|
||||
#+begin_src lisp :results output raw |
||||
;; (with-open-file (out-stream |
||||
;; #P"working-data/2024-02-24-ideal-flatmate-manc.csv" |
||||
;; :direction :output |
||||
;; :if-exists :supersede) |
||||
;; (format out-stream "ROW-ID, OTHER STUFF") |
||||
(let ((row-id 0)) |
||||
(format t "|ROW-ID|LISTING-INFO|URL|~%") |
||||
(format t "|-|-|-|~%") |
||||
(loop for file-path |
||||
in (directory #P"raw-data/external/2024-02-24_ideal-flatmate-manc-listings/*.html") |
||||
do (with-open-file (in-stream file-path) |
||||
(let* ((doc (plump:parse in-stream)) |
||||
(listing (lquery:$ doc ".card-infos-left" (text))) |
||||
(url (lquery:$ doc "a" (attr "href")))) |
||||
(format t "|~a|~a|~a|~%" row-id (aref listing 0) (aref url 0)))) |
||||
(incf row-id))) |
||||
#+end_src |
||||
|
||||
#+RESULTS: |
||||
| ROW-ID | LISTING-INFO | URL | |
||||
|--------+---------------------------------------------------------------------+-------------------------------------------------------------------------| |
||||
| 0 | £690/month per roomChapel Street, Salford M3 5DZ, UK | https://www.idealflatmate.co.uk/spare-room/manchester/property-id113377 | |
||||
| 1 | £740/month per roomChapel Street, Salford M3 5DZ, UK | https://www.idealflatmate.co.uk/spare-room/manchester/property-id113378 | |
||||
| 2 | £841 - £842/month per roomMiddlewood Street, Salford, M5 4YW, UK | https://www.idealflatmate.co.uk/spare-room/salford/property-id120130 | |
||||
| 3 | £746 - £750/month per roomSalford, M5 4ZF, UK | https://www.idealflatmate.co.uk/spare-room/salford/property-id122936 | |
||||
| 4 | £200/month 100, 100 Lloyd Mansions, Salford M6 6HA, UK | https://www.idealflatmate.co.uk/spare-room/salford/property-id122970 | |
||||
| 5 | £488/month per roomJoshua Grange, Pluto Cl, Salford M6 6HF, UK | https://www.idealflatmate.co.uk/spare-room/salford/property-id122929 | |
||||
| 6 | £580/month per roomJoshua Grange, Pluto Cl, Salford M6 6HF, UK | https://www.idealflatmate.co.uk/spare-room/salford/property-id123025 | |
||||
| 7 | £480/month per roomGreater Manchester, Manchester, M31 4HZ, 296, UK | https://www.idealflatmate.co.uk/spare-room/manchester/property-id122962 | |
||||
| 8 | £580/month per roomJoshua Grange, Pluto Cl, Salford M6 6HF, UK | https://www.idealflatmate.co.uk/auth/signup?f=b&uid=242365&pid=123025 | |
||||
| 9 | £480/month per roomGreater Manchester, Manchester, M31 4HZ, 296, UK | https://www.idealflatmate.co.uk/auth/signup?f=b&uid=210168&pid=122962 | |
||||
| 10 | £850/month per room7 Symphony Park, Manchester M1 7GB, UK | https://www.idealflatmate.co.uk/spare-room/manchester/property-id121033 | |
||||
| 11 | £850/month per room7 Symphony Park, Manchester M1 7GB, UK | https://www.idealflatmate.co.uk/spare-room/manchester/property-id121032 | |
||||
| 12 | £956 - £957/month per room7 Symphony Park, Manchester M1 7GB, UK | https://www.idealflatmate.co.uk/spare-room/manchester/property-id121034 | |
||||
| 13 | £980/month per room7 Symphony Park, Manchester M1 7GB, UK | https://www.idealflatmate.co.uk/spare-room/manchester/property-id121030 | |
||||
| 14 | £678 - £679/month per roomSalford M5 4YW, UK | https://www.idealflatmate.co.uk/spare-room/salford/property-id120131 | |
||||
| 15 | £708 - £709/month per roomSalford M5 4YW, UK | https://www.idealflatmate.co.uk/spare-room/salford/property-id120127 | |
||||
| 16 | £725/month per roomSalford M5 4YW, UK | https://www.idealflatmate.co.uk/spare-room/salford/property-id120128 | |
||||
| 17 | £775/month per roomMiddlewood Street, Salford, M5 4YW, UK | https://www.idealflatmate.co.uk/spare-room/salford/property-id120129 | |
Loading…
Reference in new issue