#+title: Data Exploration of Artwork Section #+date: 2023-03-26 Sun #+author: Craig Oates #+email: craig@craigoates.net #+options: ':nil *:t -:t ::t <:t H:3 \n:nil ^:t arch:headline author:t #+options: broken-links:nil c:nil creator:nil d:(not "LOGBOOK") date:t e:t #+options: email:nil f:t inline:t num:t p:nil pri:nil prop:nil stat:t tags:t #+options: tasks:t tex:t timestamp:t title:t toc:t todo:t |:t #+language: en #+select_tags: export #+exclude_tags: noexport #+creator: Emacs 29.0.60 (Org mode 9.6.1) #+cite_export: #+export_file_name: ./exported/artwork.html * Summary & Set-up #+begin_quote Make sure you have gone through the [[file:./README.org][README]] and set-up the environment on your machine. #+end_quote The code in this file explores the [[https://www.craigoates.net/art][Artworks]] section of the site. * Clean Data This is the SQL used to remove data I don't want in a public facing repository. The database is not included. I'm keeping the SQLite code for future reference and for the sake of completeness. #+header: :list #+header: :separator \ #+header: :results raw #+header: :dir data #+header: :db co-production-2023-03-21.db #+begin_src sqlite .headers on .mode csv .output artwork-2023-03-21.csv select id, title, slug, published, category, width, height, depth, pixel_width, pixel_height, play_length, medium, created_at, updated_at from artwork; #+end_src #+RESULTS: #+begin_src shell :results code # Use -l to check file permissions. ls -h data/artwork*.csv #+end_src #+RESULTS: #+begin_src shell data/artwork-2023-03-21.csv #+end_src To view the data in =data/artwork-2023-03-21.csv=, you need ~csvlook~ installed. #+begin_src shell :results silent sudo apt update sudo apt install csvkit #+end_src If ~csvlook~ isn't installed, skip the following code block. It produces a sample of the data this file will be using to explore the data for the Artworks section of my site. #+begin_src shell :results output head -n 4 data/artwork-2023-03-21.csv | csvlook #+end_src #+RESULTS: : | id | title | slug | published | category | width | height | depth | pixel_width | pixel_height | play_length | medium | created_at | updated_at | : | -- | ----------------------------- | -------------------- | ------------------- | -------- | ----- | ------ | ----- | ----------- | ------------ | ----------- | ----------------- | --------------------------- | --------------------------- | : | 1 | Drop and Run (Purple Squares) | drop-and-run | 2012-05-07 00:00:00 | Video | | | | | | 4 | Digital Animation | 2022-04-11 00:00:00.000000Z | 2022-05-09 14:43:28.379441Z | : | 2 | Eje x, Exio y, Z-Achse | eje-x-exio-y-z-achse | 2016-11-11 00:00:00 | Prints | 15 | 21 | | | | | Digital Print | 2022-04-11 | | : | 3 | Up This Way | up-this-way | 2016-01-24 00:00:00 | Prints | 21 | 30 | | | | | Digital Print | 2022-04-11 | | * Set-Up SLIME Run =m-x slime= before executing the following code. #+begin_src lisp (format nil "SLIME and Common Lisp is up and running!") #+end_src #+RESULTS: : SLIME and Common Lisp is up and running! #+begin_src lisp :results silent :session (ql:quickload :plot/vega) (ql:quickload :lisp-stat) (ql:quickload :data-frame) #+end_src #+begin_src lisp :session (defparameter *artworks* (lisp-stat:read-csv #P"data/artwork-2023-03-21.csv") "The data read in from data/artwork-2023-03-21.csv.") #+end_src #+RESULTS: : *ARTWORKS* * Explore Data #+begin_src lisp :session :results output (format t "Number of artworks: ~A" (lisp-stat:nrow *artworks*)) #+end_src #+RESULTS: : Number of artworks: 375 #+begin_src lisp :session (lisp-stat:defdf *artworks-df* *artworks*) #+end_src #+RESULTS: : # ** Data Heuristics #+begin_src lisp :session :results output (lisp-stat:heuristicate-types *artworks-df*) (lisp-stat:describe *artworks-df*) #+end_src #+RESULTS: #+begin_example ,*ARTWORKS-DF* A data-frame with 375 observations of 14 variables Variable | Type | Unit | Label -------- | ---- | ---- | ----------- ID | INTEGER | NIL | NIL TITLE | INTEGER | NIL | NIL SLUG | INTEGER | NIL | NIL PUBLISHED | STRING | NIL | NIL CATEGORY | STRING | NIL | NIL WIDTH | DOUBLE-FLOAT | NIL | NIL HEIGHT | DOUBLE-FLOAT | NIL | NIL DEPTH | INTEGER | NIL | NIL PIXEL-WIDTH | INTEGER | NIL | NIL PIXEL-HEIGHT | INTEGER | NIL | NIL PLAY-LENGTH | INTEGER | NIL | NIL MEDIUM | STRING | NIL | NIL CREATED-AT | STRING | NIL | NIL UPDATED-AT | SYMBOL | NIL | NIL #+end_example ** Create Sample Data-Frame #+begin_src lisp :session (defparameter *artworks-sm-list* (select:select *artworks-df* (select:range 0 10) t) "A small sample of artwork for quickly testing code.") #+end_src #+RESULTS: : # ** Summary: Width #+begin_src lisp :session :results drawer (lisp-stat:summarize-column '*artworks-df*:width) #+end_src #+RESULTS: :results: WIDTH () n: 375 missing: 55 min=6.50 q25=25.02 q50=34.73 mean=34.62 q75=42.07 max=70 :end: ** Summary: Height #+begin_src lisp :session :results drawer (lisp-stat:summarize-column '*artworks-df*:height) #+end_src #+RESULTS: :results: HEIGHT () n: 375 missing: 55 min=10 q25=26.84 q50=29.93 mean=37.04 q75=42.47 max=70 :end: ** Summary: Depth #+begin_src lisp :session :results drawer (lisp-stat:summarize-column '*artworks-df*:depth) #+end_src #+RESULTS: :results: DEPTH () n: 375 missing: 374 min=7 q25=7 q50=7 mean=7 q75=7 max=7 :end: ** Summary: Pixel Width #+begin_src lisp :session :results drawer (lisp-stat:summarize-column '*artworks-df*:pixel-width) #+end_src #+RESULTS: :results: PIXEL-WIDTH () n: 375 missing: 341 min=2480 q25=2550.93 q50=2952.86 mean=2927.82 q75=3298.00 max=3508 :end: #+begin_src lisp (format nil "Total (2D) Digital Artworks: ~A" (- 375 341)) #+end_src #+RESULTS: : Total (2D) Digital Artworks: 34 ** Summary: Pixel Height #+begin_src lisp :session :results drawer (lisp-stat:summarize-column '*artworks-df*:pixel-height) #+end_src #+RESULTS: :results: PIXEL-HEIGHT () n: 375 missing: 341 min=2480 q25=3326.59 q50=4467.47 mean=4012.12 q75=4700.63 max=4722 :end: #+begin_src lisp :session :results drawer (lisp-stat:summarize-column '*artworks-df*:medium) #+end_src #+RESULTS: :results: 119 (32%) x "Digital Photograph", 73 (19%) x "Digital Print", 70 (19%) x "Watercolour and ink", 45 (12%) x "Pen and ink", 23 (6%) x "Felt-tip marker on paper", 21 (6%) x "Digital Animation", 11 (3%) x "Watercolour on paper", 5 (1%) x "Screen-print on paper", 2 (1%) x "Screen-print and felt-tip marker on paper", 1 (0%) x "Lino. print on paper", 1 (0%) x "Graphite on paper", 1 (0%) x "Drawing", 1 (0%) x "Glass light bulb and jar", 1 (0%) x "Felt tip marker on paper", 1 (0%) x "Staples on paper", :end: *NOTE:* ~created-at~ refers to the time I added the artwork to the website's database. See [[published_summary][Published Summary]] below for the artwork creation date. #+begin_src lisp :session :results drawer (lisp-stat:summarize-column '*artworks-df*:created-at) #+end_src #+RESULTS: :results: 338 (90%) x "2022-04-11", 2 (1%) x "2022-04-11 00:00:00.000000Z", 1 (0%) x "2022-06-25 20:47:51.282515", 1 (0%) x "2022-07-19 02:21:20.722981Z", 1 (0%) x "2022-07-19 04:38:06.685195Z", 1 (0%) x "2022-07-19 04:42:14.648641", 1 (0%) x "2022-07-19 04:45:03.533171", 1 (0%) x "2022-07-19 04:46:57.297030", 1 (0%) x "2022-07-19 04:49:38.193585", 1 (0%) x "2022-07-19 04:58:04.069055", 1 (0%) x "2022-07-19 04:59:38.732074Z", 1 (0%) x "2022-07-19 05:00:55.259252", 1 (0%) x "2022-07-19 05:02:04.145161", 1 (0%) x "2022-07-19 05:03:20.898681", 1 (0%) x "2022-07-19 05:04:35.132294", 1 (0%) x "2022-07-19 05:05:38.856980", 1 (0%) x "2022-07-19 05:06:48.692528", 1 (0%) x "2022-08-15 18:16:52.321678", 1 (0%) x "2022-08-15 19:11:08.879204", 1 (0%) x "2022-08-15 19:14:40.060236", 1 (0%) x "2022-08-15 19:17:03.134433", 1 (0%) x "2022-08-15 19:20:02.404717", 1 (0%) x "2022-08-15 19:22:00.766659", 1 (0%) x "2022-08-15 19:24:06.150506", 1 (0%) x "2022-08-15 19:27:47.224984", 1 (0%) x "2022-08-15 19:49:19.064553", 1 (0%) x "2022-08-15 19:57:22.403963", 1 (0%) x "2022-08-15 20:00:46.926246", 1 (0%) x "2022-08-15 20:04:02.172163", 1 (0%) x "2022-08-15 20:06:48.419529", 1 (0%) x "2022-08-15 20:10:49.282631", 1 (0%) x "2022-08-15 20:13:10.251745", 1 (0%) x "2022-08-15 20:15:20.199923", 1 (0%) x "2022-08-15 20:18:57.298303", 1 (0%) x "2022-08-15 20:54:31.246681", 1 (0%) x "2022-08-15 21:10:15.367998", 1 (0%) x "2022-08-15 21:15:46.119031", :end: *NOTE:* ~published~ refers to when I finished the artwork. It looks like my most prolific day was 2022-08-15, with 14 artworks -- which total about 4% of my total /finished/ output. One day produced 4% -- of course it was around the Covid pandemic. #+name: published_summary #+begin_src lisp :session :results output drawer (pprint (lisp-stat:summarize-column '*artworks-df*:published)) #+end_src *I have removed dates with only 1 entry (0%).* #+RESULTS: published_summary :results: 20 (5%) x "2022-08-15", 14 (4%) x "2020-03-13", 2 (1%) x "2016-01-23 00:00:00.000", 2 (1%) x "2012-05-26 00:00:00.000", 2 (1%) x "2017-08-19 1 (0%) x "2016-11-11 00:00:00.000", 1 (0%) x "2016-01-24 00:00:00.000", :end: I keep forgetting about ~output~. Leaving this ~(lisp-stat:head...~ example here to help me remember to use it. #+begin_src lisp :session :results output code (lisp-stat:head *artworks-df*) #+end_src #+RESULTS: #+begin_src lisp ;; ID TITLE SLUG PUBLISHED CATEGORY WIDTH HEIGHT DEPTH PIXEL-WIDTH PIXEL-HEIGHT PLAY-LENGTH MEDIUM CREATED-AT UPDATED-AT ;; 0 1 Drop and Run (Purple Squares) drop-and-run 2012-05-07 Video NA NA NA NA NA 4 Digital Animation 2022-04-11 00:00:00.000000Z 2022-05-09 14:43:28.379441Z ;; 1 2 Eje x, Exio y, Z-Achse eje-x-exio-y-z-achse 2016-11-11 00:00:00.000 Prints 15.0 21.0 NA NA NA NA Digital Print 2022-04-11 NA ;; 2 3 Up This Way up-this-way 2016-01-24 00:00:00.000 Prints 21.0 30.0 NA NA NA NA Digital Print 2022-04-11 NA ;; 3 4 Now Then now-then 2016-01-23 00:00:00.000 Prints 21.0 30.0 NA NA NA NA Digital Print 2022-04-11 NA ;; 4 5 Here Now There here-now-there 2016-01-23 21:31:24.000 Prints 21.0 30.0 NA NA NA NA Digital Print 2022-04-11 NA ;; 5 6 Everything In-between everything-in-between 2015-07-07 00:00:00.000 Prints 21.0 30.0 NA NA NA NA Digital Print 2022-04-11 NA NIL #+end_src #+begin_src lisp :session :results drawer (length (lisp-stat:select *artworks-df* t '(width height))) #+end_src #+RESULTS: :results: # :end: ** Plot: Width Vs Height Scatter (Non-Digital) #+begin_src lisp :session :results file (vega:defplot width-height `(:title "Art: Width vs Height (Non-Digital)" :description "Comparison between the physical dimensions of artworks." :width 400 :height 400 :mark :circle :data ,*artworks-df* :selection (:grid (:type :interval :bind :scales)) :encoding (:x (:field :width :title "Width (cm)" :type :quantitative) :y (:field :height :title "Height (cm)" :type :quantitative) :tooltip (:field :title :type :nominative) :color (:field :title :legend :null)))) (vega:write-html width-height "output/art-width-height-2023-03-21.html") #+end_src #+RESULTS: [[file:output/art-width-height-2023-03-21.html]] *** Note: A Line of Lines has wrong dimensions They should be =21 x 14.8 cm= and not =210 x 148 cm=. *I have updated the dimensions on the live site.* I did not notice it until I saw the chart. Basically, the decimal point is was shifted one place to the right. [[file:output/art-width-height-2023-03-21.png]] ** Plot: Pixel-Width vs Pixel-Height (digital only) #+begin_src lisp :session :results file (defparameter *artworks-px-w-h-df* (lisp-stat:df-remove-duplicates (lisp-stat:drop-missing (lisp-stat:select *artworks-df* t '(pixel-width pixel-height)))) "A data-frame containing all the `PIXEL-WIDTH' and `PIXEL-HEIGHT' values. All the missing/null values have been removed from the list.") (vega:defplot px-width-px-height `(:title "Art: Pixel-Width vs Pixel-Height (2D Digital)" :description "Comparison between the pixel width and height dimensions of digital artworks." :width 400 :height 400 :mark :circle :data ,*artworks-px-w-h-df* ; ,*artworks-df* :selection (:grid (:type :interval :bind :scales)) :encoding (:x (:field :pixel-width :title "Pixel-Width (px)" :type :quantitative) :y (:field :pixel-height :title "Pixel Height (px)" :type :quantitative) :tooltip (:field :title :type :nominative) :color (:field :PIXEL-WIDTH :legend :null)))) (vega:write-html width-height "output/art-px-width-px-height-2023-03-21.html") #+end_src #+RESULTS: [[file:output/art-px-width-px-height-2023-03-21.html]] #+begin_src lisp :session :results output raw (lisp-stat:df-print (lisp-stat:df-remove-duplicates (lisp-stat:drop-missing (lisp-stat:select *artworks-df* t '(pixel-width pixel-height)) (lambda (x) (eql :na x))))) #+end_src #+RESULTS: | PIXEL-WIDTH | PIXEL-HEIGHT | |-------------+--------------| | 3142.0d0 | 4722 | | 2480.0d0 | 3508 | | 3508.0d0 | 2480 | | 3456.0d0 | 4608 | | 3402.0d0 | 4536 | There is a lot of duplicated sizes in these columns. The chart *had* loads of dots resting on top of each other so you only see five at any one point. I've removed all the rows with missing values and the duplicates to help show how thirty-four (2D) digital images only show-up as five images in the chart. ** TODO Compare landscape to portrait ** Note: Corrected A Line of Lines in CSV file I've made a note of the error in [[<2023-03-27 Mon> Note: A Line of Lines has wrong dimensions][A Line of Lines has wrong dimensions]]. I made the correction directly in =data/artwork-2023-03-21.csv= because I am lazy. I didn't want to download the recently updated database, from the live site, and run the scripts to remove/clean it again. The change takes a few seconds (on my machine) but the downloading and cleaning of the database from the server; the exporting of the data to a CSV file and adding said CSV file is not. ** Plot: Corrected Width Vs Height Scatter (Non-Digital) #+begin_src lisp :session :results file (vega:defplot width-height `(:title "Art: (Corrected) Width vs Height (Non-Digital)" :description "Comparison between the physical dimensions of artworks (corrected)." :width 400 :height 400 :mark :circle :data ,*artworks-df* :selection (:grid (:type :interval :bind :scales)) :encoding (:x (:field :width :title "Width (cm)" :type :quantitative) :y (:field :height :title "Height (cm)" :type :quantitative) :tooltip (:field :title :type :nominative) :color (:field :title :legend :null)))) (vega:write-html width-height "output/art-width-height-2023-03-21-corrected.html") #+end_src #+RESULTS: [[file:output/art-width-height-2023-03-21-corrected.html]] *** Plot: Side-by-Side of Width Vs Height (Corrected and Original) Included these images side-by-side just to see how the correction changes the feel of the graph. [[file:output/art-width-height-2023-03-21.png]] [[file:output/art-width-height-2023-03-21-corrected.png]] ** Note: Added missing depth dimensino to Touching but Not Connected Depth is =7 cm=. *I have updated it on the live site* and =data/artwork-2023-03-21.csv=. Only one sculpture so no point plotting a graph. ** TODO Plot yearly totals #+begin_src lisp :session #+end_src