A data exploration project using data from: https://www.craigoates.net
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

17 KiB

Data Exploration of Artwork Section

Summary & Set-up

Make sure you have gone through the README and set-up the environment on your machine.

The code in this file explores the Artworks section of the site.

Clean Data

This is the SQL used to remove data I don't want in a public facing repository. The database is not included. I'm keeping the SQLite code for future reference and for the sake of completeness.

  .headers on
  .mode csv
  .output artwork-2023-03-21.csv
  select
  id,
  title,
  slug,
  published,
  category,
  width,
  height,
  depth,
  pixel_width,
  pixel_height,
  play_length,
  medium,
  created_at,
  updated_at
  from
  artwork;
  # Use -l to check file permissions.
  ls -h data/artwork*.csv
data/artwork-2023-03-21.csv

To view the data in data/artwork-2023-03-21.csv, you need csvlook installed.

  sudo apt update
  sudo apt install csvkit

If csvlook isn't installed, skip the following code block. It produces a sample of the data this file will be using to explore the data for the Artworks section of my site.

  head -n 4 data/artwork-2023-03-21.csv | csvlook
| id | title                         | slug                 |           published | category | width | height | depth | pixel_width | pixel_height | play_length | medium            | created_at                  | updated_at                  |
| -- | ----------------------------- | -------------------- | ------------------- | -------- | ----- | ------ | ----- | ----------- | ------------ | ----------- | ----------------- | --------------------------- | --------------------------- |
|  1 | Drop and Run (Purple Squares) | drop-and-run         | 2012-05-07 00:00:00 | Video    |       |        |       |             |              |           4 | Digital Animation | 2022-04-11 00:00:00.000000Z | 2022-05-09 14:43:28.379441Z |
|  2 | Eje x, Exio y, Z-Achse        | eje-x-exio-y-z-achse | 2016-11-11 00:00:00 | Prints   |    15 |     21 |       |             |              |             | Digital Print     | 2022-04-11                  |                             |
|  3 | Up This Way                   | up-this-way          | 2016-01-24 00:00:00 | Prints   |    21 |     30 |       |             |              |             | Digital Print     | 2022-04-11                  |                             |

Set-Up SLIME

Run m-x slime before executing the following code.

  (format nil "SLIME and Common Lisp is up and running!")
SLIME and Common Lisp is up and running!
  (ql:quickload :plot/vega)
  (ql:quickload :lisp-stat)
  (ql:quickload :data-frame)
   (defparameter *artworks* (lisp-stat:read-csv #P"data/artwork-2023-03-21.csv")
      "The data read in from data/artwork-2023-03-21.csv.")
*ARTWORKS*

Explore Data

  (format t "Number of artworks: ~A" (lisp-stat:nrow *artworks*))
Number of artworks: 375
  (lisp-stat:defdf *artworks-df* *artworks*)
#<DATA-FRAME:DATA-FRAME (375 observations of 14 variables)>

Data Heuristics

  (lisp-stat:heuristicate-types *artworks-df*)
  (lisp-stat:describe *artworks-df*)
*ARTWORKS-DF*
  A data-frame with 375 observations of 14 variables

Variable     | Type         | Unit | Label      
--------     | ----         | ---- | -----------
ID           | INTEGER      | NIL  | NIL        
TITLE        | INTEGER      | NIL  | NIL        
SLUG         | INTEGER      | NIL  | NIL        
PUBLISHED    | STRING       | NIL  | NIL        
CATEGORY     | STRING       | NIL  | NIL        
WIDTH        | DOUBLE-FLOAT | NIL  | NIL        
HEIGHT       | DOUBLE-FLOAT | NIL  | NIL        
DEPTH        | INTEGER      | NIL  | NIL
PIXEL-WIDTH  | INTEGER      | NIL  | NIL        
PIXEL-HEIGHT | INTEGER      | NIL  | NIL        
PLAY-LENGTH  | INTEGER      | NIL  | NIL        
MEDIUM       | STRING       | NIL  | NIL        
CREATED-AT   | STRING       | NIL  | NIL        
UPDATED-AT   | SYMBOL       | NIL  | NIL

Create Sample Data-Frame

  (defparameter *artworks-sm-list*
    (select:select *artworks-df* (select:range 0 10) t)
    "A small sample of artwork for quickly testing code.")
#<DATA-FRAME:DATA-FRAME (10 observations of 14 variables)>

Summary: Width

  (lisp-stat:summarize-column '*artworks-df*:width)

WIDTH () n: 375 missing: 55 min=6.50 q25=25.02 q50=34.73 mean=34.62 q75=42.07 max=70

Summary: Height

  (lisp-stat:summarize-column '*artworks-df*:height)

HEIGHT () n: 375 missing: 55 min=10 q25=26.84 q50=29.93 mean=37.04 q75=42.47 max=70

Summary: Depth

  (lisp-stat:summarize-column '*artworks-df*:depth)

DEPTH () n: 375 missing: 374 min=7 q25=7 q50=7 mean=7 q75=7 max=7

Summary: Pixel Width

  (lisp-stat:summarize-column '*artworks-df*:pixel-width)

PIXEL-WIDTH () n: 375 missing: 341 min=2480 q25=2550.93 q50=2952.86 mean=2927.82 q75=3298.00 max=3508

  (format nil "Total (2D) Digital Artworks: ~A" (- 375 341))
Total (2D) Digital Artworks: 34

Summary: Pixel Height

  (lisp-stat:summarize-column '*artworks-df*:pixel-height)

PIXEL-HEIGHT () n: 375 missing: 341 min=2480 q25=3326.59 q50=4467.47 mean=4012.12 q75=4700.63 max=4722

  (lisp-stat:summarize-column '*artworks-df*:medium)

119 (32%) x "Digital Photograph", 73 (19%) x "Digital Print", 70 (19%) x "Watercolour and ink", 45 (12%) x "Pen and ink", 23 (6%) x "Felt-tip marker on paper", 21 (6%) x "Digital Animation", 11 (3%) x "Watercolour on paper", 5 (1%) x "Screen-print on paper", 2 (1%) x "Screen-print and felt-tip marker on paper", 1 (0%) x "Lino. print on paper", 1 (0%) x "Graphite on paper", 1 (0%) x "Drawing", 1 (0%) x "Glass light bulb and jar", 1 (0%) x "Felt tip marker on paper", 1 (0%) x "Staples on paper",

NOTE: created-at refers to the time I added the artwork to the website's database. See Published Summary below for the artwork creation date.

  (lisp-stat:summarize-column '*artworks-df*:created-at)

338 (90%) x "2022-04-11", 2 (1%) x "2022-04-11 00:00:00.000000Z", 1 (0%) x "2022-06-25 20:47:51.282515", 1 (0%) x "2022-07-19 02:21:20.722981Z", 1 (0%) x "2022-07-19 04:38:06.685195Z", 1 (0%) x "2022-07-19 04:42:14.648641", 1 (0%) x "2022-07-19 04:45:03.533171", 1 (0%) x "2022-07-19 04:46:57.297030", 1 (0%) x "2022-07-19 04:49:38.193585", 1 (0%) x "2022-07-19 04:58:04.069055", 1 (0%) x "2022-07-19 04:59:38.732074Z", 1 (0%) x "2022-07-19 05:00:55.259252", 1 (0%) x "2022-07-19 05:02:04.145161", 1 (0%) x "2022-07-19 05:03:20.898681", 1 (0%) x "2022-07-19 05:04:35.132294", 1 (0%) x "2022-07-19 05:05:38.856980", 1 (0%) x "2022-07-19 05:06:48.692528", 1 (0%) x "2022-08-15 18:16:52.321678", 1 (0%) x "2022-08-15 19:11:08.879204", 1 (0%) x "2022-08-15 19:14:40.060236", 1 (0%) x "2022-08-15 19:17:03.134433", 1 (0%) x "2022-08-15 19:20:02.404717", 1 (0%) x "2022-08-15 19:22:00.766659", 1 (0%) x "2022-08-15 19:24:06.150506", 1 (0%) x "2022-08-15 19:27:47.224984", 1 (0%) x "2022-08-15 19:49:19.064553", 1 (0%) x "2022-08-15 19:57:22.403963", 1 (0%) x "2022-08-15 20:00:46.926246", 1 (0%) x "2022-08-15 20:04:02.172163", 1 (0%) x "2022-08-15 20:06:48.419529", 1 (0%) x "2022-08-15 20:10:49.282631", 1 (0%) x "2022-08-15 20:13:10.251745", 1 (0%) x "2022-08-15 20:15:20.199923", 1 (0%) x "2022-08-15 20:18:57.298303", 1 (0%) x "2022-08-15 20:54:31.246681", 1 (0%) x "2022-08-15 21:10:15.367998", 1 (0%) x "2022-08-15 21:15:46.119031",

NOTE: published refers to when I finished the artwork.

It looks like my most prolific day was 2022-08-15, with 14 artworks – which total about 4% of my total finished output. One day produced 4% – of course it was around the Covid pandemic.

  (pprint (lisp-stat:summarize-column '*artworks-df*:published))

I have removed dates with only 1 entry (0%).

20 (5%) x "2022-08-15", 14 (4%) x "2020-03-13", 2 (1%) x "2016-01-23 00:00:00.000", 2 (1%) x "2012-05-26 00:00:00.000", 2 (1%) x "2017-08-19 1 (0%) x "2016-11-11 00:00:00.000", 1 (0%) x "2016-01-24 00:00:00.000",

I keep forgetting about output. Leaving this (lisp-stat:head... example here to help me remember to use it.

  (lisp-stat:head *artworks-df*)

;;   ID TITLE                         SLUG                  PUBLISHED               CATEGORY WIDTH HEIGHT DEPTH PIXEL-WIDTH PIXEL-HEIGHT PLAY-LENGTH MEDIUM            CREATED-AT                                   UPDATED-AT
;; 0  1 Drop and Run (Purple Squares) drop-and-run          2012-05-07              Video       NA     NA    NA          NA           NA           4 Digital Animation 2022-04-11 00:00:00.000000Z 2022-05-09 14:43:28.379441Z
;; 1  2 Eje x, Exio y, Z-Achse        eje-x-exio-y-z-achse  2016-11-11 00:00:00.000 Prints    15.0   21.0    NA          NA           NA          NA Digital Print     2022-04-11                                           NA
;; 2  3 Up This Way                   up-this-way           2016-01-24 00:00:00.000 Prints    21.0   30.0    NA          NA           NA          NA Digital Print     2022-04-11                                           NA
;; 3  4 Now Then                      now-then              2016-01-23 00:00:00.000 Prints    21.0   30.0    NA          NA           NA          NA Digital Print     2022-04-11                                           NA
;; 4  5 Here Now There                here-now-there        2016-01-23 21:31:24.000 Prints    21.0   30.0    NA          NA           NA          NA Digital Print     2022-04-11                                           NA
;; 5  6 Everything In-between         everything-in-between 2015-07-07 00:00:00.000 Prints    21.0   30.0    NA          NA           NA          NA Digital Print     2022-04-11                                           NA

NIL
  (length (lisp-stat:select *artworks-df* t '(width height)))

#<DATA-FRAME:DATA-FRAME (375 observations of 2 variables)>

Plot: Width Vs Height Scatter (Non-Digital)

  (vega:defplot width-height
    `(:title "Art: Width vs Height (Non-Digital)"
      :description "Comparison between the physical dimensions of artworks."
      :width 400
      :height 400
      :mark :circle
      :data ,*artworks-df*
      :selection (:grid (:type :interval :bind :scales))
      :encoding (:x (:field :width :title "Width (cm)" :type :quantitative)
                 :y (:field :height :title "Height (cm)" :type :quantitative)
                 :tooltip (:field :title :type :nominative)
                 :color (:field :title :legend :null))))

  (vega:write-html width-height "output/art-width-height-2023-03-21.html")

/craig.oates/co-data/src/branch/master/output/art-width-height-2023-03-21.html

Note: A Line of Lines has wrong dimensions

They should be 21 x 14.8 cm and not 210 x 148 cm. I have updated the dimensions on the live site. I did not notice it until I saw the chart. Basically, the decimal point is was shifted one place to the right.

/craig.oates/co-data/src/branch/master/output/art-width-height-2023-03-21.png

Plot: Pixel-Width vs Pixel-Height (digital only)

  (defparameter *artworks-px-w-h-df*
    (lisp-stat:df-remove-duplicates
     (lisp-stat:drop-missing
      (lisp-stat:select *artworks-df* t '(pixel-width pixel-height))))
    "A data-frame containing all the `PIXEL-WIDTH' and `PIXEL-HEIGHT' values.
  All the missing/null values have been removed from the list.")

  (vega:defplot px-width-px-height
    `(:title "Art: Pixel-Width vs Pixel-Height (2D Digital)"
      :description
      "Comparison between the pixel width and height dimensions of digital artworks."
      :width 400
      :height 400
      :mark :circle
      :data ,*artworks-px-w-h-df* ; ,*artworks-df*
      :selection (:grid (:type :interval :bind :scales))
      :encoding (:x (:field :pixel-width :title "Pixel-Width (px)" :type :quantitative)
                 :y (:field :pixel-height :title "Pixel Height (px)" :type :quantitative)
                 :tooltip (:field :title :type :nominative)
                 :color (:field :PIXEL-WIDTH :legend :null))))

  (vega:write-html width-height "output/art-px-width-px-height-2023-03-21.html")

/craig.oates/co-data/src/branch/master/output/art-px-width-px-height-2023-03-21.html

  (lisp-stat:df-print
   (lisp-stat:df-remove-duplicates
    (lisp-stat:drop-missing
     (lisp-stat:select *artworks-df* t '(pixel-width pixel-height))
     (lambda (x) (eql :na x)))))
PIXEL-WIDTH PIXEL-HEIGHT
3142.0d0 4722
2480.0d0 3508
3508.0d0 2480
3456.0d0 4608
3402.0d0 4536

There is a lot of duplicated sizes in these columns. The chart had loads of dots resting on top of each other so you only see five at any one point. I've removed all the rows with missing values and the duplicates to help show how thirty-four (2D) digital images only show-up as five images in the chart.

TODO Compare landscape to portrait

Note: Corrected A Line of Lines in CSV file

I've made a note of the error in A Line of Lines has wrong dimensions. I made the correction directly in data/artwork-2023-03-21.csv because I am lazy. I didn't want to download the recently updated database, from the live site, and run the scripts to remove/clean it again. The change takes a few seconds (on my machine) but the downloading and cleaning of the database from the server; the exporting of the data to a CSV file and adding said CSV file is not.

Plot: Corrected Width Vs Height Scatter (Non-Digital)

  (vega:defplot width-height
    `(:title "Art: (Corrected) Width vs Height (Non-Digital)"
      :description "Comparison between the physical dimensions of artworks (corrected)."
      :width 400
      :height 400
      :mark :circle
      :data ,*artworks-df*
      :selection (:grid (:type :interval :bind :scales))
      :encoding (:x (:field :width :title "Width (cm)" :type :quantitative)
                 :y (:field :height :title "Height (cm)" :type :quantitative)
                 :tooltip (:field :title :type :nominative)
                 :color (:field :title :legend :null))))

  (vega:write-html width-height "output/art-width-height-2023-03-21-corrected.html")

/craig.oates/co-data/src/branch/master/output/art-width-height-2023-03-21-corrected.html

Plot: Side-by-Side of Width Vs Height (Corrected and Original)

Included these images side-by-side just to see how the correction changes the feel of the graph.

/craig.oates/co-data/src/branch/master/output/art-width-height-2023-03-21.png /craig.oates/co-data/src/branch/master/output/art-width-height-2023-03-21-corrected.png

Note: Added missing depth dimensino to Touching but Not Connected

Depth is 7 cm. I have updated it on the live site and data/artwork-2023-03-21.csv.

Only one sculpture so no point plotting a graph.

TODO Plot yearly totals