17 KiB
Data Exploration of Artwork Section
- Summary & Set-up
- Clean Data
- Set-Up SLIME
- Explore Data
- Data Heuristics
- Create Sample Data-Frame
- Summary: Width
- Summary: Height
- Summary: Depth
- Summary: Pixel Width
- Summary: Pixel Height
- Plot: Width Vs Height Scatter (Non-Digital)
- Plot: Pixel-Width vs Pixel-Height (digital only)
- Compare landscape to portrait
- Note: Corrected A Line of Lines in CSV file
- Plot: Corrected Width Vs Height Scatter (Non-Digital)
- Note: Added missing depth dimensino to Touching but Not Connected
- Plot yearly totals
Summary & Set-up
Clean Data
This is the SQL used to remove data I don't want in a public facing repository. The database is not included. I'm keeping the SQLite code for future reference and for the sake of completeness.
.headers on
.mode csv
.output artwork-2023-03-21.csv
select
id,
title,
slug,
published,
category,
width,
height,
depth,
pixel_width,
pixel_height,
play_length,
medium,
created_at,
updated_at
from
artwork;
# Use -l to check file permissions.
ls -h data/artwork*.csv
data/artwork-2023-03-21.csv
To view the data in data/artwork-2023-03-21.csv
, you need csvlook
installed.
sudo apt update
sudo apt install csvkit
If csvlook
isn't installed, skip the following code block. It produces a sample
of the data this file will be using to explore the data for the Artworks section
of my site.
head -n 4 data/artwork-2023-03-21.csv | csvlook
| id | title | slug | published | category | width | height | depth | pixel_width | pixel_height | play_length | medium | created_at | updated_at | | -- | ----------------------------- | -------------------- | ------------------- | -------- | ----- | ------ | ----- | ----------- | ------------ | ----------- | ----------------- | --------------------------- | --------------------------- | | 1 | Drop and Run (Purple Squares) | drop-and-run | 2012-05-07 00:00:00 | Video | | | | | | 4 | Digital Animation | 2022-04-11 00:00:00.000000Z | 2022-05-09 14:43:28.379441Z | | 2 | Eje x, Exio y, Z-Achse | eje-x-exio-y-z-achse | 2016-11-11 00:00:00 | Prints | 15 | 21 | | | | | Digital Print | 2022-04-11 | | | 3 | Up This Way | up-this-way | 2016-01-24 00:00:00 | Prints | 21 | 30 | | | | | Digital Print | 2022-04-11 | |
Set-Up SLIME
Run m-x slime
before executing the following code.
(format nil "SLIME and Common Lisp is up and running!")
SLIME and Common Lisp is up and running!
(ql:quickload :plot/vega)
(ql:quickload :lisp-stat)
(ql:quickload :data-frame)
(defparameter *artworks* (lisp-stat:read-csv #P"data/artwork-2023-03-21.csv")
"The data read in from data/artwork-2023-03-21.csv.")
*ARTWORKS*
Explore Data
(format t "Number of artworks: ~A" (lisp-stat:nrow *artworks*))
Number of artworks: 375
(lisp-stat:defdf *artworks-df* *artworks*)
#<DATA-FRAME:DATA-FRAME (375 observations of 14 variables)>
Data Heuristics
(lisp-stat:heuristicate-types *artworks-df*)
(lisp-stat:describe *artworks-df*)
*ARTWORKS-DF* A data-frame with 375 observations of 14 variables Variable | Type | Unit | Label -------- | ---- | ---- | ----------- ID | INTEGER | NIL | NIL TITLE | INTEGER | NIL | NIL SLUG | INTEGER | NIL | NIL PUBLISHED | STRING | NIL | NIL CATEGORY | STRING | NIL | NIL WIDTH | DOUBLE-FLOAT | NIL | NIL HEIGHT | DOUBLE-FLOAT | NIL | NIL DEPTH | INTEGER | NIL | NIL PIXEL-WIDTH | INTEGER | NIL | NIL PIXEL-HEIGHT | INTEGER | NIL | NIL PLAY-LENGTH | INTEGER | NIL | NIL MEDIUM | STRING | NIL | NIL CREATED-AT | STRING | NIL | NIL UPDATED-AT | SYMBOL | NIL | NIL
Create Sample Data-Frame
(defparameter *artworks-sm-list*
(select:select *artworks-df* (select:range 0 10) t)
"A small sample of artwork for quickly testing code.")
#<DATA-FRAME:DATA-FRAME (10 observations of 14 variables)>
Summary: Width
(lisp-stat:summarize-column '*artworks-df*:width)
WIDTH () n: 375 missing: 55 min=6.50 q25=25.02 q50=34.73 mean=34.62 q75=42.07 max=70
Summary: Height
(lisp-stat:summarize-column '*artworks-df*:height)
HEIGHT () n: 375 missing: 55 min=10 q25=26.84 q50=29.93 mean=37.04 q75=42.47 max=70
Summary: Depth
(lisp-stat:summarize-column '*artworks-df*:depth)
DEPTH () n: 375 missing: 374 min=7 q25=7 q50=7 mean=7 q75=7 max=7
Summary: Pixel Width
(lisp-stat:summarize-column '*artworks-df*:pixel-width)
PIXEL-WIDTH () n: 375 missing: 341 min=2480 q25=2550.93 q50=2952.86 mean=2927.82 q75=3298.00 max=3508
(format nil "Total (2D) Digital Artworks: ~A" (- 375 341))
Total (2D) Digital Artworks: 34
Summary: Pixel Height
(lisp-stat:summarize-column '*artworks-df*:pixel-height)
PIXEL-HEIGHT () n: 375 missing: 341 min=2480 q25=3326.59 q50=4467.47 mean=4012.12 q75=4700.63 max=4722
(lisp-stat:summarize-column '*artworks-df*:medium)
119 (32%) x "Digital Photograph", 73 (19%) x "Digital Print", 70 (19%) x "Watercolour and ink", 45 (12%) x "Pen and ink", 23 (6%) x "Felt-tip marker on paper", 21 (6%) x "Digital Animation", 11 (3%) x "Watercolour on paper", 5 (1%) x "Screen-print on paper", 2 (1%) x "Screen-print and felt-tip marker on paper", 1 (0%) x "Lino. print on paper", 1 (0%) x "Graphite on paper", 1 (0%) x "Drawing", 1 (0%) x "Glass light bulb and jar", 1 (0%) x "Felt tip marker on paper", 1 (0%) x "Staples on paper",
NOTE: created-at
refers to the time I added the artwork to the website's
database. See Published Summary below for the artwork creation date.
(lisp-stat:summarize-column '*artworks-df*:created-at)
338 (90%) x "2022-04-11", 2 (1%) x "2022-04-11 00:00:00.000000Z", 1 (0%) x "2022-06-25 20:47:51.282515", 1 (0%) x "2022-07-19 02:21:20.722981Z", 1 (0%) x "2022-07-19 04:38:06.685195Z", 1 (0%) x "2022-07-19 04:42:14.648641", 1 (0%) x "2022-07-19 04:45:03.533171", 1 (0%) x "2022-07-19 04:46:57.297030", 1 (0%) x "2022-07-19 04:49:38.193585", 1 (0%) x "2022-07-19 04:58:04.069055", 1 (0%) x "2022-07-19 04:59:38.732074Z", 1 (0%) x "2022-07-19 05:00:55.259252", 1 (0%) x "2022-07-19 05:02:04.145161", 1 (0%) x "2022-07-19 05:03:20.898681", 1 (0%) x "2022-07-19 05:04:35.132294", 1 (0%) x "2022-07-19 05:05:38.856980", 1 (0%) x "2022-07-19 05:06:48.692528", 1 (0%) x "2022-08-15 18:16:52.321678", 1 (0%) x "2022-08-15 19:11:08.879204", 1 (0%) x "2022-08-15 19:14:40.060236", 1 (0%) x "2022-08-15 19:17:03.134433", 1 (0%) x "2022-08-15 19:20:02.404717", 1 (0%) x "2022-08-15 19:22:00.766659", 1 (0%) x "2022-08-15 19:24:06.150506", 1 (0%) x "2022-08-15 19:27:47.224984", 1 (0%) x "2022-08-15 19:49:19.064553", 1 (0%) x "2022-08-15 19:57:22.403963", 1 (0%) x "2022-08-15 20:00:46.926246", 1 (0%) x "2022-08-15 20:04:02.172163", 1 (0%) x "2022-08-15 20:06:48.419529", 1 (0%) x "2022-08-15 20:10:49.282631", 1 (0%) x "2022-08-15 20:13:10.251745", 1 (0%) x "2022-08-15 20:15:20.199923", 1 (0%) x "2022-08-15 20:18:57.298303", 1 (0%) x "2022-08-15 20:54:31.246681", 1 (0%) x "2022-08-15 21:10:15.367998", 1 (0%) x "2022-08-15 21:15:46.119031",
NOTE: published
refers to when I finished the artwork.
It looks like my most prolific day was 2022-08-15, with 14 artworks – which total about 4% of my total finished output. One day produced 4% – of course it was around the Covid pandemic.
(pprint (lisp-stat:summarize-column '*artworks-df*:published))
I have removed dates with only 1 entry (0%).
20 (5%) x "2022-08-15", 14 (4%) x "2020-03-13", 2 (1%) x "2016-01-23 00:00:00.000", 2 (1%) x "2012-05-26 00:00:00.000", 2 (1%) x "2017-08-19 1 (0%) x "2016-11-11 00:00:00.000", 1 (0%) x "2016-01-24 00:00:00.000",
I keep forgetting about output
. Leaving this (lisp-stat:head...
example here to
help me remember to use it.
(lisp-stat:head *artworks-df*)
;; ID TITLE SLUG PUBLISHED CATEGORY WIDTH HEIGHT DEPTH PIXEL-WIDTH PIXEL-HEIGHT PLAY-LENGTH MEDIUM CREATED-AT UPDATED-AT
;; 0 1 Drop and Run (Purple Squares) drop-and-run 2012-05-07 Video NA NA NA NA NA 4 Digital Animation 2022-04-11 00:00:00.000000Z 2022-05-09 14:43:28.379441Z
;; 1 2 Eje x, Exio y, Z-Achse eje-x-exio-y-z-achse 2016-11-11 00:00:00.000 Prints 15.0 21.0 NA NA NA NA Digital Print 2022-04-11 NA
;; 2 3 Up This Way up-this-way 2016-01-24 00:00:00.000 Prints 21.0 30.0 NA NA NA NA Digital Print 2022-04-11 NA
;; 3 4 Now Then now-then 2016-01-23 00:00:00.000 Prints 21.0 30.0 NA NA NA NA Digital Print 2022-04-11 NA
;; 4 5 Here Now There here-now-there 2016-01-23 21:31:24.000 Prints 21.0 30.0 NA NA NA NA Digital Print 2022-04-11 NA
;; 5 6 Everything In-between everything-in-between 2015-07-07 00:00:00.000 Prints 21.0 30.0 NA NA NA NA Digital Print 2022-04-11 NA
NIL
(length (lisp-stat:select *artworks-df* t '(width height)))
#<DATA-FRAME:DATA-FRAME (375 observations of 2 variables)>
Plot: Width Vs Height Scatter (Non-Digital)
(vega:defplot width-height
`(:title "Art: Width vs Height (Non-Digital)"
:description "Comparison between the physical dimensions of artworks."
:width 400
:height 400
:mark :circle
:data ,*artworks-df*
:selection (:grid (:type :interval :bind :scales))
:encoding (:x (:field :width :title "Width (cm)" :type :quantitative)
:y (:field :height :title "Height (cm)" :type :quantitative)
:tooltip (:field :title :type :nominative)
:color (:field :title :legend :null))))
(vega:write-html width-height "output/art-width-height-2023-03-21.html")
/craig.oates/co-data/src/branch/master/output/art-width-height-2023-03-21.html
Note: A Line of Lines has wrong dimensions
They should be 21 x 14.8 cm
and not 210 x 148 cm
. I have updated the dimensions
on the live site. I did not notice it until I saw the chart. Basically, the
decimal point is was shifted one place to the right.
Plot: Pixel-Width vs Pixel-Height (digital only)
(defparameter *artworks-px-w-h-df*
(lisp-stat:df-remove-duplicates
(lisp-stat:drop-missing
(lisp-stat:select *artworks-df* t '(pixel-width pixel-height))))
"A data-frame containing all the `PIXEL-WIDTH' and `PIXEL-HEIGHT' values.
All the missing/null values have been removed from the list.")
(vega:defplot px-width-px-height
`(:title "Art: Pixel-Width vs Pixel-Height (2D Digital)"
:description
"Comparison between the pixel width and height dimensions of digital artworks."
:width 400
:height 400
:mark :circle
:data ,*artworks-px-w-h-df* ; ,*artworks-df*
:selection (:grid (:type :interval :bind :scales))
:encoding (:x (:field :pixel-width :title "Pixel-Width (px)" :type :quantitative)
:y (:field :pixel-height :title "Pixel Height (px)" :type :quantitative)
:tooltip (:field :title :type :nominative)
:color (:field :PIXEL-WIDTH :legend :null))))
(vega:write-html width-height "output/art-px-width-px-height-2023-03-21.html")
/craig.oates/co-data/src/branch/master/output/art-px-width-px-height-2023-03-21.html
(lisp-stat:df-print
(lisp-stat:df-remove-duplicates
(lisp-stat:drop-missing
(lisp-stat:select *artworks-df* t '(pixel-width pixel-height))
(lambda (x) (eql :na x)))))
PIXEL-WIDTH | PIXEL-HEIGHT |
---|---|
3142.0d0 | 4722 |
2480.0d0 | 3508 |
3508.0d0 | 2480 |
3456.0d0 | 4608 |
3402.0d0 | 4536 |
There is a lot of duplicated sizes in these columns. The chart had loads of dots resting on top of each other so you only see five at any one point. I've removed all the rows with missing values and the duplicates to help show how thirty-four (2D) digital images only show-up as five images in the chart.
TODO Compare landscape to portrait
Note: Corrected A Line of Lines in CSV file
I've made a note of the error in A Line of Lines has wrong dimensions. I made
the correction directly in data/artwork-2023-03-21.csv
because I am lazy. I
didn't want to download the recently updated database, from the live site, and
run the scripts to remove/clean it again. The change takes a few seconds (on my
machine) but the downloading and cleaning of the database from the server; the
exporting of the data to a CSV file and adding said CSV file is not.
Plot: Corrected Width Vs Height Scatter (Non-Digital)
(vega:defplot width-height
`(:title "Art: (Corrected) Width vs Height (Non-Digital)"
:description "Comparison between the physical dimensions of artworks (corrected)."
:width 400
:height 400
:mark :circle
:data ,*artworks-df*
:selection (:grid (:type :interval :bind :scales))
:encoding (:x (:field :width :title "Width (cm)" :type :quantitative)
:y (:field :height :title "Height (cm)" :type :quantitative)
:tooltip (:field :title :type :nominative)
:color (:field :title :legend :null))))
(vega:write-html width-height "output/art-width-height-2023-03-21-corrected.html")
/craig.oates/co-data/src/branch/master/output/art-width-height-2023-03-21-corrected.html
Plot: Side-by-Side of Width Vs Height (Corrected and Original)
Included these images side-by-side just to see how the correction changes the feel of the graph.
Note: Added missing depth dimensino to Touching but Not Connected
Depth is 7 cm
. I have updated it on the live site and
data/artwork-2023-03-21.csv
.
Only one sculpture so no point plotting a graph.
TODO Plot yearly totals