Data processing and plotting for 'Personal Flash' artworks. https://www.nicolaellisandritherdon.com
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

346 lines
15 KiB

* Ritherdon Charts
This project assumes you have knowledge of:
- [[https://www.linux.org/pages/download/][Linux]] ([[https://www.debian.org/][Debian]]/[[https://ubuntu.com/][Ubuntu]])
- [[https://en.wikipedia.org/wiki/Bash_(Unix_shell)][Bash]]
- [[https://en.wikipedia.org/wiki/AWK][Awk]]
- [[https://github.com/BurntSushi/ripgrep][Ripgrep]] (rg)
- [[https://www.python.org/][Python]]
- [[https://bokeh.org/][Bokeh]]
- [[https://en.wikipedia.org/wiki/Comma-separated_values][CSV]] file format
** Summary
Here lies a loose collection of Bash and Python scripts to process data
collected by the /Personal Flash in Real-Time/ artworks. They were part of the [[https://www.castlefieldgallery.co.uk/event/nicola-ellis-solo-exhibition-coming-in-2021/][No
Gaps in the Line]] exhibition by [[http://www.nicolaellis.com][Nicola Ellis]], hosted at [[https://www.castlefieldgallery.co.uk/][Castlefied Gallery]] in
Manchester, U.K.
This project ties into a larger collection of software projects related to the
/Personal Flash in Real-Time/ artworks. And, those artworks are a small piece of
the much larger [[https://www.nicolaellisandritherdon.com/][Return to Ritherdon]] project (devised and completed by Nicola
Ellis). For more information on the artworks and where they sit in the larger
project, please use the links below:
- [[https://git.abbether.net/return-to-ritherdon/rtr-docs][rtr-docs]] (The documentation repository for all the /Person Flash in Real-Time/
software projects, with an overview of how they tie into the Return to
Ritherdon project)
- [[https://git.abbether.net/return-to-ritherdon][Return to Ritherdon Org. Page]] (The 'home page' for the /Return to Ritherdon/
project on this site, containing a list of all the publicly available
repositories)
Before continuing, I thought it would be appropriate to briefly mention who/what
[[https://ritherdon.co.uk/about-us/][Ritherdon]] is. It is a business/factory in Darwen, U.K. and specialises in
manufacturing electrical enclosures and other related products. So, if you have
spent any time in the U.K. and seen one of those green electrical boxes lurking
on a street corner, there is a good chance these folks made it.
*NOTE: This project does not contain documentation in the [[https://git.abbether.net/return-to-ritherdon/rtr-docs][rtr-docs]] repository.*
This is a self-contained mini-project which is not /directly related/ to the
/Personal Flash in Real-Time/ artworks.
*** Examples/Screenshots
- [[./docs][docs]] (For more information on the types of charts produced)
At the time of writing, the scripts in this repository produce over one-hundred
charts/files. So, here are a selection of the types of charts produced after
processing the data in the =data/lm1-exhibiton-all.csv= and
=/data/lm2-exhibition-all.csv= files.
[[file:assets/daily-totals.png]]
[[file:assets/lm1-hour-totals.png]]
[[file:assets/lm1-overlayed.png]]
[[file:assets/lm2-readings-for-2021-07-22.png]]
[[file:assets/side-by-side-day-19.png]]
** Overview of the /Personal Flash in Real-Time/ Artworks
/Personal Flash in Real-Time/ consists of two artworks, named /Personal Flash in
Real-Time (Andy)/ and /Personal Flash in Real-Time (Tony)/. Each one measured
the light in the welding booths in the Ritherdon Factory and forwarded those
readings on to a server running in Amazon's 'cloud' -- see [[https://aws.amazon.com/][Amazon Web Services]]
(AWS) for more information. While this was happening, two sets of lights,
residing in Castlefield Gallery, would turn on and off whenever the system
detected someone welding in Ritherdon. The would happen because the Relays,
controlling the lights would receive the latest Light Meter readings taken in
Ritherdon via the server (AWS).
The (AWS) server stored every reading taken in a SQLite database and this
project pokes and prods at the data -- to plot charts/graphs.
** Design Notes and Trade-Off Decisions
1. Essentially, this project is about taking the data from
=data/lm1-exhibiton-all.csv= and =data/lm2-exhibiton-all.csv= and producing
interactive charts for Nicola (the artist) to utilise how she sees fit.
2. The =separator.sh= and =totalilator.sh= scripts split the .csv files, mentioned
above, into smaller files in an attempt to make them (.csv files) easier to
work with on average hardware.
1. On top of that,I have only committed the CSV files mentioned in point 1
to the repository as a means to reduce the clutter in the repositories Git
commit history.
2. You will need to split the CSV files up yourself after you have cloned
the repository, using the scripts mentioned in point 2.
3. *The database containing the actual data is not included with this repository.*
1. The database used for the /No Gaps in the Line/ exhibition is approximately
500MB and I thought it was unreasonable to expect people to download and work
with a repository of that size -- for a repository of this nature.
4. The data exported from the database contains the data between 2021-06-13
(13^th June, 2021) and 2021-08-01 (1^st August, 2021) for both Light Meters
(the length of the exhibition): There is (not much) more (test) data in the
database but the data selected/exported seemed the most appropriate decision.
5. I chose to work with CSV files out of convenience more than anything else --
it is the easiest format to export the data to from the SQlite database.
6. I used [[https://en.wikipedia.org/wiki/Bash_(Unix_shell)][Bash]], [[https://en.wikipedia.org/wiki/AWK][Awk]] and [[https://github.com/BurntSushi/ripgrep][Ripgrep]] (rg), also, out of convenience, they were
already on my computer.
7. I used [[https://bokeh.org/][Bokeh]] because I have already used it and it is the only thing I know
which can create interactive charts as individual HTML files, which I can
share with anyone not comfortable with computers.
8. I used [[https://www.python.org/][Python]] because of Bokeh.
9. Overall, Nicola wants to work with the charts produced by this data so any
decisions made should be in service to that end.
10. I have taken a hard-coded approach to filenames with the code because the
code is not the main objective here, the charts are; In other words,
*long-term flexibility and maintenance is not a concern here.*
** Set-Up and Using the Code
Open your terminal, making sure you are in the directory you want the repository
cloned to.
*The Bash (.sh) scripts need calling before the Python (.py) ones.* You need to
process the =lm1-exhibition-all.csv= and =lm2-exhibition-all.csv= files first because
the Python (.py) scripts assumes certain files are already in the =/data=
directory.
#+begin_src shell
cd <INSERT YOUR PATH OF CHOICE HERE...>
git clone https://git.abbether.net/return-to-ritherdon/ritherdon-charts.git
cd ritherdon-charts
# You need to split the two files in /data first...
./separator.sh
./totalilator.sh
# You should have new files and directories in /data now...
# Then create a Python virtual-environment (venv) to make the charts...
# I've named the one here 'venv' (the second one) and stored it in the root of
# this project's directory.
python3 -m venv venv
# Activate the virtual-environment...
source venv/bin/activate
# You should see '(venv)' in your (terminal) promt, for example,
# (venv) yourname@yourpc:~/local-dev/ritherdon-charts$
# Install Python dependencies/packages via pip...
pip install -r requirements.txt
#+end_src
When the packages have finished installed (via ~pip~), you should be ready to
go. From there, you can simply call the Python (.py) scripts (from terminal with
the =venv= activated). For example,
#+begin_src shell
# Make sure you are in the terminal with the virtual-environment activated...
python lm1-hourly-totals.py
# Output from script...
python daily-totals.py
# Output from script...
#+end_src
When you have finished, you will need to deactivate the virtual-environment. You
can do that by entering ~disable~ in your terminal. You should see the ~(venv)~
part of your prompt removed.
#+begin_src shell
# Before you disable your Python's virtual-environment (venv)...
# (venv) yourname@yourpc:~/local-dev/ritherdon-charts$
# Disable your Python venv.
disable
# After you have disabled your Python's virtual-environment (venv)...
# yourname@yourpc:~/local-dev/ritherdon-charts$
#+end_src
From here, you can either write you own scripts to form new charts or just play
with the CSV files in something like Microsoft Excel or Libre Office Calc.
** Working with the Files/Data Produced After Running the Project's Bash Scripts
#+begin_quote
=/data= stores the CSV files and =/output= stores the charts. Run ~./separator.sh~
to get started.
#+end_quote
When you clone the repository, you will find the =/data= directory will have the
following layout,
#+begin_src shell :results code
tree -L 1 data
#+end_src
#+RESULTS:
#+begin_src shell
data
├── lm1-exhibiton-all.csv # Approx. 60MB
└── lm2-exhibiton-all.csv # Approx. 96MB
0 directories, 2 files
#+end_src
This =/data= directory is responsible for storing the /raw/ data (I.E. the CSV
files). The charts, created via the Python (.py) scripts, reside in the
=/output= directory.
*The =/output= directory should not exist until you run ~./separator.sh~.*
After you run the Bash scripts (~./seperator.sh~ and ~./totalilator.sh~), you should see
something like the following in the =/data= directory,
#+begin_src shell :results code
tree -L 2 data
#+end_src
#+RESULTS:
#+begin_src shell
data
├── light-meter-1
   ├── 2021-06-13 # Directory of readings taken per hour taken on 13/06/2021.
   ├── 2021-06-13.csv # File containing all the reading taken on 13th June 2021.
   ├── 2021-06-14 # Directory of readings taken per hour taken on 14/06/2021.
   ├── 2021-06-14.csv # File containing all the reading taken on 14th June 2021.
# More files and folder here...
   ├── 2021-07-30 # Directory of readings taken per hour taken on 30/07/2021
   └── 2021-07-30.csv # File containing all the reading taken on 30th July 2021.
├── light-meter-1-daily-totals.csv # Total number of readings recorded for each day.
├── light-meter-1-hourly-totals
   ├── 2021-06 # Directory containing files with hourly totals (per day) for June.
   └── 2021-07 # Directory containing files with hourly totals (per day) for July.
├── light-meter-2
   ├── 2021-06-13 # Directory of readings taken per hour taken on 13/06/2021.
   ├── 2021-06-13.csv # File containing all the reading taken on 13th June 2021.
   ├── 2021-06-14 # Directory of readings taken per hour taken on 14/06/2021.
   ├── 2021-06-14.csv # File containing all the reading taken on 14th June 2021.
# More files and folders here...
   ├── 2021-07-30 # Directory of readings taken per hour taken on 30/07/2021.
   └── 2021-07-30.csv # File containing all the reading taken on 30th July 2021.
├── light-meter-2-daily-totals.csv # Total number of readings recorded for each day.
├── light-meter-2-hourly-totals
   ├── 2021-06 # Directory containing files with hourly totals (per day) for June.
   └── 2021-07 # Directory containing files with hourly totals (per day) for July.
├── lm1-exhibiton-all.csv # Original file.
└── lm2-exhibiton-all.csv # Original file.
104 directories, 100 files
#+end_src
With the overview/top layer explained, now is a good time to expand on the
directories produced in =/data/light-meter-1= and =data/light-meter-2=. As an
example, I will focus on the =data/light-meter-1/2021-06-13= directory (line 3 in
code block above).
#+begin_src shell :results code
tree data/light-meter-1
#+end_src
#+RESULTS:
#+begin_src shell
data/light-meter-1
├── 2021-06-13
   ├── 2021-06-13--00.csv
   ├── 2021-06-13--01.csv
   ├── 2021-06-13--02.csv
   ├── 2021-06-13--03.csv
   ├── 2021-06-13--04.csv
   ├── 2021-06-13--05.csv
   ├── 2021-06-13--06.csv
   ├── 2021-06-13--07.csv
   ├── 2021-06-13--08.csv
   ├── 2021-06-13--09.csv # All readings recorded between the hours of 09:00 and 10:00.
   ├── 2021-06-13--10.csv
   ├── 2021-06-13--11.csv
   ├── 2021-06-13--12.csv
   ├── 2021-06-13--13.csv
   ├── 2021-06-13--14.csv
   ├── 2021-06-13--15.csv
   ├── 2021-06-13--16.csv
   ├── 2021-06-13--17.csv # All readings recorded between the hours of 17:00 and 18:00.
   ├── 2021-06-13--18.csv
   ├── 2021-06-13--19.csv
   ├── 2021-06-13--20.csv
   ├── 2021-06-13--21.csv
   ├── 2021-06-13--22.csv
   ├── 2021-06-13--23.csv
   └── 2021-06-13--24.csv
├── 2021-06-13.csv # All readings recorded on 2021-06-13 (13th June 2021).
# This file/directory pattern repeats for all dates upto end of July...
├── 2021-07-30
   ├── 2021-07-30--00.csv
   ├── 2021-07-30--01.csv # All readings recorded between the hours 01:00 and 02:00.
# Usual '2021-07-30--07' type files here...
└── 2021-07-30.csv # All readings recorded on 2021-07-30 (30th July 2021).
48 directories, 1248 files
#+end_src
If you have run ~./seperator.sh~, the =/output= directory should now exist in
the project's root directory -- with nothing in it. This is where the charts
will go (or any other artefacts), after running the Python (.py) scripts. The
project's root directory should now look like the following,
#+begin_src shell :results code
tree -L 1
#+end_src
#+RESULTS:
#+begin_src shell
.
├── assets
├── dailies-overlayed.py
├── dailies-side-by-side.py
├── daily-breakdowns.py
├── daily-totals.py
├── data # This should have a load of new files and directories...
├── day-to-day-comparisons.py
├── LICENSE
├── lm1-hourly-totals.py
├── lm2-hourly-totals.py
├── output # *NEW* This should now exist and charts go in here...
├── README.org
├── requirements.txt
├── separator.sh
├── sql-statements.sql
├── totalilator.sh
└── venv # See 'Set-Up and Using the Code' section if you do not see this...
4 directories, 13 files
#+end_src
From here, you can either create some charts with the Python (.py) scripts and
take a look at them in =/output= or open the CSV files in =/data= and inspect the
data.
*** Overview of Scripts and Charts
- [[./docs][/docs]]
To stop this file growing too much, as I write more scripts which creates more
charts, all the overviews/summaries/breakdowns of the scripts and charts are in the
[[./docs][/docs]] directory. Some scripts and charts entries might be out-of-date or missing,
though.