Compare commits

...
This repo is archived. You can view files and clone it, but cannot push or open issues/pull-requests.

11 Commits

  1. 45
      README.md
  2. 47
      app/coordinators/art_coordinator.py
  3. 11
      app/main.py
  4. 92
      app/services/art_services.py
  5. 21
      app/services/data_services.py
  6. 6
      app/services/logging_services.py
  7. 21
      app/services/parser_services.py
  8. 18
      requirements.txt

45
README.md

@ -1,3 +1,44 @@
# skivvy
# Skivvy
A Python program which parses the data in the coblob database and transforms into a format which the co-data project can use. One of the main goals of this project is to reduce the load on the CPU in the co-data project.
This is Python program which parses the data in the coblob database and transforms into a format which the co-data project can use. One of the main goals of this project is to reduce the load on the CPU in the co-data project.
## Quick Start
1. `python3 -m venv venv`
2. `. venv/bin/activate`
3. `pip install -r requirements.txt`
To run the program, run the following command (assuming you are in the project's root directory,
```bash
# -v is for verbose output. Remove in not wanted.
# -t (target) is the directory you want the data to be save at.
# -t is required.
python app/main.py -t save/data/location/path -v
```
## Architecture Overview
The program itself is situated in the `app` folder. The access point is `main.py` and the bulk of the work is shared between the code in the `coordinators` and `services` directories.
```
# The architecture's (layered) flow.
Input -> main.py -> coordinators -> services
|
Output <- main.py <- coordinators <- services
```
You should not need to touch much of the code in `main.py`. Its main focus is on stating the programs tasks at a high level. The calls in `main.py` are passed on to the `coordinators` layer which then makes the necessary function calls into `services` to reach the desired result, stated in `main.py`. The flow of the code is rigid, `main.py` does not interact with the `services` layer directly. It goes through the `coordinators` layer and the same thing applies to the code in the `services` layer (can't touch `main`).
For the list of requirements for this project, please view the `requirements.txt` file in the project's root directory.
## Note About Intended Usage of This Project
While the program can be executed as a standalone thing. Its main reason for existing is to reduce the C.P.U. load on the [co-data](https://git.abbether.net/craig.oates/co-data) project. The way it does this is by having this run as a cron job once a day and co-data then uses and use the results from the job to build the charts (for that day). The data this program transforms is called/generated from the [co-api](https://git.abbether.net/craig.oates/co-api) project. The data needs to be transformed because it is not usable in its raw (REST-API) form when called directly from the co-api project.
The rate of change with the (co-api) data is what brought about the decision to make this program. The rate is very slow and it is unnecessary for the server to constantly transform the data with every request it receives. This program acts as a cache for co-data to use. The reduction in data transformation work, also, reduces the load on the C.P.U. at the time of a web request.
Debian (or Debian based) operating systems are the intended systems for this program to run on. To set the cron job for these systems use `crontab -e`.
When the file is open, enter the following to make this program run once a day at 6 A.M. `0 06 * * * /path/to/venv/python /path/to/project/app/main.py`. Do not forget to change the paths before saving the file.
For the sake of clarity, make sure this program is on the same computer (local network at least) as the co-data project. It needs the data, otherwise it will not run as intended.

47
app/coordinators/art_coordinator.py

@ -0,0 +1,47 @@
import requests
from services import art_services, data_services, logging_services
def update_data(arguments):
directory = arguments.target
v_setting = arguments.verbose
v_out = logging_services.log # Partial function -- for brevity.
save = data_services.store_json # (P.F.) For easier reading.
v_out(v_setting, "Beginning to update Art data...")
try:
raw_art_data = data_services.get_json(
"https://api.craigoates.net/api/1.0/Artwork")
v_out(v_setting, "Data from API retrived.")
save(art_services.get_creation_date_totals(raw_art_data),
f"{directory}/art_creation_dates.json")
v_out(v_setting, "Art creation dates processed.")
save(art_services.get_db_column_totals(raw_art_data, "category"),
f"{directory}/art_category.json")
v_out(v_setting, "Art categories processed.")
save(art_services.get_db_column_totals(raw_art_data, "medium"),
f"{directory}/art_medium.json")
v_out(v_setting, "Art medium(s) totals processed.")
save(art_services.get_dimension_totals
(raw_art_data, "dimensions", "width"),
f"{directory}/art_width.json")
v_out(v_setting, "Art width totals processed.")
save(art_services.get_dimension_totals
(raw_art_data, "dimensions", "height"),
f"{directory}/art_height.json")
v_out(v_setting, "Art height totals processed.")
save(art_services.get_dimension_totals
(raw_art_data, "dimensions", "depth"),
f"{directory}/art_depth.json")
v_out(v_setting, "Art depth totals processed.")
v_out(v_setting, "Completed updating Art data.")
except Exception:
print("ERROR: [art_coordinator] Unable to update Art data.")

11
app/main.py

@ -0,0 +1,11 @@
from services import parser_services
from coordinators import art_coordinator
def main():
args = parser_services.create_args()
art_coordinator.update_data(args)
# update_software_data(args) # Future update.
# update_article_data(args) # Future update.
if __name__ == "__main__":
main()

92
app/services/art_services.py

@ -0,0 +1,92 @@
from datetime import datetime
'''
Note: Hard-Coding "Months" and "Days" Sets
======================================================================
I have hardcoded the "months" and "days" sets because they are
fixed values -- unless something monumental happens scientifically
or politically. On top of that, this makes the graphs easier to
read because they are in chronological order. This is not
guaranteed if the "keys" for "months" and "sets" are formed from
the data-object this function receives.
Unfortunately, I cannot do the same for years. That will continue
to grow as the years roll on through here -- unless something
monumental happens scientifically or politically.
This code is intended to be used in graphs -- in co-data project.
'''
def get_creation_date_totals(data):
years = {}
months = {"1": 0, "2": 0, "3": 0, "4": 0, "5": 0, "6": 0,
"7": 0, "8": 0, "9": 0,"10": 0, "11": 0, "12": 0}
days = {"1": 0, "2": 0, "3": 0, "4": 0, "5": 0, "6": 0,
"7": 0, "8": 0, "9": 0,"10": 0, "11": 0, "12": 0,
"13": 0, "14": 0, "15": 0, "16": 0, "17": 0, "18": 0,
"19": 0, "20": 0, "21": 0,"22": 0, "23": 0, "24": 0,
"25": 0, "26": 0, "27": 0, "28": 0, "29": 0, "30": 0,
"31": 0}
for item in data:
ft = datetime.fromisoformat(item["dateCreated"])
if str(ft.year) in years:
years[str(ft.year)] += 1
else:
years[str(ft.year)] = 1
if str(ft.month) in months:
months[str(ft.month)] += 1
else:
months[str(ft.month)] = 1
if str(ft.day) in days:
days[str(ft.day)] += 1
else:
days[str(ft.day)] = 1
return [years, months, days]
def get_category_totals(data):
categories = {}
for item in data:
cat = item["category"]
'''
The join and split is because the data returned from the A.P.I.
call contains a lot of white spaces. This just cleans it up.
The white space was, also, making the chart render incorrectly.
'''
cat = ''.join(cat.split())
if cat in categories:
total = categories.get(cat)
categories[cat] = total + 1
else:
categories[cat] = 1
return categories
def get_db_column_totals(data, column_name):
column_data = {}
for item in data:
col = item[column_name]
col = " ".join(col.split())
# print(col)
if col in column_data:
total = column_data.get(col)
column_data[col] = total + 1
else:
column_data[col] = 1
return column_data
def get_dimension_totals(data, column_name, dimension):
dimensions = {}
for item in data:
dim = item[column_name]
distance = dim[dimension]["value"]["distance"]
if distance is not None:
w = str(distance)
if w not in dimensions:
dimensions[str(distance)] = 1
else:
total = dimensions.get(str(distance))
dimensions[str(distance)] = total + 1
return dimensions

21
app/services/data_services.py

@ -0,0 +1,21 @@
import requests
import json
def get_data(url):
return requests.get(url)
def get_json(url):
return requests.get(url).json()
def store_json(data, file_name):
with open (file_name, "w") as outfile:
json.dump(data, outfile, indent = 4)
def load_json(file_name):
with open(file_name, "r") as infile:
data = json.load(infile)
return data
def store_txt(data, file_name):
with open(file_name, "w") as outfile:
outfile.write(data)

6
app/services/logging_services.py

@ -0,0 +1,6 @@
# This is for outputting the program's status when the verbose switch
# is used.
def log(log_output, message):
if log_output is True:
print(message)

21
app/services/parser_services.py

@ -0,0 +1,21 @@
import argparse
import os
def dir_path(string):
if os.path.isdir(string):
return string
else:
raise NotADirectoryError(string)
def create_args():
parser = argparse.ArgumentParser(
"Parses the coblob database and transforms it. " +
"This is mostly for the the benefit of the co-data project. " +
"It, also, requires access to the co-api project, via the internet.")
parser.add_argument("-t", "--target", type = dir_path, required = True,
help = "the location you would like the data to be stored at.")
parser.add_argument("-v", "--verbose", action = "store_true",
help = "provides detailed output when program is running.")
args = parser.parse_args()
return args

18
requirements.txt

@ -0,0 +1,18 @@
bokeh==2.0.2
certifi==2020.4.5.1
chardet==3.0.4
idna==2.9
Jinja2==2.11.2
MarkupSafe==1.1.1
numpy==1.18.3
packaging==20.3
Pillow==7.1.1
pkg-resources==0.0.0
pyparsing==2.4.7
python-dateutil==2.8.1
PyYAML==5.3.1
requests==2.23.0
six==1.14.0
tornado==6.0.4
typing-extensions==3.7.4.2
urllib3==1.25.9