Compare commits
11 Commits
Author | SHA1 | Date |
---|---|---|
Craig Oates | e7a78ae4cb | 4 years ago |
Craig Oates | 955cb55941 | 4 years ago |
Craig Oates | 47bc90cf85 | 4 years ago |
Craig Oates | 5f805a9841 | 4 years ago |
Craig Oates | 89e9d6f0d4 | 4 years ago |
Craig Oates | 69e3a29c99 | 4 years ago |
Craig Oates | dc91abf4a7 | 4 years ago |
Craig Oates | a6776a963e | 4 years ago |
Craig Oates | fd00b314ae | 4 years ago |
Craig Oates | 0ef9366102 | 4 years ago |
Craig Oates | 38189510f7 | 4 years ago |
8 changed files with 259 additions and 2 deletions
@ -1,3 +1,44 @@
|
||||
# skivvy |
||||
# Skivvy |
||||
|
||||
A Python program which parses the data in the coblob database and transforms into a format which the co-data project can use. One of the main goals of this project is to reduce the load on the CPU in the co-data project. |
||||
This is Python program which parses the data in the coblob database and transforms into a format which the co-data project can use. One of the main goals of this project is to reduce the load on the CPU in the co-data project. |
||||
|
||||
## Quick Start |
||||
|
||||
1. `python3 -m venv venv` |
||||
2. `. venv/bin/activate` |
||||
3. `pip install -r requirements.txt` |
||||
|
||||
To run the program, run the following command (assuming you are in the project's root directory, |
||||
|
||||
```bash |
||||
# -v is for verbose output. Remove in not wanted. |
||||
# -t (target) is the directory you want the data to be save at. |
||||
# -t is required. |
||||
python app/main.py -t save/data/location/path -v |
||||
``` |
||||
|
||||
## Architecture Overview |
||||
|
||||
The program itself is situated in the `app` folder. The access point is `main.py` and the bulk of the work is shared between the code in the `coordinators` and `services` directories. |
||||
|
||||
``` |
||||
# The architecture's (layered) flow. |
||||
Input -> main.py -> coordinators -> services |
||||
| |
||||
Output <- main.py <- coordinators <- services |
||||
``` |
||||
|
||||
You should not need to touch much of the code in `main.py`. Its main focus is on stating the programs tasks at a high level. The calls in `main.py` are passed on to the `coordinators` layer which then makes the necessary function calls into `services` to reach the desired result, stated in `main.py`. The flow of the code is rigid, `main.py` does not interact with the `services` layer directly. It goes through the `coordinators` layer and the same thing applies to the code in the `services` layer (can't touch `main`). |
||||
|
||||
For the list of requirements for this project, please view the `requirements.txt` file in the project's root directory. |
||||
|
||||
## Note About Intended Usage of This Project |
||||
|
||||
While the program can be executed as a standalone thing. Its main reason for existing is to reduce the C.P.U. load on the [co-data](https://git.abbether.net/craig.oates/co-data) project. The way it does this is by having this run as a cron job once a day and co-data then uses and use the results from the job to build the charts (for that day). The data this program transforms is called/generated from the [co-api](https://git.abbether.net/craig.oates/co-api) project. The data needs to be transformed because it is not usable in its raw (REST-API) form when called directly from the co-api project. |
||||
|
||||
The rate of change with the (co-api) data is what brought about the decision to make this program. The rate is very slow and it is unnecessary for the server to constantly transform the data with every request it receives. This program acts as a cache for co-data to use. The reduction in data transformation work, also, reduces the load on the C.P.U. at the time of a web request. |
||||
|
||||
Debian (or Debian based) operating systems are the intended systems for this program to run on. To set the cron job for these systems use `crontab -e`. |
||||
When the file is open, enter the following to make this program run once a day at 6 A.M. `0 06 * * * /path/to/venv/python /path/to/project/app/main.py`. Do not forget to change the paths before saving the file. |
||||
|
||||
For the sake of clarity, make sure this program is on the same computer (local network at least) as the co-data project. It needs the data, otherwise it will not run as intended. |
||||
|
@ -0,0 +1,47 @@
|
||||
import requests |
||||
from services import art_services, data_services, logging_services |
||||
|
||||
def update_data(arguments): |
||||
directory = arguments.target |
||||
v_setting = arguments.verbose |
||||
v_out = logging_services.log # Partial function -- for brevity. |
||||
save = data_services.store_json # (P.F.) For easier reading. |
||||
|
||||
v_out(v_setting, "Beginning to update Art data...") |
||||
|
||||
try: |
||||
raw_art_data = data_services.get_json( |
||||
"https://api.craigoates.net/api/1.0/Artwork") |
||||
v_out(v_setting, "Data from API retrived.") |
||||
|
||||
save(art_services.get_creation_date_totals(raw_art_data), |
||||
f"{directory}/art_creation_dates.json") |
||||
v_out(v_setting, "Art creation dates processed.") |
||||
|
||||
save(art_services.get_db_column_totals(raw_art_data, "category"), |
||||
f"{directory}/art_category.json") |
||||
v_out(v_setting, "Art categories processed.") |
||||
|
||||
save(art_services.get_db_column_totals(raw_art_data, "medium"), |
||||
f"{directory}/art_medium.json") |
||||
v_out(v_setting, "Art medium(s) totals processed.") |
||||
|
||||
save(art_services.get_dimension_totals |
||||
(raw_art_data, "dimensions", "width"), |
||||
f"{directory}/art_width.json") |
||||
v_out(v_setting, "Art width totals processed.") |
||||
|
||||
save(art_services.get_dimension_totals |
||||
(raw_art_data, "dimensions", "height"), |
||||
f"{directory}/art_height.json") |
||||
v_out(v_setting, "Art height totals processed.") |
||||
|
||||
save(art_services.get_dimension_totals |
||||
(raw_art_data, "dimensions", "depth"), |
||||
f"{directory}/art_depth.json") |
||||
v_out(v_setting, "Art depth totals processed.") |
||||
|
||||
v_out(v_setting, "Completed updating Art data.") |
||||
|
||||
except Exception: |
||||
print("ERROR: [art_coordinator] Unable to update Art data.") |
@ -0,0 +1,11 @@
|
||||
from services import parser_services |
||||
from coordinators import art_coordinator |
||||
|
||||
def main(): |
||||
args = parser_services.create_args() |
||||
art_coordinator.update_data(args) |
||||
# update_software_data(args) # Future update. |
||||
# update_article_data(args) # Future update. |
||||
|
||||
if __name__ == "__main__": |
||||
main() |
@ -0,0 +1,92 @@
|
||||
from datetime import datetime |
||||
|
||||
''' |
||||
Note: Hard-Coding "Months" and "Days" Sets |
||||
====================================================================== |
||||
I have hardcoded the "months" and "days" sets because they are |
||||
fixed values -- unless something monumental happens scientifically |
||||
or politically. On top of that, this makes the graphs easier to |
||||
read because they are in chronological order. This is not |
||||
guaranteed if the "keys" for "months" and "sets" are formed from |
||||
the data-object this function receives. |
||||
Unfortunately, I cannot do the same for years. That will continue |
||||
to grow as the years roll on through here -- unless something |
||||
monumental happens scientifically or politically. |
||||
This code is intended to be used in graphs -- in co-data project. |
||||
''' |
||||
def get_creation_date_totals(data): |
||||
years = {} |
||||
|
||||
months = {"1": 0, "2": 0, "3": 0, "4": 0, "5": 0, "6": 0, |
||||
"7": 0, "8": 0, "9": 0,"10": 0, "11": 0, "12": 0} |
||||
|
||||
days = {"1": 0, "2": 0, "3": 0, "4": 0, "5": 0, "6": 0, |
||||
"7": 0, "8": 0, "9": 0,"10": 0, "11": 0, "12": 0, |
||||
"13": 0, "14": 0, "15": 0, "16": 0, "17": 0, "18": 0, |
||||
"19": 0, "20": 0, "21": 0,"22": 0, "23": 0, "24": 0, |
||||
"25": 0, "26": 0, "27": 0, "28": 0, "29": 0, "30": 0, |
||||
"31": 0} |
||||
|
||||
for item in data: |
||||
ft = datetime.fromisoformat(item["dateCreated"]) |
||||
|
||||
if str(ft.year) in years: |
||||
years[str(ft.year)] += 1 |
||||
else: |
||||
years[str(ft.year)] = 1 |
||||
|
||||
if str(ft.month) in months: |
||||
months[str(ft.month)] += 1 |
||||
else: |
||||
months[str(ft.month)] = 1 |
||||
|
||||
if str(ft.day) in days: |
||||
days[str(ft.day)] += 1 |
||||
else: |
||||
days[str(ft.day)] = 1 |
||||
|
||||
return [years, months, days] |
||||
|
||||
def get_category_totals(data): |
||||
categories = {} |
||||
for item in data: |
||||
cat = item["category"] |
||||
''' |
||||
The join and split is because the data returned from the A.P.I. |
||||
call contains a lot of white spaces. This just cleans it up. |
||||
The white space was, also, making the chart render incorrectly. |
||||
''' |
||||
cat = ''.join(cat.split()) |
||||
if cat in categories: |
||||
total = categories.get(cat) |
||||
categories[cat] = total + 1 |
||||
else: |
||||
categories[cat] = 1 |
||||
return categories |
||||
|
||||
def get_db_column_totals(data, column_name): |
||||
column_data = {} |
||||
for item in data: |
||||
col = item[column_name] |
||||
col = " ".join(col.split()) |
||||
# print(col) |
||||
if col in column_data: |
||||
total = column_data.get(col) |
||||
column_data[col] = total + 1 |
||||
else: |
||||
column_data[col] = 1 |
||||
return column_data |
||||
|
||||
def get_dimension_totals(data, column_name, dimension): |
||||
dimensions = {} |
||||
for item in data: |
||||
dim = item[column_name] |
||||
distance = dim[dimension]["value"]["distance"] |
||||
if distance is not None: |
||||
w = str(distance) |
||||
if w not in dimensions: |
||||
dimensions[str(distance)] = 1 |
||||
else: |
||||
total = dimensions.get(str(distance)) |
||||
dimensions[str(distance)] = total + 1 |
||||
return dimensions |
@ -0,0 +1,21 @@
|
||||
import requests |
||||
import json |
||||
|
||||
def get_data(url): |
||||
return requests.get(url) |
||||
|
||||
def get_json(url): |
||||
return requests.get(url).json() |
||||
|
||||
def store_json(data, file_name): |
||||
with open (file_name, "w") as outfile: |
||||
json.dump(data, outfile, indent = 4) |
||||
|
||||
def load_json(file_name): |
||||
with open(file_name, "r") as infile: |
||||
data = json.load(infile) |
||||
return data |
||||
|
||||
def store_txt(data, file_name): |
||||
with open(file_name, "w") as outfile: |
||||
outfile.write(data) |
@ -0,0 +1,6 @@
|
||||
# This is for outputting the program's status when the verbose switch |
||||
# is used. |
||||
|
||||
def log(log_output, message): |
||||
if log_output is True: |
||||
print(message) |
@ -0,0 +1,21 @@
|
||||
import argparse |
||||
import os |
||||
|
||||
def dir_path(string): |
||||
if os.path.isdir(string): |
||||
return string |
||||
else: |
||||
raise NotADirectoryError(string) |
||||
|
||||
def create_args(): |
||||
parser = argparse.ArgumentParser( |
||||
"Parses the coblob database and transforms it. " + |
||||
"This is mostly for the the benefit of the co-data project. " + |
||||
"It, also, requires access to the co-api project, via the internet.") |
||||
parser.add_argument("-t", "--target", type = dir_path, required = True, |
||||
help = "the location you would like the data to be stored at.") |
||||
parser.add_argument("-v", "--verbose", action = "store_true", |
||||
help = "provides detailed output when program is running.") |
||||
|
||||
args = parser.parse_args() |
||||
return args |
@ -0,0 +1,18 @@
|
||||
bokeh==2.0.2 |
||||
certifi==2020.4.5.1 |
||||
chardet==3.0.4 |
||||
idna==2.9 |
||||
Jinja2==2.11.2 |
||||
MarkupSafe==1.1.1 |
||||
numpy==1.18.3 |
||||
packaging==20.3 |
||||
Pillow==7.1.1 |
||||
pkg-resources==0.0.0 |
||||
pyparsing==2.4.7 |
||||
python-dateutil==2.8.1 |
||||
PyYAML==5.3.1 |
||||
requests==2.23.0 |
||||
six==1.14.0 |
||||
tornado==6.0.4 |
||||
typing-extensions==3.7.4.2 |
||||
urllib3==1.25.9 |
Reference in new issue