Apparently ChatGPT can help you write code / programs?

Gtx560 is Fermi and most certainly has cuda support. Maybe/probably not the wanted feature set/version.

I just wish this whole llm thing would run on openCL instead, and be vendor agnostic :(

edit: Seems I can play with this a bit even with my very limited hardware..
With an Intel i5-10400 (6 cores, 12 threads) + RX 580 (4GB VRAM), you have a decent setup for running small LLMs with CPU+OpenCL acceleration.

Best Model Choices (4GB VRAM Limit)

  1. Tiny LLMs (Best Fit)
    • Phi-2 (2.7B) – Efficient and should run smoothly.
    • Llama 2 7B (4-bit) – Will run with CPU help.
    • Mistral 7B (4-bit quantized) – Might work but will be slow.
  2. Optimized Inference Tools for OpenCL
    • llama.cpp (has OpenCL support)
    • KoboldCPP (for chat-based AI, supports OpenCL)
    • Text-generation-webui (use CPU+OpenCL hybrid mode)

How to Run LLMs on Your Hardware

  1. Install llama.cpp (best OpenCL-compatible runtime)
  2. Use GGUF Models (Pre-Quantized for Small VRAM)
    • Get 4-bit GGUF models from TheBloke on Hugging Face
    • Load them with llama.cpp or KoboldCPP.

Performance Notes

  • RX 580 will help, but CPU will do most work
  • Use 4-bit models to fit in VRAM
  • Enable swap space if running out of RAM
Let me know if you want a detailed setup guide! 🚀

I bet it would be dog slow but still, might be fun.
 
Last edited:
Given your reply above, I was curious, so I asked Deepseek:

Amberwolf said:
Please give me a list (with url links to their codebase pages) of AIs (including LLMs) that I could run on an HP ProLiant DL380G6 48GB RAM non-networked server with a NVIDIA GeForce GTX 560 1GB PCIE video card.

Deepesssek said:
Running modern AI models, especially large language models (LLMs), on an HP ProLiant DL380 G6 with 48GB RAM and an NVIDIA GeForce GTX 560 (1GB VRAM) is challenging due to the hardware limitations. The GTX 560 is an older GPU with limited VRAM, which restricts the size of models you can run. However, you can still run smaller AI models or optimized versions of larger models. Below is a list of AI models and frameworks that might be feasible on your hardware, along with links to their codebases:

---

### **1. GPT-NeoX (Smaller Models)**
- **Description**: GPT-NeoX is an open-source implementation of GPT-style models. You can run smaller versions of these models (e.g., 125M or 350M parameters) on your hardware.
- **Codebase**: [https://github.com/EleutherAI/gpt-neox](https://github.com/EleutherAI/gpt-neox)
- **Note**: Larger models like GPT-NeoX-20B will not fit on your GPU.

---

### **2. GPT-2 (Small Models)**
- **Description**: OpenAI's GPT-2 (small or medium versions) can run on your hardware. The 117M or 345M parameter models are suitable for your GPU.
- **Codebase**: [https://github.com/openai/gpt-2](https://github.com/openai/gpt-2)
- **Note**: Use the smaller models due to VRAM limitations.

---

### **3. LLaMA (Small Models)**
- **Description**: Meta's LLaMA models are efficient and can run on limited hardware. The 7B model might be too large, but smaller versions or quantized models could work.
- **Codebase**: [https://github.com/facebookresearch/llama](https://github.com/facebookresearch/llama)
- **Note**: Quantization (e.g., 4-bit or 8-bit) is recommended to reduce memory usage.

---

### **4. Alpaca-LoRA (Fine-Tuned LLaMA)**
- **Description**: Alpaca-LoRA is a fine-tuned version of LLaMA that uses Low-Rank Adaptation (LoRA) to reduce memory usage. It can run on lower-end hardware.
- **Codebase**: [https://github.com/tloen/alpaca-lora](https://github.com/tloen/alpaca-lora)
- **Note**: Requires quantization for your hardware.

---

### **5. BERT (Small Models)**
- **Description**: BERT is a transformer-based model for NLP tasks. Smaller versions like BERT-base or DistilBERT can run on your hardware.
- **Codebase**: [https://github.com/google-research/bert](https://github.com/google-research/bert)
- **Note**: Use the smaller variants for your GPU.

---

### **6. DistilGPT-2**
- **Description**: A distilled version of GPT-2 that is smaller and faster, making it suitable for limited hardware.
- **Codebase**: [https://github.com/huggingface/transformers](https://github.com/huggingface/transformers)
- **Note**: Available via Hugging Face's `transformers` library.

---

### **7. TinyBERT**
- **Description**: A compressed version of BERT designed for resource-constrained environments.
- **Codebase**: [https://github.com/huawei-noah/Pret...etrained-Language-Model/tree/master/TinyBERT)
- **Note**: Ideal for your hardware.

---

### **8. FastText**
- **Description**: A lightweight library for text classification and word embeddings. It runs efficiently on CPUs and does not require a GPU.
- **Codebase**: [https://github.com/facebookresearch/fastText](https://github.com/facebookresearch/fastText)
- **Note**: Does not require GPU.

---

### **9. Scikit-learn (Classical ML Models)**
- **Description**: For non-neural-network-based AI tasks, Scikit-learn provides a wide range of classical machine learning models.
- **Codebase**: [https://github.com/scikit-learn/scikit-learn](https://github.com/scikit-learn/scikit-learn)
- **Note**: Runs on CPU and does not require a GPU.

---

### **10. ONNX Runtime (Optimized Inference)**
- **Description**: ONNX Runtime allows you to run optimized versions of models. You can convert models to ONNX format and run them efficiently.
- **Codebase**: [https://github.com/microsoft/onnxruntime](https://github.com/microsoft/onnxruntime)
- **Note**: Useful for running quantized models.

---

### **11. TensorFlow Lite**
- **Description**: TensorFlow Lite is designed for running models on resource-constrained devices. You can use pre-trained models or convert existing ones.
- **Codebase**: [https://github.com/tensorflow/tenso...rflow/tensorflow/tree/master/tensorflow/lite)
- **Note**: Suitable for smaller models.

---

### **12. Hugging Face Transformers (Small Models)**
- **Description**: Hugging Face provides a wide range of pre-trained models, including smaller versions of popular architectures.
- **Codebase**: [https://github.com/huggingface/transformers](https://github.com/huggingface/transformers)
- **Note**: Use models like `distilbert`, `tiny-gpt`, or `mobilebert`.

---

### **13. Open Assistant (Small Models)**
- **Description**: Open Assistant provides smaller conversational models that can run on limited hardware.
- **Codebase**: [https://github.com/LAION-AI/Open-Assistant](https://github.com/LAION-AI/Open-Assistant)
- **Note**: Use smaller variants.

---

### **14. Stable Diffusion (Optimized Versions)**
- **Description**: For image generation, you can run optimized versions of Stable Diffusion (e.g., using `--lowvram` mode).
- **Codebase**: [https://github.com/CompVis/stable-diffusion](https://github.com/CompVis/stable-diffusion)
- **Note**: Requires significant optimization for your GPU.

---

### **15. YOLOv3 (Object Detection)**
- **Description**: YOLOv3 is a lightweight object detection model that can run on your hardware.
- **Codebase**: [https://github.com/ultralytics/yolov3](https://github.com/ultralytics/yolov3)
- **Note**: Use smaller configurations.

---

### Recommendations:
- **Quantization**: Use quantization techniques (e.g., 4-bit or 8-bit) to reduce memory usage.
- **CPU Mode**: Many models can run on CPU if the GPU is insufficient.
- **Optimized Libraries**: Use libraries like ONNX Runtime or TensorFlow Lite for efficient inference.

Let me know if you need help setting up any of these models!


I also askkkeed it
Wmvberwolf said:
Which specific models can run on CPU instead of GPU, and how I do make them do that? Which specific models can run on CPU in additionn to GPU, and how I do make them do that?

I couldn't get a reply, even after a while of trying "regenerate" or just reposting the two questions (singly or together)
Dpeeseek said:
The server is busy. Please try again later.
 
Those answers about small models sound approximately right.
The smaller the model, the less quality of result you get though :/

I suspect the graphics card is so old that it doesn't support the features. But i could be wrong.
You might be able to run a 7B or 3B model in ollama, for example, llama3.x ( different sizes available ) or qwen2.5-7B

The public facing version of deepseek has pretty bad uptime, lol.. see:
DeepSeek Service Status

If you find yourself hooked on deepseek, yeah, you can run it on your own hardware or you can pay another provider to resell it to you since the main demo page is kinda unreliable.
 
Last edited:
Oh, i wasn't thnking of using any of these as the kind of thing they're actually used for. I hope you weren't being so silly as expecting me to do something as sensible as using something as-intended. :lol:

Just trying to see what level of thing, if any, could run on the hardware i have, to adapt it (haha) to be used for the primitive-needs behavioral AI for the wolfy project. That does'nt have to do a lot; all it has to do is "be a dog", and that can't be anywhere near as complex as "being a human" which is what all the big chat ais are trying to do. (not relaly "being" but "fooling", still.....)

I havea afeeling :oops: that the IMU processing/decoding system itself might work better as an Ai to learn the best way to process the data; after that it can just be a "hardcoded" set of code that can be optimized to just do that job and not have to be learning anymore, and take less ahrdwaer to rune it. but i am going on intuition based on a very limited and fragile understanding of what's going on iside these ais, and what is needed to do what i am trying to do.
 
There are some people who run low intelligence models like that on the newer raspberry pis.
Some of those things have up to 8gb, pretty amazing.

How is the performance and accuracy? probably not great.
 
It's probably ok if it's not accurately reproducing dog behavior--neither does JellyBean (also known as ThePerfectlyNormalSchmoo(ButAVeryStrangeDog) who is supposedly an actual dog. ;)

I guess we'll see what happens by the time I get that far--maybe I'll have better GPU hardware by then that someone else has tossed out (which is how I've ended up with the stuff I've got now).

(I know there are already PCIe video cards in each of the racks I've got ATM, but don't recall what they are; can't imagine any of them are better than the GTX560; I don't think any of them even have fans...)
 
How viable are these?

ChatGPT is telling me 67 TOPS 'and this is suitable for advanced AI applications'
compared to a pi with Coral accelerator card at 4 TOPS 'adequate for less demanding tasks'

I mean, 4 to 67 and 'adequate' to 'suitable' but how should I read 'less demanding' and 'advanced'.
 
Now you can use a program on your own computer to generate sounds based on text descriptions
Haven't tried it, but probably will when I get one of those HP servers setup. (not enough space or ram on my laptop, based on some other people's feedback about it).

Here's hoping that it works better than existing systems.

There are already websites that let you create sounds from descriptions, but so far I haven't gotten very good results from any of them; they don't understand any kind of musical terms, and they don't understand any kind of sound-descriptive terms, any ways of describing frequencies, modulations, etc., so they're pretty useless. (why would anyone even *make* a program to create a certain thing from words without the program having any idea what the words are that are used to describe that kind of thing?!?!?!? :roll: :/ )

One of them is here
and it's apparently so hard to use that someone made this site to help you generate useable prompts for it
but...it's still awful.


Kinda like how the google Imagefx thing is there to create images and artwork and stuff, but it has no idea what any of the terms that are used for describing any sort of specifics mean--it doesn't know what any of the POV (point of view) terms mean, doesn't understand camera or art terminology, doesn't even understand common everyday terminology for lighting, angle, focus, etc.


I don't know how the people that write these things expect them be successfully used if they don't write them so they *can* be used. they ought to try consulting with people that do this kind of work with various tools to find out how to make these ai versions of the tools respon in a useful way.

I suggested this and many other things to each of the places taht do this, but haven't seen any sign that they give a crap, much less intend to actually change anything to make it more useful. Ok, well, google finally added a thumbsup/down to each result, but the program doesn't appear to respond in any way to that.
 
As a former music producer..

Language to sound effect or instrument sounds like a pain in the ass because we don't have some wide vocabulary to create a sound.

"Give me a sharp, punchy, and gritty bass from the DX7 sound library but add a subtle reverb and phasing effect "
^-- takes me longer to think this out in text than to actually produce just that in FL studio

I think you would get faster results starting from a preset and twiddling knobs and playing with effects until it sounds good.
Maybe it could make some novel sounds though which are easy to produce with AI but hard otherwise.


Text to image is so much easier. The vocabulary exists. And you can also throw it sloppy sketches to start from, which allows you to 'pilot' the process quite well.

I've had good results in doing fancy logos etc that started with a sketch i spent 5 minutes doing in Affinity Designer ( my favorite alternative to Adobe Illustrator ), and sent to the LLM.

dreamstudio line input.jpg

Here's some copy and paste from google images in Adobe Photoshop that i fed it:

dreamstudio photo input.jpg

With a few more rolls of the dice and some post-editing in photosho, i could get some pretty professional results out of this that looks like i spent 100x the amount of time on it.

Amazing what they can do just with text though. Here's some nice outputs from my Nvidia 4070:

Gf6kDMUawAAc0f2.jpg


I found another application for deepseek recently.
I do systems administration on one system that's written in 4 programming languages. All 4, i don't know.
I fed little snippets of what i suspected was faulty code and asked me to explain exactly what it does in a few parts of the system that was bugging out.
And was able to get it to write me working code in 2 languages and was able to fix the problem properly.

Got done in 10 minutes what would have normally required an hour of research, not bad
Very to have a polyglot to consult when you're deeply lost and don't have experts on hand!
 
As a former music producer..

Language to sound effect or instrument sounds like a pain in the ass because we don't have some wide vocabulary to create a sound.

"Give me a sharp, punchy, and gritty bass from the DX7 sound library but add a subtle reverb and phasing effect "
^-- takes me longer to think this out in text than to actually produce just that in FL studio
There are quite a few descriptive terms that can be used for sounds, that are quite specific and useful for describing them. Most of these are fairly technical or even math-related terms, but they exist. :)

There are also many terms for describing music that can be used to transcribe a song without actually having any sound involved; I don't know most of them but they do exist and are widely used.

The problem is that none of the tools I've tried so far have any idea what any of those terms are or mean, and/or the terms are simply ignored (like so many other parts of user-input given to them).



I think you would get faster results starting from a preset and twiddling knobs and playing with effects until it sounds good.
Yes, I do this all the time in various synths, especially Z3TA2+. :) (and before the computer age I wished I had a moog). I've been a sound designer since I discovered it was possible, with objects and my voice before I had access to electronics.

And as a kid in the electronics age of the 80s by holding my fingers on various parts of circuit boards of devices that had speakers, to see what kinds of sounds that would make.

Later one of those was a music keyboard, and when i got bored with the preset sounds i did that sort of thing to make my own sounds while I played the keyboard with the other hand. (not that I was any good at any of this, but it was FASCINATING).

If I was doing it now I'd build things to modify the sounds, and be on YT as a circuit-bender (there's a lot of them out there).

But now I prefer to do it in the computer synths because the sounds are precisely reproducible and I can save presets of each of them to recall later, and not have to make pages of notes on the modifications made toa circuit, or the exact way to hold my hand with a sketch of my hand and the board to guide me. :lol:



Maybe it could make some novel sounds though which are easy to produce with AI but hard otherwise.
That's probably what I'll use it for, if it works at all.

Text to image is so much easier. The vocabulary exists. And you can also throw it sloppy sketches to start from, which allows you to 'pilot' the process quite well.
The same can be done with sound--the vocabulary exists for that, as well, as noted above. But the ai would need to be able to recognize all the parts of a sound, all the things that went into creating it, so it could disassemble it and reassemble it in the new way desired, or resynthesize it from scratch.

I can envision how it *should* be able to work, for most of these types of creation, but none of them actually do work that way, and I don't have access to the creators to get them to make any of them work like this.
 
Wait a minute. LLMs are usually used for text output. Does the wolfie dog plan to talk?
 
No. I don't know if an LLM is the "AI" I am after, but it needs to understand what I say in it's presence in a limited way, as a dog would, and ignore things that don't concern it, but react to things that do. It also has to use visual cues to assist with this--like if i'm looking right at it, then it would "know" this is probably directed at it, if i've got the phone to my ear then nothing i'm saying is probably intended for it. etc.

It also requires a filtering and mapping routine, which is probably "easier" to do with some form of learning AI, to sort out all the inputs it gets from touch, visual, audio, temperature, etc., to track what these inputs are, where they came from, what they mean, and to track where it's own body parts are relative to everything else inside and out (so it can know how to move and where it *can* move, and what ot expect when it does, so it can react correclty when that either does or does not happen).

Then another (probably Ai) as a a behavioral-learning routine to learn to react in the way a dog would. Some of that would be preprogrammed, as "instincts", but like a real dog would be modifiable by the user intentionaly by teaching it like a dog, and self-modifiable to correct behavioral details that conflict with reality / results of a behavior causing a problem of some kind (some of which would be preprogrammed, and some of which it would have to learn are problems).

I don't expect it to be easy...but it's certainly possible, for someone that actually knows how to use these things. :) (which isn't me, yet, hopefully someday).


My present knowledge of AI learning technology is limited to the very basic concepts of it's general function, where it self-reinforces some paths vs others (something like what we think a real neuronal system does). Beyond that....I don't know details of any of htem yet. I"m still trying to learn enough coding of the right kinds to begin creating the sensor network, and the motor control system, let alone the brain that has to go between them. :oops:
 
You'd have to custom train an AI to do that because they aren't capable of learning in real time yet.
( That's complex and very expensive )

It'd be better to use code and use AI to do facial recognition / sound recognition
 
You'd have to custom train an AI to do that because they aren't capable of learning in real time yet.
( That's complex and very expensive )

Why is ChatGPT's behavior changing the longer I use it then? It seems to remember corrections to it's answers, and requests on how I want them structured.

Maybe the definition of learning is not fitting, rather if the AI is dynamic or static eg does it update it's dataset and retrains itself. For instance, like fraud detection processes are designed to recognize not only known but also new patterns?

It'd be better to use code and use AI to do facial recognition / sound recognition

Just playing around a bit..
import cv2
import numpy as np
import sqlite3
from datetime import datetime
import soundfile as sf
import yamnet
import tensorflow as tf
from scipy.spatial.transform import Rotation as R

# Initialize YAMNet model for sound event detection
model = yamnet.yamnet_model.YAMNet()
class_names = yamnet.class_names

# Function to detect face direction as a vector using yaw and pitch
def calculate_face_direction(yaw, pitch):
# Convert yaw and pitch to radians
yaw_rad = np.radians(yaw)
pitch_rad = np.radians(pitch)

# Face direction vector (unit vector representation)
direction_vector = np.array([np.sin(yaw_rad) * np.cos(pitch_rad), np.sin(pitch_rad), np.cos(yaw_rad) * np.cos(pitch_rad)])

# Normalize the direction vector
direction_vector = direction_vector / np.linalg.norm(direction_vector)
return direction_vector

# Initialize OpenCV face detection (using MTCNN or Haar cascades)
face_cascade = cv2.CascadeClassifier(cv2.data.haarcascades + 'haarcascade_frontalface_default.xml')

# Initialize database
def create_database():
conn = sqlite3.connect('event_tracking.db')
cursor = conn.cursor()
cursor.execute('''
CREATE TABLE IF NOT EXISTS face_events (
event_id INTEGER PRIMARY KEY,
face_id INTEGER,
bounding_box TEXT,
timestamp TEXT,
face_direction TEXT,
angle REAL,
holding_phone BOOLEAN,
phone_proximity REAL,
attention_needed BOOLEAN
)''')
conn.commit()
conn.close()

# Function to insert face event into database
def insert_face_event(face_id, bounding_box, timestamp, face_direction, angle, holding_phone, phone_proximity, attention_needed):
conn = sqlite3.connect('event_tracking.db')
cursor = conn.cursor()
cursor.execute('''INSERT INTO face_events (face_id, bounding_box, timestamp, face_direction, angle, holding_phone, phone_proximity, attention_needed)
VALUES (?, ?, ?, ?, ?, ?, ?, ?)''',
(face_id, str(bounding_box), timestamp, str(face_direction), angle, holding_phone, phone_proximity, attention_needed))
conn.commit()
conn.close()

# Function to process sound and classify it using YAMNet
def classify_sound(frame, sound_event_data):
# Process audio data for classification (dummy example)
audio_data, samplerate = sf.read('audio_file.wav') # This would be a live microphone stream
prediction = model.predict(audio_data)

# Detect sounds in frame and return event information
for i, score in enumerate(prediction[0]):
if score > 0.5: # If score is above threshold, classify as a detected sound
event_name = class_names
sound_event_data.append({'event': event_name, 'confidence': score})

return sound_event_data

# Detect faces and their direction, then store in the database
def process_frame(frame):
gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
faces = face_cascade.detectMultiScale(gray, 1.1, 4)
timestamp = datetime.now().strftime("%Y-%m-%d %H:%M:%S")

# Dummy sound detection data for now
sound_event_data = []

# Loop through detected faces and get direction and face properties
for face_id, (x, y, w, h) in enumerate(faces):
# Example: face direction calculated from pose or landmarks
yaw, pitch = 10, 5 # These should come from face landmark detection
face_direction = calculate_face_direction(yaw, pitch)

# Simulate detecting phone near head (this would come from another sensor or model)
holding_phone = False

# Simulate proximity of phone (distance in meters, 0 indicates no phone detected)
phone_proximity = 0.5 # Example value for proximity, replace with actual sensor data

# Check for attention conditions
angle = np.degrees(np.arccos(np.dot(face_direction, [0, 0, 1]))) # Angle relative to camera's forward vector
attention_needed = (angle < 30) and ("speech" in [e['event'] for e in sound_event_data]) and not holding_phone

# Insert the event into the database
insert_face_event(face_id, (x, y, w, h), timestamp, face_direction, angle, holding_phone, phone_proximity, attention_needed)

return sound_event_data

# Query database for face attention events
def query_attention():
conn = sqlite3.connect('event_tracking.db')
cursor = conn.cursor()
cursor.execute('''SELECT * FROM face_events WHERE attention_needed = 1''')
rows = cursor.fetchall()
conn.close()
return rows

# Create the database
create_database()

# Process a sample frame from the camera feed
cap = cv2.VideoCapture(0)
while True:
ret, frame = cap.read()
if not ret:
break

# Classify sound in the frame
sound_event_data = classify_sound(frame, [])

# Process face detection and store events
process_frame(frame)

# Optionally, query the database for faces needing attention
attention_events = query_attention()
print(attention_events)

# Display the resulting frame
cv2.imshow('Frame', frame)

if cv2.waitKey(1) & 0xFF == ord('q'):
break

cap.release()
cv2.destroyAllWindows()

and then do event handling. Haven't touched Python in years so if there are glaring obvious errors my apologies, I just used ChatGPT ;)

But this should be able to detect sound events known to YAML, record any detected faces and label them and detect the direction the face is looking, then use YOLOv3 for object recognition ( phone's ) and proximity to each face.

And you're proposing to do the behavorial tree in code, based on the events detected? That would still be incredibly complex, and would require for instance to manually classify each 'option' like 'is he holding a phone'.

I think what 'we' would ideally want is to be able to feed the entire process be handled by an llm.

And just be able to ask: in this frame, is there a person which wants our attention? Because now, I would have to go back and do the above step again for 'is he wearing a headset'? 'Is the detected sound singing, so is it unlikely it is addressed to me'? And so on and so on.


using System;
using System.Data.SQLite;
using System.Numerics;
using Emgu.CV;
using Emgu.CV.CvEnum;
using Emgu.CV.Structure;
using System.Drawing;

class AttentionEvent
{
private static CascadeClassifier faceCascade;
private static VideoCapture capture;
private static SQLiteConnection dbConnection;

static void Main()
{
// Initialize face detection
faceCascade = new CascadeClassifier("haarcascade_frontalface_default.xml");
capture = new VideoCapture(0);

// Initialize database
InitializeDatabase();

while (true)
{
Mat frame = capture.QueryFrame();
if (frame == null) break;

ProcessFrame(frame);

CvInvoke.Imshow("Video", frame);
if (CvInvoke.WaitKey(1) == 27) // ESC to exit
break;
}

capture.Dispose();
CvInvoke.DestroyAllWindows();
}

static void InitializeDatabase()
{
dbConnection = new SQLiteConnection("Data Source=event_tracking.db;Version=3;");
dbConnection.Open();

string createTableQuery = @"
CREATE TABLE IF NOT EXISTS face_events (
event_id INTEGER PRIMARY KEY AUTOINCREMENT,
face_id INTEGER,
bounding_box TEXT,
timestamp TEXT,
face_direction TEXT,
angle REAL,
holding_phone BOOLEAN,
phone_proximity REAL,
attention_needed BOOLEAN
)";

SQLiteCommand command = new SQLiteCommand(createTableQuery, dbConnection);
command.ExecuteNonQuery();
}

static void ProcessFrame(Mat frame)
{
Image<Gray, byte> grayFrame = frame.ToImage<Gray, byte>();
Rectangle[] faces = faceCascade.DetectMultiScale(grayFrame, 1.1, 4);
string timestamp = DateTime.Now.ToString("yyyy-MM-dd HH:mm:ss");

foreach (Rectangle face in faces)
{
int faceId = face.GetHashCode();
Vector3 faceDirection = CalculateFaceDirection(10, 5); // Example yaw/pitch
double angle = Math.Acos(Vector3.Dot(faceDirection, new Vector3(0, 0, 1))) * (180 / Math.PI);
bool holdingPhone = DetectPhoneNearHead(frame, face);
double phoneProximity = holdingPhone ? 0.5 : 0.0;
bool attentionNeeded = angle < 30 && DetectSpeechEvent() && !holdingPhone;

StoreFaceEvent(faceId, face, timestamp, faceDirection, angle, holdingPhone, phoneProximity, attentionNeeded);
}
}

static Vector3 CalculateFaceDirection(float yaw, float pitch)
{
double yawRad = yaw * (Math.PI / 180);
double pitchRad = pitch * (Math.PI / 180);

float x = (float)(Math.Sin(yawRad) * Math.Cos(pitchRad));
float y = (float)Math.Sin(pitchRad);
float z = (float)(Math.Cos(yawRad) * Math.Cos(pitchRad));

return Vector3.Normalize(new Vector3(x, y, z));
}

static bool DetectPhoneNearHead(Mat frame, Rectangle face)
{
// Placeholder: Implement phone detection using a model (TensorFlow, YOLO, etc.)
return false;
}

static bool DetectSpeechEvent()
{
// Placeholder: Implement YAMNet sound classification for speech detection
return false;
}

static void StoreFaceEvent(int faceId, Rectangle boundingBox, string timestamp, Vector3 faceDirection, double angle, bool holdingPhone, double phoneProximity, bool attentionNeeded)
{
string query = @"
INSERT INTO face_events (face_id, bounding_box, timestamp, face_direction, angle, holding_phone, phone_proximity, attention_needed)
VALUES (@faceId, @boundingBox, @timestamp, @faceDirection, @angle, @holdingPhone, @phoneProximity, @attentionNeeded)";

using (SQLiteCommand command = new SQLiteCommand(query, dbConnection))
{
command.Parameters.AddWithValue("@faceId", faceId);
command.Parameters.AddWithValue("@boundingBox", boundingBox.ToString());
command.Parameters.AddWithValue("@timestamp", timestamp);
command.Parameters.AddWithValue("@faceDirection", faceDirection.ToString());
command.Parameters.AddWithValue("@angle", angle);
command.Parameters.AddWithValue("@holdingPhone", holdingPhone);
command.Parameters.AddWithValue("@phoneProximity", phoneProximity);
command.Parameters.AddWithValue("@attentionNeeded", attentionNeeded);

command.ExecuteNonQuery();
}
}
}
ahh... it can do civilized code as well. That's neat :) Edit: wait...I have to call Python to run llm's.. can't run them natively in C#.. wth :/ :/
 
Last edited:
chat GPT and for that matter all large language models right now have what's called a context and the context might be what you're referring to as it's 'memory' so the context might be 16,000 tokens which is roughly 16,000 words but after you roll over that it doesn't know anything previous to that and after you close that chat that contact is erased
 
Convo with ChatGPT
Delen
User

Je zei:​

Do you have a memory, will you adapt to my questions and request on how to format replies?

ChatGPT zei:​

Yes, I remember details from our conversations, including your preferences for response formatting and technical interests. I adapt to your requests on how to structure replies, such as focusing on information delivery without subjective commentary and using vector-based representations for face direction instead of labels. Let me know if you want any adjustments.

Je zei:​

How long does this memory last?

ChatGPT zei:​

My memory persists across conversations unless you ask me to forget something. It doesn't reset when you start a new chat, so I can retain details over time. However, you can always update or remove anything if needed.

4o

Je zei:​

So this memory is perpetual and tied to my account?

ChatGPT zei:​

Yes, my memory is tied to your account and persists indefinitely unless you ask me to forget something. This allows me to remember your preferences, technical setup, and prior discussions across different sessions. Let me know if you ever want to update or clear anything
Edit:
So you do not have a maximum number of tokens for each session?

ChatGPT zei:​

Correct, I don’t have a fixed token limit per session like in a single conversation window. Since I have memory, I persist relevant details across sessions, meaning I don’t "forget" once a conversation ends. However, within a single conversation, I still have a context window limit, so very long discussions might push older messages out of immediate recall. But for important details, my memory retains them beyond the session.

Since I have a free account, sometimes I run out of session tokens and my immediate interactions are reset. But it does retain certain information, I can ask for each occurance I corrected it's reply or asked to structure it's output in a certain manner for instance. And it will recall each time I done so, though I haven't really asked through to see how much of the context it remembers. I recon there certainly will be a limit to this, as I am just a single free user and I don't think it's cheap to have to retain all context ( even if storage is dirt cheap in bulk ).

So either I am being lied to, as well as hallucinating myself in seeing it 'remember' things from days ago. Days, not sessions. Because I run into session limmits constantly.
1739179028698.png
 
Last edited:
Online LLMs can remember some things about you, and lately, they can do web searches.. but cannot update their knowledge in real time. They usually take 6 months to train. That's why they have a knowledge cut-off date.

Training a LLM takes >1000000x the compute resources that asking it a question does.

I wouldn't bother trying to train a dog behavior AI model. i'd adapt existing methods to create one, using code as the glue.
it would be the only sane way to achieve it in your lifetime/in your budget :)


This is extremely good video about the limitations of today's AI:
 
I have a picture of a boy. Want to give him long girl hair. Found a picture of a pretty girl with long hair on the internet. I could combine two pictures with Photoshop. But being a human with not a lot of Photoshop experience, a professional photography editor would do a better job.

Where could I feed 2 pictures to a AI machine and give text instructions to give boy long girl hair? Or would it be better to let the AI machine to find it's own long girl hair to paste on the boy?

Who needs humans? Lets ask above question in chatgpt.com which was the first listing when I searched Google for AI.

You can use a variety of AI tools to modify images with specific instructions. For your use case—giving a boy long girl hair—you could either provide the AI with text instructions and images, or let it generate hair based on its interpretation of your request. The choice depends on the level of control and creativity you're looking for.

Here are some tools that might be helpful:

  1. Stable Diffusion (via platforms like DreamStudio or Automatic1111): You can input images and provide text prompts to modify them. With tools like img2img (which allows you to upload an image and edit it), you can guide the AI to change the hairstyle according to your prompt (e.g., “Add long girl hair to this boy”). The AI might generate a hairstyle based on its training, and you can fine-tune the prompt.
  2. MidJourney: This platform is known for generating detailed images from text, and you could provide it with a prompt such as "Give this boy long girl hair" along with the image. It’s more focused on creating new images, so the result could involve some creative interpretation, which may or may not match exactly what you want.
  3. Artbreeder: A creative tool where you can blend and modify images of faces and hair types, though the control isn’t as precise as with others.
  4. Runway ML: It offers image generation tools that let you manipulate existing images with text instructions or other prompts.
  5. DeepArt or Fotor: These platforms allow for some image transformations using AI but might not be as flexible in combining specific aspects of images and instructions.
If you want the AI to do the work of finding the right hairstyle itself, tools like MidJourney and Stable Diffusion could work great, especially with detailed prompts that guide the AI. But if you have specific hair or facial details in mind, providing a couple of reference photos might be helpful for more accurate results.
 
DreamStudio Wants $ but will give me a 3 day free trial.

Automatic1111 takes me to Stable Diffusion Art - Tutorials for ComfyUI, Forge, and AUTOMATIC1111 looks interesting.

img2img takes me to AI Art Generator Online Free looks like art not photographs.

https://www.midjourney.com/home Weird lots of blue background. Click on Explore to see lots of pictures.

Artbreeder Try for free. See no pricing. Sorry no sale. That's no way to do business.

Runway | Tools for human imagination. Pricing? Will hire a team of accountants to try to figure this out.

Deep Art AI: Be an artist! Turn your photos into awesome artworks Looks like a program that you install on a computer. My computer is getting old. Windows 7

Photo Editor: Free Online Photo Editing & Image Editor | Fotor "Online photo editor for everyone" Simple easy to understand prices.

Any other suggestions to change boys to girls. I no pay money.
 
You will pay money one way or another, these kinds of tools consume a ton of computing resources and therefore electricity.
For image creation, i can run 'flux' via the 'invoke ai' frontend on a Nvidia 4070, but it's slow.. scroll up for more details.
 
You will pay money one way or another, these kinds of tools consume a ton of computing resources and therefore electricity.
For image creation, i can run 'flux' via the 'invoke ai' frontend on a Nvidia 4070, but it's slow.. scroll up for more details.
OK you have a AI machine inside your computer that can create graphics? Is it connected to the internet?

This AI stuff is pretty amazing. Thinking about the brain inside my skull. Way long time ago it listened to words, looked at pictures, learned how answer questions, and create graphics with crayons, paint and then computers.

I like the best tools. [Think Snap-on] Would the BEST AI photo and graphic tools be inside my computer or far away in some giant Google type computer?

Instead of using Adobe Photoshop and Illustrator to create graphics. Want to make a text request and have a completed project in minutes. Or should I just hire a human graphic artist?
 
I have a middle of the price range Nvidia graphics card doing the majority of the work.

Right now, the best models all run on big expensive remote services.. because they can get extremely expensive commercial GPUs with >10x the RAM and >2x the compute power.. then run those things on 220v outlets, or some unusually high amps at 110v that you couldn't get out of your wall.

But the gap between what you can run on your own computer versus what big companies run is shrinking.
You want to run an image generator that's 80%-90% as good as state of the art? you can do it.. but you need a $2000 GPU
In few years, you might be able to run one that's 90-95% as good as state of the art on a $1000 GPU.

From a privacy perspective, you can basically assume that any commercial service is stealing 100% of what you send it.
If that doesn't matter to you.. the most cost effective move is to rent AI through a paid online service.
..but right now there's lots of opportunities to get free accounts with limited capabilities.
 
Last edited:
This AI stuff is pretty amazing. Thinking about the brain inside my skull. Way long time ago it listened to words, looked at pictures, learned how answer questions, and create graphics with crayons, paint and then computers.

I like the best tools. [Think Snap-on] Would the BEST AI photo and graphic tools be inside my computer or far away in some giant Google type computer?

Instead of using Adobe Photoshop and Illustrator to create graphics. Want to make a text request and have a completed project in minutes. Or should I just hire a human graphic artist?

I have yet to find any of the services that have a free option (trial, limited time, all free, etc) that actually do what you tell them. AFAICT none of the paid ones would be any different.

The more detailed you get in what you describe, the less likely it is to do any of the things you tell them; as if it got confused. The less detail you give, the less likely it is to do what you want. None of them understand any kind of artistic terms, photography terms, etc., not even simple things like point-of-view or focus; most of the time they can't even put things in the right order front to back or left to right, etc.

Hard to create things that way....

If you look at the various images I used as cover art on my Amberwolf page, you'll see some weird stuff in them, but that's nothing compared to the many many iterations I had to go thru to get those (some of them more than a hundred to even come close).

None of the services I can find will let you edit the output either. At best you can have it "paint over" a section with some new generation, but it always looks pasted on (or worse), because it does not use the original generation stuff to simply recreate the new one with the edit. Most of them all you get is a totally new output from the text you enter, which is intensely frustrating because you can't just iterate until you get the right thing, you just get new outputs every time, each of which will always have something wrong with it that you can NEVER correct because it doesn't take your corrections and fix it, it just makes a totally new one that might be *similar* but it isn't the same. :/


The best you can do to create what you really want is to have it create different pieces and then you can cut them all up and layer them together in a graphics program (I use the ancient paintshoppro5); the Gareki image was created this way, as was the banner at the top of that page.
 
Back
Top